├── .gitignore
├── LICENSE
├── README.md
├── ad_simple.py
├── ad_tf.py
├── ad_tf_autoencoder.ipynb
└── res
    ├── anomaly.pcap.gz
    ├── attack-trace.pcap.gz
    ├── attack-trace.pcap.readme
    ├── input.pcap.gz
    ├── input_normal_anomaly.pcap.readme
    ├── normal.pcap.gz
    └── trace.json


/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
2 | ad_test.pcap
3 | checkpoint
4 | model.*
5 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Copyright 2015 The TensorFlow Authors.  All rights reserved.
  2 | 
  3 |                                  Apache License
  4 |                            Version 2.0, January 2004
  5 |                         http://www.apache.org/licenses/
  6 | 
  7 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  8 | 
  9 |    1. Definitions.
 10 | 
 11 |       "License" shall mean the terms and conditions for use, reproduction,
 12 |       and distribution as defined by Sections 1 through 9 of this document.
 13 | 
 14 |       "Licensor" shall mean the copyright owner or entity authorized by
 15 |       the copyright owner that is granting the License.
 16 | 
 17 |       "Legal Entity" shall mean the union of the acting entity and all
 18 |       other entities that control, are controlled by, or are under common
 19 |       control with that entity. For the purposes of this definition,
 20 |       "control" means (i) the power, direct or indirect, to cause the
 21 |       direction or management of such entity, whether by contract or
 22 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 23 |       outstanding shares, or (iii) beneficial ownership of such entity.
 24 | 
 25 |       "You" (or "Your") shall mean an individual or Legal Entity
 26 |       exercising permissions granted by this License.
 27 | 
 28 |       "Source" form shall mean the preferred form for making modifications,
 29 |       including but not limited to software source code, documentation
 30 |       source, and configuration files.
 31 | 
 32 |       "Object" form shall mean any form resulting from mechanical
 33 |       transformation or translation of a Source form, including but
 34 |       not limited to compiled object code, generated documentation,
 35 |       and conversions to other media types.
 36 | 
 37 |       "Work" shall mean the work of authorship, whether in Source or
 38 |       Object form, made available under the License, as indicated by a
 39 |       copyright notice that is included in or attached to the work
 40 |       (an example is provided in the Appendix below).
 41 | 
 42 |       "Derivative Works" shall mean any work, whether in Source or Object
 43 |       form, that is based on (or derived from) the Work and for which the
 44 |       editorial revisions, annotations, elaborations, or other modifications
 45 |       represent, as a whole, an original work of authorship. For the purposes
 46 |       of this License, Derivative Works shall not include works that remain
 47 |       separable from, or merely link (or bind by name) to the interfaces of,
 48 |       the Work and Derivative Works thereof.
 49 | 
 50 |       "Contribution" shall mean any work of authorship, including
 51 |       the original version of the Work and any modifications or additions
 52 |       to that Work or Derivative Works thereof, that is intentionally
 53 |       submitted to Licensor for inclusion in the Work by the copyright owner
 54 |       or by an individual or Legal Entity authorized to submit on behalf of
 55 |       the copyright owner. For the purposes of this definition, "submitted"
 56 |       means any form of electronic, verbal, or written communication sent
 57 |       to the Licensor or its representatives, including but not limited to
 58 |       communication on electronic mailing lists, source code control systems,
 59 |       and issue tracking systems that are managed by, or on behalf of, the
 60 |       Licensor for the purpose of discussing and improving the Work, but
 61 |       excluding communication that is conspicuously marked or otherwise
 62 |       designated in writing by the copyright owner as "Not a Contribution."
 63 | 
 64 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 65 |       on behalf of whom a Contribution has been received by Licensor and
 66 |       subsequently incorporated within the Work.
 67 | 
 68 |    2. Grant of Copyright License. Subject to the terms and conditions of
 69 |       this License, each Contributor hereby grants to You a perpetual,
 70 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 71 |       copyright license to reproduce, prepare Derivative Works of,
 72 |       publicly display, publicly perform, sublicense, and distribute the
 73 |       Work and such Derivative Works in Source or Object form.
 74 | 
 75 |    3. Grant of Patent License. Subject to the terms and conditions of
 76 |       this License, each Contributor hereby grants to You a perpetual,
 77 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 78 |       (except as stated in this section) patent license to make, have made,
 79 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 80 |       where such license applies only to those patent claims licensable
 81 |       by such Contributor that are necessarily infringed by their
 82 |       Contribution(s) alone or by combination of their Contribution(s)
 83 |       with the Work to which such Contribution(s) was submitted. If You
 84 |       institute patent litigation against any entity (including a
 85 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 86 |       or a Contribution incorporated within the Work constitutes direct
 87 |       or contributory patent infringement, then any patent licenses
 88 |       granted to You under this License for that Work shall terminate
 89 |       as of the date such litigation is filed.
 90 | 
 91 |    4. Redistribution. You may reproduce and distribute copies of the
 92 |       Work or Derivative Works thereof in any medium, with or without
 93 |       modifications, and in Source or Object form, provided that You
 94 |       meet the following conditions:
 95 | 
 96 |       (a) You must give any other recipients of the Work or
 97 |           Derivative Works a copy of this License; and
 98 | 
 99 |       (b) You must cause any modified files to carry prominent notices
100 |           stating that You changed the files; and
101 | 
102 |       (c) You must retain, in the Source form of any Derivative Works
103 |           that You distribute, all copyright, patent, trademark, and
104 |           attribution notices from the Source form of the Work,
105 |           excluding those notices that do not pertain to any part of
106 |           the Derivative Works; and
107 | 
108 |       (d) If the Work includes a "NOTICE" text file as part of its
109 |           distribution, then any Derivative Works that You distribute must
110 |           include a readable copy of the attribution notices contained
111 |           within such NOTICE file, excluding those notices that do not
112 |           pertain to any part of the Derivative Works, in at least one
113 |           of the following places: within a NOTICE text file distributed
114 |           as part of the Derivative Works; within the Source form or
115 |           documentation, if provided along with the Derivative Works; or,
116 |           within a display generated by the Derivative Works, if and
117 |           wherever such third-party notices normally appear. The contents
118 |           of the NOTICE file are for informational purposes only and
119 |           do not modify the License. You may add Your own attribution
120 |           notices within Derivative Works that You distribute, alongside
121 |           or as an addendum to the NOTICE text from the Work, provided
122 |           that such additional attribution notices cannot be construed
123 |           as modifying the License.
124 | 
125 |       You may add Your own copyright statement to Your modifications and
126 |       may provide additional or different license terms and conditions
127 |       for use, reproduction, or distribution of Your modifications, or
128 |       for any such Derivative Works as a whole, provided Your use,
129 |       reproduction, and distribution of the Work otherwise complies with
130 |       the conditions stated in this License.
131 | 
132 |    5. Submission of Contributions. Unless You explicitly state otherwise,
133 |       any Contribution intentionally submitted for inclusion in the Work
134 |       by You to the Licensor shall be under the terms and conditions of
135 |       this License, without any additional terms or conditions.
136 |       Notwithstanding the above, nothing herein shall supersede or modify
137 |       the terms of any separate license agreement you may have executed
138 |       with Licensor regarding such Contributions.
139 | 
140 |    6. Trademarks. This License does not grant permission to use the trade
141 |       names, trademarks, service marks, or product names of the Licensor,
142 |       except as required for reasonable and customary use in describing the
143 |       origin of the Work and reproducing the content of the NOTICE file.
144 | 
145 |    7. Disclaimer of Warranty. Unless required by applicable law or
146 |       agreed to in writing, Licensor provides the Work (and each
147 |       Contributor provides its Contributions) on an "AS IS" BASIS,
148 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
149 |       implied, including, without limitation, any warranties or conditions
150 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
151 |       PARTICULAR PURPOSE. You are solely responsible for determining the
152 |       appropriateness of using or redistributing the Work and assume any
153 |       risks associated with Your exercise of permissions under this License.
154 | 
155 |    8. Limitation of Liability. In no event and under no legal theory,
156 |       whether in tort (including negligence), contract, or otherwise,
157 |       unless required by applicable law (such as deliberate and grossly
158 |       negligent acts) or agreed to in writing, shall any Contributor be
159 |       liable to You for damages, including any direct, indirect, special,
160 |       incidental, or consequential damages of any character arising as a
161 |       result of this License or out of the use or inability to use the
162 |       Work (including but not limited to damages for loss of goodwill,
163 |       work stoppage, computer failure or malfunction, or any and all
164 |       other commercial damages or losses), even if such Contributor
165 |       has been advised of the possibility of such damages.
166 | 
167 |    9. Accepting Warranty or Additional Liability. While redistributing
168 |       the Work or Derivative Works thereof, You may choose to offer,
169 |       and charge a fee for, acceptance of support, warranty, indemnity,
170 |       or other liability obligations and/or rights consistent with this
171 |       License. However, in accepting such obligations, You may act only
172 |       on Your own behalf and on Your sole responsibility, not on behalf
173 |       of any other Contributor, and only if You agree to indemnify,
174 |       defend, and hold each Contributor harmless for any liability
175 |       incurred by, or claims asserted against, such Contributor by reason
176 |       of your accepting any such warranty or additional liability.
177 | 
178 |    END OF TERMS AND CONDITIONS
179 | 
180 |    APPENDIX: How to apply the Apache License to your work.
181 | 
182 |       To apply the Apache License to your work, attach the following
183 |       boilerplate notice, with the fields enclosed by brackets "[]"
184 |       replaced with your own identifying information. (Don't include
185 |       the brackets!)  The text should be enclosed in the appropriate
186 |       comment syntax for the file format. We also recommend that a
187 |       file or class name and description of purpose be included on the
188 |       same "printed page" as the copyright notice for easier
189 |       identification within third-party archives.
190 | 
191 |    Copyright 2017, H21 lab, Martin Kacer
192 |    
193 |    Based on tensorflow classifier example wide_n_deep_tutorial.py
194 |    Copyright 2017, The TensorFlow Authors.
195 | 
196 |    Licensed under the Apache License, Version 2.0 (the "License");
197 |    you may not use this file except in compliance with the License.
198 |    You may obtain a copy of the License at
199 | 
200 |        http://www.apache.org/licenses/LICENSE-2.0
201 | 
202 |    Unless required by applicable law or agreed to in writing, software
203 |    distributed under the License is distributed on an "AS IS" BASIS,
204 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
205 |    See the License for the specific language governing permissions and
206 |    limitations under the License.
207 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Unsupervised Anomaly Detection using tensorflow and tshark
 2 | Unsupervised learning by using autoencoder neural network by using tensorflow.
 3 | 
 4 | See the [ad_tf_autoencoder.ipynb](https://github.com/H21lab/Anomaly-Detection/blob/master/ad_tf_autoencoder.ipynb)
 5 | 
 6 | 
 7 | # Supervised Anomaly Detection using tensorflow and tshark
 8 | ```shell-session
 9 | Script to help to detect anomalies in pcap file.
10 | Using tensorflow neural network classifier and tshark -T ek -x input.
11 | 
12 | Input is tshark ek json generate by:
13 | ./tshark -T ek -x -r trace.pcap > input.json
14 | 
15 | Run script:
16 | cat input.pcap.json | python ad_tf.py -i normal.pcap.json \
17 |  -a anomaly.pcap.json -f field_1 field_2 .... field_n
18 | 
19 | For fields the name of the fields from json ek should be used, e.g.:
20 | tshark -T ek -x -r ./res/input.pcap.gz | python ad_tf.py \
21 |    -i res/normal.json -a res/anomaly.json -f tcp_tcp_flags_raw \
22 |    tcp_tcp_dstport_raw
23 | 
24 | Output pcap
25 | ad_test.pcap
26 | 
27 | The script  uses the tshark ek  jsons including the raw  hex data generated
28 | from pcaps by command as described above. The fields arguments are used for
29 | anomaly detection. The fields are used as columns, hashed and used as input
30 | to tensorflow neural classifier network.
31 | 
32 | The neural classifier network is  first trained with normal.pcap.json input
33 | with label 0 and with anomaly.pcap.json  input with label 1. After training
34 | then  from stdin  is read  the  input.pcap.json and  evaluated. The  neural
35 | network predicts the label.
36 | 
37 | The output  pcap contains then  the frames  predicted by neural  network as
38 | anomalies with label 1.
39 | ```
40 | 
41 | # Simple Anomaly Detection using tshark
42 | ```shell-session
43 | Simple script to help to detect anomalies in pcap file.
44 | 
45 | Input is tshark ek json generate by:
46 | ./tshark -T ek -x -r trace.pcap > input.json
47 | 
48 | Run script:
49 | cat input.json | python ad_simple.py field_1 field_2 .... field_n
50 | 
51 | For fields the name of the fields from json ek should be used, e.g.:
52 | cat input.json | python ad_simple.py ip_ip_src ip_ip_dst
53 | 
54 | Output pcap
55 | ad_test.pcap
56 | 
57 | The script read the tshark ek json including the raw hex data. The input is
58 | generated from pcap using tshark. The  fields arguments are used for simple
59 | anomaly detection. The  behavior is similar like SQL GROUP  BY command. The
60 | fields  are  hashed  together  and  the output  pcap  contains  the  frames
61 | beginning with most unique combination of selected fields and descending to
62 | most frequent frames containing the selected fields.
63 | 
64 | The following example
65 |     cat input.json | python ad_simple.py ip_ip_src ip_ip_dst
66 | will  generate pcap starting with less  frequent combinations of source and
67 | dest IP pairs and descending to frames with common
68 | combinations.
69 | ```
70 | 
71 | ## Limitations
72 | 
73 | Program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY.
74 | 
75 | ## Attribution
76 | 
77 | This code was created by Martin Kacer, H21 lab, Copyright 2020.
78 | https://www.h21lab.com
79 | 
80 | 


--------------------------------------------------------------------------------
/ad_simple.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Simple anomaly detection using tshark
  4 | # Input is tshark -T ek -x json and output is pcap
  5 | #
  6 | # Copyright 2020, H21 lab, Martin Kacer
  7 | # All the content and resources have been provided in the hope that it will be useful.
  8 | # Author do not take responsibility for any misapplication of it.
  9 | #
 10 | # Licensed under the Apache License, Version 2.0 (the "License");
 11 | # you may not use this file except in compliance with the License.
 12 | # You may obtain a copy of the License at
 13 | #
 14 | #     http://www.apache.org/licenses/LICENSE-2.0
 15 | #
 16 | # Unless required by applicable law or agreed to in writing, software
 17 | # distributed under the License is distributed on an "AS IS" BASIS,
 18 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 19 | # See the License for the specific language governing permissions and
 20 | # limitations under the License.
 21 | #
 22 | 
 23 | import sys
 24 | import json
 25 | import argparse
 26 | import operator
 27 | import subprocess
 28 | import os
 29 | import hashlib
 30 | 
 31 | def to_pcap_file(filename, output_pcap_file):
 32 |     FNULL = open(os.devnull, 'w')
 33 |     subprocess.call(["text2pcap", filename, output_pcap_file], stdout=FNULL, stderr=subprocess.STDOUT)
 34 | 
 35 | def hex_to_txt(hexstring, output_file):
 36 |     h = hexstring.lower()
 37 |     
 38 |     file = open(output_file, 'a')
 39 |     
 40 |     for i in range(0, len(h), 2):
 41 |         if(i%32 == 0):
 42 |             file.write(format(int(i/2), '06x') + ' ')
 43 |         
 44 |         file.write(h[i:i+2] + ' ')
 45 |         
 46 |         if(i%32 == 30):
 47 |             file.write('\n')
 48 | 
 49 |     file.write('\n')
 50 |     file.close()
 51 | 
 52 | def json_collector(dict, name):
 53 |     r = []
 54 |     if hasattr(dict, 'items'):
 55 |         for k, v in dict.items():
 56 |             if (k in name):
 57 |                 r.append(v)
 58 |             else:
 59 |                 val = json_collector(v, name)
 60 |                 if (len(val) > 0):
 61 |                     r = r + val
 62 | 
 63 |     return r
 64 |     
 65 | 
 66 | def main(_):
 67 |     
 68 |     df = []     # Table storing frames, score, hash_key, ...
 69 |     d = {}      # Hash dict storing counters
 70 |     i = 0
 71 | 
 72 |     # Read from stdin line by line
 73 |     for line in sys.stdin:
 74 |         # trim end of lines
 75 |         line = line.rstrip('\n')
 76 |         # skip empty lines
 77 |         if (line.rstrip() == ""):
 78 |             continue
 79 |         j = json.loads(line)
 80 |         
 81 |         # packet found in ek json input
 82 |         if ('layers' in j):
 83 |             
 84 |             # calculate hash-key and store counters in d dict
 85 |             fj = json_collector(j, _)
 86 |             #print fj
 87 |             k = ''
 88 |             for f in fj:
 89 |                 s = str(f)
 90 |                 m = hashlib.md5()
 91 |                 m.update(s.encode('utf-8'))
 92 |                 k = m.hexdigest()
 93 |                 if k in d:
 94 |                     d[k] = d[k] + 1
 95 |                 else:
 96 |                     d[k] = 1
 97 |             if(k == ''):
 98 |                 if k in d:
 99 |                     d[k] = d[k] + 1
100 |                 else:
101 |                     d[k] = 1
102 |             
103 |             
104 |             # store in df list all the columns
105 |             layers = j['layers']
106 |             
107 |             linux_cooked_header = False
108 |             if ('sll_raw' in layers):
109 |                 linux_cooked_header = True
110 |             if ('frame_raw' in layers):
111 |                 # columns: frame_id, frame_raw, score, linux_cooked_header_flag, hash_key
112 |                 df.append([i, layers['frame_raw'], 0, linux_cooked_header, k])
113 |                 i = i + 1
114 |             
115 |     #print d
116 |     
117 |     # Calculate score column in df table
118 |     for index in range(0, len(df)):
119 |         frame = df[index][1]
120 |         df[index][2] = d[df[index][4]]
121 | 
122 |     #print(df)
123 |     
124 |     # sort the df table by score ascending
125 |     sorted_df = sorted(df, key=operator.itemgetter(2), reverse=False)
126 |     
127 |     #print(sorted_df)
128 |     
129 |     # Generate output pcap
130 |     # open TMP file used by text2pcap
131 |     infile = 'ad_test'
132 |     file = infile + '.tmp'
133 |     f = open(file, 'w')
134 | 
135 |     # Iterate over packets in JSON
136 |     for index in range(0, len(sorted_df)):
137 |         list = []
138 |         linux_cooked_header = False;
139 | 
140 |         frame_raw = sorted_df[index][1]
141 | 
142 |         # for Linux cooked header replace dest MAC and remove two bytes to reconstruct normal frame using text2pcap
143 |         if (sorted_df[index][3]):
144 |             frame_raw = "000000000000" + frame_raw[6*2:] # replce dest MAC
145 |             frame_raw = frame_raw[:12*2] + "" + frame_raw[14*2:] # remove two bytes before Protocol
146 | 
147 |         hex_to_txt(frame_raw, file)
148 |         
149 |     f.close()
150 |     # Write out pcap
151 |     to_pcap_file(infile + '.tmp', infile + '.pcap')
152 |     print("Generated " + infile + ".pcap")
153 |     os.remove(infile + '.tmp')
154 | 
155 | 
156 | if __name__ == "__main__":
157 |     parser = argparse.ArgumentParser(description="""
158 | Simple script to help to detect anomalies in pcap file.
159 | 
160 | Input is tshark ek json generate by:
161 | ./tshark -T ek -x -r trace.pcap > input.json
162 | 
163 | Run script:
164 | cat input.json | python ad_simple.py field_1 field_2 .... field_n
165 | 
166 | For fields the name of the fields from json ek should be used, e.g.:
167 | cat input.json | python ad_simple.py ip_ip_src ip_ip_dst
168 | 
169 | Output pcap
170 | ad_test.pcap
171 | 
172 | The script read the tshark ek json including the raw hex data. The input is
173 | generated from pcap using tshark. The  fields arguments are used for simple
174 | anomaly detection. The  behavior is similar like SQL GROUP  BY command. The
175 | fields  are  hashed  together  and  the output  pcap  contains  the  frames
176 | beginning with most unique combination of selected fields and descending to
177 | most frequent frames containing the selected fields.
178 | 
179 | The following example
180 |     cat input.json | python ad_simple.py ip_ip_src ip_ip_dst
181 | will  generate pcap starting with less  frequent combinations of source and
182 | dest IP pairs and descending to frames with common
183 | combinations.
184 | 
185 | """, formatter_class=argparse.RawTextHelpFormatter)
186 |     parser.register("type", "bool", lambda v: v.lower() == "true")
187 |     FLAGS, unparsed = parser.parse_known_args()
188 |     main(unparsed)
189 | 


--------------------------------------------------------------------------------
/ad_tf.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | #
  4 | # Anomaly detection using tensorflow and tshark
  5 | # Supervised learning using neural network classifier
  6 | #
  7 | # Copyright 2020, H21 lab, Martin Kacer
  8 | # All the content and resources have been provided in the hope that it will be useful.
  9 | # Author do not take responsibility for any misapplication of it.
 10 | #
 11 | # Based on tensorflow classifier example wide_n_deep_tutorial.py
 12 | # Copyright 2017, The TensorFlow Authors.
 13 | #
 14 | # Licensed under the Apache License, Version 2.0 (the "License");
 15 | # you may not use this file except in compliance with the License.
 16 | # You may obtain a copy of the License at
 17 | #
 18 | #     http://www.apache.org/licenses/LICENSE-2.0
 19 | #
 20 | # Unless required by applicable law or agreed to in writing, software
 21 | # distributed under the License is distributed on an "AS IS" BASIS,
 22 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 23 | # See the License for the specific language governing permissions and
 24 | # limitations under the License.
 25 | #
 26 | 
 27 | import sys
 28 | import json
 29 | import argparse
 30 | import tempfile
 31 | import pandas as pd
 32 | import operator
 33 | import subprocess
 34 | import os
 35 | import hashlib
 36 | import tensorflow as tf
 37 | tf.estimator.Estimator._validate_features_in_predict_input = lambda *args: None
 38 | 
 39 | tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.DEBUG)
 40 | 
 41 | COLUMNS = []
 42 | LABEL_COLUMN = "label"
 43 | CATEGORICAL_COLUMNS = []
 44 | CONTINUOUS_COLUMNS = []
 45 | 
 46 | FLAGS = None
 47 | 
 48 | def build_model_columns():
 49 |     """Builds a set of wide and deep feature columns."""
 50 | 
 51 |     # Wide columns and deep columns.
 52 |     wide_columns = []
 53 | 
 54 |     deep_columns = []
 55 | 
 56 |     for c in COLUMNS:
 57 |         # Sparse base columns.
 58 |         print(">>>>>>>>>>>>>>>>>>>")
 59 |         print(c)
 60 |         column = tf.feature_column.categorical_column_with_hash_bucket(c, hash_bucket_size=10000)
 61 |         deep_columns.append(tf.feature_column.embedding_column(column, dimension=8))
 62 |         #wide_columns.append(column)
 63 | 
 64 |     return wide_columns, deep_columns
 65 | 
 66 | def build_estimator(model_dir, model_type):
 67 |     """Build an estimator appropriate for the given model type."""
 68 |     wide_columns, deep_columns = build_model_columns()
 69 |     hidden_units = [100, 75, 50, 25]
 70 | 
 71 |     run_config = tf.estimator.RunConfig().replace(keep_checkpoint_max = 5,
 72 |                     log_step_count_steps=20, save_checkpoints_steps=200)
 73 | 
 74 |     if model_type == 'wide':
 75 |         return tf.estimator.LinearClassifier(
 76 |             model_dir=model_dir,
 77 |             feature_columns=wide_columns,
 78 |             config=run_config)
 79 |     elif model_type == 'deep':
 80 |         return tf.estimator.DNNClassifier(
 81 |             model_dir=model_dir,
 82 |             feature_columns=deep_columns,
 83 |             hidden_units=hidden_units,
 84 |             config=run_config)
 85 |     else:
 86 |         return tf.estimator.DNNLinearCombinedClassifier(
 87 |             model_dir=model_dir,
 88 |             linear_feature_columns=wide_columns,
 89 |             dnn_feature_columns=deep_columns,
 90 |             dnn_hidden_units=hidden_units,
 91 |             config=run_config)
 92 | 
 93 | def input_fn(df, num_epochs, shuffle, batch_size):
 94 |     """Input builder function."""
 95 |     dataset = tf.data.Dataset.from_tensor_slices((dict(df[COLUMNS]), df['label']))
 96 | 
 97 |     if shuffle:
 98 |         dataset = dataset.shuffle(1000)
 99 | 
100 |     dataset = dataset.repeat(num_epochs)
101 |     dataset = dataset.batch(batch_size)
102 |     return dataset
103 | 
104 | def df_to_pcap(j, df_predict, file):
105 |     linux_cooked_header = df_predict.at[j, 'linux_cooked_header'];
106 |     frame_raw = df_predict.at[j, 'frame_raw']
107 |     # for Linux cooked header replace dest MAC and remove two bytes to reconstruct normal frame using text2pcap
108 |     if (linux_cooked_header):
109 |         frame_raw = "000000000000" + frame_raw[6*2:] # replce dest MAC
110 |         frame_raw = frame_raw[:12*2] + "" + frame_raw[14*2:] # remove two bytes before Protocol
111 |     hex_to_txt(frame_raw, file)
112 | 
113 | def to_pcap_file(filename, output_pcap_file):
114 |     FNULL = open(os.devnull, 'w')
115 |     subprocess.call(["text2pcap", filename, output_pcap_file], stdout=FNULL, stderr=subprocess.STDOUT)
116 | 
117 | def hex_to_txt(hexstring, output_file):
118 |     h = hexstring.lower()
119 | 
120 |     file = open(output_file, 'a')
121 | 
122 |     for i in range(0, len(h), 2):
123 |         if(i%32 == 0):
124 |             file.write(format(int(i/2), '06x') + ' ')
125 | 
126 |         file.write(h[i:i+2] + ' ')
127 | 
128 |         if(i%32 == 30):
129 |             file.write('\n')
130 | 
131 |     file.write('\n')
132 |     file.close()
133 | 
134 | def json_collector(dict, name):
135 |     r = []
136 |     if hasattr(dict, 'items'):
137 |         for k, v in dict.items():
138 |             if (k in name):
139 |                 r.append(v)
140 |             else:
141 |                 val = json_collector(v, name)
142 |                 if (len(val) > 0):
143 |                     r = r + val
144 | 
145 |     return r
146 |     
147 | def readJsonEKLine(df, line, label):
148 |     # trim end of lines
149 |     line = line.rstrip('\n')
150 |     # skip empty lines
151 |     if (line.rstrip() == ""):
152 |         return
153 | 
154 |     j = json.loads(line)
155 |                 
156 |     # frames
157 |     if ('layers' in j):
158 |         layers = j['layers']
159 | 
160 |         linux_cooked_header = False
161 |         if ('sll_raw' in layers):
162 |             linux_cooked_header = True
163 |         if ('frame_raw' in layers):
164 | 
165 |             i = len(df)
166 | 
167 |             df.loc[i, 'frame_raw'] = layers['frame_raw']
168 |             df.loc[i, 'linux_cooked_header'] = linux_cooked_header
169 |             
170 |             for c in COLUMNS:
171 |                 v = json_collector(j, [c])
172 |                 if (len(v) > 0):
173 |                     v = v[0]
174 |                 else:
175 |                     v = ''
176 |                 df.loc[i, c] = v
177 | 
178 |             df.loc[i, 'label'] = label
179 | 
180 | def readJsonEK(df, filename, label, limit = 0):
181 |     i = 0
182 |     while i <= limit:
183 |         with open(filename) as f:
184 |             for line in f:
185 |                 if (limit != 0 and i > limit):
186 |                     return i
187 |                 readJsonEKLine(df, line, label)
188 |                 i = i + 1
189 |     return i
190 | 
191 | def main(_):
192 |     
193 |     global COLUMNS
194 |     global CATEGORICAL_COLUMNS
195 |     COLUMNS = FLAGS.fields
196 |     CATEGORICAL_COLUMNS = COLUMNS
197 |     
198 |     print('===============')
199 |     print(COLUMNS)
200 |     print(CATEGORICAL_COLUMNS)
201 |     print(CONTINUOUS_COLUMNS)
202 |     print('===============')
203 | 
204 |     df = pd.DataFrame()
205 | 
206 |     ln = readJsonEK(df, FLAGS.normal_tshark_ek_x_json, 0)
207 |     readJsonEK(df, FLAGS.anomaly_tshark_ek_x_json, 1, ln)
208 | 
209 |     df = df.sample(frac=1).reset_index(drop=True)
210 | 
211 |     print(df)
212 | 
213 |     #####################################
214 |     # train neural network and evaluate #
215 |     #####################################
216 |     model_dir = tempfile.mkdtemp()
217 |     print("model directory = %s" % model_dir)
218 | 
219 |     print(">>>>>>>>>>>>>>>" + str(COLUMNS))
220 |     model = build_estimator(model_dir, 'wide_n_deep')
221 | 
222 |     # Train and evaluate the model every `FLAGS.epochs_per_eval` epochs.
223 |     train_epochs = 100
224 |     epochs_per_eval = 20
225 |     train_steps = 400
226 |     for n in range(train_epochs // epochs_per_eval):
227 |         model.train(input_fn=lambda: input_fn(df, train_epochs, True, train_steps))
228 | 
229 |         results = model.evaluate(input_fn=lambda: input_fn(df, train_epochs, True, train_steps))
230 | 
231 |     # Display evaluation metrics
232 |     print('Results at epoch', (n + 1) * epochs_per_eval)
233 |     print('-' * 60)
234 | 
235 |     for key in sorted(results):
236 |         print('%s: %s' % (key, results[key]))
237 | 
238 |     #####################################
239 |     # read from stdin and predict       #
240 |     #####################################
241 |     # Generate pcap
242 |     # open TMP file used by text2pcap
243 | 
244 |     infile = 'ad_test'
245 |     file = infile + '.tmp'
246 |     f = open(file, 'w')
247 | 
248 |     df_predict = pd.DataFrame()
249 | 
250 |     i = 0;
251 |     for line in sys.stdin:
252 |         readJsonEKLine(df_predict, line, 0)  
253 | 
254 |         i = i + 1
255 | 
256 |         #print(df_predict)
257 | 
258 |         # flush every 100 lines, EK JSON contains also index lines, not packets
259 |         if (i%200) == 0:
260 |             y = model.predict(input_fn=lambda: input_fn(df_predict, 1, False, 100))
261 |             #print("=======================")
262 |             #print(y)
263 |             #print("=======================")
264 | 
265 |             j = 0
266 |             for val in y:
267 |                 #print("****")
268 |                 #print(val)
269 |                 #print("****")
270 |                 if (val == 1):
271 |                     print(str(df_predict.iloc[[j]]))
272 |                     # pcap
273 |                     df_to_pcap(j, df_predict, file)
274 | 
275 |                 j = j + 1
276 | 
277 |             # check predicted labels
278 |             if len(df_predict) > 0:
279 |                 y = model.predict(input_fn=lambda: input_fn(df_predict, 1, False, 100))
280 |                 j = 0
281 |                 for val in y:
282 |                     label = val['class_ids'][0]
283 |                     if (label == 1):
284 |                         print("index = " + str(j))
285 |                         print("label = " + str(label))
286 |                         print("Probability = " + str(val['probabilities'][label]))
287 |                         print(str(df_predict.iloc[[j]]))
288 |                         # pcap
289 |                         df_to_pcap(j, df_predict, file)
290 | 
291 |                     j = j + 1
292 | 
293 |             # flush
294 |             df_predict = pd.DataFrame()
295 | 
296 |     # pcap
297 |     f.close()
298 |     to_pcap_file(infile + '.tmp', infile + '.pcap')
299 |     os.remove(infile + '.tmp')
300 |     print("Generated " + infile + ".pcap")
301 | 
302 | 
303 | if __name__ == "__main__":
304 |     parser = argparse.ArgumentParser(description="""
305 | Script to help to detect anomalies in pcap file.
306 | Using tensorflow neural network classifier and tshark -T ek -x input.
307 | 
308 | Input is tshark ek json generate by:
309 | ./tshark -T ek -x -r trace.pcap > input.json
310 | 
311 | Run script:
312 | cat input.pcap.json | python ad_tf.py -i normal.pcap.json \\
313 |  -a anomaly.pcap.json -f field_1 field_2 .... field_n
314 | 
315 | For fields the name of the fields from json ek should be used, e.g.:
316 | tshark -T ek -x -r ./res/input.pcap.gz | python ad_tf.py \\
317 |    -i res/normal.json -a res/anomaly.json -f tcp_tcp_flags_raw \\
318 |    tcp_tcp_dstport_raw
319 | 
320 | Output pcap
321 | ad_test.pcap
322 | 
323 | The script  uses the tshark ek  jsons including the raw  hex data generated
324 | from pcaps by command as described above. The fields arguments are used for
325 | anomaly detection. The fields are used as columns, hashed and used as input
326 | to tensorflow neural classifier network.
327 | 
328 | The neural classifier network is  first trained with normal.pcap.json input
329 | with label 0 and with anomaly.pcap.json  input with label 1. After training
330 | then  from stdin  is read  the  input.pcap.json and  evaluated. The  neural
331 | network predicts the label.
332 | 
333 | The output  pcap contains then  the frames  predicted by neural  network as
334 | anomalies with label 1.
335 | """, formatter_class=argparse.RawTextHelpFormatter)
336 |     parser.register("type", "bool", lambda v: v.lower() == "true")
337 |     parser.add_argument(
338 |         "-a",
339 |         "--anomaly_tshark_ek_x_json",
340 |         type=str,
341 |         default="",
342 |         help="Anomaly traffic. Json created by tshark -T ek -x from pcap.\nShall contain only frames considered as anomalies.",
343 |         required=True
344 |     )
345 |     parser.add_argument(
346 |         "-i",
347 |         "--normal_tshark_ek_x_json",
348 |         type=str,
349 |         default="",
350 |         help="Regular traffic. Json created by tshark -T ek -x from pcap.\nShall contain only frames considered as normal.",
351 |         required=True
352 |     )
353 |     parser.add_argument(
354 |         "-f",
355 |         "--fields",
356 |         nargs='+',
357 |         help='field_1 field_2 .... field_n (e.g. ip_ip_src ip_ip_dst)',
358 |         required=True
359 |     )
360 | 
361 |     FLAGS, unparsed = parser.parse_known_args()
362 | 
363 |     print("============")
364 |     print(FLAGS.anomaly_tshark_ek_x_json)
365 |     print(FLAGS.normal_tshark_ek_x_json)
366 |     print(FLAGS.fields)
367 |     print("============")
368 | 
369 |     main([sys.argv[0]] + unparsed)
370 | 


--------------------------------------------------------------------------------
/res/anomaly.pcap.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/H21lab/Anomaly-Detection/070c01bb6de2601c78dc6aea0a558a990c04acc2/res/anomaly.pcap.gz


--------------------------------------------------------------------------------
/res/attack-trace.pcap.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/H21lab/Anomaly-Detection/070c01bb6de2601c78dc6aea0a558a990c04acc2/res/attack-trace.pcap.gz


--------------------------------------------------------------------------------
/res/attack-trace.pcap.readme:
--------------------------------------------------------------------------------
1 | Created from pcaps publicly available on https://pcapr.net
2 | 
3 | attack-trace.pcap
4 | 
5 | Created by the following command
6 | tshark -T ek -x -r attack-trace.pcap > trace.json
7 | 


--------------------------------------------------------------------------------
/res/input.pcap.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/H21lab/Anomaly-Detection/070c01bb6de2601c78dc6aea0a558a990c04acc2/res/input.pcap.gz


--------------------------------------------------------------------------------
/res/input_normal_anomaly.pcap.readme:
--------------------------------------------------------------------------------
1 | input.pcap, normal.pcap, anomaly.pcap
2 | 
3 | Original pcap was "DEF CON 23 ICS Village" 
4 | Downloaded from https://www.netresec.com/?page=PcapFiles
5 | https://media.defcon.org/DEF%20CON%2023/DEF%20CON%2023%20villages/DEF%20CON%2023%20ics%20village/DEF%20CON%2023%20ICS%20Village%20packet%20captures.rar
6 | 
7 | The pcap was after split into 3 pcap files.
8 | 


--------------------------------------------------------------------------------
/res/normal.pcap.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/H21lab/Anomaly-Detection/070c01bb6de2601c78dc6aea0a558a990c04acc2/res/normal.pcap.gz


--------------------------------------------------------------------------------