├── late.csv.zip
├── main6.py
└── README.md


/late.csv.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nb0309/Network-Traffic-Analysis-using-Machine-learning/HEAD/late.csv.zip


--------------------------------------------------------------------------------
/main6.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | import numpy as np
 3 | 
 4 | def identify_green(frame):
 5 |     """Detects green objects in the frame and sends a signal to a solenoid valve (Raspberry Pi specific).
 6 | 
 7 |     Args:
 8 |         frame (numpy.ndarray): The frame captured from the video stream.
 9 | 
10 |     Returns:
11 |         None
12 |     """
13 | 
14 |     hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
15 |     lower_green = np.array([40, 50, 50])
16 |     upper_green = np.array([80, 255, 255])
17 |     mask = cv2.inRange(hsv, lower_green, upper_green)
18 |     contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
19 | 
20 |     green_detected = False
21 |     for cnt in contours:
22 |         # Calculate area of the contour
23 |         area = cv2.contourArea(cnt)
24 |         # Adjust minimum area threshold as needed
25 |         if area > 1000:  # Adjust threshold based on object size and camera distance
26 |             green_detected = True
27 |             cv2.drawContours(frame, [cnt], 0, (0, 255, 0), 2)
28 |             break  # Exit loop after finding one green object
29 | 
30 |     # Send signal to solenoid valve (Raspberry Pi specific)
31 |     
32 | 
33 |     cv2.imshow("Green Detection", frame)
34 | 
35 | 
36 | 
37 | 
38 | cap = cv2.VideoCapture(0)
39 | while True:
40 |     ret, frame = cap.read()
41 |     if ret:
42 |         identify_green(frame)
43 |         if cv2.waitKey(1) & 0xFF == ord('q'):
44 |             break
45 |     else:
46 |         print("Error: Frame not captured")
47 |         break
48 | 
49 | cap.release()
50 | cv2.destroyAllWindows()
51 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # Anomaly Detection In Computer Network
  3 | In the ever-evolving landscape of cybersecurity, safeguarding computer networks from malicious activities and unusual behavior has become paramount. Anomaly detection plays a pivotal role in identifying and mitigating potential threats in real-time. This description explores the concept of anomaly detection in computer networks, with a focus on how it can be optimized using Intel's extension for TensorFlow.
  4 | 
  5 | Anomaly detection is a sophisticated technique used to identify patterns and behaviors that deviate significantly from normal network activity. In computer networks, normal behavior is typically established by observing historical data, which is then used as a benchmark for detecting anomalies. Anomalies can be indicative of security breaches, network failures, or other irregularities that warrant immediate attention.
  6 | 
  7 | 
  8 | 
  9 | ## Data
 10 | Wireshark captures various data when monitoring network traffic:
 11 | ```
 12 | Packet Information: Individual network packets with details like source/destination IP addresses, port numbers, packet length, and payload content.
 13 | Protocol Analysis: Identification and decoding of network protocols (e.g., TCP, UDP, HTTP) for deeper inspection.
 14 | Decoding: Human-readable representation of packet data structures (e.g., DNS queries, HTTP requests).
 15 | Statistics: Captured packet statistics, including duration, data rates, and potential issues like packet loss.
 16 | Filtering: Tools to focus on specific traffic types or search for keywords.
 17 | Visualization: Packet flow visualization for understanding network communication patterns.
 18 | Expert Info: Flags warnings or errors related to packet issues.
 19 | Hierarchy: Displays the protocol breakdown in the capture.
 20 | Export: Allows saving captures for further analysis or reporting in different formats.
 21 | ```
 22 | Features:
 23 | ```
 24 | #   Column       Non-Null Count   Dtype  
 25 | ---  ------       --------------   -----  
 26 |  0   No.          117084 non-null  int64  
 27 |  1   Time         117084 non-null  float64
 28 |  2   Source       117084 non-null  object 
 29 |  3   Destination  117084 non-null  object 
 30 |  4   Protocol     117084 non-null  object 
 31 |  5   Length       117084 non-null  int64  
 32 |  6   Info         117084 non-null  object 
 33 |  ```
 34 | 
 35 | ## Tech Stack:
 36 | ![App Screenshot](https://imgs.search.brave.com/3xxa-ZJZeey5h_Czsj0lckx9eJ_irq7jN5oO680hyCM/rs:fit:560:320:1/g:ce/aHR0cHM6Ly91cGxv/YWQud2lraW1lZGlh/Lm9yZy93aWtpcGVk/aWEvZW4vdGh1bWIv/Zi9mYS9PbmVBUEkt/cmdiLTMwMDAucG5n/LzUxMnB4LU9uZUFQ/SS1yZ2ItMzAwMC5w/bmc)
 37 | 
 38 | 
 39 | #
 40 | 
 41 | 
 42 | 
 43 | Intel Extension for Tensorflow*:
 44 | 1. Plug into Tensorflow 2.10 or late to accelerate training and inference on Intel GPU hardware with no code changes.
 45 | 2. accelerate AI performance with Intel oneAPI Deep Neural Network Library(oneDNN) features such as graph optimizations and memory pool allocation.
 46 | 3. Automatically use Intel Deep Learning Boost instruction set features to parallelize and accelerate AI workloads.
 47 | 4. Enable optimizations by setting the environment variable by
 48 | ```
 49 | TF_ENABLE_ONEDNN_OPTS=1
 50 | ```
 51 | ##
 52 | Intel Distribution for Python*:
 53 | 1. The distribution is designed to scale efficiently across multiple CPU cores and threads. This scalability is essential for applications that required high-performance computing.
 54 | 2. Essential Python bindings for easing integration of Intel native tools with the python project. It seamlessly works with Intel software and libraries.
 55 | 3. Intel Distribution for python maintains compatibility with the standard python distribution(cpython). This means that most existing python packages and libraries can be used seamlessly with this distribution.
 56 | 
 57 | ##
 58 | 
 59 | 
 60 | Intel Extension for scikit-learn*:
 61 | 1. Intel extension can accelerate scikit-learn algorithms by up to 100x, which can significantly reduce the time it takes to train and deploy machine learning models.
 62 | 2. The extension is seamlessly integrated with scikit-learn, so you can continue to use the same API and code.
 63 | 3. The intel extension supports multiple devices, including CPUs, GPUs, and FPGAs. This allows you to choose the best device for your specific applicatino and workload.
 64 |    
 65 | Add two lines of code to patch all compatible algorithms in your Python script.
 66 | ```
 67 | from sklearnex import patch_sklearn
 68 | patch_sklearn()
 69 | ```    
 70 | 
 71 | 
 72 | Wireshark:
 73 | Data packet sniffing tool
 74 | ![App Screenshot](https://imgs.search.brave.com/eZPcDy6jX155eTNG-TC_-d6jzFp5rparfpL5l_zuycM/rs:fit:560:320:1/g:ce/aHR0cHM6Ly91cGxv/YWQud2lraW1lZGlh/Lm9yZy93aWtpcGVk/aWEvY29tbW9ucy90/aHVtYi9jL2NmL1dp/cmVzaGFya18zLjZf/c2NyZWVuc2hvdC5w/bmcvNTEycHgtV2ly/ZXNoYXJrXzMuNl9z/Y3JlZW5zaG90LnBu/Zw)
 75 | 
 76 | ## Model:
 77 | In the pursuit of robust anomaly detection in computer network data, a combination of two powerful techniques has been employed: Isolation Forest and Autoencoders. This dual approach harnesses the strengths of both methodologies to enhance the precision and effectiveness of anomaly detection in complex network environments.
 78 | 
 79 | Autoencoders Architecture:
 80 | 
 81 | Input Layer:
 82 | ```
 83 | Neurons: Number of input features (determined by input_dim).
 84 | 
 85 | Activation: None (raw input).
 86 | ```
 87 | Encoding Layers:
 88 | ```
 89 | Layer 1: Dense layer with 64 neurons and ReLU activation.
 90 | 
 91 | Dropout: 20% dropout for regularization.
 92 | Layer 2: Dense layer with 32 neurons and ReLU activation.
 93 | 
 94 | Encoding Bottleneck: Dense layer with encoding_dim (10) neurons and ReLU activation.
 95 | ```
 96 | Decoding Layers:
 97 | ```
 98 | Layer 1: Dense layer with 32 neurons and ReLU activation.
 99 | 
100 | Dropout: 20% dropout for regularization.
101 | 
102 | Layer 2: Dense layer with 64 neurons and ReLU activation.
103 | 
104 | Output Layer: Dense layer with the same number of neurons as input features (specified by input_dim) and sigmoid activation.
105 | ```
106 | Model Compilation:
107 | ```
108 | Optimizer: Adam optimizer.
109 | Loss Function: Mean Squared Error (MSE) for reconstruction loss.
110 | Training:
111 | 
112 | Input and Target: Scaled input data (X_scaled) used as both input and target.
113 | Epochs: 20.
114 | Batch Size: 32.
115 | ```
116 | Anomaly Detection:
117 | ```
118 | After training, the model calculates MSE between original data and its reconstruction.
119 | Anomaly threshold is set at the 99.9th percentile of MSE values.
120 | ```
121 | Identifying Anomalies:
122 | ```
123 | Data points with MSE above the threshold are considered
124 | ```
125 | 
126 | Ensemble Method:
127 | 
128 | Combinig both randomforestclassifier and isolation forest.
129 | 
130 | Isolation Forest (IsolationForest):
131 | ```
132 | Isolation Forest is used for initial anomaly score estimation.
133 | contamination is set to 0.0045, and random_state is 42.
134 | Anomaly scores are predicted for data points, where -1 indicates anomalies and 1 indicates normal data.
135 | ```
136 | Random Forest Classifier (RandomForestClassifier):
137 | ```
138 | Random Forest Classifier refines the anomaly detection process.
139 | n_estimators is 100, and random_state is 42.
140 | It is trained on features and anomaly labels derived from the Isolation Forest.
141 | Anomaly predictions are made, and anomalies are identified where the prediction is 0 (anomaly).
142 | ```
143 | ## Anomaly points:
144 | 
145 | ```Anomaly points:
146 |            No.         Time                                   Source  \
147 | 11901    11902   379.582211  2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236   
148 | 19689    19690   424.147757             2405:200:1630:a03::312c:c5b3   
149 | 19831    19832   424.232626             2405:200:1630:a03::312c:c5b3   
150 | 20090    20091   424.369842             2405:200:1630:a03::312c:c5b3   
151 | 20118    20119   424.388097             2405:200:1630:a03::312c:c5b3   
152 | ...        ...          ...                                      ...   
153 | 100885  100886  5232.917759  2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236   
154 | 100888  100889  5232.938174  2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236   
155 | 100897  100898  5233.018599  2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236   
156 | 115756  115757  6767.014347  2409:40f4:100b:c1b6:5a68:c2f9:5206:af46   
157 | 115757  115758  6767.014478                           192.168.239.25   
158 | 
159 |                                     Destination  Protocol  Length  \
160 | 11901                  2404:6800:4007:816::2002         5    1294   
161 | 19689   2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236        12    2662   
162 | 19831   2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236        12    2662   
163 | 20090   2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236        12    2662   
164 | 20118   2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236        12    2662   
165 | ...                                         ...       ...     ...   
166 | 100885                 2404:6800:4007:810::2002         5    1294   
167 | 100888                 2404:6800:4007:810::2002         5    1294   
168 | 100897                 2404:6800:4007:810::2002         5    1294   
169 | 115756                                 ff02::fb         7     221   
170 | 115757                              224.0.0.251         7     201   
171 | 
172 |                                                      Info  anomaly  
173 | 11901          Destination Unreachable (Port unreachable)        1  
174 | 19689                   Encrypted Data, Continuation Data        1  
175 | 19831                   Encrypted Data, Continuation Data        1  
176 | 20090                   Encrypted Data, Continuation Data        1  
177 | 20118                   Encrypted Data, Continuation Data        1  
178 | ...                                                   ...      ...  
179 | 100885         Destination Unreachable (Port unreachable)        1  
180 | 100888         Destination Unreachable (Port unreachable)        1  
181 | 100897         Destination Unreachable (Port unreachable)        1  
182 | 115756  Standard query 0x0000 PTR _nfs._tcp.local, "QM...        1  
183 | 115757  Standard query 0x0000 PTR _nfs._tcp.local, "QM...        1  
184 | ```
185 | ## Epoch:
186 | Without Intel Extension for Tensorflow:
187 | ![image](https://github.com/nb0309/Network-Traffic-Analysis-using-Machine-learning/assets/93106796/f985f3b9-d78f-472d-8ef4-99b9871a1f66)
188 | 
189 | 
190 | With using Intel Extension for Tensorflow:
191 | ![image](https://github.com/nb0309/Network-Traffic-Analysis-using-Machine-learning/assets/93106796/f6c8fe7f-30cd-4412-bf53-8e341ba3a609)
192 | 
193 | 
194 | 
195 | ## Contributors:
196 | 
197 | - [@navabhaarathi](https://github.com/nb0309)
198 | - [@balasuriya](https://github.com/balasuriyaranganathan/balasuriyaranganathan)
199 | 
200 | 
201 | ## Acknowledgements
202 | 
203 |  - [Computer Network Intrusion and anomaly detection](https://www.hindawi.com/journals/misy/2022/6576023/)
204 |  - [Intel Distribution for Python](https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html)
205 |  - [Intel oneAPI](https://www.oneapi.io/)
206 |  - [Wireshark](https://www.wiresharp.org/)
207 | 


--------------------------------------------------------------------------------