├── late.csv.zip ├── main6.py └── README.md /late.csv.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nb0309/Network-Traffic-Analysis-using-Machine-learning/HEAD/late.csv.zip -------------------------------------------------------------------------------- /main6.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | 4 | def identify_green(frame): 5 | """Detects green objects in the frame and sends a signal to a solenoid valve (Raspberry Pi specific). 6 | 7 | Args: 8 | frame (numpy.ndarray): The frame captured from the video stream. 9 | 10 | Returns: 11 | None 12 | """ 13 | 14 | hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) 15 | lower_green = np.array([40, 50, 50]) 16 | upper_green = np.array([80, 255, 255]) 17 | mask = cv2.inRange(hsv, lower_green, upper_green) 18 | contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) 19 | 20 | green_detected = False 21 | for cnt in contours: 22 | # Calculate area of the contour 23 | area = cv2.contourArea(cnt) 24 | # Adjust minimum area threshold as needed 25 | if area > 1000: # Adjust threshold based on object size and camera distance 26 | green_detected = True 27 | cv2.drawContours(frame, [cnt], 0, (0, 255, 0), 2) 28 | break # Exit loop after finding one green object 29 | 30 | # Send signal to solenoid valve (Raspberry Pi specific) 31 | 32 | 33 | cv2.imshow("Green Detection", frame) 34 | 35 | 36 | 37 | 38 | cap = cv2.VideoCapture(0) 39 | while True: 40 | ret, frame = cap.read() 41 | if ret: 42 | identify_green(frame) 43 | if cv2.waitKey(1) & 0xFF == ord('q'): 44 | break 45 | else: 46 | print("Error: Frame not captured") 47 | break 48 | 49 | cap.release() 50 | cv2.destroyAllWindows() 51 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Anomaly Detection In Computer Network 3 | In the ever-evolving landscape of cybersecurity, safeguarding computer networks from malicious activities and unusual behavior has become paramount. Anomaly detection plays a pivotal role in identifying and mitigating potential threats in real-time. This description explores the concept of anomaly detection in computer networks, with a focus on how it can be optimized using Intel's extension for TensorFlow. 4 | 5 | Anomaly detection is a sophisticated technique used to identify patterns and behaviors that deviate significantly from normal network activity. In computer networks, normal behavior is typically established by observing historical data, which is then used as a benchmark for detecting anomalies. Anomalies can be indicative of security breaches, network failures, or other irregularities that warrant immediate attention. 6 | 7 | 8 | 9 | ## Data 10 | Wireshark captures various data when monitoring network traffic: 11 | ``` 12 | Packet Information: Individual network packets with details like source/destination IP addresses, port numbers, packet length, and payload content. 13 | Protocol Analysis: Identification and decoding of network protocols (e.g., TCP, UDP, HTTP) for deeper inspection. 14 | Decoding: Human-readable representation of packet data structures (e.g., DNS queries, HTTP requests). 15 | Statistics: Captured packet statistics, including duration, data rates, and potential issues like packet loss. 16 | Filtering: Tools to focus on specific traffic types or search for keywords. 17 | Visualization: Packet flow visualization for understanding network communication patterns. 18 | Expert Info: Flags warnings or errors related to packet issues. 19 | Hierarchy: Displays the protocol breakdown in the capture. 20 | Export: Allows saving captures for further analysis or reporting in different formats. 21 | ``` 22 | Features: 23 | ``` 24 | # Column Non-Null Count Dtype 25 | --- ------ -------------- ----- 26 | 0 No. 117084 non-null int64 27 | 1 Time 117084 non-null float64 28 | 2 Source 117084 non-null object 29 | 3 Destination 117084 non-null object 30 | 4 Protocol 117084 non-null object 31 | 5 Length 117084 non-null int64 32 | 6 Info 117084 non-null object 33 | ``` 34 | 35 | ## Tech Stack: 36 | ![App Screenshot](https://imgs.search.brave.com/3xxa-ZJZeey5h_Czsj0lckx9eJ_irq7jN5oO680hyCM/rs:fit:560:320:1/g:ce/aHR0cHM6Ly91cGxv/YWQud2lraW1lZGlh/Lm9yZy93aWtpcGVk/aWEvZW4vdGh1bWIv/Zi9mYS9PbmVBUEkt/cmdiLTMwMDAucG5n/LzUxMnB4LU9uZUFQ/SS1yZ2ItMzAwMC5w/bmc) 37 | 38 | 39 | # 40 | 41 | 42 | 43 | Intel Extension for Tensorflow*: 44 | 1. Plug into Tensorflow 2.10 or late to accelerate training and inference on Intel GPU hardware with no code changes. 45 | 2. accelerate AI performance with Intel oneAPI Deep Neural Network Library(oneDNN) features such as graph optimizations and memory pool allocation. 46 | 3. Automatically use Intel Deep Learning Boost instruction set features to parallelize and accelerate AI workloads. 47 | 4. Enable optimizations by setting the environment variable by 48 | ``` 49 | TF_ENABLE_ONEDNN_OPTS=1 50 | ``` 51 | ## 52 | Intel Distribution for Python*: 53 | 1. The distribution is designed to scale efficiently across multiple CPU cores and threads. This scalability is essential for applications that required high-performance computing. 54 | 2. Essential Python bindings for easing integration of Intel native tools with the python project. It seamlessly works with Intel software and libraries. 55 | 3. Intel Distribution for python maintains compatibility with the standard python distribution(cpython). This means that most existing python packages and libraries can be used seamlessly with this distribution. 56 | 57 | ## 58 | 59 | 60 | Intel Extension for scikit-learn*: 61 | 1. Intel extension can accelerate scikit-learn algorithms by up to 100x, which can significantly reduce the time it takes to train and deploy machine learning models. 62 | 2. The extension is seamlessly integrated with scikit-learn, so you can continue to use the same API and code. 63 | 3. The intel extension supports multiple devices, including CPUs, GPUs, and FPGAs. This allows you to choose the best device for your specific applicatino and workload. 64 | 65 | Add two lines of code to patch all compatible algorithms in your Python script. 66 | ``` 67 | from sklearnex import patch_sklearn 68 | patch_sklearn() 69 | ``` 70 | 71 | 72 | Wireshark: 73 | Data packet sniffing tool 74 | ![App Screenshot](https://imgs.search.brave.com/eZPcDy6jX155eTNG-TC_-d6jzFp5rparfpL5l_zuycM/rs:fit:560:320:1/g:ce/aHR0cHM6Ly91cGxv/YWQud2lraW1lZGlh/Lm9yZy93aWtpcGVk/aWEvY29tbW9ucy90/aHVtYi9jL2NmL1dp/cmVzaGFya18zLjZf/c2NyZWVuc2hvdC5w/bmcvNTEycHgtV2ly/ZXNoYXJrXzMuNl9z/Y3JlZW5zaG90LnBu/Zw) 75 | 76 | ## Model: 77 | In the pursuit of robust anomaly detection in computer network data, a combination of two powerful techniques has been employed: Isolation Forest and Autoencoders. This dual approach harnesses the strengths of both methodologies to enhance the precision and effectiveness of anomaly detection in complex network environments. 78 | 79 | Autoencoders Architecture: 80 | 81 | Input Layer: 82 | ``` 83 | Neurons: Number of input features (determined by input_dim). 84 | 85 | Activation: None (raw input). 86 | ``` 87 | Encoding Layers: 88 | ``` 89 | Layer 1: Dense layer with 64 neurons and ReLU activation. 90 | 91 | Dropout: 20% dropout for regularization. 92 | Layer 2: Dense layer with 32 neurons and ReLU activation. 93 | 94 | Encoding Bottleneck: Dense layer with encoding_dim (10) neurons and ReLU activation. 95 | ``` 96 | Decoding Layers: 97 | ``` 98 | Layer 1: Dense layer with 32 neurons and ReLU activation. 99 | 100 | Dropout: 20% dropout for regularization. 101 | 102 | Layer 2: Dense layer with 64 neurons and ReLU activation. 103 | 104 | Output Layer: Dense layer with the same number of neurons as input features (specified by input_dim) and sigmoid activation. 105 | ``` 106 | Model Compilation: 107 | ``` 108 | Optimizer: Adam optimizer. 109 | Loss Function: Mean Squared Error (MSE) for reconstruction loss. 110 | Training: 111 | 112 | Input and Target: Scaled input data (X_scaled) used as both input and target. 113 | Epochs: 20. 114 | Batch Size: 32. 115 | ``` 116 | Anomaly Detection: 117 | ``` 118 | After training, the model calculates MSE between original data and its reconstruction. 119 | Anomaly threshold is set at the 99.9th percentile of MSE values. 120 | ``` 121 | Identifying Anomalies: 122 | ``` 123 | Data points with MSE above the threshold are considered 124 | ``` 125 | 126 | Ensemble Method: 127 | 128 | Combinig both randomforestclassifier and isolation forest. 129 | 130 | Isolation Forest (IsolationForest): 131 | ``` 132 | Isolation Forest is used for initial anomaly score estimation. 133 | contamination is set to 0.0045, and random_state is 42. 134 | Anomaly scores are predicted for data points, where -1 indicates anomalies and 1 indicates normal data. 135 | ``` 136 | Random Forest Classifier (RandomForestClassifier): 137 | ``` 138 | Random Forest Classifier refines the anomaly detection process. 139 | n_estimators is 100, and random_state is 42. 140 | It is trained on features and anomaly labels derived from the Isolation Forest. 141 | Anomaly predictions are made, and anomalies are identified where the prediction is 0 (anomaly). 142 | ``` 143 | ## Anomaly points: 144 | 145 | ```Anomaly points: 146 | No. Time Source \ 147 | 11901 11902 379.582211 2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236 148 | 19689 19690 424.147757 2405:200:1630:a03::312c:c5b3 149 | 19831 19832 424.232626 2405:200:1630:a03::312c:c5b3 150 | 20090 20091 424.369842 2405:200:1630:a03::312c:c5b3 151 | 20118 20119 424.388097 2405:200:1630:a03::312c:c5b3 152 | ... ... ... ... 153 | 100885 100886 5232.917759 2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236 154 | 100888 100889 5232.938174 2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236 155 | 100897 100898 5233.018599 2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236 156 | 115756 115757 6767.014347 2409:40f4:100b:c1b6:5a68:c2f9:5206:af46 157 | 115757 115758 6767.014478 192.168.239.25 158 | 159 | Destination Protocol Length \ 160 | 11901 2404:6800:4007:816::2002 5 1294 161 | 19689 2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236 12 2662 162 | 19831 2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236 12 2662 163 | 20090 2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236 12 2662 164 | 20118 2409:40f4:100b:c1b6:b9fb:3ec3:5675:a236 12 2662 165 | ... ... ... ... 166 | 100885 2404:6800:4007:810::2002 5 1294 167 | 100888 2404:6800:4007:810::2002 5 1294 168 | 100897 2404:6800:4007:810::2002 5 1294 169 | 115756 ff02::fb 7 221 170 | 115757 224.0.0.251 7 201 171 | 172 | Info anomaly 173 | 11901 Destination Unreachable (Port unreachable) 1 174 | 19689 Encrypted Data, Continuation Data 1 175 | 19831 Encrypted Data, Continuation Data 1 176 | 20090 Encrypted Data, Continuation Data 1 177 | 20118 Encrypted Data, Continuation Data 1 178 | ... ... ... 179 | 100885 Destination Unreachable (Port unreachable) 1 180 | 100888 Destination Unreachable (Port unreachable) 1 181 | 100897 Destination Unreachable (Port unreachable) 1 182 | 115756 Standard query 0x0000 PTR _nfs._tcp.local, "QM... 1 183 | 115757 Standard query 0x0000 PTR _nfs._tcp.local, "QM... 1 184 | ``` 185 | ## Epoch: 186 | Without Intel Extension for Tensorflow: 187 | ![image](https://github.com/nb0309/Network-Traffic-Analysis-using-Machine-learning/assets/93106796/f985f3b9-d78f-472d-8ef4-99b9871a1f66) 188 | 189 | 190 | With using Intel Extension for Tensorflow: 191 | ![image](https://github.com/nb0309/Network-Traffic-Analysis-using-Machine-learning/assets/93106796/f6c8fe7f-30cd-4412-bf53-8e341ba3a609) 192 | 193 | 194 | 195 | ## Contributors: 196 | 197 | - [@navabhaarathi](https://github.com/nb0309) 198 | - [@balasuriya](https://github.com/balasuriyaranganathan/balasuriyaranganathan) 199 | 200 | 201 | ## Acknowledgements 202 | 203 | - [Computer Network Intrusion and anomaly detection](https://www.hindawi.com/journals/misy/2022/6576023/) 204 | - [Intel Distribution for Python](https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html) 205 | - [Intel oneAPI](https://www.oneapi.io/) 206 | - [Wireshark](https://www.wiresharp.org/) 207 | --------------------------------------------------------------------------------