├── .bashrc ├── set_path.sh ├── pipeline.pkl ├── kddcup.data_10_percent ├── requirements.txt ├── LICENSE ├── anomaly_detection.py ├── .github └── workflows │ └── python-app.yml ├── templates └── index.html ├── app.py ├── inspect_pipeline.py └── README.md /.bashrc: -------------------------------------------------------------------------------- 1 | source ~/network-anomaly-detection/set_path.sh 2 | -------------------------------------------------------------------------------- /set_path.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | export PATH="$HOME/.local/bin:$PATH" 3 | -------------------------------------------------------------------------------- /pipeline.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/webpro255/network-anomaly-detection/HEAD/pipeline.pkl -------------------------------------------------------------------------------- /kddcup.data_10_percent: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:f8c8267ebcd9c0ed1fd7d6277fe5bfff8732e9b7db8e61b873542b2a534b6f9a 3 | size 74889749 4 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | Flask==3.0.3 2 | pandas==2.2.2 3 | scikit-learn==1.5.0 4 | joblib==1.4.2 5 | blinker==1.8.2 6 | click==8.1.7 7 | itsdangerous==2.2.0 8 | Jinja2==3.1.4 9 | numpy==2.0.0 10 | python-dateutil==2.9.0.post0 11 | pytz==2024.1 12 | scipy==1.13.1 13 | threadpoolctl==3.5.0 14 | tzdata==2024.1 15 | Werkzeug==3.0.3 16 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 David Grice 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /anomaly_detection.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import joblib 3 | 4 | # Load the trained pipeline 5 | pipeline = joblib.load('pipeline.pkl') 6 | 7 | def detect_anomalies(data): 8 | try: 9 | df = pd.DataFrame(data) 10 | 11 | # Preprocess data 12 | categorical_columns = ['column2', 'column3'] # Update with actual categorical columns 13 | numeric_columns = ['column1'] # Update with actual numeric columns 14 | 15 | df[categorical_columns] = df[categorical_columns].astype(str) 16 | df[numeric_columns] = pd.to_numeric(df[numeric_columns]) 17 | 18 | preprocessed_data = pipeline.named_steps['preprocessor'].transform(df) 19 | predictions = pipeline.named_steps['classifier'].predict(preprocessed_data) 20 | 21 | return predictions 22 | except Exception as e: 23 | raise ValueError(f"Error during preprocessing or prediction: {e}") 24 | 25 | # Example usage 26 | if __name__ == '__main__': 27 | sample_data = [{'column1': 0, 'column2': '0', 'column3': '0'}] # Update with actual sample data 28 | try: 29 | result = detect_anomalies(sample_data) 30 | print("Predictions:", result) 31 | except ValueError as e: 32 | print(e) 33 | 34 | -------------------------------------------------------------------------------- /.github/workflows/python-app.yml: -------------------------------------------------------------------------------- 1 | name: Python application 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | pull_request: 8 | branches: 9 | - main 10 | 11 | jobs: 12 | build: 13 | 14 | runs-on: ubuntu-latest 15 | 16 | steps: 17 | - name: Checkout code 18 | uses: actions/checkout@v2 19 | 20 | - name: Set up Python 21 | uses: actions/setup-python@v2 22 | with: 23 | python-version: '3.10' 24 | 25 | - name: Install dependencies 26 | run: | 27 | python -m pip install --upgrade pip 28 | pip install flask pandas scikit-learn joblib 29 | 30 | - name: Run Flask server in background 31 | run: | 32 | nohup python3 app.py & 33 | sleep 5 # Give the server time to start 34 | 35 | - name: Run tests 36 | run: | 37 | # Add your test commands here 38 | curl -X POST http://127.0.0.1:5000/predict -H "Content-Type: application/json" -d '[{"0": 0, "1": "0", "2": "0", "3": "0", "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0, "11": 0, "12": 0, "13": 0, "14": 0, "15": 0, "16": 0, "17": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0, "25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, "31": 0, "32": 0, "33": 0, "34": 0, "35": 0, "36": 0, "37": 0, "38": 0, "39": 0, "40": 0}]' 39 | 40 | - name: Kill Flask server 41 | run: | 42 | pkill -f "python3 app.py" 43 | -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Network Anomaly Detection 7 | 8 | 9 |

Network Anomaly Detection

10 |
11 | 12 |
13 | 14 |
15 |
16 | 17 | 47 | 48 | 49 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, request, jsonify, render_template 2 | import pandas as pd 3 | import joblib 4 | import numpy as np 5 | 6 | app = Flask(__name__) 7 | 8 | # Load the trained pipeline 9 | pipeline = joblib.load('pipeline.pkl') 10 | 11 | @app.route('/') 12 | def index(): 13 | return render_template('index.html') 14 | 15 | @app.route('/predict', methods=['POST']) 16 | def predict(): 17 | try: 18 | json_data = request.get_json() 19 | df = pd.DataFrame(json_data) 20 | 21 | # Ensure all column names are strings 22 | df.columns = df.columns.astype(str) 23 | 24 | # Define the expected column names based on the training data 25 | expected_columns = [str(i) for i in range(41)] 26 | 27 | # Ensure the input data has all expected columns, adding missing ones with default values 28 | for col in expected_columns: 29 | if col not in df.columns: 30 | df[col] = 0 31 | 32 | # Keep only the expected columns 33 | df = df[expected_columns] 34 | 35 | # Define categorical and numeric columns 36 | categorical_columns = ['1', '2', '3'] 37 | numeric_columns = [col for col in df.columns if col not in categorical_columns] 38 | 39 | # Convert categorical columns to string type and fill NaNs with a placeholder 40 | df[categorical_columns] = df[categorical_columns].astype(str).fillna('missing') 41 | 42 | # Convert numeric columns to numeric type and fill NaNs with 0 43 | df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric, errors='coerce').fillna(0) 44 | 45 | # Inspect the preprocessor step in the pipeline 46 | preprocessor = pipeline.named_steps['preprocessor'] 47 | onehot = preprocessor.named_transformers_['cat'] 48 | 49 | # Ensure input matches the categories in the encoder 50 | for i, col in enumerate(categorical_columns): 51 | df[col] = df[col].apply(lambda x: x if x in onehot.categories_[i] else 'missing') 52 | 53 | # Check if 'missing' is a known category in each categorical column 54 | for i, col in enumerate(categorical_columns): 55 | if 'missing' not in onehot.categories_[i]: 56 | onehot.categories_[i] = np.append(onehot.categories_[i], 'missing') 57 | 58 | # Process the data through the preprocessor 59 | preprocessed_data = preprocessor.transform(df) 60 | 61 | # Predict using the classifier step 62 | predictions = pipeline.named_steps['model'].predict(preprocessed_data) 63 | 64 | return jsonify({'prediction': predictions.tolist()}) 65 | except Exception as e: 66 | return jsonify({'error': str(e)}) 67 | 68 | if __name__ == '__main__': 69 | app.run(debug=True, host='0.0.0.0') 70 | -------------------------------------------------------------------------------- /inspect_pipeline.py: -------------------------------------------------------------------------------- 1 | import joblib 2 | import pandas as pd 3 | import numpy as np 4 | 5 | # Load the trained pipeline 6 | try: 7 | pipeline = joblib.load('pipeline.pkl') 8 | print("Pipeline loaded successfully.") 9 | except Exception as e: 10 | print(f"Error loading pipeline: {e}") 11 | 12 | # Sample data (the same as used in your curl request) 13 | data = [{'0': 0, '1': '0', '2': '0', '3': '0', '4': 0, '5': 0, '6': 0, '7': 0, '8': 0, '9': 0, 14 | '10': 0, '11': 0, '12': 0, '13': 0, '14': 0, '15': 0, '16': 0, '17': 0, '18': 0, 15 | '19': 0, '20': 0, '21': 0, '22': 0, '23': 0, '24': 0, '25': 0, '26': 0, '27': 0, 16 | '28': 0, '29': 0, '30': 0, '31': 0, '32': 0, '33': 0, '34': 0, '35': 0, '36': 0, 17 | '37': 0, '38': 0, '39': 0, '40': 0}] 18 | 19 | # Convert data to DataFrame 20 | df = pd.DataFrame(data) 21 | print(f"DataFrame created: {df}") 22 | print(f"DataFrame dtypes before conversion: {df.dtypes}") 23 | 24 | # Define categorical and numeric columns 25 | categorical_columns = ['1', '2', '3'] 26 | numeric_columns = [col for col in df.columns if col not in categorical_columns] 27 | 28 | # Convert categorical columns to string 29 | df[categorical_columns] = df[categorical_columns].astype(str) 30 | print(f"Categorical columns converted to string:\n{df[categorical_columns].dtypes}") 31 | 32 | # Convert numeric columns to numeric type (int) 33 | df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric) 34 | print(f"Numeric columns converted to numeric:\n{df[numeric_columns].dtypes}") 35 | 36 | # Check for NaN values 37 | if df.isnull().values.any(): 38 | print(f"DataFrame contains NaN values:\n{df[df.isnull().any(axis=1)]}") 39 | raise ValueError("Input data contains NaN values after conversion") 40 | 41 | # Reset column names to be consistent with the pipeline's expectations 42 | df.columns = df.columns.astype(str) 43 | print(f"DataFrame with reset column names:\n{df.head()}") 44 | 45 | # Convert DataFrame to numpy array 46 | array_data = df.to_numpy() 47 | print(f"Numpy array data:\n{array_data}") 48 | print(f"Numpy array dtypes: {array_data.dtype}") 49 | 50 | # Inspect the preprocessor step in the pipeline 51 | preprocessor = pipeline.named_steps['preprocessor'] 52 | print(f"Preprocessor steps: {preprocessor}") 53 | 54 | # Handle OneHotEncoder categories 55 | onehot = preprocessor.named_transformers_['cat'] 56 | print(f"OneHotEncoder categories: {onehot.categories_}") 57 | 58 | # Ensure input matches the categories in the encoder 59 | for i, col in enumerate(categorical_columns): 60 | if df[col].values[0] not in onehot.categories_[i]: 61 | df[col] = onehot.categories_[i][0] # Replace with a known category 62 | 63 | # Process the data through the preprocessor 64 | try: 65 | preprocessed_data = preprocessor.transform(df) 66 | print(f"Preprocessed data:\n{preprocessed_data}") 67 | except Exception as e: 68 | print(f"Error in preprocessor: {e}") 69 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Network Anomaly Detection 2 | ![Python version](https://img.shields.io/badge/python-3.10+-blue) 3 | ![GitHub Actions](https://github.com/webpro255/network-anomaly-detection/actions/workflows/python-app.yml/badge.svg) 4 | ![GitHub last commit](https://img.shields.io/github/last-commit/webpro255/network-anomaly-detection) 5 | ![GitHub issues](https://img.shields.io/github/issues/webpro255/network-anomaly-detection) 6 | [![MIT License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/webpro255/network-anomaly-detection/blob/main/LICENSE) 7 | ![GitHub contributors](https://img.shields.io/github/contributors/webpro255/network-anomaly-detection) 8 | 9 | Welcome to the Network Anomaly Detection project! This repository showcases a practical application of machine learning in cybersecurity by monitoring and detecting unusual activities in a network. 10 | 11 | ## Features 12 | - **Real-time Anomaly Detection**: Monitors network traffic and identifies anomalies in real-time using machine learning. 13 | - **REST API**: Provides a RESTful API for easy integration and real-time anomaly detection. 14 | - **Extensible Design**: Easily adaptable to different network environments and customizable for various use cases. 15 | 16 | ## Tools and Technologies 17 | - **Python**: The core programming language used in the project. 18 | - **Flask**: A micro web framework for building the REST API. 19 | - **Scikit-learn**: For training the anomaly detection model. 20 | - **Pandas**: For data manipulation and preprocessing. 21 | - **Joblib**: For saving and loading machine learning models. 22 | - **Wireshark/tcpdump**: For capturing network traffic data. 23 | 24 | ## Use Cases 25 | ### Home Network Monitoring 26 | Monitor and analyze traffic in your home network to detect unusual activities, such as unauthorized access attempts or unusual data transfers. 27 | 28 | ### Small Business Network Security 29 | Deploy in a small business environment to enhance network security by identifying potential threats and anomalies in network traffic. 30 | 31 | ### Educational Tool 32 | Serve as an educational tool for students and professionals learning about network security and machine learning applications in cybersecurity. 33 | 34 | ## Getting Started 35 | 36 | ### 1. Clone the Repository 37 | ```sh 38 | git clone https://github.com/webpro255/network-anomaly-detection.git 39 | cd network-anomaly-detection 40 | ``` 41 | ### 2. Install Dependencies 42 | 43 | Ensure you have Python 3.10+ installed, then install the necessary Python packages: 44 | ```sh 45 | pip install -r requirements.txt 46 | ``` 47 | ### 3. Run the Flask Application 48 | 49 | Start the Flask application to serve the REST API: 50 | ```sh 51 | python3 app.py 52 | ``` 53 | ### 4. Test the Application 54 | 55 | Use the following curl command to test the API with sample data: 56 | ```sh 57 | curl -X POST http://:5000/predict -H "Content-Type: application/json" -d '[{"src_ip": "192.168.0.1", "dest_ip": "192.168.0.2", "src_port": 1234, "dest_port": 80, "protocol": "TCP", "packet_size": 100}]' 58 | ``` 59 | ### API Endpoints 60 | `/` 61 | Returns a welcome message for the application. 62 | `/predict` 63 | Accepts JSON data representing network traffic and returns a prediction of anomalies. Expects data in the following format: 64 | ```json 65 | [ 66 | { 67 | "src_ip": "192.168.0.1", 68 | "dest_ip": "192.168.0.2", 69 | "src_port": 1234, 70 | "dest_port": 80, 71 | "protocol": "TCP", 72 | "packet_size": 100 73 | } 74 | ] 75 | ``` 76 | ### Project Structure 77 | - **app.py**: The main Flask application file. 78 | - **anomaly_detection.py**: Contains the anomaly detection logic. 79 | - **requirements.txt**: Python dependencies. 80 | - **kddcup.data_10_percent**: Sample network traffic data (for training and testing). 81 | - **pipeline.pkl**: Serialized machine learning model. 82 | - **templates**: HTML templates for the web interface. 83 | 84 | ### Future Improvements 85 | - **Integration with Real-time Traffic Capture**: Use Wireshark or tcpdump for real-time traffic capture. 86 | - **Dashboard**: Develop a real-time dashboard for monitoring network traffic and visualizing anomalies. 87 | - **Enhanced Model**: Improve the anomaly detection model using more advanced machine learning techniques. 88 | 89 | ### Contributing 90 | We welcome contributions! Please read our contributing guidelines before making any changes. 91 | 92 | ### License 93 | This project is licensed under the MIT License. 94 | 95 | ## Follow and Support 96 | 97 | Thank you for your interest in the Network Anomaly Detection project! Your support and engagement are crucial for the continued development and improvement of this tool. Here are a few ways you can follow and support the project: 98 | 99 | ### GitHub 100 | - **Star the Repository**: If you find this project helpful, please star the repository on GitHub. This helps increase its visibility and shows your appreciation. 101 | - **Watch for Updates**: Click on the "Watch" button to get notified about updates, new features, and important discussions. 102 | - **Fork and Contribute**: If you're interested in contributing, fork the repository and submit your pull requests. We welcome contributions of all kinds, from bug fixes to new features. 103 | 104 | ### Social Media 105 | - **LinkedIn**: Connect with me on [LinkedIn](https://www.linkedin.com/in/davidgrice-cybersecurity/) for professional updates and networking. Feel free to reach out with any questions or collaboration ideas. 106 | - **Twitter**: Follow me on Twitter [@webpro25](https://twitter.com/webpro25) for the latest updates, news, and discussions related to cybersecurity and this project. Join the conversation and share your thoughts! 107 | 108 | ### Community and Feedback 109 | - **Issues and Discussions**: Open an issue on GitHub if you encounter any problems or have suggestions for improvement. Join the discussions to provide feedback and help shape the future of this project. 110 | - **Spread the Word**: Share this project with your network. Whether it's through social media, blog posts, or word of mouth, your support in spreading the word is invaluable. 111 | 112 | ### Support the Developer 113 | - **Buy Me a Coffee**: If you would like to support the development of this project financially, consider [buying me a coffee](https://www.buymeacoffee.com/webpro255). Your contributions help cover the costs of development and hosting. 114 | 115 | Your support is greatly appreciated and helps ensure the continued success and improvement of the Network Anomaly Detection project. Thank you for being part of this journey! 116 | --------------------------------------------------------------------------------