├── .bashrc
├── set_path.sh
├── pipeline.pkl
├── kddcup.data_10_percent
├── requirements.txt
├── LICENSE
├── anomaly_detection.py
├── .github
    └── workflows
    │   └── python-app.yml
├── templates
    └── index.html
├── app.py
├── inspect_pipeline.py
└── README.md


/.bashrc:
--------------------------------------------------------------------------------
1 | source ~/network-anomaly-detection/set_path.sh
2 | 


--------------------------------------------------------------------------------
/set_path.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | export PATH="$HOME/.local/bin:$PATH"
3 | 


--------------------------------------------------------------------------------
/pipeline.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/webpro255/network-anomaly-detection/HEAD/pipeline.pkl


--------------------------------------------------------------------------------
/kddcup.data_10_percent:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:f8c8267ebcd9c0ed1fd7d6277fe5bfff8732e9b7db8e61b873542b2a534b6f9a
3 | size 74889749
4 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | Flask==3.0.3
 2 | pandas==2.2.2
 3 | scikit-learn==1.5.0
 4 | joblib==1.4.2
 5 | blinker==1.8.2
 6 | click==8.1.7
 7 | itsdangerous==2.2.0
 8 | Jinja2==3.1.4
 9 | numpy==2.0.0
10 | python-dateutil==2.9.0.post0
11 | pytz==2024.1
12 | scipy==1.13.1
13 | threadpoolctl==3.5.0
14 | tzdata==2024.1
15 | Werkzeug==3.0.3
16 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 David Grice
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/anomaly_detection.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import joblib
 3 | 
 4 | # Load the trained pipeline
 5 | pipeline = joblib.load('pipeline.pkl')
 6 | 
 7 | def detect_anomalies(data):
 8 |     try:
 9 |         df = pd.DataFrame(data)
10 | 
11 |         # Preprocess data
12 |         categorical_columns = ['column2', 'column3']  # Update with actual categorical columns
13 |         numeric_columns = ['column1']  # Update with actual numeric columns
14 | 
15 |         df[categorical_columns] = df[categorical_columns].astype(str)
16 |         df[numeric_columns] = pd.to_numeric(df[numeric_columns])
17 | 
18 |         preprocessed_data = pipeline.named_steps['preprocessor'].transform(df)
19 |         predictions = pipeline.named_steps['classifier'].predict(preprocessed_data)
20 | 
21 |         return predictions
22 |     except Exception as e:
23 |         raise ValueError(f"Error during preprocessing or prediction: {e}")
24 | 
25 | # Example usage
26 | if __name__ == '__main__':
27 |     sample_data = [{'column1': 0, 'column2': '0', 'column3': '0'}]  # Update with actual sample data
28 |     try:
29 |         result = detect_anomalies(sample_data)
30 |         print("Predictions:", result)
31 |     except ValueError as e:
32 |         print(e)
33 | 
34 | 


--------------------------------------------------------------------------------
/.github/workflows/python-app.yml:
--------------------------------------------------------------------------------
 1 | name: Python application
 2 | 
 3 | on:
 4 |   push:
 5 |     branches:
 6 |       - main
 7 |   pull_request:
 8 |     branches:
 9 |       - main
10 | 
11 | jobs:
12 |   build:
13 | 
14 |     runs-on: ubuntu-latest
15 | 
16 |     steps:
17 |     - name: Checkout code
18 |       uses: actions/checkout@v2
19 | 
20 |     - name: Set up Python
21 |       uses: actions/setup-python@v2
22 |       with:
23 |         python-version: '3.10'
24 | 
25 |     - name: Install dependencies
26 |       run: |
27 |         python -m pip install --upgrade pip
28 |         pip install flask pandas scikit-learn joblib
29 | 
30 |     - name: Run Flask server in background
31 |       run: |
32 |         nohup python3 app.py &
33 |         sleep 5  # Give the server time to start
34 | 
35 |     - name: Run tests
36 |       run: |
37 |         # Add your test commands here
38 |         curl -X POST http://127.0.0.1:5000/predict -H "Content-Type: application/json" -d '[{"0": 0, "1": "0", "2": "0", "3": "0", "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0, "11": 0, "12": 0, "13": 0, "14": 0, "15": 0, "16": 0, "17": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0, "25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, "31": 0, "32": 0, "33": 0, "34": 0, "35": 0, "36": 0, "37": 0, "38": 0, "39": 0, "40": 0}]'
39 | 
40 |     - name: Kill Flask server
41 |       run: |
42 |         pkill -f "python3 app.py"
43 | 


--------------------------------------------------------------------------------
/templates/index.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en">
 3 | <head>
 4 |     <meta charset="UTF-8">
 5 |     <meta name="viewport" content="width=device-width, initial-scale=1.0">
 6 |     <title>Network Anomaly Detection</title>
 7 | </head>
 8 | <body>
 9 |     <h1>Network Anomaly Detection</h1>
10 |     <form id="predictionForm">
11 |         <label for="data">Enter Data:</label>
12 |         <textarea id="data" name="data" rows="10" cols="50"></textarea><br>
13 |         <button type="submit">Predict</button>
14 |     </form>
15 |     <div id="result"></div>
16 | 
17 |     <script>
18 |         document.getElementById('predictionForm').addEventListener('submit', function(event) {
19 |             event.preventDefault();
20 | 
21 |             const dataInput = document.getElementById('data').value;
22 |             let jsonData;
23 |             try {
24 |                 jsonData = JSON.parse(dataInput);
25 |             } catch (e) {
26 |                 document.getElementById('result').innerText = 'Invalid JSON format';
27 |                 return;
28 |             }
29 | 
30 |             fetch('/predict', {
31 |                 method: 'POST',
32 |                 headers: {
33 |                     'Content-Type': 'application/json'
34 |                 },
35 |                 body: JSON.stringify(jsonData)
36 |             })
37 |             .then(response => response.json())
38 |             .then(data => {
39 |                 document.getElementById('result').innerText = JSON.stringify(data);
40 |             })
41 |             .catch(error => {
42 |                 console.error('Error:', error);
43 |                 document.getElementById('result').innerText = 'An error occurred';
44 |             });
45 |         });
46 |     </script>
47 | </body>
48 | </html>
49 | 


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
 1 | from flask import Flask, request, jsonify, render_template
 2 | import pandas as pd
 3 | import joblib
 4 | import numpy as np
 5 | 
 6 | app = Flask(__name__)
 7 | 
 8 | # Load the trained pipeline
 9 | pipeline = joblib.load('pipeline.pkl')
10 | 
11 | @app.route('/')
12 | def index():
13 |     return render_template('index.html')
14 | 
15 | @app.route('/predict', methods=['POST'])
16 | def predict():
17 |     try:
18 |         json_data = request.get_json()
19 |         df = pd.DataFrame(json_data)
20 | 
21 |         # Ensure all column names are strings
22 |         df.columns = df.columns.astype(str)
23 | 
24 |         # Define the expected column names based on the training data
25 |         expected_columns = [str(i) for i in range(41)]
26 | 
27 |         # Ensure the input data has all expected columns, adding missing ones with default values
28 |         for col in expected_columns:
29 |             if col not in df.columns:
30 |                 df[col] = 0
31 | 
32 |         # Keep only the expected columns
33 |         df = df[expected_columns]
34 | 
35 |         # Define categorical and numeric columns
36 |         categorical_columns = ['1', '2', '3']
37 |         numeric_columns = [col for col in df.columns if col not in categorical_columns]
38 | 
39 |         # Convert categorical columns to string type and fill NaNs with a placeholder
40 |         df[categorical_columns] = df[categorical_columns].astype(str).fillna('missing')
41 | 
42 |         # Convert numeric columns to numeric type and fill NaNs with 0
43 |         df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric, errors='coerce').fillna(0)
44 | 
45 |         # Inspect the preprocessor step in the pipeline
46 |         preprocessor = pipeline.named_steps['preprocessor']
47 |         onehot = preprocessor.named_transformers_['cat']
48 | 
49 |         # Ensure input matches the categories in the encoder
50 |         for i, col in enumerate(categorical_columns):
51 |             df[col] = df[col].apply(lambda x: x if x in onehot.categories_[i] else 'missing')
52 | 
53 |         # Check if 'missing' is a known category in each categorical column
54 |         for i, col in enumerate(categorical_columns):
55 |             if 'missing' not in onehot.categories_[i]:
56 |                 onehot.categories_[i] = np.append(onehot.categories_[i], 'missing')
57 | 
58 |         # Process the data through the preprocessor
59 |         preprocessed_data = preprocessor.transform(df)
60 | 
61 |         # Predict using the classifier step
62 |         predictions = pipeline.named_steps['model'].predict(preprocessed_data)
63 | 
64 |         return jsonify({'prediction': predictions.tolist()})
65 |     except Exception as e:
66 |         return jsonify({'error': str(e)})
67 | 
68 | if __name__ == '__main__':
69 |     app.run(debug=True, host='0.0.0.0')
70 | 


--------------------------------------------------------------------------------
/inspect_pipeline.py:
--------------------------------------------------------------------------------
 1 | import joblib
 2 | import pandas as pd
 3 | import numpy as np
 4 | 
 5 | # Load the trained pipeline
 6 | try:
 7 |     pipeline = joblib.load('pipeline.pkl')
 8 |     print("Pipeline loaded successfully.")
 9 | except Exception as e:
10 |     print(f"Error loading pipeline: {e}")
11 | 
12 | # Sample data (the same as used in your curl request)
13 | data = [{'0': 0, '1': '0', '2': '0', '3': '0', '4': 0, '5': 0, '6': 0, '7': 0, '8': 0, '9': 0,
14 |          '10': 0, '11': 0, '12': 0, '13': 0, '14': 0, '15': 0, '16': 0, '17': 0, '18': 0,
15 |          '19': 0, '20': 0, '21': 0, '22': 0, '23': 0, '24': 0, '25': 0, '26': 0, '27': 0,
16 |          '28': 0, '29': 0, '30': 0, '31': 0, '32': 0, '33': 0, '34': 0, '35': 0, '36': 0,
17 |          '37': 0, '38': 0, '39': 0, '40': 0}]
18 | 
19 | # Convert data to DataFrame
20 | df = pd.DataFrame(data)
21 | print(f"DataFrame created: {df}")
22 | print(f"DataFrame dtypes before conversion: {df.dtypes}")
23 | 
24 | # Define categorical and numeric columns
25 | categorical_columns = ['1', '2', '3']
26 | numeric_columns = [col for col in df.columns if col not in categorical_columns]
27 | 
28 | # Convert categorical columns to string
29 | df[categorical_columns] = df[categorical_columns].astype(str)
30 | print(f"Categorical columns converted to string:\n{df[categorical_columns].dtypes}")
31 | 
32 | # Convert numeric columns to numeric type (int)
33 | df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric)
34 | print(f"Numeric columns converted to numeric:\n{df[numeric_columns].dtypes}")
35 | 
36 | # Check for NaN values
37 | if df.isnull().values.any():
38 |     print(f"DataFrame contains NaN values:\n{df[df.isnull().any(axis=1)]}")
39 |     raise ValueError("Input data contains NaN values after conversion")
40 | 
41 | # Reset column names to be consistent with the pipeline's expectations
42 | df.columns = df.columns.astype(str)
43 | print(f"DataFrame with reset column names:\n{df.head()}")
44 | 
45 | # Convert DataFrame to numpy array
46 | array_data = df.to_numpy()
47 | print(f"Numpy array data:\n{array_data}")
48 | print(f"Numpy array dtypes: {array_data.dtype}")
49 | 
50 | # Inspect the preprocessor step in the pipeline
51 | preprocessor = pipeline.named_steps['preprocessor']
52 | print(f"Preprocessor steps: {preprocessor}")
53 | 
54 | # Handle OneHotEncoder categories
55 | onehot = preprocessor.named_transformers_['cat']
56 | print(f"OneHotEncoder categories: {onehot.categories_}")
57 | 
58 | # Ensure input matches the categories in the encoder
59 | for i, col in enumerate(categorical_columns):
60 |     if df[col].values[0] not in onehot.categories_[i]:
61 |         df[col] = onehot.categories_[i][0]  # Replace with a known category
62 | 
63 | # Process the data through the preprocessor
64 | try:
65 |     preprocessed_data = preprocessor.transform(df)
66 |     print(f"Preprocessed data:\n{preprocessed_data}")
67 | except Exception as e:
68 |     print(f"Error in preprocessor: {e}")
69 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Network Anomaly Detection
  2 | ![Python version](https://img.shields.io/badge/python-3.10+-blue)
  3 | ![GitHub Actions](https://github.com/webpro255/network-anomaly-detection/actions/workflows/python-app.yml/badge.svg)
  4 | ![GitHub last commit](https://img.shields.io/github/last-commit/webpro255/network-anomaly-detection)
  5 | ![GitHub issues](https://img.shields.io/github/issues/webpro255/network-anomaly-detection)
  6 | [![MIT License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/webpro255/network-anomaly-detection/blob/main/LICENSE)
  7 | ![GitHub contributors](https://img.shields.io/github/contributors/webpro255/network-anomaly-detection)
  8 | 
  9 | Welcome to the Network Anomaly Detection project! This repository showcases a practical application of machine learning in cybersecurity by monitoring and detecting unusual activities in a network. 
 10 | 
 11 | ## Features
 12 | - **Real-time Anomaly Detection**: Monitors network traffic and identifies anomalies in real-time using machine learning.
 13 | - **REST API**: Provides a RESTful API for easy integration and real-time anomaly detection.
 14 | - **Extensible Design**: Easily adaptable to different network environments and customizable for various use cases.
 15 | 
 16 | ## Tools and Technologies
 17 | - **Python**: The core programming language used in the project.
 18 | - **Flask**: A micro web framework for building the REST API.
 19 | - **Scikit-learn**: For training the anomaly detection model.
 20 | - **Pandas**: For data manipulation and preprocessing.
 21 | - **Joblib**: For saving and loading machine learning models.
 22 | - **Wireshark/tcpdump**: For capturing network traffic data.
 23 | 
 24 | ## Use Cases
 25 | ### Home Network Monitoring
 26 | Monitor and analyze traffic in your home network to detect unusual activities, such as unauthorized access attempts or unusual data transfers.
 27 | 
 28 | ### Small Business Network Security
 29 | Deploy in a small business environment to enhance network security by identifying potential threats and anomalies in network traffic.
 30 | 
 31 | ### Educational Tool
 32 | Serve as an educational tool for students and professionals learning about network security and machine learning applications in cybersecurity.
 33 | 
 34 | ## Getting Started
 35 | 
 36 | ### 1. Clone the Repository
 37 | ```sh
 38 | git clone https://github.com/webpro255/network-anomaly-detection.git
 39 | cd network-anomaly-detection
 40 | ```
 41 | ### 2. Install Dependencies
 42 | 
 43 | Ensure you have Python 3.10+ installed, then install the necessary Python packages:
 44 | ```sh
 45 | pip install -r requirements.txt
 46 | ```
 47 | ### 3. Run the Flask Application
 48 | 
 49 | Start the Flask application to serve the REST API:
 50 | ```sh
 51 | python3 app.py
 52 | ```
 53 | ### 4. Test the Application
 54 | 
 55 | Use the following curl command to test the API with sample data:
 56 | ```sh
 57 | curl -X POST http://<your-server-ip>:5000/predict -H "Content-Type: application/json" -d '[{"src_ip": "192.168.0.1", "dest_ip": "192.168.0.2", "src_port": 1234, "dest_port": 80, "protocol": "TCP", "packet_size": 100}]'
 58 | ```
 59 | ### API Endpoints
 60 | `/`
 61 | Returns a welcome message for the application.
 62 | `/predict`
 63 | Accepts JSON data representing network traffic and returns a prediction of anomalies. Expects data in the following format:
 64 | ```json
 65 | [
 66 |     {
 67 |         "src_ip": "192.168.0.1",
 68 |         "dest_ip": "192.168.0.2",
 69 |         "src_port": 1234,
 70 |         "dest_port": 80,
 71 |         "protocol": "TCP",
 72 |         "packet_size": 100
 73 |     }
 74 | ]
 75 | ```
 76 | ### Project Structure
 77 | - **app.py**: The main Flask application file.
 78 | - **anomaly_detection.py**: Contains the anomaly detection logic.
 79 | - **requirements.txt**: Python dependencies.
 80 | - **kddcup.data_10_percent**: Sample network traffic data (for training and testing).
 81 | - **pipeline.pkl**: Serialized machine learning model.
 82 | - **templates**: HTML templates for the web interface.
 83 | 
 84 | ### Future Improvements
 85 | - **Integration with Real-time Traffic Capture**: Use Wireshark or tcpdump for real-time traffic capture.
 86 | - **Dashboard**: Develop a real-time dashboard for monitoring network traffic and visualizing anomalies.
 87 | - **Enhanced Model**: Improve the anomaly detection model using more advanced machine learning techniques.
 88 | 
 89 | ### Contributing
 90 | We welcome contributions! Please read our contributing guidelines before making any changes.
 91 | 
 92 | ### License
 93 | This project is licensed under the MIT License.
 94 | 
 95 | ## Follow and Support
 96 | 
 97 | Thank you for your interest in the Network Anomaly Detection project! Your support and engagement are crucial for the continued development and improvement of this tool. Here are a few ways you can follow and support the project:
 98 | 
 99 | ### GitHub
100 | - **Star the Repository**: If you find this project helpful, please star the repository on GitHub. This helps increase its visibility and shows your appreciation.
101 | - **Watch for Updates**: Click on the "Watch" button to get notified about updates, new features, and important discussions.
102 | - **Fork and Contribute**: If you're interested in contributing, fork the repository and submit your pull requests. We welcome contributions of all kinds, from bug fixes to new features.
103 | 
104 | ### Social Media
105 | - **LinkedIn**: Connect with me on [LinkedIn](https://www.linkedin.com/in/davidgrice-cybersecurity/) for professional updates and networking. Feel free to reach out with any questions or collaboration ideas.
106 | - **Twitter**: Follow me on Twitter [@webpro25](https://twitter.com/webpro25) for the latest updates, news, and discussions related to cybersecurity and this project. Join the conversation and share your thoughts!
107 | 
108 | ### Community and Feedback
109 | - **Issues and Discussions**: Open an issue on GitHub if you encounter any problems or have suggestions for improvement. Join the discussions to provide feedback and help shape the future of this project.
110 | - **Spread the Word**: Share this project with your network. Whether it's through social media, blog posts, or word of mouth, your support in spreading the word is invaluable.
111 | 
112 | ### Support the Developer
113 | - **Buy Me a Coffee**: If you would like to support the development of this project financially, consider [buying me a coffee](https://www.buymeacoffee.com/webpro255). Your contributions help cover the costs of development and hosting.
114 | 
115 | Your support is greatly appreciated and helps ensure the continued success and improvement of the Network Anomaly Detection project. Thank you for being part of this journey!
116 | 


--------------------------------------------------------------------------------