├── .bashrc
├── set_path.sh
├── pipeline.pkl
├── kddcup.data_10_percent
├── requirements.txt
├── LICENSE
├── anomaly_detection.py
├── .github
└── workflows
│ └── python-app.yml
├── templates
└── index.html
├── app.py
├── inspect_pipeline.py
└── README.md
/.bashrc:
--------------------------------------------------------------------------------
1 | source ~/network-anomaly-detection/set_path.sh
2 |
--------------------------------------------------------------------------------
/set_path.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | export PATH="$HOME/.local/bin:$PATH"
3 |
--------------------------------------------------------------------------------
/pipeline.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/webpro255/network-anomaly-detection/HEAD/pipeline.pkl
--------------------------------------------------------------------------------
/kddcup.data_10_percent:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:f8c8267ebcd9c0ed1fd7d6277fe5bfff8732e9b7db8e61b873542b2a534b6f9a
3 | size 74889749
4 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | Flask==3.0.3
2 | pandas==2.2.2
3 | scikit-learn==1.5.0
4 | joblib==1.4.2
5 | blinker==1.8.2
6 | click==8.1.7
7 | itsdangerous==2.2.0
8 | Jinja2==3.1.4
9 | numpy==2.0.0
10 | python-dateutil==2.9.0.post0
11 | pytz==2024.1
12 | scipy==1.13.1
13 | threadpoolctl==3.5.0
14 | tzdata==2024.1
15 | Werkzeug==3.0.3
16 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2024 David Grice
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/anomaly_detection.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import joblib
3 |
4 | # Load the trained pipeline
5 | pipeline = joblib.load('pipeline.pkl')
6 |
7 | def detect_anomalies(data):
8 | try:
9 | df = pd.DataFrame(data)
10 |
11 | # Preprocess data
12 | categorical_columns = ['column2', 'column3'] # Update with actual categorical columns
13 | numeric_columns = ['column1'] # Update with actual numeric columns
14 |
15 | df[categorical_columns] = df[categorical_columns].astype(str)
16 | df[numeric_columns] = pd.to_numeric(df[numeric_columns])
17 |
18 | preprocessed_data = pipeline.named_steps['preprocessor'].transform(df)
19 | predictions = pipeline.named_steps['classifier'].predict(preprocessed_data)
20 |
21 | return predictions
22 | except Exception as e:
23 | raise ValueError(f"Error during preprocessing or prediction: {e}")
24 |
25 | # Example usage
26 | if __name__ == '__main__':
27 | sample_data = [{'column1': 0, 'column2': '0', 'column3': '0'}] # Update with actual sample data
28 | try:
29 | result = detect_anomalies(sample_data)
30 | print("Predictions:", result)
31 | except ValueError as e:
32 | print(e)
33 |
34 |
--------------------------------------------------------------------------------
/.github/workflows/python-app.yml:
--------------------------------------------------------------------------------
1 | name: Python application
2 |
3 | on:
4 | push:
5 | branches:
6 | - main
7 | pull_request:
8 | branches:
9 | - main
10 |
11 | jobs:
12 | build:
13 |
14 | runs-on: ubuntu-latest
15 |
16 | steps:
17 | - name: Checkout code
18 | uses: actions/checkout@v2
19 |
20 | - name: Set up Python
21 | uses: actions/setup-python@v2
22 | with:
23 | python-version: '3.10'
24 |
25 | - name: Install dependencies
26 | run: |
27 | python -m pip install --upgrade pip
28 | pip install flask pandas scikit-learn joblib
29 |
30 | - name: Run Flask server in background
31 | run: |
32 | nohup python3 app.py &
33 | sleep 5 # Give the server time to start
34 |
35 | - name: Run tests
36 | run: |
37 | # Add your test commands here
38 | curl -X POST http://127.0.0.1:5000/predict -H "Content-Type: application/json" -d '[{"0": 0, "1": "0", "2": "0", "3": "0", "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0, "11": 0, "12": 0, "13": 0, "14": 0, "15": 0, "16": 0, "17": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0, "25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, "31": 0, "32": 0, "33": 0, "34": 0, "35": 0, "36": 0, "37": 0, "38": 0, "39": 0, "40": 0}]'
39 |
40 | - name: Kill Flask server
41 | run: |
42 | pkill -f "python3 app.py"
43 |
--------------------------------------------------------------------------------
/templates/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | Network Anomaly Detection
7 |
8 |
9 | Network Anomaly Detection
10 |
15 |
16 |
17 |
47 |
48 |
49 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | from flask import Flask, request, jsonify, render_template
2 | import pandas as pd
3 | import joblib
4 | import numpy as np
5 |
6 | app = Flask(__name__)
7 |
8 | # Load the trained pipeline
9 | pipeline = joblib.load('pipeline.pkl')
10 |
11 | @app.route('/')
12 | def index():
13 | return render_template('index.html')
14 |
15 | @app.route('/predict', methods=['POST'])
16 | def predict():
17 | try:
18 | json_data = request.get_json()
19 | df = pd.DataFrame(json_data)
20 |
21 | # Ensure all column names are strings
22 | df.columns = df.columns.astype(str)
23 |
24 | # Define the expected column names based on the training data
25 | expected_columns = [str(i) for i in range(41)]
26 |
27 | # Ensure the input data has all expected columns, adding missing ones with default values
28 | for col in expected_columns:
29 | if col not in df.columns:
30 | df[col] = 0
31 |
32 | # Keep only the expected columns
33 | df = df[expected_columns]
34 |
35 | # Define categorical and numeric columns
36 | categorical_columns = ['1', '2', '3']
37 | numeric_columns = [col for col in df.columns if col not in categorical_columns]
38 |
39 | # Convert categorical columns to string type and fill NaNs with a placeholder
40 | df[categorical_columns] = df[categorical_columns].astype(str).fillna('missing')
41 |
42 | # Convert numeric columns to numeric type and fill NaNs with 0
43 | df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric, errors='coerce').fillna(0)
44 |
45 | # Inspect the preprocessor step in the pipeline
46 | preprocessor = pipeline.named_steps['preprocessor']
47 | onehot = preprocessor.named_transformers_['cat']
48 |
49 | # Ensure input matches the categories in the encoder
50 | for i, col in enumerate(categorical_columns):
51 | df[col] = df[col].apply(lambda x: x if x in onehot.categories_[i] else 'missing')
52 |
53 | # Check if 'missing' is a known category in each categorical column
54 | for i, col in enumerate(categorical_columns):
55 | if 'missing' not in onehot.categories_[i]:
56 | onehot.categories_[i] = np.append(onehot.categories_[i], 'missing')
57 |
58 | # Process the data through the preprocessor
59 | preprocessed_data = preprocessor.transform(df)
60 |
61 | # Predict using the classifier step
62 | predictions = pipeline.named_steps['model'].predict(preprocessed_data)
63 |
64 | return jsonify({'prediction': predictions.tolist()})
65 | except Exception as e:
66 | return jsonify({'error': str(e)})
67 |
68 | if __name__ == '__main__':
69 | app.run(debug=True, host='0.0.0.0')
70 |
--------------------------------------------------------------------------------
/inspect_pipeline.py:
--------------------------------------------------------------------------------
1 | import joblib
2 | import pandas as pd
3 | import numpy as np
4 |
5 | # Load the trained pipeline
6 | try:
7 | pipeline = joblib.load('pipeline.pkl')
8 | print("Pipeline loaded successfully.")
9 | except Exception as e:
10 | print(f"Error loading pipeline: {e}")
11 |
12 | # Sample data (the same as used in your curl request)
13 | data = [{'0': 0, '1': '0', '2': '0', '3': '0', '4': 0, '5': 0, '6': 0, '7': 0, '8': 0, '9': 0,
14 | '10': 0, '11': 0, '12': 0, '13': 0, '14': 0, '15': 0, '16': 0, '17': 0, '18': 0,
15 | '19': 0, '20': 0, '21': 0, '22': 0, '23': 0, '24': 0, '25': 0, '26': 0, '27': 0,
16 | '28': 0, '29': 0, '30': 0, '31': 0, '32': 0, '33': 0, '34': 0, '35': 0, '36': 0,
17 | '37': 0, '38': 0, '39': 0, '40': 0}]
18 |
19 | # Convert data to DataFrame
20 | df = pd.DataFrame(data)
21 | print(f"DataFrame created: {df}")
22 | print(f"DataFrame dtypes before conversion: {df.dtypes}")
23 |
24 | # Define categorical and numeric columns
25 | categorical_columns = ['1', '2', '3']
26 | numeric_columns = [col for col in df.columns if col not in categorical_columns]
27 |
28 | # Convert categorical columns to string
29 | df[categorical_columns] = df[categorical_columns].astype(str)
30 | print(f"Categorical columns converted to string:\n{df[categorical_columns].dtypes}")
31 |
32 | # Convert numeric columns to numeric type (int)
33 | df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric)
34 | print(f"Numeric columns converted to numeric:\n{df[numeric_columns].dtypes}")
35 |
36 | # Check for NaN values
37 | if df.isnull().values.any():
38 | print(f"DataFrame contains NaN values:\n{df[df.isnull().any(axis=1)]}")
39 | raise ValueError("Input data contains NaN values after conversion")
40 |
41 | # Reset column names to be consistent with the pipeline's expectations
42 | df.columns = df.columns.astype(str)
43 | print(f"DataFrame with reset column names:\n{df.head()}")
44 |
45 | # Convert DataFrame to numpy array
46 | array_data = df.to_numpy()
47 | print(f"Numpy array data:\n{array_data}")
48 | print(f"Numpy array dtypes: {array_data.dtype}")
49 |
50 | # Inspect the preprocessor step in the pipeline
51 | preprocessor = pipeline.named_steps['preprocessor']
52 | print(f"Preprocessor steps: {preprocessor}")
53 |
54 | # Handle OneHotEncoder categories
55 | onehot = preprocessor.named_transformers_['cat']
56 | print(f"OneHotEncoder categories: {onehot.categories_}")
57 |
58 | # Ensure input matches the categories in the encoder
59 | for i, col in enumerate(categorical_columns):
60 | if df[col].values[0] not in onehot.categories_[i]:
61 | df[col] = onehot.categories_[i][0] # Replace with a known category
62 |
63 | # Process the data through the preprocessor
64 | try:
65 | preprocessed_data = preprocessor.transform(df)
66 | print(f"Preprocessed data:\n{preprocessed_data}")
67 | except Exception as e:
68 | print(f"Error in preprocessor: {e}")
69 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Network Anomaly Detection
2 | 
3 | 
4 | 
5 | 
6 | [](https://github.com/webpro255/network-anomaly-detection/blob/main/LICENSE)
7 | 
8 |
9 | Welcome to the Network Anomaly Detection project! This repository showcases a practical application of machine learning in cybersecurity by monitoring and detecting unusual activities in a network.
10 |
11 | ## Features
12 | - **Real-time Anomaly Detection**: Monitors network traffic and identifies anomalies in real-time using machine learning.
13 | - **REST API**: Provides a RESTful API for easy integration and real-time anomaly detection.
14 | - **Extensible Design**: Easily adaptable to different network environments and customizable for various use cases.
15 |
16 | ## Tools and Technologies
17 | - **Python**: The core programming language used in the project.
18 | - **Flask**: A micro web framework for building the REST API.
19 | - **Scikit-learn**: For training the anomaly detection model.
20 | - **Pandas**: For data manipulation and preprocessing.
21 | - **Joblib**: For saving and loading machine learning models.
22 | - **Wireshark/tcpdump**: For capturing network traffic data.
23 |
24 | ## Use Cases
25 | ### Home Network Monitoring
26 | Monitor and analyze traffic in your home network to detect unusual activities, such as unauthorized access attempts or unusual data transfers.
27 |
28 | ### Small Business Network Security
29 | Deploy in a small business environment to enhance network security by identifying potential threats and anomalies in network traffic.
30 |
31 | ### Educational Tool
32 | Serve as an educational tool for students and professionals learning about network security and machine learning applications in cybersecurity.
33 |
34 | ## Getting Started
35 |
36 | ### 1. Clone the Repository
37 | ```sh
38 | git clone https://github.com/webpro255/network-anomaly-detection.git
39 | cd network-anomaly-detection
40 | ```
41 | ### 2. Install Dependencies
42 |
43 | Ensure you have Python 3.10+ installed, then install the necessary Python packages:
44 | ```sh
45 | pip install -r requirements.txt
46 | ```
47 | ### 3. Run the Flask Application
48 |
49 | Start the Flask application to serve the REST API:
50 | ```sh
51 | python3 app.py
52 | ```
53 | ### 4. Test the Application
54 |
55 | Use the following curl command to test the API with sample data:
56 | ```sh
57 | curl -X POST http://:5000/predict -H "Content-Type: application/json" -d '[{"src_ip": "192.168.0.1", "dest_ip": "192.168.0.2", "src_port": 1234, "dest_port": 80, "protocol": "TCP", "packet_size": 100}]'
58 | ```
59 | ### API Endpoints
60 | `/`
61 | Returns a welcome message for the application.
62 | `/predict`
63 | Accepts JSON data representing network traffic and returns a prediction of anomalies. Expects data in the following format:
64 | ```json
65 | [
66 | {
67 | "src_ip": "192.168.0.1",
68 | "dest_ip": "192.168.0.2",
69 | "src_port": 1234,
70 | "dest_port": 80,
71 | "protocol": "TCP",
72 | "packet_size": 100
73 | }
74 | ]
75 | ```
76 | ### Project Structure
77 | - **app.py**: The main Flask application file.
78 | - **anomaly_detection.py**: Contains the anomaly detection logic.
79 | - **requirements.txt**: Python dependencies.
80 | - **kddcup.data_10_percent**: Sample network traffic data (for training and testing).
81 | - **pipeline.pkl**: Serialized machine learning model.
82 | - **templates**: HTML templates for the web interface.
83 |
84 | ### Future Improvements
85 | - **Integration with Real-time Traffic Capture**: Use Wireshark or tcpdump for real-time traffic capture.
86 | - **Dashboard**: Develop a real-time dashboard for monitoring network traffic and visualizing anomalies.
87 | - **Enhanced Model**: Improve the anomaly detection model using more advanced machine learning techniques.
88 |
89 | ### Contributing
90 | We welcome contributions! Please read our contributing guidelines before making any changes.
91 |
92 | ### License
93 | This project is licensed under the MIT License.
94 |
95 | ## Follow and Support
96 |
97 | Thank you for your interest in the Network Anomaly Detection project! Your support and engagement are crucial for the continued development and improvement of this tool. Here are a few ways you can follow and support the project:
98 |
99 | ### GitHub
100 | - **Star the Repository**: If you find this project helpful, please star the repository on GitHub. This helps increase its visibility and shows your appreciation.
101 | - **Watch for Updates**: Click on the "Watch" button to get notified about updates, new features, and important discussions.
102 | - **Fork and Contribute**: If you're interested in contributing, fork the repository and submit your pull requests. We welcome contributions of all kinds, from bug fixes to new features.
103 |
104 | ### Social Media
105 | - **LinkedIn**: Connect with me on [LinkedIn](https://www.linkedin.com/in/davidgrice-cybersecurity/) for professional updates and networking. Feel free to reach out with any questions or collaboration ideas.
106 | - **Twitter**: Follow me on Twitter [@webpro25](https://twitter.com/webpro25) for the latest updates, news, and discussions related to cybersecurity and this project. Join the conversation and share your thoughts!
107 |
108 | ### Community and Feedback
109 | - **Issues and Discussions**: Open an issue on GitHub if you encounter any problems or have suggestions for improvement. Join the discussions to provide feedback and help shape the future of this project.
110 | - **Spread the Word**: Share this project with your network. Whether it's through social media, blog posts, or word of mouth, your support in spreading the word is invaluable.
111 |
112 | ### Support the Developer
113 | - **Buy Me a Coffee**: If you would like to support the development of this project financially, consider [buying me a coffee](https://www.buymeacoffee.com/webpro255). Your contributions help cover the costs of development and hosting.
114 |
115 | Your support is greatly appreciated and helps ensure the continued success and improvement of the Network Anomaly Detection project. Thank you for being part of this journey!
116 |
--------------------------------------------------------------------------------