├── .dockerignore
├── .github
├── ISSUE_TEMPLATE
│ ├── add-new-model.md
│ ├── bug_report.md
│ └── feature_request.md
├── PULL_REQUEST_TEMPLATE.md
├── pull_request.md
└── workflows
│ └── validate-pr.yaml
├── .gitignore
├── App.py
├── Dockerfile
├── LICENSE
├── assets
└── images
│ └── machine_learning.png
├── contributing.md
├── dev-requirements.txt
├── docker-compose.debug.yml
├── docker-compose.yml
├── docs
├── project-structure.md
└── tutorial.md
├── form_configs
├── business_performance_forecasting.json
├── credit_card_fraud.json
├── customer_income.json
├── gold_price_prediction.json
├── house_price.json
├── insurance_cost_predictor.json
├── loan_eligibility.json
├── parkinson_detection.json
├── sleep_prediction.json
└── stress_detection.json
├── form_handler.py
├── machine-learning.gif
├── models
├── PDF_malware_detection
│ ├── Data
│ │ └── PDFMalware2022.csv
│ ├── model.ipynb
│ ├── pdf_extraction.py
│ ├── predict.py
│ └── saved_models
│ │ └── random_forest_model.pkl
├── business_performance_forecasting
│ ├── data
│ │ └── 50_Startups.csv
│ ├── model.py
│ ├── notebooks
│ │ └── business_performance_forecasting.ipynb
│ ├── predict.py
│ └── saved_models
│ │ ├── evaluation_results.pkl
│ │ ├── model.pkl
│ │ └── scaler.pkl
├── credit_card_fraud
│ ├── data
│ │ └── creditcardcsvpresent.csv
│ ├── model.py
│ ├── modelEvaluation.py
│ ├── predict.py
│ └── saved_models
│ │ └── creditCardFraud_svc_model.pkl
├── customer_income
│ ├── data
│ │ └── train.csv
│ ├── model.py
│ ├── notebooks
│ │ └── customer_income.ipynb
│ ├── predict.py
│ └── saved_models
│ │ ├── CImodel.pkl
│ │ ├── CIscaler.pkl
│ │ └── feature_names.pkl
├── gold_price_prediction
│ ├── data
│ │ └── gold_price_data.csv
│ ├── model.py
│ ├── notebooks
│ │ └── Gold_Price_Prediction.ipynb
│ ├── predict.py
│ └── saved_models
│ │ └── random_forest_model.joblib
├── house_price
│ ├── ImprovedModel.py
│ ├── ModelEvaluation.py
│ ├── data
│ │ └── housing.csv
│ ├── model.py
│ ├── predict.py
│ └── saved_models
│ │ ├── model_01.pkl
│ │ ├── model_02.pkl
│ │ ├── scaler_01.pkl
│ │ └── scaler_02.pkl
├── insurance_cost_predictor
│ ├── data
│ │ └── insurance.csv
│ ├── model.py
│ ├── notebooks
│ │ └── ins.ipynb
│ ├── predict.py
│ └── saved_models
│ │ └── insurance_model.pkl
├── loan_eligibility
│ ├── model.py
│ └── predict.py
├── parkinson_disease_detector
│ ├── data
│ │ └── dataset.csv
│ ├── notebooks
│ │ └── Parkinson's_Disease.ipynb
│ ├── parkinson_model.py
│ ├── parkinson_predict.py
│ └── saved_models
│ │ ├── MinMaxScaler.sav
│ │ └── Model_Prediction.sav
├── sleep_disorder_predictor
│ ├── data
│ │ └── dataset.csv
│ ├── model.py
│ ├── notebooks
│ │ └── sleep-disorder.ipynb
│ ├── predict.py
│ └── saved_models
│ │ ├── Model_Prediction.sav
│ │ └── preprocessor.sav
├── stress_level_detect
│ ├── model.py
│ ├── notebooks
│ │ └── stress_level_detection.ipynb
│ ├── predict.py
│ └── saved_models
│ │ └── random_forest_model.joblib
├── text_sumarization
│ └── predict.py
└── translator_app
│ ├── README.md
│ ├── assets
│ └── styles.css
│ ├── translation.py
│ └── utils.py
├── packages.txt
├── page_handler.py
├── pages
├── Business_Performance_Forecasting.py
├── Credit_Card_Fraud_Estimator.py
├── Customer_Income_Estimator.py
├── Gold_Price_Predictor.py
├── House_Price_Estimator.py
├── Insurance_Cost_Predictor.py
├── Loan_Eligibility_Estimator.py
├── PDF_Malware_Detection.py
├── Parkinson_Disease_Detector.py
├── Sleep_Disorder_Predictor.py
├── Stress_Level_Detector.py
├── Text Summarizer.py
├── Translator.py
└── pages.json
├── readme.md
├── requirements.txt
└── todo.md
/.dockerignore:
--------------------------------------------------------------------------------
1 | **/__pycache__
2 | **/.venv
3 | **/.classpath
4 | **/.dockerignore
5 | **/.env
6 | **/.git
7 | **/.gitignore
8 | **/.project
9 | **/.settings
10 | **/.toolstarget
11 | **/.vs
12 | **/.vscode
13 | **/*.*proj.user
14 | **/*.dbmdl
15 | **/*.jfm
16 | **/bin
17 | **/charts
18 | **/docker-compose*
19 | **/compose*
20 | **/Dockerfile*
21 | **/node_modules
22 | **/npm-debug.log
23 | **/obj
24 | **/secrets.dev.yaml
25 | **/values.dev.yaml
26 | LICENSE
27 | README.md
28 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/add-new-model.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Add New Model
3 | about: Add a new model to the project
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | 🔍 **Problem Description**:
11 |
12 |
13 | 🧠 **Model Description**:
14 |
15 |
16 | ⏲️ **Estimated Time for Completion**:
17 |
18 |
19 | 🎯 **Expected Outcome**:
20 |
21 |
22 | 📄 **Additional Context**:
23 |
24 |
25 | **To be Mentioned while taking the issue**:
26 | - What is your participant role?
27 |
28 |
29 | **Note:**
30 | - Please review the project documentation and ensure your code aligns with the project structure.
31 | - Please ensure that either the `predict.py` file includes a properly implemented `model_details()` function or the notebook contains this function to print a detailed model report. The model will not be accepted without this function in place, as it is essential for generating the necessary model details.
32 | - Prefer using a new branch to resolve the issue, as it helps keep the main branch stable and makes it easier to manage and review your changes.
33 | - Strictly use the pull request template provided in the repository to create a pull request.
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Bug report
3 | about: Create a report to help us improve
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | 🐞 **Describe the Bug**:
11 |
12 |
13 | 🔁 **To Reproduce**:
14 |
19 |
20 | 💡 **Expected Behavior**:
21 |
22 |
23 | 🖥️ **Device Information**:
24 |
29 |
30 | 📸 **Screenshots**:
31 |
32 |
33 | 📄 **Additional Context**:
34 |
35 |
36 | **To be Mentioned while taking the issue**:
37 | - What is your participant role?
38 |
39 |
40 | **Note:**
41 | - Please review the project documentation and ensure your code aligns with the project structure. If applicable, consider adding a `model_details` function for additional insights.
42 | - Prefer using a new branch to resolve the issue, as it helps keep the main branch stable and makes it easier to manage and review your changes.
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Feature request
3 | about: Suggest an idea for this project
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | 🌟 **Is your feature request related to a problem?**:
11 |
12 |
13 | 💡 **Describe the solution you'd like**:
14 |
15 |
16 | 🔀 **Describe alternatives considered**:
17 |
18 |
19 | 📄 **Additional Context**:
20 |
21 |
22 | **To be Mentioned while taking the issue**:
23 | - What is your participant role?
24 |
25 |
26 | **Note:**
27 | - Please review the project documentation and ensure your code aligns with the project structure.
28 | - Please ensure that either the `predict.py` file includes a properly implemented `model_details()` function or the notebook contains this function to print a detailed model report. The model will not be accepted without this function in place, as it is essential for generating the necessary model details.
29 | - Prefer using a new branch to resolve the issue, as it helps keep the main branch stable and makes it easier to manage and review your changes.
30 | - Strictly use the pull request template provided in the repository to create a pull request.
--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | ### Description
2 |
3 |
4 |
5 | ### Issue Resolved
6 |
7 |
8 |
9 | ### Changes Made
10 |
11 |
12 |
13 | ### Screenshots or Videos
14 |
15 |
16 |
17 | ### Additional Details
18 |
19 |
20 |
21 |
22 | ### Checklist
23 |
24 | - [ ] My code follows the current [project structure](https://github.com/yashasvini121/predictive-calc/blob/master/docs/project-structure.md)
25 | - [ ] I have thoroughly reviewed and updated the `requirements.txt` file to include any new packages
26 | - [ ] The `predict.py` file includes a properly implemented `model_details()` function, or the notebook contains this function to print a detailed model report. **The model will not be accepted without this function**, as it is essential for generating the necessary model details.
27 | - [ ] I have added relevant tests (if necessary).
28 | - [ ] I have added comments in the code where needed.
29 | - [ ] This PR is submitted under **Hacktoberfest**.
30 | - [ ] This PR is submitted under **GirlScript Summer of Code (GSSoC-Extd)**.
--------------------------------------------------------------------------------
/.github/pull_request.md:
--------------------------------------------------------------------------------
1 | ### Description
2 |
3 |
4 |
5 | ### Issue Resolved
6 |
7 |
8 |
9 | ### Changes Made
10 |
11 |
12 |
13 | ### Screenshots or Videos
14 |
15 |
16 |
17 | ### Additional Details
18 |
19 |
20 |
21 |
22 | ### Checklist
23 |
24 | - [ ] My code follows the current [project structure](https://github.com/yashasvini121/predictive-calc/blob/master/docs/project-structure.md)
25 | - [ ] I have thoroughly reviewed and updated the `requirements.txt` file to include any new packages
26 | - [ ] The `predict.py` file includes a properly implemented `model_details()` function, or the notebook contains this function to print a detailed model report. **The model will not be accepted without this function**, as it is essential for generating the necessary model details.
27 | - [ ] I have added relevant tests (if necessary).
28 | - [ ] I have added comments in the code where needed.
29 | - [ ] This PR is submitted under **Hacktoberfest**.
30 | - [ ] This PR is submitted under **GirlScript Summer of Code (GSSoC-Extd)**.
--------------------------------------------------------------------------------
/.github/workflows/validate-pr.yaml:
--------------------------------------------------------------------------------
1 | name: Validate PR
2 |
3 | on:
4 | pull_request:
5 | branches: [ master ]
6 | types: [opened, synchronize, reopened, ready_for_review]
7 |
8 | jobs:
9 | build:
10 |
11 | runs-on: ubuntu-latest
12 |
13 | strategy:
14 | matrix:
15 | python-version: [3.12]
16 |
17 | steps:
18 | - name: Checkout repository
19 | uses: actions/checkout@v3
20 |
21 | - name: Install Ubuntu packages
22 | run: |
23 | sudo apt-get update
24 | sudo xargs -a packages.txt apt-get install -y
25 | shell: bash
26 |
27 | - name: Set up Python ${{ matrix.python-version }}
28 | uses: actions/setup-python@v4
29 | with:
30 | python-version: ${{ matrix.python-version }}
31 | architecture: 'x64'
32 |
33 | - name: Install Python dependencies
34 | run: |
35 | python -m pip install --upgrade pip
36 | pip install -r requirements.txt
37 |
38 | - name: Validate dependencies with pip-check
39 | run: |
40 | pip install pip-check
41 | pip-check
42 | continue-on-error: false # Fail the workflow if dependencies are invalid
43 |
44 | - name: Test Streamlit App
45 | run: |
46 | pip install streamlit
47 | streamlit run App.py --server.headless true --browser.gatherUsageStats false &
48 | sleep 10 # Wait for the app to start
49 | curl --retry 5 --retry-delay 5 http://localhost:8501
50 | env:
51 | STREAMLIT_SERVER_HEADLESS: true
52 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | *.log
3 |
--------------------------------------------------------------------------------
/App.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 |
3 | st.set_page_config(page_title="Predictive Calc - Machine Learning Models", page_icon="🤖")
4 |
5 | st.title("Welcome to Predictive Calc!")
6 |
7 | st.markdown("""
8 | ## Explore Cutting-Edge Machine Learning Models
9 | **Predictive Calc** offers a powerful suite of machine learning models designed to assist you in making informed decisions. Whether it's predicting house prices, determining loan eligibility, or evaluating health risks, we have you covered.
10 | """)
11 |
12 | st.markdown("""
13 | ## Why Choose Predictive Calc? """)
14 | features = [
15 | {
16 | "title": "Accurate Predictions",
17 | "icon": "🔍",
18 | "description": "Harness cutting-edge machine learning algorithms that provide reliable and precise predictions."
19 | },
20 | {
21 | "title": "User-Friendly Interface",
22 | "icon": "💻",
23 | "description": "Enjoy a seamless, intuitive experience with models designed for practical applications across various domains."
24 | },
25 | {
26 | "title": "Comprehensive Calculators",
27 | "icon": "📊",
28 | "description": "Access a diverse set of models for financial analysis, health assessments, security checks, and more, all in one place."
29 | },
30 | {
31 | "title": "Health & Financial Insights",
32 | "icon": "🏥💰",
33 | "description": "From estimating house prices and checking loan eligibility to evaluating health risks like Parkinson’s and stress levels, Predictive Calc offers essential tools for everyday decision-making."
34 | },
35 | {
36 | "title": "Enhanced Document Analysis & Language Tools",
37 | "icon": "📄🌐",
38 | "description": "With a built-in text summarizer and translator, streamline your reading experience and break language barriers effortlessly. PDF Malware Detection also helps keep your documents safe."
39 | }
40 | ]
41 |
42 | # Display features in a structured card format
43 | for feature in features:
44 | st.markdown(
45 | f"""
46 |
56 |
{feature["icon"]}
57 |
58 |
{feature["title"]}
59 |
{feature["description"]}
60 |
61 |
62 | """,
63 | unsafe_allow_html=True
64 | )
65 |
66 | st.markdown("---")
67 | st.markdown("## Available Calculators")
68 |
69 | # Calculator information in a structured format
70 | calculators = [
71 | {
72 | "name": "Income Estimator",
73 | "description": "Estimate the annual income based on socio-economic and demographic information.",
74 | "details": """
75 | This calculator uses demographic and socio-economic variables to predict income level, providing insights into income patterns.
76 | """
77 | },
78 | {
79 | "name": "Gold Price Predictor",
80 | "description": "Predict future gold prices using various financial metrics.",
81 | "details": """
82 | ### Introduction
83 | The Gold Price Predictor leverages financial metrics and machine learning algorithms to forecast the price of gold (GLD). Gold prices are influenced by various economic factors, and this tool aims to provide accurate predictions based on historical data.
84 |
85 | ### Gold Price Dataset
86 | The dataset used for this model contains daily financial data, including stock market indices, commodity prices, and currency exchange rates. The goal is to predict the gold price (GLD) using features such as the S&P 500 Index (SPX), crude oil price (USO), silver price (SLV), and the EUR/USD exchange rate.
87 |
88 | ### Additional Variable Information
89 | - **SPX**: The S&P 500 index value, which tracks the performance of 500 large companies listed on stock exchanges in the United States.
90 | - **USO**: The price of United States Oil Fund (USO), which reflects crude oil prices.
91 | - **SLV**: The price of iShares Silver Trust (SLV), which reflects silver prices.
92 | - **EUR/USD**: The Euro-to-U.S. Dollar exchange rate, which indicates the strength of the euro relative to the U.S. dollar.
93 | - **GLD**: The price of SPDR Gold Shares (GLD), which is the target variable representing gold prices.
94 | """
95 | },
96 | {
97 | "name": "House Price Estimator",
98 | "description": "Predict the price of a house based on various features.",
99 | "details": """
100 | Using historical and current market data, this tool predicts the house price based on features like location, size, and amenities.
101 | """
102 | },
103 | {
104 | "name": "Loan Eligibility",
105 | "description": "Check your eligibility for various types of loans based on your financial profile.",
106 | "details": """
107 | This calculator assesses loan eligibility by analyzing credit scores, income, and other relevant financial details.
108 | """
109 | },
110 | {
111 | "name": "Parkinson's Disease",
112 | "description": "Assess your risk of Parkinson's Disease with advanced ML algorithms.",
113 | "details": """
114 | ### Introduction
115 | Parkinson's disease (PD) is a progressive neurodegenerative disorder that primarily affects movement. It often starts with subtle symptoms such as tremors, stiffness, and slow movement.
116 |
117 | ### Oxford Parkinson's Disease Detection Dataset (UCI ML Repository)
118 | The dataset contains biomedical voice measurements from 31 people, 23 of whom have Parkinson's disease (PD). The main goal is to differentiate between healthy individuals and those with PD using the "status" column, where 0 indicates healthy and 1 indicates PD.
119 |
120 | ### Additional Variable Information
121 | - **MDVP_Fo(Hz)**: Average vocal fundamental frequency.
122 | - **MDVP_Fhi(Hz)**: Maximum vocal fundamental frequency.
123 | - **MDVP_Flo(Hz)**: Minimum vocal fundamental frequency.
124 | - **MDVP_Jitter(%)**, **MDVP_Jitter(Abs)**, **MDVP_RAP**, **MDVP_PPQ**, **Jitter_DDP**: Measures of variation in fundamental frequency.
125 | - **MDVP_Shimmer**, **MDVP_Shimmer(dB)**, **Shimmer_APQ3**, **Shimmer_APQ5**, **MDVP_APQ**, **Shimmer_DDA**: Measures of variation in amplitude.
126 | - **NHR**, **HNR**: Noise-to-tonal ratio measures in the voice.
127 | - **status**: Health status of the subject (1 - Parkinson's, 0 - healthy).
128 | - **RPDE**, **D2**: Nonlinear dynamical complexity measures.
129 | - **DFA**: Signal fractal scaling exponent.
130 | - **spread1**, **spread2**, **PPE**: Nonlinear measures of fundamental frequency variation.
131 | """
132 | },
133 | {
134 | "name": "Sleep Disorder Prediction",
135 | "description": "Assess your risk of developing a sleep disorder using advanced ML algorithms.",
136 | "details": """
137 | ### Introduction
138 | Sleep disorders can have a significant impact on an individual's overall health and well-being. These disorders often result from a combination of poor sleep habits, lifestyle factors, stress, and underlying medical conditions.
139 |
140 | ### Sleep Health and Lifestyle Dataset
141 | The dataset consists of sleep, lifestyle, and health metrics collected from 400 individuals. The main goal is to predict the likelihood of an individual having a sleep disorder using the "Sleep Disorder" column, which contains categorical values indicating the presence or absence of a sleep disorder.
142 |
143 | """
144 | },
145 | {
146 | "name": "PDF Malware Detector",
147 | "description": "Identify and alert users about potential malware in PDF files.",
148 | "details": """
149 | ### Overview
150 | The PDF Malware Detector scans uploaded PDF files for malicious content, ensuring user safety and data protection.
151 |
152 | ### Key Features
153 | - **File Upload**: Simple drag-and-drop interface for easy file submission.
154 | - **Malware Detection**: Comprehensive analysis to detect harmful elements within PDFs.
155 | - **File Size Limit**: Supports files up to 200MB.
156 |
157 | ### Use Cases
158 | Perfect for users needing to verify the integrity of PDF documents before opening or sharing.
159 | """
160 | },
161 | {
162 | "name": "Stress Level Detector",
163 | "description": "Analyze your mental stress levels based on social media interactions.",
164 | "details": """
165 | The model uses text analysis on social media data to identify signs of stress, helping users understand their mental health patterns.
166 | """
167 | },
168 | {
169 | "name": "Text Summarizer",
170 | "description": "Save time with concise, professional summaries of lengthy texts.",
171 | "details": """
172 | Generate quick and comprehensive summaries of lengthy documents, ideal for students, researchers, and professionals.
173 | """
174 | },
175 | {
176 | "name": "Real-Time Language Translator",
177 | "description": "Translate spoken language into other languages instantly for seamless communication.",
178 | "details": """
179 | ### Overview
180 | The Real-Time Language Translator uses advanced speech recognition and NLP to provide immediate translations between languages, enhancing communication in diverse settings.
181 |
182 | ### Key Features
183 | - **Instant Translation**: Real-time spoken language translation.
184 | - **Multiple Languages**: Supports a variety of source and target languages.
185 | - **User-Friendly Interface**: Easy to navigate for all users.
186 |
187 | ### Use Cases
188 | Ideal for travel, business meetings, and language learning, breaking down language barriers effortlessly.
189 | """
190 | },
191 | {
192 | "name": "Business Performance Forecaster",
193 | "description": "Forecast business profits based on various investment areas for better financial planning and budget allocation.",
194 | "details": """
195 | ### Overview
196 | The Business Performance Forecaster predicts company profit based on investment in R&D, administration, and marketing, using machine learning to analyze investment patterns and optimize budget allocation.
197 |
198 | ### Key Features
199 | - **Profit Prediction**: Provides an estimated profit based on investment data.
200 | - **Investment Analysis**: Evaluates how different spending areas impact overall profit.
201 | - **Multi-Input Support**: Accounts for multiple variables like R&D, administration, and marketing expenses.
202 |
203 | ### Use Cases
204 | Useful for companies looking to plan budgets, assess the impact of investments, and improve decision-making processes in financial forecasting.
205 | """
206 | }
207 | ]
208 |
209 | # Define shades of blue for calculators
210 | blue_shades = [
211 | "#D1E8E2", # Light Blue
212 | "#A0D6E0", # Soft Blue
213 | "#7FB3E8", # Sky Blue
214 | ]
215 |
216 | # Display calculators in a table layout with two columns per row
217 | for i in range(0, len(calculators), 2):
218 | cols = st.columns(2)
219 | for j, col in enumerate(cols):
220 | if i + j < len(calculators):
221 | calc = calculators[i + j]
222 | # Use modulo to cycle through the blue shades
223 | color_index = (i + j) % len(blue_shades)
224 | color = blue_shades[color_index]
225 | with col:
226 | # Styled container for heading and description with different blue shades
227 | st.markdown(
228 | f"""
229 |
235 |
{calc['name']}
236 |
{calc['description']}
237 |
238 | """,
239 | unsafe_allow_html=True
240 | )
241 | # More Info expander
242 | with st.expander("More Info"):
243 | st.write(calc["details"])
244 | st.markdown("---")
245 |
246 | st.markdown("## Get Started Today!")
247 | st.markdown("Explore our calculators and take control of your predictive analytics journey!")
248 |
249 | st.write("Developed with ❤️ by Yashasvini Sharma | [Github](https://www.github.com/yashasvini121) | [LinkedIn](https://www.linkedin.com/in/yashasvini121/)")
250 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | # app/Dockerfile
2 |
3 | FROM python:3.10-slim
4 |
5 | WORKDIR /app
6 |
7 | RUN apt-get update && apt-get install -y \
8 | build-essential \
9 | portaudio19-dev \
10 | curl \
11 | software-properties-common \
12 | git \
13 | && rm -rf /var/lib/apt/lists/*
14 |
15 | # RUN git clone https://github.com/streamlit/streamlit-example.git .
16 |
17 | COPY requirements.txt /app/
18 |
19 | RUN pip3 install -r requirements.txt
20 |
21 | COPY . /app
22 |
23 | EXPOSE 8501
24 |
25 | HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
26 |
27 | ENTRYPOINT ["streamlit", "run", "App.py"]
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2024 Yashasvini Sharma
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/assets/images/machine_learning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/assets/images/machine_learning.png
--------------------------------------------------------------------------------
/contributing.md:
--------------------------------------------------------------------------------
1 | # How to Contribute:
2 | 1. Select an area of interest from the sections below.
3 | 2. Fork the repository and create a new branch for your contribution.
4 | 3. Implement your changes and submit a pull request with a clear description.
5 | 4. You can also create issues to discuss new ideas, suggest features, or report bugs.
6 | 5. Alternatively, review existing issues and contribute towards resolving them.
7 |
8 | ### Frontend Development (UI/UX Enhancements)
9 | - Help improve the design, responsiveness, and user experience of the web interface.
10 | - Key areas for enhancement include form layouts, interaction feedback, accessibility features, and mobile responsiveness.
11 |
12 | ### Machine Learning Contributions
13 | - Expand the scope of the project by adding new machine learning models for different prediction use cases.
14 | - **Notebook Contributions**: Share your model via a Jupyter notebook under the `models//notebooks/` directory.
15 | - **Full Model Integration**: Submit fully integrated models with optimized parameters, preprocessing steps, and final outputs.
16 | - You can also contribute by optimizing existing models, tuning hyperparameters, or improving dataset handling for better performance.
17 |
18 | ### Backend Development & System Integration
19 | - Help integrate new or existing machine learning models into the application’s backend using Python APIs.
20 | - Enhance the system's performance, develop API endpoints, and improve data handling capabilities for larger datasets.
21 |
22 | ### Documentation & Tutorials
23 | - Improve the project's documentation to help new contributors understand the structure and flow of the application.
24 | - Create and share tutorials or example use cases on building and integrating custom models into the system.
25 |
26 | ### Testing & Deployment
27 | - Contribute towards testing the application, prefer using pytest etc.
28 |
29 | ### Logging & Monitoring
30 | - Implement logging in the project
31 |
--------------------------------------------------------------------------------
/dev-requirements.txt:
--------------------------------------------------------------------------------
1 | # Development dependencies for the project.
2 | # Includes Jupyter and other tools used during development, but not required for the Streamlit app in production.
3 |
4 | jupyterlab
5 | notebook
6 |
7 | # jupyterlab==4.2.5
8 | # jupyterlab_pygments==0.3.0
9 | # jupyterlab_server==2.27.3
10 | # jupyterlab_widgets==3.0.13
--------------------------------------------------------------------------------
/docker-compose.debug.yml:
--------------------------------------------------------------------------------
1 | version: '3.4'
2 |
3 | services:
4 | dockerupdate:
5 | image: dockerupdate
6 | build:
7 | context: .
8 | dockerfile: ./Dockerfile
9 | command: ["sh", "-c", "pip install debugpy -t /tmp && python /tmp/debugpy --wait-for-client --listen 0.0.0.0:5678 App.py "]
10 | ports:
11 | - 5678:5678
12 |
--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
1 | version: '3'
2 |
3 | services:
4 | webapp:
5 | build:
6 | context: .
7 | dockerfile: Dockerfile # Path to the Dockerfile
8 | ports:
9 | - "8501:8501" # Expose the container's port 8501 to the host
10 | environment:
11 | - PYTHONUNBUFFERED=1 # Ensures logs are immediately flushed to stdout
12 | healthcheck:
13 | test: ["CMD", "curl", "--fail", "http://localhost:8501/_stcore/health"]
14 | interval: 30s
15 | timeout: 10s
16 | retries: 3
17 | volumes:
18 | - .:/app # Mount the current directory to /app in the container
--------------------------------------------------------------------------------
/docs/project-structure.md:
--------------------------------------------------------------------------------
1 | # Project Structure
2 |
3 | The directory layout of **Predictive Calc** is organized in a way that separates concerns between model development, frontend interaction, and configuration management. Below is a breakdown of the key folders and files:
4 |
5 | ```
6 | predictive-calc/
7 | ├── app.py # Main entry point for the Streamlit web app
8 | ├── docs/ # Documentation files
9 | │ ├── project-structure.md # Directory layout of the repository
10 | │ ├── tutorial.md # Steps to integrate a new machine learning model into the repository
11 | ├── form_configs/ # Configuration files for the forms
12 | │ ├── house_price.json # JSON configuration for house price model form input fields
13 | │ ├── loan_eligibility.json # JSON configuration for loan eligibility model form input fields
14 | │ ├── ...
15 | ├── models/ # Folder containing machine learning models
16 | │ ├── house_price/ # Example model directory (for house price prediction)
17 | │ │ ├── data/ # Datasets used by the models
18 | │ │ ├── notebooks/ # Jupyter notebooks for dataset exploration and model training
19 | │ │ │ ├── house_price.ipynb
20 | │ │ ├── saved_models/ # Serialized (pickled) model files and scalers
21 | │ │ │ ├── model.pkl # Trained model for predictions
22 | │ │ │ ├── scaler.pkl # Scaler for data normalization
23 | │ │ ├── model.py # Code to define and train the model
24 | │ │ ├── predict.py # Code to make predictions using the trained model. Contain get_prediction() function
25 | │ ├── modelEvaluation.py # Model evaluation class generates plots and metrics
26 | │ ├── ...
27 | ├── pages/ # Streamlit pages representing different calculators
28 | │ ├── pages.json # Configuration file for managing page details and settings
29 | │ ├── House_Price_Estimator.py
30 | │ ├── Loan_Eligibility_Estimator.py
31 | │ ├── ...
32 | ├── assets/
33 | │ ├── images/ # Image assets used in the Streamlit app
34 | │ │ ├── machine-learning.png
35 | │ │ ├── machine-learning.gif
36 | │ │ ├── ...
37 | ├── form_handler.py # Class to handle dynamic form generation based on JSON configs
38 | ├── page_handler.py # Class to manage the page rendering logic and handles model predictions
39 | ├── requirements.txt # List of Python dependencies required for the Streamlit App
40 | ├── dev-requirements.txt # Development dependencies, including Jupyter and tools for local use, excluding those needed for production Streamlit
41 | ├── packages.txt # List of Ubuntu packages required for Streamlit App
42 | ├── Dockerfile # Instructions for building a Docker Image
43 | ├── docker-compose.yml # To set up a docker container for streamlit
44 | ├── docker-compose.debug.yml # To debug inside the docker container using Debugpy
45 | ├── readme.md # Overview of the project and setup instructions
46 | ```
47 |
48 | ### Key Components
49 |
50 | #### 1. `app.py`
51 | This is the entry point for the **Streamlit** application. It initializes the app and renders the home page, and loads the model calculators having their pages in the `pages/` directory.
52 |
53 | #### 2. `page_handler.py`
54 | The page_handler.py file manages the rendering of pages in the Predictive Calc application by reading configurations from the pages.json file. It dynamically loads page titles, icons, and model paths, ensuring a smooth user experience. The class integrates model prediction logic and utilizes the `FormHandler` to generate dynamic forms based on the specified configurations. It also manages multiple tabs, enhancing functionality and allowing for easy updates or additions of new models while maintaining a cohesive interface.
55 |
56 | #### 3. `form_handler.py`
57 | This script dynamically generates the input forms based on the JSON configuration files. It maps user inputs to the model’s expected parameters and passes the data to the prediction logic.
58 |
59 | #### 4. `models/`
60 | Each model gets its own folder within the `models/` directory, which contains all necessary files for that particular model. This includes:
61 | - **notebooks/**: Jupyter notebooks for model training and experimentation.
62 | - **model.py**: Code defining and training the finally chosen machine learning model.
63 | - **predict.py**: Code to load the trained model and make predictions based on user input.
64 | - **saved_models/**: Directory where the trained model (`model.pkl`) and any preprocessing objects like scalers (`scaler.pkl`) are stored.
65 | - **data/**: Raw datasets used for training the model.
66 | - **modelEvaluation.py**: Scripts for model evaluation and reporting.
67 |
68 | #### 5. `pages/`
69 | Each model has a corresponding Streamlit page in the `pages/` folder. This page handles the frontend logic, rendering the forms for user input and displaying the prediction results. For example, the `house_price_calculator.py` page contains the interface for the house price prediction model.
70 |
71 | #### 6. `pages/pages.json`
72 | This file manages the configuration for all pages in the application, defining attributes such as titles, icons, model paths, and tab configurations for each calculator. This centralizes the configuration for easier updates and management of multiple pages.
73 |
74 | #### 7. `form_configs/`
75 | The `form_configs/` folder contains JSON configuration files that define the input fields required by each model. These JSON files dictate how the forms are dynamically generated by `form_handler.py`. For example, the `house_price.json` file specifies the input fields (e.g., square footage, number of bedrooms, etc.) needed for the house price prediction model.
76 |
77 | #### 8. `docs/`
78 | Contains the project's documentation. This is where you can find information about how the system is structured and how to contribute to the project, more specifically the `tutorial.md` and `project-stuctures.md` files.
79 |
80 | #### 9. `requirements.txt`, `dev-requirements.txt`, `packages.txt`
81 | - `requirements.txt` contains the Python dependencies required to run the Streamlit application.
82 | - `dev-requirements.txt` includes additional dependencies for development purposes, such as Jupyter notebooks and other tools.
83 | - `packages.txt` lists the Ubuntu packages required for the Streamlit application to run correctly.
84 |
85 | #### 10. `Dockerfile`, `docker-compose.yml`, `docker-compose.debug.yml`
86 | - `Dockerfile` contains the instructions for building a Docker image that can run the Streamlit application.
87 | - `docker-compose.yml` sets up a Docker container for the Streamlit application.
88 | - `docker-compose.debug.yml` allows debugging inside the Docker container using Debugpy.
89 |
90 | #### 11. `assets/`
91 | This folder contains all the assets used in the main project.
92 |
--------------------------------------------------------------------------------
/docs/tutorial.md:
--------------------------------------------------------------------------------
1 | ## How to Integrate Your Model
2 |
3 | To integrate a new machine learning model into **Predictive Calc**, follow these steps:
4 |
5 | ### 1. Create Your Model Directory
6 | Navigate to the `models/` directory and create a new folder for your model. The folder should be named based on the problem your model addresses (e.g., `models/loan_eligibility/`).
7 |
8 | Inside your folder, you’ll need the following structure:
9 | ```
10 | models/
11 | ├── loan_approval/
12 | │ ├── data/ # Folder for storing the dataset used for training
13 | │ ├── notebooks/ # Jupyter notebooks for training and experimentation
14 | │ ├── saved_models/ # Folder for saving the trained model and any scalers
15 | │ ├── model.py # Script to define and train your model
16 | │ ├── predict.py # Script to load the trained model and make predictions
17 | │ ├── modelEvaluation.py # (Optional) Script for model evaluation and testing
18 | ```
19 |
20 | ### 2. Train Your Model
21 | Train your model in a Jupyter notebook and save the final trained model as a pickle file (`model.pkl`) inside the `saved_models/` folder. If your model requires preprocessing steps like scaling or encoding, save these objects as well (e.g., `scaler.pkl`).
22 |
23 | ### 3. Define the Prediction Logic
24 | In `predict.py`, load your saved model and write the logic for making predictions. This file should accept input data (coming from the web form) and return the prediction result.
25 |
26 | ### 4. Configure the Input Form
27 | In the `form_configs/` folder, create a new JSON configuration file (e.g., `loan_eligibility.json`). This file defines the fields that will appear in the input form and maps them to the prediction model’s expected input parameters.
28 |
29 | ### 5. Configure the Page Settings
30 | In the `pages/pages.json` file, add an entry for your new model. This configuration file manages the settings for all pages in the application, including titles, icons, model paths, and tab configurations.
31 |
32 | ### 6. Add a Streamlit Page
33 | - In the `pages/` directory, create a new Python file for the web interface (e.g., `loan_approval_calculator.py`). This file will call the `page_handler.py` script to render the form and display the prediction results.
34 | - The page name on the sidebar will be same as the file name.
35 |
36 | ### 7. Update the Main App
37 | In `app.py`, update the list of available pages
38 |
39 | ### 8. Update and Test the Dependencies
40 | - If you've added or updated any dependencies (packages/modules/libraries) for your model, update the `requirements.txt` file accordingly.
41 | - After updating, test the entire project to ensure there are no version conflicts between packages. This helps maintain a stable and reproducible environment for all the contributors.
--------------------------------------------------------------------------------
/form_configs/business_performance_forecasting.json:
--------------------------------------------------------------------------------
1 | {
2 | "Business Forecast Form": {
3 | "R&D Spend": {
4 | "type": "number",
5 | "min_value": 0.0,
6 | "default_value": 100000.0,
7 | "step": 1000.0,
8 | "field_name": "RnD_Spend"
9 | },
10 | "Administration": {
11 | "type": "number",
12 | "min_value": 0.0,
13 | "default_value": 50000.0,
14 | "step": 1000.0,
15 | "field_name": "Administration"
16 | },
17 | "Marketing Spend": {
18 | "type": "number",
19 | "min_value": 0.0,
20 | "default_value": 100000.0,
21 | "step": 1000.0,
22 | "field_name": "Marketing_Spend"
23 | },
24 | "State": {
25 | "type": "dropdown",
26 | "options": ["New York", "California", "Florida"],
27 | "default_value": "New York",
28 | "field_name": "State"
29 | }
30 | }
31 | }
32 |
--------------------------------------------------------------------------------
/form_configs/credit_card_fraud.json:
--------------------------------------------------------------------------------
1 | {
2 | "Credit Card Fraud Estimator": {
3 | "Average Amount per Transaction per Day": {
4 | "type": "number",
5 | "min_value": 0,
6 | "max_value": 100000,
7 | "default_value": 100,
8 | "step": 100,
9 | "field_name": "avg_amount_per_day"
10 | },
11 | "Transaction Amount": {
12 | "type": "number",
13 | "min_value": 0,
14 | "max_value": 100000,
15 | "default_value": 3000,
16 | "step": 100,
17 | "field_name": "transaction_amount"
18 | },
19 | "Is Declined": {
20 | "type": "dropdown",
21 | "options": [
22 | "Yes",
23 | "No"
24 | ],
25 | "default_value": "No",
26 | "field_name": "Is_declined"
27 | },
28 | "Total Number of Declines per Day": {
29 | "type": "number",
30 | "min_value": 0,
31 | "max_value": 100,
32 | "default_value": 0,
33 | "step": 1,
34 | "field_name": "no_of_declines_per_day"
35 | },
36 | "Is Foreign Transaction": {
37 | "type": "dropdown",
38 | "options": [
39 | "Yes",
40 | "No"
41 | ],
42 | "default_value": "No",
43 | "field_name": "Is_Foreign_transaction"
44 | },
45 | "Is High-Risk Country": {
46 | "type": "dropdown",
47 | "options": [
48 | "Yes",
49 | "No"
50 | ],
51 | "default_value": "No",
52 | "field_name": "Is_High_Risk_country"
53 | },
54 | "Daily Chargeback Average Amount": {
55 | "type": "number",
56 | "min_value": 0,
57 | "max_value": 10000,
58 | "default_value": 0,
59 | "step": 100,
60 | "field_name": "Daily_chargeback_avg_amt"
61 | },
62 | "6-Month Average Chargeback Amount": {
63 | "type": "number",
64 | "min_value": 0,
65 | "max_value": 10000,
66 | "default_value": 0,
67 | "step": 100,
68 | "field_name": "six_month_avg_chbk_amt"
69 | },
70 | "6-Month Chargeback Frequency": {
71 | "type": "number",
72 | "min_value": 0,
73 | "max_value": 100,
74 | "default_value": 0,
75 | "step": 1,
76 | "field_name": "six_month_chbk_freq"
77 | }
78 | }
79 | }
--------------------------------------------------------------------------------
/form_configs/customer_income.json:
--------------------------------------------------------------------------------
1 | {
2 | "Customer Income Estimation Form": {
3 | "Age": {
4 | "field_name": "age",
5 | "type": "number",
6 | "min_value": 18,
7 | "max_value": 100,
8 | "default_value": 30,
9 | "step": 1
10 | },
11 | "Workclass": {
12 | "field_name": "workclass",
13 | "type": "dropdown",
14 | "options": ["Private", "State-gov", "Self-emp-not-inc", "Federal-gov",
15 | "Local-gov", "Self-emp-inc", "Without-pay"],
16 | "default_value": "Private"
17 | },
18 | "Financial Weight (fnlwgt)": {
19 | "field_name": "fnlwgt",
20 | "type": "number",
21 | "min_value": 0,
22 | "max_value": 1000000,
23 | "default_value": 100000,
24 | "step": 1000
25 | },
26 | "Education Level": {
27 | "field_name": "education",
28 | "type": "dropdown",
29 | "options": ["Doctorate", "12th", "Bachelors", "7th-8th", "Some-college",
30 | "HS-grad", "9th", "10th", "11th", "Masters", "Preschool",
31 | "5th-6th", "Prof-school", "Assoc-voc", "Assoc-acdm", "1st-4th"],
32 | "default_value": "Bachelors"
33 | },
34 | "Marital Status": {
35 | "field_name": "marital_status",
36 | "type": "dropdown",
37 | "options": ["Divorced", "Never-married", "Married-civ-spouse", "Widowed",
38 | "Separated", "Married-spouse-absent", "Married-AF-spouse"],
39 | "default_value": "Never-married"
40 | },
41 | "Occupation": {
42 | "field_name": "occupation",
43 | "type": "dropdown",
44 | "options": ["Exec-managerial", "Other-service", "Transport-moving",
45 | "Adm-clerical", "Machine-op-inspct", "Sales", "Handlers-cleaners",
46 | "Farming-fishing", "Protective-serv", "Prof-specialty",
47 | "Craft-repair", "Tech-support", "Priv-house-serv", "Armed-Forces"],
48 | "default_value": "Exec-managerial"
49 | },
50 | "Relationship": {
51 | "field_name": "relationship",
52 | "type": "dropdown",
53 | "options": ["Not-in-family", "Own-child", "Husband", "Wife", "Unmarried",
54 | "Other-relative"],
55 | "default_value": "Unmarried"
56 | },
57 | "Race": {
58 | "field_name": "race",
59 | "type": "dropdown",
60 | "options": ["White", "Black", "Asian-Pac-Islander", "Amer-Indian-Eskimo", "Other"],
61 | "default_value": "White"
62 | },
63 | "Sex": {
64 | "field_name": "sex",
65 | "type": "dropdown",
66 | "options": ["Male", "Female"],
67 | "default_value": "Male"
68 | },
69 | "Capital Gain": {
70 | "field_name": "capital_gain",
71 | "type": "number",
72 | "min_value": 0,
73 | "max_value": 100000,
74 | "default_value": 0,
75 | "step": 1000
76 | },
77 | "Capital Loss": {
78 | "field_name": "capital_loss",
79 | "type": "number",
80 | "min_value": 0,
81 | "max_value": 100000,
82 | "default_value": 0,
83 | "step": 1000
84 | },
85 | "Earn per Hour": {
86 | "field_name": "hours_per_week",
87 | "type": "number",
88 | "min_value": 0,
89 | "max_value": 100,
90 | "default_value": 25,
91 | "step": 1
92 | },
93 | "Native Country": {
94 | "field_name": "native_country",
95 | "type": "dropdown",
96 | "options": ["United-States", "Japan", "South", "Portugal", "Italy", "Mexico",
97 | "Ecuador", "England", "Philippines", "China", "Germany",
98 | "Dominican-Republic", "Jamaica", "Vietnam", "Thailand",
99 | "Puerto-Rico", "Cuba", "India", "Cambodia", "Yugoslavia", "Iran",
100 | "El-Salvador", "Poland", "Greece", "Ireland", "Canada",
101 | "Guatemala", "Scotland", "Columbia", "Outlying-US(Guam-USVI-etc)",
102 | "Haiti", "Peru", "Nicaragua", "Trinadad&Tobago", "Laos", "Taiwan",
103 | "France", "Hungary", "Honduras", "Hong", "Holand-Netherlands"],
104 | "default_value": "United-States"
105 | }
106 | }
107 | }
--------------------------------------------------------------------------------
/form_configs/gold_price_prediction.json:
--------------------------------------------------------------------------------
1 | {
2 | "Gold Price Form": {
3 | "SPX": {
4 | "type": "slider",
5 | "min_value": 0.0,
6 | "default_value": 1447.16,
7 | "step": 1,
8 | "field_name": "spx"
9 | },
10 | "USO": {
11 | "type": "number",
12 | "min_value": 0.0,
13 | "default_value": 78.47,
14 | "step": 0.01,
15 | "field_name": "uso"
16 | },
17 | "SLV": {
18 | "type": "number",
19 | "min_value": 0.0,
20 | "default_value": 15.18,
21 | "step": 0.01,
22 | "field_name": "slv"
23 | },
24 | "EUR/USD": {
25 | "type": "number",
26 | "min_value": 0.0,
27 | "default_value": 1.47,
28 | "step": 0.01,
29 | "field_name": "eur_usd"
30 | }
31 | }
32 | }
33 |
--------------------------------------------------------------------------------
/form_configs/house_price.json:
--------------------------------------------------------------------------------
1 | {
2 | "House Price Form": {
3 | "Area (in square feet)": {
4 | "type": "number",
5 | "min_value": 500,
6 | "max_value": 10000,
7 | "default_value": 1000,
8 | "step": 100,
9 | "field_name": "area"
10 | },
11 | "Near Main Road": {
12 | "type": "dropdown",
13 | "options": ["Yes", "No"],
14 | "default_value": "No",
15 | "field_name": "mainroad"
16 | },
17 | "Guest Room": {
18 | "type": "dropdown",
19 | "options": ["Yes", "No"],
20 | "default_value": "No",
21 | "field_name": "guestroom"
22 | },
23 | "Basement": {
24 | "type": "dropdown",
25 | "options": ["Yes", "No"],
26 | "default_value": "No",
27 | "field_name": "basement"
28 | },
29 | "Hot Water Heating": {
30 | "type": "dropdown",
31 | "options": ["Yes", "No"],
32 | "default_value": "No",
33 | "field_name": "hotwaterheating"
34 | },
35 | "Air Conditioning": {
36 | "type": "dropdown",
37 | "options": ["Yes", "No"],
38 | "default_value": "No",
39 | "field_name": "airconditioning"
40 | },
41 | "Preferred Area": {
42 | "type": "dropdown",
43 | "options": ["Yes", "No"],
44 | "default_value": "No",
45 | "field_name": "prefarea"
46 | },
47 | "Number of Bedrooms": {
48 | "type": "number",
49 | "min_value": 0,
50 | "max_value": 6,
51 | "default_value": 3,
52 | "step": 1,
53 | "field_name": "bedrooms"
54 | },
55 | "Number of Bathrooms": {
56 | "type": "number",
57 | "min_value": 0,
58 | "max_value": 4,
59 | "default_value": 2,
60 | "step": 1,
61 | "field_name": "bathrooms"
62 | },
63 | "Number of Stories": {
64 | "type": "number",
65 | "min_value": 1,
66 | "max_value": 4,
67 | "default_value": 2,
68 | "step": 1,
69 | "field_name": "stories"
70 | },
71 | "Parking Spaces": {
72 | "type": "number",
73 | "min_value": 0,
74 | "max_value": 3,
75 | "default_value": 1,
76 | "step": 1,
77 | "field_name": "parking"
78 | },
79 | "Furnishing Status": {
80 | "type": "dropdown",
81 | "field_name": "furnishingstatus",
82 | "options": ["semi-furnished", "unfurnished", "furnished"],
83 | "default_value": "semi-furnished"
84 | }
85 | }
86 | }
87 |
--------------------------------------------------------------------------------
/form_configs/insurance_cost_predictor.json:
--------------------------------------------------------------------------------
1 | {
2 | "Insurance Cost Form": {
3 | "Age": {
4 | "type": "slider",
5 | "min_value": 0,
6 | "default_value": 30,
7 | "step": 1,
8 | "field_name": "age"
9 | },
10 | "Sex": {
11 | "type": "dropdown",
12 | "options": ["Male", "Female"],
13 | "default_value": "Male",
14 | "field_name": "sex"
15 | },
16 | "BMI": {
17 | "type": "number",
18 | "min_value": 0.0,
19 | "default_value": 25.0,
20 | "step": 0.1,
21 | "field_name": "bmi"
22 | },
23 | "Children": {
24 | "type": "number",
25 | "min_value": 0,
26 | "default_value": 0,
27 | "step": 1,
28 | "field_name": "children"
29 | },
30 | "Smoker": {
31 | "type": "dropdown",
32 | "options": ["Yes", "No"],
33 | "default_value": "No",
34 | "field_name": "smoker"
35 | },
36 | "Region": {
37 | "type": "dropdown",
38 | "options": ["Southeast", "Southwest", "Northeast", "Northwest"],
39 | "default_value": "Southeast",
40 | "field_name": "region"
41 | }
42 | }
43 | }
44 |
--------------------------------------------------------------------------------
/form_configs/loan_eligibility.json:
--------------------------------------------------------------------------------
1 | {
2 | "Loan Eligibility Form": {
3 | "Income": {
4 | "field_name": "income",
5 | "type": "number",
6 | "min_value": 10000,
7 | "max_value": 1000000,
8 | "default_value": 50000,
9 | "step": 5000
10 | },
11 | "Loan Amount": {
12 | "field_name": "loan_amount",
13 | "type": "number",
14 | "min_value": 5000,
15 | "max_value": 500000,
16 | "default_value": 20000,
17 | "step": 1000
18 | },
19 | "Credit Score": {
20 | "field_name": "credit_score",
21 | "type": "range",
22 | "min_value": 300,
23 | "max_value": 850,
24 | "default_value": [600, 700]
25 | }
26 | }
27 | }
--------------------------------------------------------------------------------
/form_configs/parkinson_detection.json:
--------------------------------------------------------------------------------
1 | {
2 | "Parkinson Detection Form": {
3 | "MDVP_Fo_Hz": {
4 | "field_name": "MDVP_Fo_Hz",
5 | "type": "float",
6 | "min_value": 88.00000,
7 | "max_value": 260.00000,
8 | "default_value": 88.00000,
9 | "step": 0.000001
10 | },
11 | "MDVP_Fhi_Hz": {
12 | "field_name": "MDVP_Fhi_Hz",
13 | "type": "float",
14 | "min_value": 102.00000,
15 | "max_value": 592.00000,
16 | "default_value": 102.00000,
17 | "step": 0.000001
18 | },
19 | "MDVP_Flo_Hz": {
20 | "field_name": "MDVP_Flo_Hz",
21 | "type": "float",
22 | "min_value": 65.00000,
23 | "max_value": 240.00000,
24 | "default_value": 65.00000,
25 | "step": 0.000001
26 | },
27 | "MDVP_Jitter_percent": {
28 | "field_name": "MDVP_Jitter_percent",
29 | "type": "float",
30 | "min_value": 0.00100,
31 | "max_value": 0.03300,
32 | "default_value": 0.00100,
33 | "step": 0.000001
34 | },
35 | "MDVP_Jitter_Abs": {
36 | "field_name": "MDVP_Jitter_Abs",
37 | "type": "float",
38 | "min_value": 0.000020,
39 | "max_value": 0.000200,
40 | "default_value": 0.000020,
41 | "step": 0.000001
42 | },
43 | "MDVP_RAP": {
44 | "field_name": "MDVP_RAP",
45 | "type": "float",
46 | "min_value": 0.000600,
47 | "max_value": 0.020000,
48 | "default_value": 0.000600,
49 | "step": 0.000001
50 | },
51 | "MDVP_PPQ": {
52 | "field_name": "MDVP_PPQ",
53 | "type": "float",
54 | "min_value": 0.000900,
55 | "max_value": 0.020000,
56 | "default_value": 0.000900,
57 | "step": 0.000001
58 | },
59 | "Jitter_DDP": {
60 | "field_name": "Jitter_DDP",
61 | "type": "float",
62 | "min_value": 0.00200,
63 | "max_value": 0.06500,
64 | "default_value": 0.00200,
65 | "step": 0.000001
66 | },
67 | "MDVP_Shimmer": {
68 | "field_name": "MDVP_Shimmer",
69 | "type": "float",
70 | "min_value": 0.00900,
71 | "max_value": 0.12000,
72 | "default_value": 0.00900,
73 | "step": 0.000001
74 | },
75 | "MDVP_Shimmer_dB": {
76 | "field_name": "MDVP_Shimmer_dB",
77 | "type": "float",
78 | "min_value": 0.08500,
79 | "max_value": 1.30200,
80 | "default_value": 0.08500,
81 | "step": 0.000001
82 | },
83 | "Shimmer_APQ3": {
84 | "field_name": "Shimmer_APQ3",
85 | "type": "float",
86 | "min_value": 0.00400,
87 | "max_value": 0.05600,
88 | "default_value": 0.00400,
89 | "step": 0.000001
90 | },
91 | "Shimmer_APQ5": {
92 | "field_name": "Shimmer_APQ5",
93 | "type": "float",
94 | "min_value": 0.00500,
95 | "max_value": 0.08000,
96 | "default_value": 0.00500,
97 | "step": 0.000001
98 | },
99 | "MDVP_APQ": {
100 | "field_name": "MDVP_APQ",
101 | "type": "float",
102 | "min_value": 0.00700,
103 | "max_value": 0.14000,
104 | "default_value": 0.00700,
105 | "step": 0.000001
106 | },
107 | "Shimmer_DDA": {
108 | "field_name": "Shimmer_DDA",
109 | "type": "float",
110 | "min_value": 0.01300,
111 | "max_value": 0.17000,
112 | "default_value": 0.01300,
113 | "step": 0.000001
114 | },
115 | "NHR": {
116 | "field_name": "NHR",
117 | "type": "float",
118 | "min_value": 0.000600,
119 | "max_value": 0.310000,
120 | "default_value": 0.000600,
121 | "step": 0.000001
122 | },
123 | "HNR": {
124 | "field_name": "HNR",
125 | "type": "float",
126 | "min_value": 8.00000,
127 | "max_value": 33.00000,
128 | "default_value": 8.00000,
129 | "step": 0.000001
130 | },
131 | "RPDE": {
132 | "field_name": "RPDE",
133 | "type": "float",
134 | "min_value": 0.25000,
135 | "max_value": 0.68000,
136 | "default_value": 0.25000,
137 | "step": 0.000001
138 | },
139 | "DFA": {
140 | "field_name": "DFA",
141 | "type": "float",
142 | "min_value": 0.57000,
143 | "max_value": 0.82000,
144 | "default_value": 0.57000,
145 | "step": 0.000001
146 | },
147 | "Spread1": {
148 | "field_name": "spread1",
149 | "type": "float",
150 | "min_value": -7.00000,
151 | "max_value": -2.00000,
152 | "default_value": -7.00000,
153 | "step": 0.01
154 | },
155 | "Spread2": {
156 | "field_name": "spread2",
157 | "type": "float",
158 | "min_value": 0.00600,
159 | "max_value": 0.45000,
160 | "default_value": 0.00600,
161 | "step": 0.000001
162 | },
163 | "D2": {
164 | "field_name": "D2",
165 | "type": "float",
166 | "min_value": 1.42000,
167 | "max_value": 3.67000,
168 | "default_value": 1.42000,
169 | "step": 0.000001
170 | },
171 | "PPE": {
172 | "field_name": "PPE",
173 | "type": "float",
174 | "min_value": 0.04000,
175 | "max_value": 0.50000,
176 | "default_value": 0.04000,
177 | "step": 0.000001
178 | }
179 | }
180 | }
181 |
--------------------------------------------------------------------------------
/form_configs/sleep_prediction.json:
--------------------------------------------------------------------------------
1 | {
2 | "Sleep Prediction Form": {
3 | "Age": {
4 | "field_name": "Age",
5 | "type": "number",
6 | "min_value": 0,
7 | "max_value": 120,
8 | "default_value": 25,
9 | "step": 1
10 | },
11 | "Sleep_Duration": {
12 | "field_name": "Sleep_Duration",
13 | "type": "float",
14 | "min_value": 0.0,
15 | "max_value": 24.0,
16 | "default_value": 8.0,
17 | "step": 0.1
18 | },
19 | "Heart_Rate": {
20 | "field_name": "Heart_Rate",
21 | "type": "number",
22 | "min_value": 10,
23 | "max_value": 200,
24 | "default_value": 72,
25 | "step": 1
26 | },
27 | "Daily_Steps": {
28 | "field_name": "Daily_Steps",
29 | "type": "number",
30 | "min_value": 0,
31 | "max_value": 1000000,
32 | "default_value": 0,
33 | "step": 10
34 | },
35 | "Systolic": {
36 | "field_name": "Systolic",
37 | "type": "float",
38 | "min_value": 0.0,
39 | "max_value": 250.0,
40 | "default_value": 120.0,
41 | "step": 0.1
42 | },
43 | "Diastolic": {
44 | "field_name": "Diastolic",
45 | "type": "float",
46 | "min_value": 0.0,
47 | "max_value": 250.0,
48 | "default_value": 80.0,
49 | "step": 0.1
50 | },
51 | "Occupation": {
52 | "field_name": "Occupation",
53 | "type": "dropdown",
54 | "options": [
55 | "Software Engineer" ,"Doctor" ,"Sales Representative","Teacher" ,"Nurse",
56 | "Engineer" ,"Accountant" ,"Scientist", "Lawyer" ,"Salesperson" ,"Manager"
57 | ],
58 | "default_value": "Software Engineer"
59 | },
60 | "Quality_of_Sleep": {
61 | "field_name": "Quality_of_Sleep",
62 | "type": "number",
63 | "min_value": 0,
64 | "max_value": 10,
65 | "default_value": 0,
66 | "step": 1
67 | },
68 | "Gender": {
69 | "field_name": "Gender",
70 | "type": "dropdown",
71 | "options": [
72 | "Male",
73 | "Female"
74 | ],
75 | "default_value": "Male"
76 |
77 | },
78 | "Physical_Activity_Level": {
79 | "field_name": "Physical_Activity_Level",
80 | "type": "number",
81 | "min_value": 0,
82 | "max_value": 200,
83 | "default_value": 0,
84 | "step": 1
85 | },
86 | "Stress_Level": {
87 | "field_name": "Stress_Level",
88 | "type": "number",
89 | "min_value": 0,
90 | "max_value": 10,
91 | "default_value": 0,
92 | "step": 1
93 | },
94 | "BMI_Category": {
95 | "field_name": "BMI_Category",
96 | "type": "dropdown",
97 | "options": [
98 | "Normal Weight",
99 | "Obese",
100 | "Overweight"
101 | ],
102 | "default_value": "Normal Weight"
103 | }
104 | }
105 | }
--------------------------------------------------------------------------------
/form_configs/stress_detection.json:
--------------------------------------------------------------------------------
1 | {
2 | "Stress Detection Form": {
3 | "Age": {
4 | "field_name": "age",
5 | "type": "number",
6 | "min_value": 0,
7 | "max_value": 100,
8 | "default_value": 25,
9 | "step": 1
10 | },
11 | "Frequency of Using Social Media Without Purpose": {
12 | "field_name": "freq_no_purpose",
13 | "type": "slider",
14 | "min_value": 1,
15 | "max_value": 5,
16 | "default_value": 3,
17 | "step": 1
18 | },
19 | "Frequency of Feeling Distracted": {
20 | "field_name": "freq_distracted",
21 | "type": "slider",
22 | "min_value": 1,
23 | "max_value": 5,
24 | "default_value": 3,
25 | "step": 1
26 | },
27 | "Restlessness Level": {
28 | "field_name": "restless",
29 | "type": "slider",
30 | "min_value": 1,
31 | "max_value": 5,
32 | "default_value": 3,
33 | "step": 1
34 | },
35 | "Worry Level": {
36 | "field_name": "worry_level",
37 | "type": "slider",
38 | "min_value": 1,
39 | "max_value": 5,
40 | "default_value": 3,
41 | "step": 1
42 | },
43 | "Difficulty Concentrating": {
44 | "field_name": "difficulty_concentrating",
45 | "type": "slider",
46 | "min_value": 1,
47 | "max_value": 5,
48 | "default_value": 3,
49 | "step": 1
50 | },
51 | "Comparison to Successful People": {
52 | "field_name": "compare_to_successful_people",
53 | "type": "slider",
54 | "min_value": 1,
55 | "max_value": 5,
56 | "default_value": 3,
57 | "step": 1
58 | },
59 | "Feelings About Comparisons": {
60 | "field_name": "feelings_about_comparisons",
61 | "type": "slider",
62 | "min_value": 1,
63 | "max_value": 5,
64 | "default_value": 3,
65 | "step": 1
66 | },
67 | "Frequency of Seeking Validation": {
68 | "field_name": "freq_seeking_validation",
69 | "type": "slider",
70 | "min_value": 1,
71 | "max_value": 5,
72 | "default_value": 3,
73 | "step": 1
74 | },
75 | "Frequency of Feeling Depressed": {
76 | "field_name": "freq_feeling_depressed",
77 | "type": "slider",
78 | "min_value": 1,
79 | "max_value": 5,
80 | "default_value": 3,
81 | "step": 1
82 | },
83 | "Interest Fluctuation": {
84 | "field_name": "interest_fluctuation",
85 | "type": "slider",
86 | "min_value": 1,
87 | "max_value": 5,
88 | "default_value": 3,
89 | "step": 1
90 | },
91 | "Sleep Issues": {
92 | "field_name": "sleep_issues",
93 | "type": "slider",
94 | "min_value": 1,
95 | "max_value": 5,
96 | "default_value": 3,
97 | "step": 1
98 | }
99 | }
100 | }
101 |
--------------------------------------------------------------------------------
/form_handler.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | import json
3 | from typing import Dict, Any
4 |
5 |
6 | class FormHandler:
7 | """
8 | A class to handle rendering a form in Streamlit based on a configuration file.
9 |
10 | Attributes:
11 | name (str): The name of the form to be rendered.
12 | button_label (str): The label for the submit button.
13 | model (callable): The model function to be called with form data.
14 | config_path (str): Path to the configuration JSON file.
15 |
16 | Structure of the configuration JSON file:
17 | {
18 | "Form Name": {
19 | "field_label": {
20 | "field_name": "field_name",
21 | "type": "field_type",
22 | "default_value": "default_value",
23 | "min_value": min_value,
24 | "max_value": max_value,
25 | "step": step,
26 | "options": ["option1", "option2"],
27 | },
28 | ...
29 | }
30 | }
31 | """
32 |
33 | def __init__(
34 | self, name: str, button_label: str, model: callable, config_path: str
35 | ) -> None:
36 | """
37 | Initializes the FormHandler with the provided parameters.
38 |
39 | Parameters:
40 | name (str): The name of the form to be rendered.
41 | button_label (str): The label for the submit button.
42 | model (callable): The model function to be called with form data.
43 | config_path (str): Path to the configuration JSON file.
44 | """
45 | self.name = name
46 | self.button_label = button_label
47 | self.model = model
48 | self.config_path = config_path
49 | self.fields = self.load_fields_from_config()
50 |
51 | def load_fields_from_config(self) -> Dict[str, Dict[str, Any]]:
52 | """
53 | Loads form fields from the configuration JSON file.
54 |
55 | Returns:
56 | Dict[str, Dict[str, Any]]: A dictionary of form fields and their attributes.
57 | """
58 |
59 | # Try-except block to handle the exception if file not found or if parsing has parsing issues in JSON
60 | try:
61 | with open(self.config_path, "r") as f:
62 | config = json.load(f)
63 | return config.get(self.name, {})
64 | except FileNotFoundError:
65 | st.error(f"Configuration file not found: {self.config_path}")
66 | return {}
67 | except json.JSONDecodeError:
68 | st.error(f"Error parsing the configuration file: {self.config_path}")
69 | return {}
70 |
71 |
72 | def render(self) -> None:
73 | """
74 | Renders the form in the Streamlit application.
75 |
76 | This method collects user input and, upon form submission, calls the specified model
77 | with the collected data mapped to the appropriate field names from the config file.
78 | """
79 | # Dictionary to hold form data
80 | form_data: Dict[str, Any] = {}
81 |
82 | # Loop over the fields in the form
83 | for label, attributes in self.fields.items():
84 | field_type = attributes.get("type")
85 | field_name = attributes.get(
86 | "field_name", label
87 | ) # Use field_name from the config
88 |
89 | #Handle different types of input fields
90 | if field_type == "number":
91 | form_data[field_name] = st.number_input(
92 | label,
93 | value=attributes.get("default_value"),
94 | min_value=attributes.get("min_value"),
95 | max_value=attributes.get("max_value"),
96 | step=attributes.get("step", 1),
97 | )
98 |
99 | elif field_type == "float": # New case for float values
100 | form_data[field_name] = st.number_input(
101 | label,
102 | value=attributes.get("default_value"),
103 | min_value=attributes.get("min_value"),
104 | max_value=attributes.get("max_value"),
105 | step=attributes.get("step"),
106 | format="%.6f" # format to 6 decimal places
107 | )
108 |
109 | elif field_type == "dropdown":
110 | form_data[field_name] = st.selectbox(
111 | label,
112 | options=attributes.get("options"),
113 | index=attributes.get("options").index(
114 | attributes.get("default_value")
115 | ),
116 | )
117 | elif field_type == "range":
118 | form_data[field_name] = st.slider(
119 | label,
120 | min_value=attributes.get("min_value"),
121 | max_value=attributes.get("max_value"),
122 | value=tuple(attributes.get("default_value")), # type: ignore
123 | )
124 | elif field_type == "multiselect":
125 | form_data[field_name] = st.multiselect(
126 | label,
127 | options=attributes.get("options"),
128 | default=attributes.get("default_value"),
129 | )
130 | elif field_type == "text":
131 | form_data[field_name] = st.text_input(
132 | label, value=attributes.get("default_value")
133 | )
134 | elif field_type == "checkbox":
135 | form_data[field_name] = st.checkbox(
136 | label, value=attributes.get("default_value", False)
137 | )
138 | elif field_type == "radio":
139 | form_data[field_name] = st.radio(
140 | label,
141 | options=attributes.get("options"),
142 | index=attributes.get("options").index(
143 | attributes.get("default_value")
144 | ),
145 | )
146 | elif field_type == "slider":
147 | form_data[field_name] = st.slider(
148 | label,
149 | min_value=attributes.get("min_value"),
150 | max_value=attributes.get("max_value"),
151 | value=attributes.get("default_value"),
152 | )
153 | elif field_type == "date":
154 | form_data[field_name] = st.date_input(label)
155 | elif field_type == "time":
156 | form_data[field_name] = st.time_input(label)
157 | elif field_type == "file":
158 | form_data[field_name] = st.file_uploader(label)
159 | elif field_type == "password":
160 | form_data[field_name] = st.text_input(label, type="password")
161 | elif field_type == "image":
162 | form_data[field_name] = st.image(label)
163 | else:
164 | st.warning(f"Unknown field type: {field_type}")
165 |
166 | # Submit button
167 | if st.button(self.button_label):
168 | # Call the model with the form data
169 | result = self.model(**form_data)
170 |
171 | # Display the result
172 | st.success(f"Result: {result}")
173 |
--------------------------------------------------------------------------------
/machine-learning.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/machine-learning.gif
--------------------------------------------------------------------------------
/models/PDF_malware_detection/pdf_extraction.py:
--------------------------------------------------------------------------------
1 | from pdfid import pdfid
2 | import fitz
3 | from os.path import exists
4 | import sys
5 |
6 | # Function to convert bytes to kilobytes
7 | def bytes_to_kb(bytes):
8 | return bytes / 1024
9 |
10 | # Main function to extract features from the PDF file
11 | def extract_pdf_features(pdf_file):
12 | # Checking if the file exists
13 | if not exists(pdf_file):
14 | print(f"File {pdf_file} not found")
15 | return None
16 |
17 | features = {'FileName': pdf_file}
18 |
19 | # Open the PDF
20 | pdf = fitz.open(pdf_file)
21 |
22 | # Extract basic PDF metadata using PyMuPDF
23 | try:
24 | features['Pages'] = pdf.page_count
25 | except:
26 | features['Pages'] = -1
27 |
28 | try:
29 | features['XrefLength'] = pdf.xref_length()
30 | except:
31 | features['XrefLength'] = -1
32 |
33 | try:
34 | features['TitleCharacters'] = len(pdf.metadata.get('title', ''))
35 | except:
36 | features['TitleCharacters'] = -1
37 |
38 | features['isEncrypted'] = 1 if pdf.is_encrypted else 0
39 |
40 | # Extract image-related features
41 | images_count = 0
42 | for i in range(pdf.page_count):
43 | images_count += len(pdf.get_page_images(i))
44 |
45 | features['Images'] = images_count
46 |
47 | # Extract embedded file details
48 | emb_count = pdf.embfile_count()
49 | emb_size_sum = 0
50 | if emb_count != 0:
51 | try:
52 | for i in range(emb_count):
53 | emb_size_sum += pdf.embfile_info(i)
54 | except:
55 | features['EmbeddedFiles'] = -1
56 | else:
57 | features['EmbeddedFiles'] = emb_size_sum / emb_count
58 | else:
59 | features['EmbeddedFiles'] = 0
60 |
61 | # Extract presence of text in the PDF
62 | text = 0
63 | for page in pdf:
64 | if len(page.get_text().split()):
65 | text = 1
66 | break
67 | features['Text'] = text
68 |
69 | # Close the PDF after processing
70 | pdf.close()
71 |
72 | # Extract additional PDF features using pdfid
73 | try:
74 | options = pdfid.get_fake_options()
75 | options.scan = True
76 | options.json = True
77 | list_of_dict = pdfid.PDFiDMain([pdf_file], options)
78 | pdf_features = list_of_dict['reports'][0]
79 | del pdf_features['version']
80 |
81 | # Rename features to correspond to dataset names
82 | diff_in_feature_name = {
83 | 'header': 'Header',
84 | 'obj': 'Obj',
85 | 'endobj': 'Endobj',
86 | 'stream': 'Stream',
87 | 'endstream': 'Endstream',
88 | 'xref': 'Xref',
89 | 'trailer': 'Trailer',
90 | 'startxref': 'StartXref',
91 | '/Page': 'PageNo',
92 | '/Encrypt': 'Encrypt',
93 | '/ObjStm': 'ObjStm',
94 | '/JS': 'JS',
95 | '/JavaScript': 'JavaScript',
96 | '/AA': 'AA',
97 | '/OpenAction': 'OpenAction',
98 | '/AcroForm': 'AcroForm',
99 | '/JBIG2Decode': 'JBIG2Decode',
100 | '/RichMedia': 'RichMedia',
101 | '/Launch': 'Launch',
102 | '/EmbeddedFile': 'EmbeddedFile',
103 | '/XFA': 'XFA',
104 | '/Colors > 2^24': 'Colors'
105 | }
106 |
107 | for curr_name, new_name in diff_in_feature_name.items():
108 | pdf_features[new_name] = features.pop(curr_name, -1)
109 |
110 | features.update(pdf_features)
111 | except Exception as e:
112 | print(f"Error extracting pdfid features: {e}")
113 | features.update({
114 | 'Header': '-1',
115 | 'Obj': -1,
116 | 'Endobj': -1,
117 | 'Stream': -1,
118 | 'Endstream': -1,
119 | 'Xref': -1,
120 | 'Trailer': -1,
121 | 'StartXref': -1,
122 | 'PageNo': -1,
123 | 'Encrypt': -1,
124 | 'ObjStm': -1,
125 | 'JS': -1,
126 | 'JavaScript': -1,
127 | 'AA': -1,
128 | 'OpenAction': -1,
129 | 'AcroForm': -1,
130 | 'JBIG2Decode': -1,
131 | 'RichMedia': -1,
132 | 'Launch': -1,
133 | 'EmbeddedFile': -1,
134 | 'XFA': -1,
135 | 'Colors': -1
136 | })
137 |
138 | return features
139 |
140 | # If run as a script
141 | if __name__ == '__main__':
142 | if len(sys.argv) != 2:
143 | print("Usage: python pdf_feature_extraction.py ")
144 | sys.exit(1)
145 |
146 | pdf_file = sys.argv[1]
147 | features = extract_pdf_features(pdf_file)
148 |
149 | if features:
150 | print(features)
151 |
--------------------------------------------------------------------------------
/models/PDF_malware_detection/predict.py:
--------------------------------------------------------------------------------
1 | import subprocess
2 | import json
3 | import joblib
4 | import os
5 | # from pdf_extraction import extract_pdf_features
6 |
7 | from pdfid import pdfid
8 | import fitz
9 | from os.path import exists
10 | import sys
11 |
12 | # Function to convert bytes to kilobytes
13 | def bytes_to_kb(bytes):
14 | return bytes / 1024
15 |
16 | # Main function to extract features from the PDF file
17 | def extract_pdf_features(pdf_file):
18 | # Checking if the file exists
19 | if not exists(pdf_file):
20 | print(f"File {pdf_file} not found")
21 | return None
22 |
23 | features = {'FileName': pdf_file}
24 |
25 | # Open the PDF
26 | pdf = fitz.open(pdf_file)
27 |
28 | # Extract basic PDF metadata using PyMuPDF
29 | try:
30 | features['Pages'] = pdf.page_count
31 | except:
32 | features['Pages'] = -1
33 |
34 | try:
35 | features['XrefLength'] = pdf.xref_length()
36 | except:
37 | features['XrefLength'] = -1
38 |
39 | try:
40 | features['TitleCharacters'] = len(pdf.metadata.get('title', ''))
41 | except:
42 | features['TitleCharacters'] = -1
43 |
44 | features['isEncrypted'] = 1 if pdf.is_encrypted else 0
45 |
46 | # Extract image-related features
47 | images_count = 0
48 | for i in range(pdf.page_count):
49 | images_count += len(pdf.get_page_images(i))
50 |
51 | features['Images'] = images_count
52 |
53 | # Extract embedded file details
54 | emb_count = pdf.embfile_count()
55 | emb_size_sum = 0
56 | if emb_count != 0:
57 | try:
58 | for i in range(emb_count):
59 | emb_size_sum += pdf.embfile_info(i)
60 | except:
61 | features['EmbeddedFiles'] = -1
62 | else:
63 | features['EmbeddedFiles'] = emb_size_sum / emb_count
64 | else:
65 | features['EmbeddedFiles'] = 0
66 |
67 | # Extract presence of text in the PDF
68 | text = 0
69 | for page in pdf:
70 | if len(page.get_text().split()):
71 | text = 1
72 | break
73 | features['Text'] = text
74 |
75 | # Close the PDF after processing
76 | pdf.close()
77 |
78 | # Extract additional PDF features using pdfid
79 | try:
80 | options = pdfid.get_fake_options()
81 | options.scan = True
82 | options.json = True
83 | list_of_dict = pdfid.PDFiDMain([pdf_file], options)
84 | pdf_features = list_of_dict['reports'][0]
85 | del pdf_features['version']
86 |
87 | # Rename features to correspond to dataset names
88 | diff_in_feature_name = {
89 | 'header': 'Header',
90 | 'obj': 'Obj',
91 | 'endobj': 'Endobj',
92 | 'stream': 'Stream',
93 | 'endstream': 'Endstream',
94 | 'xref': 'Xref',
95 | 'trailer': 'Trailer',
96 | 'startxref': 'StartXref',
97 | '/Page': 'PageNo',
98 | '/Encrypt': 'Encrypt',
99 | '/ObjStm': 'ObjStm',
100 | '/JS': 'JS',
101 | '/JavaScript': 'JavaScript',
102 | '/AA': 'AA',
103 | '/OpenAction': 'OpenAction',
104 | '/AcroForm': 'AcroForm',
105 | '/JBIG2Decode': 'JBIG2Decode',
106 | '/RichMedia': 'RichMedia',
107 | '/Launch': 'Launch',
108 | '/EmbeddedFile': 'EmbeddedFile',
109 | '/XFA': 'XFA',
110 | '/Colors > 2^24': 'Colors'
111 | }
112 |
113 | for curr_name, new_name in diff_in_feature_name.items():
114 | pdf_features[new_name] = features.pop(curr_name, -1)
115 |
116 | features.update(pdf_features)
117 | except Exception as e:
118 | print(f"Error extracting pdfid features: {e}")
119 | features.update({
120 | 'Header': '-1',
121 | 'Obj': -1,
122 | 'Endobj': -1,
123 | 'Stream': -1,
124 | 'Endstream': -1,
125 | 'Xref': -1,
126 | 'Trailer': -1,
127 | 'StartXref': -1,
128 | 'PageNo': -1,
129 | 'Encrypt': -1,
130 | 'ObjStm': -1,
131 | 'JS': -1,
132 | 'JavaScript': -1,
133 | 'AA': -1,
134 | 'OpenAction': -1,
135 | 'AcroForm': -1,
136 | 'JBIG2Decode': -1,
137 | 'RichMedia': -1,
138 | 'Launch': -1,
139 | 'EmbeddedFile': -1,
140 | 'XFA': -1,
141 | 'Colors': -1
142 | })
143 |
144 | return features
145 | # Function to extract features from the PDF file using pdf_feature_extraction.py
146 | def extract_features(pdf_file):
147 | # command = f'python pdf_feature_extraction.py "{pdf_file}"'
148 | # result = subprocess.run(command, shell=True, capture_output=True, text=True)
149 |
150 | # if result.returncode != 0:
151 | # raise ValueError(f"Error in feature extraction: {result.stderr}")
152 |
153 | # # Parse the output JSON string to a dictionary
154 | # features = json.loads(result.stdout)
155 | features = extract_pdf_features(pdf_file)
156 | return features
157 |
158 | def header_to_numeric(header):
159 | if header.startswith('%PDF-'):
160 | return float(header.split('-')[1]) # Extract the version number
161 | return 0
162 |
163 | # Function to predict if the PDF contains malware
164 | def predict_malware(pdf_file, model_path = os.path.join(os.path.dirname(__file__), 'saved_models', 'random_forest_model.pkl')):
165 | # Extract features
166 | features = extract_features(pdf_file)
167 | print(features)
168 |
169 | # Load pre-trained model (replace with the actual path of your model)
170 | model = joblib.load(model_path)
171 |
172 | # Select the required features for prediction
173 | feature_vector = [
174 | header_to_numeric(features.get('header', '')),
175 | features.get('obj',0),
176 | features.get('endobj',0),
177 | features.get('stream',0),
178 | features.get('endstream',0),
179 | features.get('xref',0),
180 | features.get('trailer',0),
181 | features.get('startxref',0),
182 | features.get('/Page', 0),
183 | features.get('/Encrypt', 0),
184 | features.get('ObjStm', 0),
185 | features.get('/JS',0),
186 | features.get('/JavaScript',0),
187 | features.get('/AA',0),
188 | features.get('/OpenAction',0),
189 | features.get('/AcroForm',0),
190 | features.get('/JBIG2Decode',0),
191 | features.get('/RichMedia',0),
192 | features.get('/Launch',0),
193 | features.get('/EmbeddedFile', 0),
194 | features.get('/XFA',0),
195 | features.get('/Colors',0),
196 | # Add more features as required by the model
197 | ]
198 |
199 | # Predict malware (assuming binary classification: 0 = benign, 1 = malicious)
200 | prediction = model.predict([feature_vector])
201 |
202 | if prediction[0] == 1:
203 | print("The PDF contains malware.")
204 | else:
205 | print("The PDF is clean.")
206 | return prediction[0]
207 |
208 | # Example usage
209 | if __name__ == "__main__":
210 | pdf_file = r"C:\Users\agraw\Downloads\DAA_UNIT-II_BinarySearch_Notes.pdf" # Replace with the actual PDF file path
211 | # model_path = "malware_model.pkl" # Replace with the actual path of the trained model
212 | predict_malware(pdf_file)
213 |
--------------------------------------------------------------------------------
/models/PDF_malware_detection/saved_models/random_forest_model.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/PDF_malware_detection/saved_models/random_forest_model.pkl
--------------------------------------------------------------------------------
/models/business_performance_forecasting/data/50_Startups.csv:
--------------------------------------------------------------------------------
1 | R&D Spend,Administration,Marketing Spend,State,Profit
2 | 165349.2,136897.8,471784.1,New York,192261.83
3 | 162597.7,151377.59,443898.53,California,191792.06
4 | 153441.51,101145.55,407934.54,Florida,191050.39
5 | 144372.41,118671.85,383199.62,New York,182901.99
6 | 142107.34,91391.77,366168.42,Florida,166187.94
7 | 131876.9,99814.71,362861.36,New York,156991.12
8 | 134615.46,147198.87,127716.82,California,156122.51
9 | 130298.13,145530.06,323876.68,Florida,155752.6
10 | 120542.52,148718.95,311613.29,New York,152211.77
11 | 123334.88,108679.17,304981.62,California,149759.96
12 | 101913.08,110594.11,229160.95,Florida,146121.95
13 | 100671.96,91790.61,249744.55,California,144259.4
14 | 93863.75,127320.38,249839.44,Florida,141585.52
15 | 91992.39,135495.07,252664.93,California,134307.35
16 | 119943.24,156547.42,256512.92,Florida,132602.65
17 | 114523.61,122616.84,261776.23,New York,129917.04
18 | 78013.11,121597.55,264346.06,California,126992.93
19 | 94657.16,145077.58,282574.31,New York,125370.37
20 | 91749.16,114175.79,294919.57,Florida,124266.9
21 | 86419.7,153514.11,0,New York,122776.86
22 | 76253.86,113867.3,298664.47,California,118474.03
23 | 78389.47,153773.43,299737.29,New York,111313.02
24 | 73994.56,122782.75,303319.26,Florida,110352.25
25 | 67532.53,105751.03,304768.73,Florida,108733.99
26 | 77044.01,99281.34,140574.81,New York,108552.04
27 | 64664.71,139553.16,137962.62,California,107404.34
28 | 75328.87,144135.98,134050.07,Florida,105733.54
29 | 72107.6,127864.55,353183.81,New York,105008.31
30 | 66051.52,182645.56,118148.2,Florida,103282.38
31 | 65605.48,153032.06,107138.38,New York,101004.64
32 | 61994.48,115641.28,91131.24,Florida,99937.59
33 | 61136.38,152701.92,88218.23,New York,97483.56
34 | 63408.86,129219.61,46085.25,California,97427.84
35 | 55493.95,103057.49,214634.81,Florida,96778.92
36 | 46426.07,157693.92,210797.67,California,96712.8
37 | 46014.02,85047.44,205517.64,New York,96479.51
38 | 28663.76,127056.21,201126.82,Florida,90708.19
39 | 44069.95,51283.14,197029.42,California,89949.14
40 | 20229.59,65947.93,185265.1,New York,81229.06
41 | 38558.51,82982.09,174999.3,California,81005.76
42 | 28754.33,118546.05,172795.67,California,78239.91
43 | 27892.92,84710.77,164470.71,Florida,77798.83
44 | 23640.93,96189.63,148001.11,California,71498.49
45 | 15505.73,127382.3,35534.17,New York,69758.98
46 | 22177.74,154806.14,28334.72,California,65200.33
47 | 1000.23,124153.04,1903.93,New York,64926.08
48 | 1315.46,115816.21,297114.46,Florida,49490.75
49 | 0,135426.92,0,California,42559.73
50 | 542.05,51743.15,0,New York,35673.41
51 | 0,116983.8,45173.06,California,14681.4
--------------------------------------------------------------------------------
/models/business_performance_forecasting/model.py:
--------------------------------------------------------------------------------
1 | import pickle
2 | import os
3 | model_path = os.path.join(os.path.dirname(__file__), 'saved_models', 'model.pkl')
4 | scaler_path = os.path.join(os.path.dirname(__file__), 'saved_models', 'scaler.pkl')
5 |
6 |
7 | # Load the saved model and scaler
8 | def load_model_and_scaler():
9 | with open(model_path, 'rb') as model_file:
10 | model = pickle.load(model_file)
11 | with open(scaler_path, 'rb') as scaler_file:
12 | scaler = pickle.load(scaler_file)
13 |
14 | return model, scaler
15 |
--------------------------------------------------------------------------------
/models/business_performance_forecasting/notebooks/business_performance_forecasting.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "CazISR8X_HUG"
7 | },
8 | "source": [
9 | "# Multiple Linear Regression"
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {
15 | "id": "pOyqYHTk_Q57"
16 | },
17 | "source": [
18 | "## Importing the libraries"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {
25 | "id": "T_YHJjnD_Tja"
26 | },
27 | "outputs": [],
28 | "source": [
29 | "import numpy as np\n",
30 | "import matplotlib.pyplot as plt\n",
31 | "import pandas as pd\n",
32 | "import pickle"
33 | ]
34 | },
35 | {
36 | "cell_type": "markdown",
37 | "metadata": {
38 | "id": "vgC61-ah_WIz"
39 | },
40 | "source": [
41 | "## Importing the dataset"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": null,
47 | "metadata": {
48 | "id": "UrxyEKGn_ez7"
49 | },
50 | "outputs": [],
51 | "source": [
52 | "dataset = pd.read_csv('50_Startups.csv')\n",
53 | "X = dataset.iloc[:, :-1].values\n",
54 | "y = dataset.iloc[:, -1].values"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": null,
60 | "metadata": {
61 | "colab": {
62 | "base_uri": "https://localhost:8080/",
63 | "height": 874
64 | },
65 | "id": "GOB3QhV9B5kD",
66 | "outputId": "905a7bca-1889-4d04-920f-5f3ed8211585"
67 | },
68 | "outputs": [
69 | {
70 | "name": "stdout",
71 | "output_type": "stream",
72 | "text": [
73 | "[[165349.2 136897.8 471784.1 'New York']\n",
74 | " [162597.7 151377.59 443898.53 'California']\n",
75 | " [153441.51 101145.55 407934.54 'Florida']\n",
76 | " [144372.41 118671.85 383199.62 'New York']\n",
77 | " [142107.34 91391.77 366168.42 'Florida']\n",
78 | " [131876.9 99814.71 362861.36 'New York']\n",
79 | " [134615.46 147198.87 127716.82 'California']\n",
80 | " [130298.13 145530.06 323876.68 'Florida']\n",
81 | " [120542.52 148718.95 311613.29 'New York']\n",
82 | " [123334.88 108679.17 304981.62 'California']\n",
83 | " [101913.08 110594.11 229160.95 'Florida']\n",
84 | " [100671.96 91790.61 249744.55 'California']\n",
85 | " [93863.75 127320.38 249839.44 'Florida']\n",
86 | " [91992.39 135495.07 252664.93 'California']\n",
87 | " [119943.24 156547.42 256512.92 'Florida']\n",
88 | " [114523.61 122616.84 261776.23 'New York']\n",
89 | " [78013.11 121597.55 264346.06 'California']\n",
90 | " [94657.16 145077.58 282574.31 'New York']\n",
91 | " [91749.16 114175.79 294919.57 'Florida']\n",
92 | " [86419.7 153514.11 0.0 'New York']\n",
93 | " [76253.86 113867.3 298664.47 'California']\n",
94 | " [78389.47 153773.43 299737.29 'New York']\n",
95 | " [73994.56 122782.75 303319.26 'Florida']\n",
96 | " [67532.53 105751.03 304768.73 'Florida']\n",
97 | " [77044.01 99281.34 140574.81 'New York']\n",
98 | " [64664.71 139553.16 137962.62 'California']\n",
99 | " [75328.87 144135.98 134050.07 'Florida']\n",
100 | " [72107.6 127864.55 353183.81 'New York']\n",
101 | " [66051.52 182645.56 118148.2 'Florida']\n",
102 | " [65605.48 153032.06 107138.38 'New York']\n",
103 | " [61994.48 115641.28 91131.24 'Florida']\n",
104 | " [61136.38 152701.92 88218.23 'New York']\n",
105 | " [63408.86 129219.61 46085.25 'California']\n",
106 | " [55493.95 103057.49 214634.81 'Florida']\n",
107 | " [46426.07 157693.92 210797.67 'California']\n",
108 | " [46014.02 85047.44 205517.64 'New York']\n",
109 | " [28663.76 127056.21 201126.82 'Florida']\n",
110 | " [44069.95 51283.14 197029.42 'California']\n",
111 | " [20229.59 65947.93 185265.1 'New York']\n",
112 | " [38558.51 82982.09 174999.3 'California']\n",
113 | " [28754.33 118546.05 172795.67 'California']\n",
114 | " [27892.92 84710.77 164470.71 'Florida']\n",
115 | " [23640.93 96189.63 148001.11 'California']\n",
116 | " [15505.73 127382.3 35534.17 'New York']\n",
117 | " [22177.74 154806.14 28334.72 'California']\n",
118 | " [1000.23 124153.04 1903.93 'New York']\n",
119 | " [1315.46 115816.21 297114.46 'Florida']\n",
120 | " [0.0 135426.92 0.0 'California']\n",
121 | " [542.05 51743.15 0.0 'New York']\n",
122 | " [0.0 116983.8 45173.06 'California']]\n"
123 | ]
124 | }
125 | ],
126 | "source": [
127 | "print(X)"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {
133 | "id": "VadrvE7s_lS9"
134 | },
135 | "source": [
136 | "## Encoding categorical data"
137 | ]
138 | },
139 | {
140 | "cell_type": "code",
141 | "execution_count": null,
142 | "metadata": {
143 | "id": "wV3fD1mbAvsh"
144 | },
145 | "outputs": [],
146 | "source": [
147 | "from sklearn.compose import ColumnTransformer\n",
148 | "from sklearn.preprocessing import OneHotEncoder\n",
149 | "ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough')\n",
150 | "X = np.array(ct.fit_transform(X))"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": null,
156 | "metadata": {
157 | "colab": {
158 | "base_uri": "https://localhost:8080/",
159 | "height": 874
160 | },
161 | "id": "4ym3HdYeCGYG",
162 | "outputId": "9bd9e71a-bae0-45cb-fa26-9a0d480bb560"
163 | },
164 | "outputs": [
165 | {
166 | "name": "stdout",
167 | "output_type": "stream",
168 | "text": [
169 | "[[0.0 0.0 1.0 165349.2 136897.8 471784.1]\n",
170 | " [1.0 0.0 0.0 162597.7 151377.59 443898.53]\n",
171 | " [0.0 1.0 0.0 153441.51 101145.55 407934.54]\n",
172 | " [0.0 0.0 1.0 144372.41 118671.85 383199.62]\n",
173 | " [0.0 1.0 0.0 142107.34 91391.77 366168.42]\n",
174 | " [0.0 0.0 1.0 131876.9 99814.71 362861.36]\n",
175 | " [1.0 0.0 0.0 134615.46 147198.87 127716.82]\n",
176 | " [0.0 1.0 0.0 130298.13 145530.06 323876.68]\n",
177 | " [0.0 0.0 1.0 120542.52 148718.95 311613.29]\n",
178 | " [1.0 0.0 0.0 123334.88 108679.17 304981.62]\n",
179 | " [0.0 1.0 0.0 101913.08 110594.11 229160.95]\n",
180 | " [1.0 0.0 0.0 100671.96 91790.61 249744.55]\n",
181 | " [0.0 1.0 0.0 93863.75 127320.38 249839.44]\n",
182 | " [1.0 0.0 0.0 91992.39 135495.07 252664.93]\n",
183 | " [0.0 1.0 0.0 119943.24 156547.42 256512.92]\n",
184 | " [0.0 0.0 1.0 114523.61 122616.84 261776.23]\n",
185 | " [1.0 0.0 0.0 78013.11 121597.55 264346.06]\n",
186 | " [0.0 0.0 1.0 94657.16 145077.58 282574.31]\n",
187 | " [0.0 1.0 0.0 91749.16 114175.79 294919.57]\n",
188 | " [0.0 0.0 1.0 86419.7 153514.11 0.0]\n",
189 | " [1.0 0.0 0.0 76253.86 113867.3 298664.47]\n",
190 | " [0.0 0.0 1.0 78389.47 153773.43 299737.29]\n",
191 | " [0.0 1.0 0.0 73994.56 122782.75 303319.26]\n",
192 | " [0.0 1.0 0.0 67532.53 105751.03 304768.73]\n",
193 | " [0.0 0.0 1.0 77044.01 99281.34 140574.81]\n",
194 | " [1.0 0.0 0.0 64664.71 139553.16 137962.62]\n",
195 | " [0.0 1.0 0.0 75328.87 144135.98 134050.07]\n",
196 | " [0.0 0.0 1.0 72107.6 127864.55 353183.81]\n",
197 | " [0.0 1.0 0.0 66051.52 182645.56 118148.2]\n",
198 | " [0.0 0.0 1.0 65605.48 153032.06 107138.38]\n",
199 | " [0.0 1.0 0.0 61994.48 115641.28 91131.24]\n",
200 | " [0.0 0.0 1.0 61136.38 152701.92 88218.23]\n",
201 | " [1.0 0.0 0.0 63408.86 129219.61 46085.25]\n",
202 | " [0.0 1.0 0.0 55493.95 103057.49 214634.81]\n",
203 | " [1.0 0.0 0.0 46426.07 157693.92 210797.67]\n",
204 | " [0.0 0.0 1.0 46014.02 85047.44 205517.64]\n",
205 | " [0.0 1.0 0.0 28663.76 127056.21 201126.82]\n",
206 | " [1.0 0.0 0.0 44069.95 51283.14 197029.42]\n",
207 | " [0.0 0.0 1.0 20229.59 65947.93 185265.1]\n",
208 | " [1.0 0.0 0.0 38558.51 82982.09 174999.3]\n",
209 | " [1.0 0.0 0.0 28754.33 118546.05 172795.67]\n",
210 | " [0.0 1.0 0.0 27892.92 84710.77 164470.71]\n",
211 | " [1.0 0.0 0.0 23640.93 96189.63 148001.11]\n",
212 | " [0.0 0.0 1.0 15505.73 127382.3 35534.17]\n",
213 | " [1.0 0.0 0.0 22177.74 154806.14 28334.72]\n",
214 | " [0.0 0.0 1.0 1000.23 124153.04 1903.93]\n",
215 | " [0.0 1.0 0.0 1315.46 115816.21 297114.46]\n",
216 | " [1.0 0.0 0.0 0.0 135426.92 0.0]\n",
217 | " [0.0 0.0 1.0 542.05 51743.15 0.0]\n",
218 | " [1.0 0.0 0.0 0.0 116983.8 45173.06]]\n"
219 | ]
220 | }
221 | ],
222 | "source": [
223 | "print(X)"
224 | ]
225 | },
226 | {
227 | "cell_type": "markdown",
228 | "metadata": {
229 | "id": "WemVnqgeA70k"
230 | },
231 | "source": [
232 | "## Splitting the dataset into the Training set and Test set"
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": null,
238 | "metadata": {
239 | "id": "Kb_v_ae-A-20"
240 | },
241 | "outputs": [],
242 | "source": [
243 | "from sklearn.model_selection import train_test_split\n",
244 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
245 | ]
246 | },
247 | {
248 | "cell_type": "markdown",
249 | "metadata": {
250 | "id": "k-McZVsQBINc"
251 | },
252 | "source": [
253 | "## Training the Multiple Linear Regression model on the Training set"
254 | ]
255 | },
256 | {
257 | "cell_type": "code",
258 | "execution_count": null,
259 | "metadata": {
260 | "colab": {
261 | "base_uri": "https://localhost:8080/",
262 | "height": 34
263 | },
264 | "id": "ywPjx0L1BMiD",
265 | "outputId": "3417c2b0-6871-423c-a81f-643e35ae9f3e"
266 | },
267 | "outputs": [
268 | {
269 | "data": {
270 | "text/plain": [
271 | "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)"
272 | ]
273 | },
274 | "execution_count": 7,
275 | "metadata": {
276 | "tags": []
277 | },
278 | "output_type": "execute_result"
279 | }
280 | ],
281 | "source": [
282 | "from sklearn.linear_model import LinearRegression\n",
283 | "regressor = LinearRegression()\n",
284 | "regressor.fit(X_train, y_train)"
285 | ]
286 | },
287 | {
288 | "cell_type": "markdown",
289 | "metadata": {
290 | "id": "xNkXL1YQBiBT"
291 | },
292 | "source": [
293 | "## Predicting the Test set results"
294 | ]
295 | },
296 | {
297 | "cell_type": "code",
298 | "execution_count": null,
299 | "metadata": {
300 | "colab": {
301 | "base_uri": "https://localhost:8080/",
302 | "height": 188
303 | },
304 | "id": "TQKmwvtdBkyb",
305 | "outputId": "72da0067-f2e3-48d3-fae7-86ddbf597e5e"
306 | },
307 | "outputs": [
308 | {
309 | "name": "stdout",
310 | "output_type": "stream",
311 | "text": [
312 | "[[103015.2 103282.38]\n",
313 | " [132582.28 144259.4 ]\n",
314 | " [132447.74 146121.95]\n",
315 | " [ 71976.1 77798.83]\n",
316 | " [178537.48 191050.39]\n",
317 | " [116161.24 105008.31]\n",
318 | " [ 67851.69 81229.06]\n",
319 | " [ 98791.73 97483.56]\n",
320 | " [113969.44 110352.25]\n",
321 | " [167921.07 166187.94]]\n"
322 | ]
323 | }
324 | ],
325 | "source": [
326 | "y_pred = regressor.predict(X_test)\n",
327 | "np.set_printoptions(precision=2)\n",
328 | "print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))"
329 | ]
330 | },
331 | {
332 | "cell_type": "markdown",
333 | "metadata": {
334 | "id": "MC-XRwjE6x6M"
335 | },
336 | "source": [
337 | "# Saving the model\n"
338 | ]
339 | },
340 | {
341 | "cell_type": "code",
342 | "execution_count": null,
343 | "metadata": {
344 | "id": "HaEuLbtg_76M"
345 | },
346 | "outputs": [],
347 | "source": [
348 | "import os\n",
349 | "model_path = os.path.abspath(\"model.pkl\")\n",
350 | "scaler_path = os.path.abspath(\"scaler.pkl\")\n",
351 | "\n",
352 | "# Save the model and preprocessing objects\n",
353 | "with open(model_path, 'wb') as model_file:\n",
354 | " pickle.dump(model, model_file)\n",
355 | "\n",
356 | "with open(scaler_path, 'wb') as scaler_file:\n",
357 | " pickle.dump(ct, scaler_file)\n",
358 | "\n",
359 | "print(f\"Model saved at: {model_path}\")\n",
360 | "print(f\"Preprocessor saved at: {scaler_path}\")"
361 | ]
362 | },
363 | {
364 | "cell_type": "markdown",
365 | "metadata": {
366 | "id": "UwurUG9r63EK"
367 | },
368 | "source": [
369 | "# New Section"
370 | ]
371 | },
372 | {
373 | "cell_type": "code",
374 | "execution_count": null,
375 | "metadata": {
376 | "id": "wmPoacS7eWMt"
377 | },
378 | "outputs": [],
379 | "source": [
380 | "def model_evaluation(train_X, train_Y, test_X, test_Y, output_file=\"evaluation_results.pkl\"):\n",
381 | " # Calculate R^2 score\n",
382 | " train_r2 = r2_score(train_Y, model.predict(train_X))\n",
383 | " test_r2 = r2_score(test_Y, y_pred)\n",
384 | "\n",
385 | " \n",
386 | " \n",
387 | "\n",
388 | " # Package results\n",
389 | " results = {\n",
390 | " \"Train_R2\": train_r2,\n",
391 | " \"Test_R2\": test_r2,\n",
392 | " \"plot_file\": plot_file # Save the plot file path\n",
393 | " }\n",
394 | "\n",
395 | " # Save results to a pickle file\n",
396 | " with open(output_file, \"wb\") as f:\n",
397 | " pickle.dump(results, f)\n",
398 | "\n",
399 | " print(f\"Evaluation data saved to {output_file}\")\n",
400 | " \n",
401 | "# Run this function once to generate the evaluation file\n",
402 | "model_evaluation(X_train, y_train, X_test, y_test)"
403 | ]
404 | }
405 | ],
406 | "metadata": {
407 | "colab": {
408 | "provenance": []
409 | },
410 | "kernelspec": {
411 | "display_name": "Python 3",
412 | "name": "python3"
413 | }
414 | },
415 | "nbformat": 4,
416 | "nbformat_minor": 0
417 | }
418 |
--------------------------------------------------------------------------------
/models/business_performance_forecasting/predict.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | import pandas as pd
4 | import seaborn as sns
5 | import matplotlib.pyplot as plt
6 | import pickle
7 | from models.business_performance_forecasting.model import load_model_and_scaler # Import the function from model.py
8 |
9 | # Define the prediction function
10 | def get_prediction(RnD_Spend, Administration, Marketing_Spend, State):
11 | # Load the model and scalers
12 | model, scaler = load_model_and_scaler()
13 | # Prepare input features as a NumPy array
14 | input_data = np.array([[RnD_Spend, Administration, Marketing_Spend, State]])
15 |
16 | # Apply the scaler
17 | scaled_data = scaler.transform(input_data)
18 | scaled_data = scaled_data.astype(float)
19 |
20 | # Make prediction using the loaded model
21 | prediction = model.predict(scaled_data)
22 |
23 | return prediction[0] # Return the predicted profit
24 |
25 |
26 | class ModelEvaluation:
27 | def __init__(self):
28 | metrics_file= os.path.join(os.path.dirname(__file__), 'saved_models', 'evaluation_results.pkl')
29 | # Load evaluation metrics from a pickle file
30 | with open(metrics_file, "rb") as f:
31 | self.metrics = pickle.load(f)
32 | print("Loaded metrics:", self.metrics)
33 | def evaluate(self):
34 | metrics = self.metrics
35 | return metrics, None, None, None
36 |
37 | def model_details():
38 | evaluator = ModelEvaluation()
39 | return evaluator
40 |
41 |
--------------------------------------------------------------------------------
/models/business_performance_forecasting/saved_models/evaluation_results.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/business_performance_forecasting/saved_models/evaluation_results.pkl
--------------------------------------------------------------------------------
/models/business_performance_forecasting/saved_models/model.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/business_performance_forecasting/saved_models/model.pkl
--------------------------------------------------------------------------------
/models/business_performance_forecasting/saved_models/scaler.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/business_performance_forecasting/saved_models/scaler.pkl
--------------------------------------------------------------------------------
/models/credit_card_fraud/model.py:
--------------------------------------------------------------------------------
1 | # importing libraries
2 | from sklearn.model_selection import train_test_split
3 | from sklearn.svm import SVC
4 | import pandas as pd
5 | import warnings
6 | from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
7 | import pandas as pd
8 | import pickle
9 | from models.credit_card_fraud.modelEvaluation import ModelEvaluation
10 | warnings.filterwarnings("ignore")
11 |
12 |
13 | # reading dataset
14 | data = pd.read_csv("models/credit_card_fraud/data/creditcardcsvpresent.csv")
15 | df = data.copy(deep=True)
16 |
17 | # df.info()
18 |
19 | # remove transaction_date all values are null
20 | # and also remove merchant id
21 | df = df.drop(columns=['Merchant_id', 'Transaction date'], axis=1)
22 |
23 |
24 | # encoding for qualitative variables
25 | code = {
26 | "N": 0,
27 | "Y": 1 }
28 |
29 | for obj in df.select_dtypes("object"):
30 | df[obj] = df[obj].map(code)
31 |
32 | # Target and Feature Identification
33 | target = "isFradulent"
34 | features = [col for col in df.columns if col != target]
35 |
36 | X = df[features] # Create a DataFrame for the features
37 | y = df[target] # Create a Series for the target
38 |
39 |
40 | # Split the dataset
41 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
42 |
43 | # Train SVM Classifier
44 | svm_model = SVC(kernel='rbf', class_weight='balanced', random_state=42) # RBF kernel (default) is good for non-linear problems
45 | svm_model.fit(X_train, y_train)
46 |
47 | # Make predictions
48 | y_pred = svm_model.predict(X_test)
49 |
50 | # Function to prepare input data into a DataFrame
51 | def prepare_input_data(
52 | avg_amount_per_day,
53 | transaction_amount,
54 | Is_declined,
55 | no_of_declines_per_day,
56 | Is_Foreign_transaction,
57 | Is_High_Risk_country,
58 | Daily_chargeback_avg_amt,
59 | six_month_avg_chbk_amt,
60 | six_month_chbk_freq,
61 | ):
62 | # Create a DataFrame with the input data
63 | input_data = {
64 | "Average Amount/transaction/day": [avg_amount_per_day],
65 | "Transaction_amount": [transaction_amount],
66 | "Is declined": [Is_declined],
67 | "Total Number of declines/day": [no_of_declines_per_day],
68 | "isForeignTransaction": [Is_Foreign_transaction],
69 | "isHighRiskCountry": [Is_High_Risk_country],
70 | "Daily_chargeback_avg_amt": [Daily_chargeback_avg_amt],
71 | "6_month_avg_chbk_amt": [six_month_avg_chbk_amt],
72 | "6-month_chbk_freq": [six_month_chbk_freq],
73 | }
74 |
75 | return pd.DataFrame(input_data)
76 |
77 | def get_prediction(
78 | avg_amount_per_day,
79 | transaction_amount,
80 | Is_declined,
81 | no_of_declines_per_day,
82 | Is_Foreign_transaction,
83 | Is_High_Risk_country,
84 | Daily_chargeback_avg_amt,
85 | six_month_avg_chbk_amt,
86 | six_month_chbk_freq,
87 | ):
88 | # Prepare the input data
89 | input_df = prepare_input_data(
90 | avg_amount_per_day,
91 | transaction_amount,
92 | Is_declined,
93 | no_of_declines_per_day,
94 | Is_Foreign_transaction,
95 | Is_High_Risk_country,
96 | Daily_chargeback_avg_amt,
97 | six_month_avg_chbk_amt,
98 | six_month_chbk_freq,
99 | )
100 | # Predict using Random Forest
101 | predicted_value = svm_model.predict(input_df)
102 |
103 | # Return "Fraud" if fraud (1), else "Not a Fraud"
104 | return "Fraud" if predicted_value[0] == 1 else "Not a Fraud"
105 |
106 |
107 | # Function to save the model
108 | def save_model():
109 | # Save the Random Forest model
110 | model_filename = 'creditCardFraud_svc_model.pkl'
111 | with open(model_filename, 'wb') as file:
112 | pickle.dump(svm_model, file)
113 |
114 | # # Function to evaluate accuracy
115 | def get_evaluator():
116 | evaluator = ModelEvaluation(svm_model, X_train, y_train, X_test, y_test)
117 | return evaluator
118 |
119 | # save_model()
120 |
--------------------------------------------------------------------------------
/models/credit_card_fraud/modelEvaluation.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 | import matplotlib.pyplot as plt
4 | import seaborn as sns
5 | from sklearn.metrics import accuracy_score, confusion_matrix
6 |
7 |
8 | class ModelEvaluation:
9 | def __init__(self, model, train_X, train_Y, test_X, test_Y):
10 | self.model = model
11 | self.train_X = train_X
12 | self.train_Y = train_Y
13 | self.test_X = test_X
14 | self.test_Y = test_Y
15 | self.evaluation_matrix = pd.DataFrame(
16 | np.zeros([1, 8]),
17 | columns=[
18 | "Train-R2",
19 | "Test-R2",
20 | "Train-RSS",
21 | "Test-RSS",
22 | "Train-MSE",
23 | "Test-MSE",
24 | "Train-RMSE",
25 | "Test-RMSE",
26 | ],
27 | )
28 | self.random_column = np.random.choice(
29 | train_X.columns[train_X.nunique() >= 50], 1, replace=False
30 | )[0]
31 |
32 | def evaluate(self):
33 | pred_train = self.model.predict(self.train_X)
34 | pred_test = self.model.predict(self.test_X)
35 |
36 | self.update_evaluation_matrix(pred_train, pred_test)
37 | metrics = self.get_metrics()
38 | prediction_plot = self.plot_predictions(pred_train)
39 | error_plot = self.plot_error_terms(pred_train)
40 |
41 | # adding performance graph of the model
42 | performance_plot = self.plot_performance_graph()
43 |
44 | return metrics, prediction_plot, error_plot, performance_plot
45 |
46 | def get_metrics(self):
47 | """Return a dictionary of evaluation metrics for easy integration."""
48 | pred_train = self.model.predict(self.train_X)
49 | pred_test = self.model.predict(self.test_X)
50 |
51 | metrics = {
52 | "Train_R2": accuracy_score(self.train_Y, pred_train),
53 | "Test_R2": accuracy_score(self.test_Y, pred_test),
54 | "Train_RSS": np.sum(np.square(self.train_Y - pred_train)),
55 | "Test_RSS": np.sum(np.square(self.test_Y - pred_test))
56 | }
57 | return metrics
58 |
59 | def plot_predictions(self, pred_train):
60 | # Predict on test data
61 | pred_test = self.model.predict(self.test_X)
62 |
63 | # Calculate confusion matrix
64 | cm = confusion_matrix(self.test_Y, pred_test)
65 |
66 | # Plot confusion matrix
67 | fig, ax = plt.subplots(figsize=(10, 6))
68 | sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax)
69 |
70 | ax.set_title("Confusion Matrix")
71 | ax.set_xlabel("Predicted Labels")
72 | ax.set_ylabel("True Labels")
73 |
74 | plt.tight_layout()
75 | return fig
76 |
77 | def update_evaluation_matrix(self, pred_train, pred_test):
78 | return
79 |
80 | # making a separate function for plotting error terms
81 | def plot_error_terms(self, pred_train):
82 | fig, axes = plt.subplots(figsize=(15, 6))
83 |
84 | # Plotting error distribution
85 | sns.histplot(self.train_Y - pred_train, bins=30, kde=True, ax=axes)
86 | axes.set_title("Error Terms Distribution")
87 | axes.set_xlabel("Errors")
88 |
89 | plt.tight_layout()
90 | return fig # returning figure the is created here
91 |
92 | def plot_performance_graph(self):
93 | # Predict on test data
94 | pred_test = self.model.predict(self.test_X)
95 |
96 | # Calculate confusion matrix
97 | cm = confusion_matrix(self.test_Y, pred_test)
98 |
99 | # Plot confusion matrix
100 | fig, ax = plt.subplots(figsize=(10, 6))
101 | sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax)
102 |
103 | ax.set_title("Confusion Matrix")
104 | ax.set_xlabel("Predicted Labels")
105 | ax.set_ylabel("True Labels")
106 |
107 | plt.tight_layout()
108 | return fig
109 |
--------------------------------------------------------------------------------
/models/credit_card_fraud/predict.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import pickle
3 | from models.credit_card_fraud.model import get_evaluator
4 |
5 |
6 | def load_model(model_path):
7 | """ Load the trained Random Forest model from the specified path. """
8 | with open(model_path, 'rb') as file:
9 | return pickle.load(file)
10 |
11 |
12 | def prepare_input_data(
13 | avg_amount_per_day,
14 | transaction_amount,
15 | Is_declined,
16 | no_of_declines_per_day,
17 | Is_Foreign_transaction,
18 | Is_High_Risk_country,
19 | Daily_chargeback_avg_amt,
20 | six_month_avg_chbk_amt,
21 | six_month_chbk_freq,
22 | ):
23 | # Create a DataFrame with the input data
24 | input_data = {
25 | "Average Amount/transaction/day": [avg_amount_per_day],
26 | "Transaction_amount": [transaction_amount],
27 | "Is declined": [Is_declined],
28 | "Total Number of declines/day": [no_of_declines_per_day],
29 | "isForeignTransaction": [Is_Foreign_transaction],
30 | "isHighRiskCountry": [Is_High_Risk_country],
31 | "Daily_chargeback_avg_amt": [Daily_chargeback_avg_amt],
32 | "6_month_avg_chbk_amt": [six_month_avg_chbk_amt],
33 | "6-month_chbk_freq": [six_month_chbk_freq],
34 | }
35 |
36 | return pd.DataFrame(input_data)
37 |
38 |
39 | def get_prediction(
40 | avg_amount_per_day,
41 | transaction_amount,
42 | Is_declined,
43 | no_of_declines_per_day,
44 | Is_Foreign_transaction,
45 | Is_High_Risk_country,
46 | Daily_chargeback_avg_amt,
47 | six_month_avg_chbk_amt,
48 | six_month_chbk_freq,
49 | ):
50 |
51 | # Convert "no" to 0 and "yes" to 1 for the relevant fields
52 | Is_declined = 0 if Is_declined.lower() == "no" else 1
53 | Is_Foreign_transaction = 0 if Is_Foreign_transaction.lower() == "no" else 1
54 | Is_High_Risk_country = 0 if Is_High_Risk_country.lower() == "no" else 1
55 |
56 | # Prepare the input data
57 | input_df = prepare_input_data(
58 | avg_amount_per_day,
59 | transaction_amount,
60 | Is_declined,
61 | no_of_declines_per_day,
62 | Is_Foreign_transaction,
63 | Is_High_Risk_country,
64 | Daily_chargeback_avg_amt,
65 | six_month_avg_chbk_amt,
66 | six_month_chbk_freq,
67 | )
68 | # print(input_df.values)
69 | # Load the model
70 | svm_model = load_model("models/credit_card_fraud/saved_models/creditCardFraud_svc_model.pkl")
71 |
72 | # Predict using Random Forest
73 | predicted_value = svm_model.predict(input_df)
74 |
75 | # Return "Fraud" if fraud (1), else "Not a Fraud"
76 | return "Fraud" if predicted_value[0] == 1 else "Not a Fraud"
77 |
78 | def model_details():
79 | """Returns model evaluation details."""
80 | return get_evaluator()
81 |
--------------------------------------------------------------------------------
/models/credit_card_fraud/saved_models/creditCardFraud_svc_model.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/credit_card_fraud/saved_models/creditCardFraud_svc_model.pkl
--------------------------------------------------------------------------------
/models/customer_income/model.py:
--------------------------------------------------------------------------------
1 | import pickle
2 | import numpy as np
3 |
4 |
5 | def load_model():
6 | with open('models/customer_income/saved_models/CImodel.pkl', 'rb') as model_file:
7 | model = pickle.load(model_file)
8 | with open('models/customer_income/saved_models/CIscaler.pkl', 'rb') as scaler_file:
9 | scaler = pickle.load(scaler_file)
10 | return model, scaler
11 |
12 |
13 | def predict(features):
14 | model, scaler = load_model()
15 | features_array = np.array(features).reshape(1, -1)
16 | scaled_features = scaler.transform(features_array)
17 | prediction = model.predict(scaled_features)
18 | return prediction
19 |
20 |
21 |
--------------------------------------------------------------------------------
/models/customer_income/predict.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import pickle
3 | from models.customer_income.model import predict
4 |
5 | def load_model():
6 | with open('models/customer_income/saved_models/feature_names.pkl', 'rb') as feature_file:
7 | feature_names = pickle.load(feature_file)
8 | return feature_names
9 |
10 | def get_prediction(age, workclass, fnlwgt, education, marital_status, relationship, occupation, sex, race, capital_gain, capital_loss, hours_per_week, native_country):
11 | input_dict = {
12 | 'age': age,
13 | 'fnlwgt': fnlwgt,
14 | 'capital-gain': capital_gain,
15 | 'capital-loss': capital_loss,
16 | 'hours-per-week': hours_per_week,
17 | 'workclass_' + workclass: 1,
18 | 'education_' + education: 1,
19 | 'marital-status_' + marital_status: 1,
20 | 'relationship_' + relationship: 1,
21 | 'occupation_' + occupation: 1,
22 | 'sex_' + sex: 1,
23 | 'race_' + race: 1,
24 | 'native-country_' + native_country: 1,
25 | }
26 |
27 | feature_names = load_model()
28 |
29 | input_df = pd.DataFrame(0, index=[0], columns=feature_names)
30 |
31 | for key, value in input_dict.items():
32 | if key in input_df.columns:
33 | input_df[key] = value
34 | else:
35 | print(f"Warning: {key} not found in feature columns.")
36 |
37 |
38 | result = predict(input_df)
39 |
40 | if result == 1:
41 | return "The person earns more than $50,000 per year."
42 | else:
43 | return "The person earns less than or equal to $50,000 per year."
44 |
--------------------------------------------------------------------------------
/models/customer_income/saved_models/CImodel.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/customer_income/saved_models/CImodel.pkl
--------------------------------------------------------------------------------
/models/customer_income/saved_models/CIscaler.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/customer_income/saved_models/CIscaler.pkl
--------------------------------------------------------------------------------
/models/customer_income/saved_models/feature_names.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/customer_income/saved_models/feature_names.pkl
--------------------------------------------------------------------------------
/models/gold_price_prediction/model.py:
--------------------------------------------------------------------------------
1 | from joblib import load
2 |
3 | # Load the trained model for gold price prediction
4 | model = load('models/gold_price_prediction/saved_models/random_forest_model.joblib')
5 |
6 | def gold_price_prediction(spx, uso, slv, eur_usd):
7 | # Feature extraction
8 | features = [
9 | float(spx),
10 | float(uso),
11 | float(slv),
12 | float(eur_usd)
13 | ]
14 |
15 | # Predict the gold price (GLD)
16 | prediction = model.predict([features])[0]
17 |
18 | return prediction
19 |
--------------------------------------------------------------------------------
/models/gold_price_prediction/predict.py:
--------------------------------------------------------------------------------
1 | from models.gold_price_prediction.model import gold_price_prediction
2 |
3 | def get_prediction(spx, uso, slv, eur_usd):
4 | # Call the function that makes the prediction using input features
5 | return gold_price_prediction(spx, uso, slv, eur_usd)
6 |
--------------------------------------------------------------------------------
/models/gold_price_prediction/saved_models/random_forest_model.joblib:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/gold_price_prediction/saved_models/random_forest_model.joblib
--------------------------------------------------------------------------------
/models/house_price/ImprovedModel.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | from sklearn.model_selection import train_test_split, cross_val_score
3 | from sklearn.preprocessing import StandardScaler, OneHotEncoder
4 | from sklearn.compose import ColumnTransformer
5 | from sklearn.pipeline import Pipeline
6 | from sklearn.feature_selection import RFE
7 | from sklearn.linear_model import LinearRegression
8 | from sklearn.ensemble import RandomForestRegressor
9 | import warnings
10 | import pickle
11 | from .ModelEvaluation import ModelEvaluation
12 | import os
13 | import logging
14 | import streamlit as st
15 | import numpy as np
16 | warnings.filterwarnings("ignore")
17 |
18 | # Define the directory for logs
19 | log_directory = 'models/house_price/logs'
20 | os.makedirs(log_directory, exist_ok=True) # Create the directory if it doesn't exist
21 |
22 | # Set up logging
23 | log_file = os.path.join(log_directory, 'model_training.log')
24 | logging.basicConfig(
25 | filename=log_file,
26 | level=logging.INFO,
27 | format='%(asctime)s - %(levelname)s - %(message)s'
28 | )
29 |
30 | df = pd.read_csv("models/house_price/data/housing.csv")
31 | original_df = df.copy(deep=True)
32 |
33 | # Target and Feature Identification
34 | target = "price"
35 | features = [col for col in df.columns if col != target]
36 |
37 | # Separates numerical and categorical features based on unique values
38 | nu = df[features].nunique()
39 | numerical_features = [col for col in features if nu[col] > 16]
40 | categorical_features = [col for col in features if nu[col] <= 16]
41 |
42 | # Removing outliers using IQR
43 | def remove_outliers(df, numerical_features):
44 | for feature in numerical_features:
45 | Q1 = df[feature].quantile(0.25)
46 | Q3 = df[feature].quantile(0.75)
47 | IQR = Q3 - Q1
48 | df = df[(df[feature] >= (Q1 - 1.5 * IQR)) & (df[feature] <= (Q3 + 1.5 * IQR))]
49 | return df.reset_index(drop=True)
50 |
51 |
52 | # Handling missing values
53 | def handle_missing_values(df):
54 | null_summary = df.isnull().sum()
55 | null_percentage = (null_summary / df.shape[0]) * 100
56 | return pd.DataFrame(
57 | {"Total Null Values": null_summary, "Percentage": null_percentage}
58 | ).sort_values(by="Percentage", ascending=False)
59 |
60 |
61 | # Removes outliers from numerical features
62 | df = remove_outliers(df, numerical_features)
63 |
64 | # Filters categorical features without missing values
65 | null_value_summary = handle_missing_values(df)
66 | valid_categorical_features = [
67 | col
68 | for col in categorical_features
69 | if col not in null_value_summary[null_value_summary["Percentage"] != 0].index
70 | ]
71 |
72 | # Encoding categorical features
73 | def encode_categorical_features(df, categorical_features):
74 | for feature in categorical_features:
75 | # Binary encoding for features with 2 unique values
76 | if df[feature].nunique() == 2:
77 | df[feature] = pd.get_dummies(df[feature], drop_first=True, prefix=feature)
78 | # Dummy encoding for features with more than 2 unique values
79 | elif 2 < df[feature].nunique() <= 16:
80 | df = pd.concat(
81 | [
82 | df.drop([feature], axis=1),
83 | pd.get_dummies(df[feature], drop_first=True, prefix=feature),
84 | ],
85 | axis=1,
86 | )
87 | return df
88 |
89 | df = encode_categorical_features(df, valid_categorical_features)
90 |
91 | # Renames columns to avoid invalid characters
92 | df.columns = [col.replace("-", "_").replace(" ", "_") for col in df.columns]
93 |
94 | # Splitting the data into training & testing sets
95 | X = df.drop([target], axis=1)
96 | Y = df[target]
97 | Train_X, Test_X, Train_Y, Test_Y = train_test_split(
98 | X, Y, train_size=0.8, test_size=0.2, random_state=100
99 | )
100 |
101 | # Feature Scaling (Standardization)
102 | std = StandardScaler()
103 | Train_X_std = pd.DataFrame(std.fit_transform(Train_X), columns=X.columns)
104 | Test_X_std = pd.DataFrame(std.transform(Test_X), columns=X.columns)
105 |
106 | #Random Forest Algorithm
107 | rf_model = RandomForestRegressor(random_state=42, n_estimators=200, max_depth=8, min_samples_split=12)
108 | rf_model.fit(Train_X_std, Train_Y)
109 |
110 |
111 | pred_train = rf_model.predict(Train_X_std)
112 | pred_test = rf_model.predict(Test_X_std)
113 |
114 | # Calculate RMSE for train and test sets
115 | # train_rmse = np.sqrt(mean_squared_error(Train_Y, pred_train))
116 | # test_rmse = np.sqrt(mean_squared_error(Test_Y, pred_test))
117 |
118 |
119 | def prepare_input_data(
120 | area,
121 | mainroad,
122 | guestroom,
123 | basement,
124 | hotwaterheating,
125 | airconditioning,
126 | prefarea,
127 | additional_bedrooms,
128 | bathrooms,
129 | stories,
130 | parking,
131 | furnishingstatus,
132 | ):
133 | # Creates a dictionary for the input features
134 | input_data = {
135 | "area": [area],
136 | "mainroad": True if mainroad == "Yes" else False,
137 | "guestroom": True if guestroom == "Yes" else False,
138 | "basement": True if basement == "Yes" else False,
139 | "hotwaterheating": True if hotwaterheating == "Yes" else False,
140 | "airconditioning": True if airconditioning == "Yes" else False,
141 | "prefarea": True if prefarea == "Yes" else False,
142 | "bedrooms_2": additional_bedrooms == 2,
143 | "bedrooms_3": additional_bedrooms == 3,
144 | "bedrooms_4": additional_bedrooms == 4,
145 | "bedrooms_5": additional_bedrooms == 5,
146 | "bedrooms_6": additional_bedrooms == 6,
147 | "bathrooms_2": bathrooms == 2,
148 | "bathrooms_3": bathrooms == 3,
149 | "bathrooms_4": bathrooms == 4,
150 | "stories_2": stories == 2,
151 | "stories_3": stories == 3,
152 | "stories_4": stories == 4,
153 | "parking_1": parking == 1,
154 | "parking_2": parking == 2,
155 | "parking_3": parking == 3,
156 | "furnishingstatus_semi_furnished": furnishingstatus == "semi_furnished",
157 | "furnishingstatus_unfurnished": furnishingstatus == "unfurnished",
158 | }
159 |
160 | return pd.DataFrame(input_data)
161 |
162 | # Note: Not removing this fxn because of the warning in predict.py file
163 |
164 |
165 | ### Final Endpoint ###
166 | def get_predicted(area=0, mainroad=False, guestroom=False, basement=False, hotwaterheating=False,
167 | airconditioning=False, prefarea=False,bedrooms=0, bathrooms=2,stories=1, parking=1,
168 | furnishingstatus="semi_furnished",):
169 |
170 | input_df = prepare_input_data(area, mainroad, guestroom,basement, hotwaterheating, airconditioning, prefarea,
171 | bedrooms, bathrooms, stories, parking, furnishingstatus)
172 |
173 | input_std = pd.DataFrame(std.transform(input_df), columns=input_df.columns)
174 | predicted_price = rf_model.predict(input_std)
175 | return round(predicted_price[0],2)
176 |
177 | def save_model():
178 | # todo: Ask the user for the model name, and warn that the model will be overwritten
179 |
180 | with open("./saved_models/model_02.pkl", "wb") as file:
181 | pickle.dump(rf_model, file)
182 |
183 |
184 | def save_scaler():
185 | with open("./saved_models/scaler_02.pkl", "wb") as file:
186 | pickle.dump(std, file)
187 |
188 |
189 | def get_evaluator():
190 | evaluator = ModelEvaluation(rf_model, Train_X_std, Train_Y, Test_X_std, Test_Y)
191 | return evaluator
192 |
193 | if __name__ == "__main__":
194 | save_model()
195 | save_scaler()
196 | # model_evaluation()
197 |
--------------------------------------------------------------------------------
/models/house_price/ModelEvaluation.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 | import matplotlib.pyplot as plt
4 | import seaborn as sns
5 | from sklearn.metrics import r2_score, mean_squared_error
6 |
7 |
8 | class ModelEvaluation:
9 | def __init__(self, model, train_X, train_Y, test_X, test_Y):
10 | self.model = model
11 | self.train_X = train_X
12 | self.train_Y = train_Y
13 | self.test_X = test_X
14 | self.test_Y = test_Y
15 | self.evaluation_matrix = pd.DataFrame(
16 | np.zeros([1, 8]),
17 | columns=[
18 | "Train-R2",
19 | "Test-R2",
20 | "Train-RSS",
21 | "Test-RSS",
22 | "Train-MSE",
23 | "Test-MSE",
24 | "Train-RMSE",
25 | "Test-RMSE",
26 | ],
27 | )
28 | self.random_column = np.random.choice(
29 | train_X.columns[train_X.nunique() >= 50], 1, replace=False
30 | )[0]
31 |
32 | def evaluate(self):
33 | pred_train = self.model.predict(self.train_X)
34 | pred_test = self.model.predict(self.test_X)
35 |
36 | self.update_evaluation_matrix(pred_train, pred_test)
37 | metrics = self.get_metrics()
38 | prediction_plot = self.plot_predictions(pred_train)
39 | error_plot = self.plot_error_terms(pred_train)
40 |
41 | #adding performance graph of the model
42 | performance_plot = self.plot_performance_graph()
43 |
44 | return metrics, prediction_plot, error_plot, performance_plot
45 |
46 | def get_metrics(self):
47 | """Return a dictionary of evaluation metrics for easy integration."""
48 | pred_train = self.model.predict(self.train_X)
49 | pred_test = self.model.predict(self.test_X)
50 |
51 | metrics = {
52 | "Train_R2": r2_score(self.train_Y, pred_train),
53 | "Test_R2": r2_score(self.test_Y, pred_test),
54 | "Train_RSS": np.sum(np.square(self.train_Y - pred_train)),
55 | "Test_RSS": np.sum(np.square(self.test_Y - pred_test)),
56 | "Train_MSE": mean_squared_error(self.train_Y, pred_train),
57 | "Test_MSE": mean_squared_error(self.test_Y, pred_test),
58 | "Train_RMSE": np.sqrt(mean_squared_error(self.train_Y, pred_train)),
59 | "Test_RMSE": np.sqrt(mean_squared_error(self.test_Y, pred_test)),
60 | }
61 | return metrics
62 |
63 | def plot_predictions(self, pred_train):
64 | fig, axes = plt.subplots(figsize=(15, 6))
65 |
66 | # Plotting actual vs predicted
67 | axes.scatter(self.train_Y, pred_train, alpha=0.6)
68 | axes.plot(
69 | [self.train_Y.min(), self.train_Y.max()],
70 | [self.train_Y.min(), self.train_Y.max()],
71 | "r--",
72 | )
73 | axes.set_title("Actual vs Predicted Prices")
74 | axes.set_xlabel("Actual Price")
75 | axes.set_ylabel("Predicted Price")
76 |
77 | plt.legend()
78 | plt.grid()
79 | plt.tight_layout()
80 |
81 | return fig #returning figure the is created here
82 |
83 | def update_evaluation_matrix(self, pred_train, pred_test):
84 | self.evaluation_matrix.loc[0] = [
85 | r2_score(self.train_Y, pred_train),
86 | r2_score(self.test_Y, pred_test),
87 | np.sum(np.square(self.train_Y - pred_train)),
88 | np.sum(np.square(self.test_Y - pred_test)),
89 | mean_squared_error(self.train_Y, pred_train),
90 | mean_squared_error(self.test_Y, pred_test),
91 | np.sqrt(mean_squared_error(self.train_Y, pred_train)),
92 | np.sqrt(mean_squared_error(self.test_Y, pred_test)),
93 | ]
94 |
95 | #making a separate function for plotting error terms
96 | def plot_error_terms(self, pred_train):
97 | fig, axes = plt.subplots( figsize=(15, 6))
98 |
99 | # Plotting error distribution
100 | sns.histplot(self.train_Y - pred_train, bins=30, kde=True, ax=axes)
101 | axes.set_title("Error Terms Distribution")
102 | axes.set_xlabel("Errors")
103 |
104 | plt.tight_layout()
105 | return fig #returning figure the is created here
106 |
107 | def plot_performance_graph(self):
108 | metrics = self.get_metrics()
109 | performance_data = {
110 | "Metric": ["Train RMSE", "Test RMSE"],
111 | "Value": [metrics["Train_RMSE"], metrics["Test_RMSE"]],
112 | }
113 | performance_df = pd.DataFrame(performance_data)
114 |
115 | fig, axes = plt.subplots( figsize=(15, 6))
116 | sns.barplot(x="Metric", y="Value", data=performance_df)
117 | axes.set_title("Model Performance Comparison")
118 | axes.set_ylabel("RMSE")
119 |
120 | plt.tight_layout()
121 | return fig #returning figure the is created here
122 |
--------------------------------------------------------------------------------
/models/house_price/model.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | from sklearn.model_selection import train_test_split
3 | from sklearn.preprocessing import StandardScaler
4 | from sklearn.feature_selection import RFE
5 | from sklearn.linear_model import LinearRegression
6 | import warnings
7 | import pickle
8 | from .ModelEvaluation import ModelEvaluation
9 | import os
10 | import logging
11 | warnings.filterwarnings("ignore")
12 |
13 | # Define the directory for logs
14 | log_directory = 'models/house_price/logs'
15 | os.makedirs(log_directory, exist_ok=True) # Create the directory if it doesn't exist
16 |
17 | # Set up logging
18 | log_file = os.path.join(log_directory, 'model_training.log')
19 | logging.basicConfig(
20 | filename=log_file,
21 | level=logging.INFO,
22 | format='%(asctime)s - %(levelname)s - %(message)s'
23 | )
24 |
25 | df = pd.read_csv("models/house_price/data/housing.csv")
26 | original_df = df.copy(deep=True)
27 |
28 | # Target and Feature Identification
29 | target = "price"
30 | features = [col for col in df.columns if col != target]
31 |
32 | # Separates numerical and categorical features based on unique values
33 | nu = df[features].nunique()
34 | numerical_features = [col for col in features if nu[col] > 16]
35 | categorical_features = [col for col in features if nu[col] <= 16]
36 |
37 | # Removing outliers using IQR
38 | def remove_outliers(df, numerical_features):
39 | for feature in numerical_features:
40 | Q1 = df[feature].quantile(0.25)
41 | Q3 = df[feature].quantile(0.75)
42 | IQR = Q3 - Q1
43 | df = df[(df[feature] >= (Q1 - 1.5 * IQR)) & (df[feature] <= (Q3 + 1.5 * IQR))]
44 | return df.reset_index(drop=True)
45 |
46 |
47 | # Handling missing values
48 | def handle_missing_values(df):
49 | null_summary = df.isnull().sum()
50 | null_percentage = (null_summary / df.shape[0]) * 100
51 | return pd.DataFrame(
52 | {"Total Null Values": null_summary, "Percentage": null_percentage}
53 | ).sort_values(by="Percentage", ascending=False)
54 |
55 |
56 | # Removes outliers from numerical features
57 | df = remove_outliers(df, numerical_features)
58 |
59 | # Filters categorical features without missing values
60 | null_value_summary = handle_missing_values(df)
61 | valid_categorical_features = [
62 | col
63 | for col in categorical_features
64 | if col not in null_value_summary[null_value_summary["Percentage"] != 0].index
65 | ]
66 |
67 | # Encoding categorical features
68 | def encode_categorical_features(df, categorical_features):
69 | for feature in categorical_features:
70 | # Binary encoding for features with 2 unique values
71 | if df[feature].nunique() == 2:
72 | df[feature] = pd.get_dummies(df[feature], drop_first=True, prefix=feature)
73 | # Dummy encoding for features with more than 2 unique values
74 | elif 2 < df[feature].nunique() <= 16:
75 | df = pd.concat(
76 | [
77 | df.drop([feature], axis=1),
78 | pd.get_dummies(df[feature], drop_first=True, prefix=feature),
79 | ],
80 | axis=1,
81 | )
82 | return df
83 |
84 | df = encode_categorical_features(df, valid_categorical_features)
85 |
86 | # Renames columns to avoid invalid characters
87 | df.columns = [col.replace("-", "_").replace(" ", "_") for col in df.columns]
88 |
89 | # Splitting the data into training & testing sets
90 | X = df.drop([target], axis=1)
91 | Y = df[target]
92 | Train_X, Test_X, Train_Y, Test_Y = train_test_split(
93 | X, Y, train_size=0.8, test_size=0.2, random_state=100
94 | )
95 |
96 | # Feature Scaling (Standardization)
97 | std = StandardScaler()
98 | Train_X_std = pd.DataFrame(std.fit_transform(Train_X), columns=X.columns)
99 | Test_X_std = pd.DataFrame(std.transform(Test_X), columns=X.columns)
100 |
101 | # Multiple Linear Regression with sklearn
102 | MLR = LinearRegression().fit(Train_X_std, Train_Y)
103 | pred_train = MLR.predict(Train_X_std)
104 | pred_test = MLR.predict(Test_X_std)
105 |
106 | # Calculate RMSE for train and test sets
107 | # train_rmse = np.sqrt(mean_squared_error(Train_Y, pred_train))
108 | # test_rmse = np.sqrt(mean_squared_error(Test_Y, pred_test))
109 |
110 |
111 | def prepare_input_data(
112 | area,
113 | mainroad,
114 | guestroom,
115 | basement,
116 | hotwaterheating,
117 | airconditioning,
118 | prefarea,
119 | additional_bedrooms,
120 | bathrooms,
121 | stories,
122 | parking,
123 | furnishingstatus,
124 | ):
125 | # Creates a dictionary for the input features
126 | input_data = {
127 | "area": [area],
128 | "mainroad": True if mainroad == "Yes" else False,
129 | "guestroom": True if guestroom == "Yes" else False,
130 | "basement": True if basement == "Yes" else False,
131 | "hotwaterheating": True if hotwaterheating == "Yes" else False,
132 | "airconditioning": True if airconditioning == "Yes" else False,
133 | "prefarea": True if prefarea == "Yes" else False,
134 | "bedrooms_2": additional_bedrooms == 2,
135 | "bedrooms_3": additional_bedrooms == 3,
136 | "bedrooms_4": additional_bedrooms == 4,
137 | "bedrooms_5": additional_bedrooms == 5,
138 | "bedrooms_6": additional_bedrooms == 6,
139 | "bathrooms_2": bathrooms == 2,
140 | "bathrooms_3": bathrooms == 3,
141 | "bathrooms_4": bathrooms == 4,
142 | "stories_2": stories == 2,
143 | "stories_3": stories == 3,
144 | "stories_4": stories == 4,
145 | "parking_1": parking == 1,
146 | "parking_2": parking == 2,
147 | "parking_3": parking == 3,
148 | "furnishingstatus_semi_furnished": furnishingstatus == "semi_furnished",
149 | "furnishingstatus_unfurnished": furnishingstatus == "unfurnished",
150 | }
151 |
152 | return pd.DataFrame(input_data)
153 |
154 | # Note: Not removing this fxn because of the warning in predict.py file
155 |
156 |
157 | ### Final Endpoint ###
158 | # Predicts the price of a house based on the input features
159 | def get_prediction(
160 | area=0,
161 | mainroad=False,
162 | guestroom=False,
163 | basement=False,
164 | hotwaterheating=False,
165 | airconditioning=False,
166 | prefarea=False,
167 | bedrooms=0,
168 | bathrooms=2,
169 | stories=1,
170 | parking=1,
171 | furnishingstatus="semi_furnished",
172 | ):
173 | # Modifying the input data to match the model's input format
174 | input_df = prepare_input_data(
175 | area,
176 | mainroad,
177 | guestroom,
178 | basement,
179 | hotwaterheating,
180 | airconditioning,
181 | prefarea,
182 | bedrooms,
183 | bathrooms,
184 | stories,
185 | parking,
186 | furnishingstatus,
187 | )
188 |
189 | # Standardizes the input data
190 | input_std = pd.DataFrame(std.transform(input_df), columns=input_df.columns)
191 |
192 | # Predicts the price
193 | predicted_price = MLR.predict(input_std)
194 |
195 | return round(predicted_price[0], 2)
196 |
197 |
198 | def save_model():
199 | # todo: Ask the user for the model name, and warn that the model will be overwritten
200 |
201 | with open("models/house_price/saved_models/model_01.pkl", "wb") as file:
202 | pickle.dump(MLR, file)
203 |
204 |
205 | def save_scaler():
206 | with open("models/house_price/saved_models/scaler_01.pkl", "wb") as file:
207 | pickle.dump(std, file)
208 |
209 |
210 | def get_evaluator():
211 | evaluator = ModelEvaluation(MLR, Train_X_std, Train_Y, Test_X_std, Test_Y)
212 | return evaluator
213 |
214 | # if __name__ == "__main__":
215 | # save_model()
216 | # save_scaler()
217 | # model_evaluation()
218 |
--------------------------------------------------------------------------------
/models/house_price/predict.py:
--------------------------------------------------------------------------------
1 | import pickle
2 | import pandas as pd
3 | # from models.house_price.model import get_evaluator
4 | from models.house_price.ImprovedModel import get_evaluator
5 |
6 | """
7 | Predict.py file:
8 | Contains the following functions:
9 | - load_model: Loads a model from a pickle file.
10 | - prepare_input_data: Prepares the input data for the model.
11 | - [IMPORTANT] get_prediction: Predicts the price of a house based on the input features.
12 | - test_house_price_prediction: Tests the house price prediction model.
13 | - [IMPORTANT] model_details: Returns the details of the model.
14 | """
15 |
16 |
17 | def load_model(filepath):
18 | """Loads a model from the given pickle file path."""
19 | with open(filepath, "rb") as file:
20 | model = pickle.load(file)
21 | return model
22 |
23 |
24 | def prepare_input_data(
25 | area,
26 | mainroad,
27 | guestroom,
28 | basement,
29 | hotwaterheating,
30 | airconditioning,
31 | prefarea,
32 | additional_bedrooms,
33 | bathrooms,
34 | stories,
35 | parking,
36 | furnishingstatus,
37 | ):
38 | """
39 | Prepares the input data for the model by converting user inputs into a
40 | structured DataFrame format.
41 | """
42 | input_data = {
43 | "area": [area],
44 | "mainroad": mainroad == "Yes",
45 | "guestroom": guestroom == "Yes",
46 | "basement": basement == "Yes",
47 | "hotwaterheating": hotwaterheating == "Yes",
48 | "airconditioning": airconditioning == "Yes",
49 | "prefarea": prefarea == "Yes",
50 | "bedrooms_2": additional_bedrooms == 2,
51 | "bedrooms_3": additional_bedrooms == 3,
52 | "bedrooms_4": additional_bedrooms == 4,
53 | "bedrooms_5": additional_bedrooms == 5,
54 | "bedrooms_6": additional_bedrooms == 6,
55 | "bathrooms_2": bathrooms == 2,
56 | "bathrooms_3": bathrooms == 3,
57 | "bathrooms_4": bathrooms == 4,
58 | "stories_2": stories == 2,
59 | "stories_3": stories == 3,
60 | "stories_4": stories == 4,
61 | "parking_1": parking == 1,
62 | "parking_2": parking == 2,
63 | "parking_3": parking == 3,
64 | "furnishingstatus_semi_furnished": furnishingstatus == "semi_furnished",
65 | "furnishingstatus_unfurnished": furnishingstatus == "unfurnished",
66 | }
67 |
68 | return pd.DataFrame(input_data)
69 |
70 |
71 | def get_prediction(
72 | area=0,
73 | mainroad=False,
74 | guestroom=False,
75 | basement=False,
76 | hotwaterheating=False,
77 | airconditioning=False,
78 | prefarea=False,
79 | bedrooms=0,
80 | bathrooms=2,
81 | stories=1,
82 | parking=1,
83 | furnishingstatus="semi_furnished",
84 | ):
85 | """
86 | Predicts the house price based on the input features.
87 | Returns the predicted house price rounded to two decimal places.
88 | """
89 | # Prepare input data
90 | input_df = prepare_input_data(
91 | area,
92 | mainroad,
93 | guestroom,
94 | basement,
95 | hotwaterheating,
96 | airconditioning,
97 | prefarea,
98 | bedrooms,
99 | bathrooms,
100 | stories,
101 | parking,
102 | furnishingstatus,
103 | )
104 |
105 | # Load the model and the scaler
106 | model = load_model("models/house_price/saved_models/model_02.pkl")
107 | scaler = load_model("models/house_price/saved_models/scaler_02.pkl")
108 |
109 | # Scale the input data
110 | input_scaled = scaler.transform(input_df)
111 | scaled_df = pd.DataFrame(input_scaled, columns=scaler.get_feature_names_out())
112 |
113 | # Predict the house price
114 | predicted_price = model.predict(scaled_df)
115 |
116 | return round(predicted_price[0], 2)
117 |
118 |
119 | def test_house_price_prediction():
120 | """Test function to predict a sample house price."""
121 | # Sample inputs
122 | sample_input = {
123 | "area": 3000,
124 | "mainroad": "Yes",
125 | "guestroom": "No",
126 | "basement": "Yes",
127 | "hotwaterheating": "No",
128 | "airconditioning": "Yes",
129 | "prefarea": "Yes",
130 | "bedrooms": 2,
131 | "bathrooms": 3,
132 | "stories": 2,
133 | "parking": 2,
134 | "furnishingstatus": "semi_furnished",
135 | }
136 |
137 | predicted_price = get_prediction(**sample_input)
138 |
139 | print("Predicted House Price: Rs.", predicted_price)
140 |
141 |
142 | def model_details():
143 | """Returns model evaluation details."""
144 | return get_evaluator()
145 |
--------------------------------------------------------------------------------
/models/house_price/saved_models/model_01.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/house_price/saved_models/model_01.pkl
--------------------------------------------------------------------------------
/models/house_price/saved_models/model_02.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/house_price/saved_models/model_02.pkl
--------------------------------------------------------------------------------
/models/house_price/saved_models/scaler_01.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/house_price/saved_models/scaler_01.pkl
--------------------------------------------------------------------------------
/models/house_price/saved_models/scaler_02.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/house_price/saved_models/scaler_02.pkl
--------------------------------------------------------------------------------
/models/insurance_cost_predictor/model.py:
--------------------------------------------------------------------------------
1 | from joblib import load
2 |
3 | # Load the trained model for insurance cost prediction
4 | model = load("models/insurance_cost_predictor/saved_models/insurance_model.pkl")
5 |
6 | def insurance_cost_prediction(age, sex, bmi, children, smoker, region):
7 | # Feature extraction and conversions
8 | sex_value = 0 if sex.lower() == 'male' else 1 # 0 for male, 1 for female
9 | smoker_value = 0 if smoker.lower() == 'yes' else 1 # 0 for smoker, 1 for non-smoker
10 | region_dict = {'southeast': 0, 'southwest': 1, 'northeast': 2, 'northwest': 3}
11 | region_value = region_dict.get(region.lower(), -1) # Convert region to numerical value
12 |
13 | # Prepare features for prediction
14 | features = [
15 | float(age),
16 | float(sex_value),
17 | float(bmi),
18 | int(children),
19 | float(smoker_value),
20 | float(region_value)
21 | ]
22 |
23 | # Predict the insurance cost (charges)
24 | prediction = model.predict([features])[0]
25 |
26 | return prediction
27 |
--------------------------------------------------------------------------------
/models/insurance_cost_predictor/predict.py:
--------------------------------------------------------------------------------
1 | from models.insurance_cost_predictor.model import insurance_cost_prediction
2 |
3 | def get_prediction(age, sex, bmi, children, smoker, region):
4 | # Call the function that makes the insurance cost prediction using input features
5 | return insurance_cost_prediction(age, sex, bmi, children, smoker, region)
6 |
--------------------------------------------------------------------------------
/models/insurance_cost_predictor/saved_models/insurance_model.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/insurance_cost_predictor/saved_models/insurance_model.pkl
--------------------------------------------------------------------------------
/models/loan_eligibility/model.py:
--------------------------------------------------------------------------------
1 | def loan_eligibility(income, loan_amount, credit_score):
2 | # Placeholder logic for loan eligibility prediction
3 | if income > 50000 and credit_score[0] > 700:
4 | return "Loan approved"
5 | else:
6 | return "Loan denied"
7 |
--------------------------------------------------------------------------------
/models/loan_eligibility/predict.py:
--------------------------------------------------------------------------------
1 | from models.loan_eligibility.model import loan_eligibility
2 |
3 | def get_prediction(income, loan_amount, credit_score):
4 | return loan_eligibility(income, loan_amount, credit_score)
--------------------------------------------------------------------------------
/models/parkinson_disease_detector/parkinson_model.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | import pickle
3 | import numpy as np
4 | import warnings
5 | warnings.filterwarnings("ignore")
6 |
7 | # Load the model and the scaler
8 | model_path = 'models/parkinson_disease_detector/saved_models/Model_Prediction.sav'
9 | scaler_path = 'models/parkinson_disease_detector/saved_models/MinMaxScaler.sav'
10 |
11 | # Load the pre-trained model and scaler using pickle
12 | loaded_model = pickle.load(open(model_path, 'rb'))
13 | scaler = pickle.load(open(scaler_path, 'rb'))
14 |
15 | # Define the prediction function
16 | def disease_get_prediction(MDVP_Fo_Hz, MDVP_Fhi_Hz, MDVP_Flo_Hz,
17 | MDVP_Jitter_percent, MDVP_Jitter_Abs,
18 | MDVP_RAP, MDVP_PPQ, Jitter_DDP,
19 | MDVP_Shimmer, MDVP_Shimmer_dB,
20 | Shimmer_APQ3, Shimmer_APQ5,
21 | MDVP_APQ, Shimmer_DDA, NHR,
22 | HNR, RPDE, DFA, spread1,
23 | spread2, D2, PPE):
24 | features = np.array([[
25 | float(MDVP_Fo_Hz), float(MDVP_Fhi_Hz), float(MDVP_Flo_Hz),
26 | float(MDVP_Jitter_percent), float(MDVP_Jitter_Abs),
27 | float(MDVP_RAP), float(MDVP_PPQ), float(Jitter_DDP),
28 | float(MDVP_Shimmer), float(MDVP_Shimmer_dB),
29 | float(Shimmer_APQ3), float(Shimmer_APQ5),
30 | float(MDVP_APQ), float(Shimmer_DDA),
31 | float(NHR), float(HNR),
32 | float(RPDE), float(DFA),
33 | float(spread1), float(spread2),
34 | float(D2), float(PPE)
35 | ]])
36 |
37 | # Apply the scaler
38 | scaled_data = scaler.transform(features)
39 |
40 | # Make prediction
41 | prediction = loaded_model.predict(scaled_data)
42 |
43 | return prediction
44 |
--------------------------------------------------------------------------------
/models/parkinson_disease_detector/parkinson_predict.py:
--------------------------------------------------------------------------------
1 | from models.parkinson_disease_detector.parkinson_model import disease_get_prediction
2 |
3 | def get_prediction(MDVP_Fo_Hz, MDVP_Fhi_Hz, MDVP_Flo_Hz, MDVP_Jitter_percent, MDVP_Jitter_Abs, MDVP_RAP, MDVP_PPQ, Jitter_DDP, MDVP_Shimmer, MDVP_Shimmer_dB, Shimmer_APQ3, Shimmer_APQ5, MDVP_APQ, Shimmer_DDA, NHR, HNR, RPDE, DFA, spread1, spread2, D2, PPE):
4 |
5 | prediction = disease_get_prediction(MDVP_Fo_Hz, MDVP_Fhi_Hz, MDVP_Flo_Hz, MDVP_Jitter_percent, MDVP_Jitter_Abs, MDVP_RAP, MDVP_PPQ, Jitter_DDP, MDVP_Shimmer, MDVP_Shimmer_dB, Shimmer_APQ3, Shimmer_APQ5, MDVP_APQ, Shimmer_DDA, NHR, HNR, RPDE, DFA, spread1, spread2, D2, PPE)
6 |
7 | message = ""
8 |
9 | # Provide message based on the prediction value
10 | if prediction==1:
11 | message= "The prediction indicates you may have Parkinson's Disease. Please consult a doctor."
12 | elif prediction==0:
13 | message = "The prediction indicates you are healthy."
14 | else:
15 | message="Invalid details."
16 |
17 | return message
--------------------------------------------------------------------------------
/models/parkinson_disease_detector/saved_models/MinMaxScaler.sav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/parkinson_disease_detector/saved_models/MinMaxScaler.sav
--------------------------------------------------------------------------------
/models/parkinson_disease_detector/saved_models/Model_Prediction.sav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/parkinson_disease_detector/saved_models/Model_Prediction.sav
--------------------------------------------------------------------------------
/models/sleep_disorder_predictor/data/dataset.csv:
--------------------------------------------------------------------------------
1 | Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
2 | 1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,None
3 | 2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,None
4 | 3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,None
5 | 4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
6 | 5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
7 | 6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia
8 | 7,Male,29,Teacher,6.3,6,40,7,Obese,140/90,82,3500,Insomnia
9 | 8,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None
10 | 9,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None
11 | 10,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None
12 | 11,Male,29,Doctor,6.1,6,30,8,Normal,120/80,70,8000,None
13 | 12,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None
14 | 13,Male,29,Doctor,6.1,6,30,8,Normal,120/80,70,8000,None
15 | 14,Male,29,Doctor,6,6,30,8,Normal,120/80,70,8000,None
16 | 15,Male,29,Doctor,6,6,30,8,Normal,120/80,70,8000,None
17 | 16,Male,29,Doctor,6,6,30,8,Normal,120/80,70,8000,None
18 | 17,Female,29,Nurse,6.5,5,40,7,Normal Weight,132/87,80,4000,Sleep Apnea
19 | 18,Male,29,Doctor,6,6,30,8,Normal,120/80,70,8000,Sleep Apnea
20 | 19,Female,29,Nurse,6.5,5,40,7,Normal Weight,132/87,80,4000,Insomnia
21 | 20,Male,30,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None
22 | 21,Male,30,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
23 | 22,Male,30,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
24 | 23,Male,30,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
25 | 24,Male,30,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
26 | 25,Male,30,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None
27 | 26,Male,30,Doctor,7.9,7,75,6,Normal,120/80,70,8000,None
28 | 27,Male,30,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None
29 | 28,Male,30,Doctor,7.9,7,75,6,Normal,120/80,70,8000,None
30 | 29,Male,30,Doctor,7.9,7,75,6,Normal,120/80,70,8000,None
31 | 30,Male,30,Doctor,7.9,7,75,6,Normal,120/80,70,8000,None
32 | 31,Female,30,Nurse,6.4,5,35,7,Normal Weight,130/86,78,4100,Sleep Apnea
33 | 32,Female,30,Nurse,6.4,5,35,7,Normal Weight,130/86,78,4100,Insomnia
34 | 33,Female,31,Nurse,7.9,8,75,4,Normal Weight,117/76,69,6800,None
35 | 34,Male,31,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None
36 | 35,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
37 | 36,Male,31,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None
38 | 37,Male,31,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None
39 | 38,Male,31,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None
40 | 39,Male,31,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None
41 | 40,Male,31,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None
42 | 41,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
43 | 42,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
44 | 43,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
45 | 44,Male,31,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None
46 | 45,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
47 | 46,Male,31,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None
48 | 47,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
49 | 48,Male,31,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None
50 | 49,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
51 | 50,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,Sleep Apnea
52 | 51,Male,32,Engineer,7.5,8,45,3,Normal,120/80,70,8000,None
53 | 52,Male,32,Engineer,7.5,8,45,3,Normal,120/80,70,8000,None
54 | 53,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None
55 | 54,Male,32,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None
56 | 55,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None
57 | 56,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None
58 | 57,Male,32,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
59 | 58,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None
60 | 59,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None
61 | 60,Male,32,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None
62 | 61,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None
63 | 62,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None
64 | 63,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,None
65 | 64,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,None
66 | 65,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,None
67 | 66,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,None
68 | 67,Male,32,Accountant,7.2,8,50,6,Normal Weight,118/76,68,7000,None
69 | 68,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,Insomnia
70 | 69,Female,33,Scientist,6.2,6,50,6,Overweight,128/85,76,5500,None
71 | 70,Female,33,Scientist,6.2,6,50,6,Overweight,128/85,76,5500,None
72 | 71,Male,33,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None
73 | 72,Male,33,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None
74 | 73,Male,33,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None
75 | 74,Male,33,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None
76 | 75,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None
77 | 76,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None
78 | 77,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None
79 | 78,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None
80 | 79,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None
81 | 80,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None
82 | 81,Female,34,Scientist,5.8,4,32,8,Overweight,131/86,81,5200,Sleep Apnea
83 | 82,Female,34,Scientist,5.8,4,32,8,Overweight,131/86,81,5200,Sleep Apnea
84 | 83,Male,35,Teacher,6.7,7,40,5,Overweight,128/84,70,5600,None
85 | 84,Male,35,Teacher,6.7,7,40,5,Overweight,128/84,70,5600,None
86 | 85,Male,35,Software Engineer,7.5,8,60,5,Normal Weight,120/80,70,8000,None
87 | 86,Female,35,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
88 | 87,Male,35,Engineer,7.2,8,60,4,Normal,125/80,65,5000,None
89 | 88,Male,35,Engineer,7.2,8,60,4,Normal,125/80,65,5000,None
90 | 89,Male,35,Engineer,7.3,8,60,4,Normal,125/80,65,5000,None
91 | 90,Male,35,Engineer,7.3,8,60,4,Normal,125/80,65,5000,None
92 | 91,Male,35,Engineer,7.3,8,60,4,Normal,125/80,65,5000,None
93 | 92,Male,35,Engineer,7.3,8,60,4,Normal,125/80,65,5000,None
94 | 93,Male,35,Software Engineer,7.5,8,60,5,Normal Weight,120/80,70,8000,None
95 | 94,Male,35,Lawyer,7.4,7,60,5,Obese,135/88,84,3300,Sleep Apnea
96 | 95,Female,36,Accountant,7.2,8,60,4,Normal,115/75,68,7000,Insomnia
97 | 96,Female,36,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
98 | 97,Female,36,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
99 | 98,Female,36,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
100 | 99,Female,36,Teacher,7.1,8,60,4,Normal,115/75,68,7000,None
101 | 100,Female,36,Teacher,7.1,8,60,4,Normal,115/75,68,7000,None
102 | 101,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,None
103 | 102,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,None
104 | 103,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,None
105 | 104,Male,36,Teacher,6.6,5,35,7,Overweight,129/84,74,4800,Sleep Apnea
106 | 105,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,Sleep Apnea
107 | 106,Male,36,Teacher,6.6,5,35,7,Overweight,129/84,74,4800,Insomnia
108 | 107,Female,37,Nurse,6.1,6,42,6,Overweight,126/83,77,4200,None
109 | 108,Male,37,Engineer,7.8,8,70,4,Normal Weight,120/80,68,7000,None
110 | 109,Male,37,Engineer,7.8,8,70,4,Normal Weight,120/80,68,7000,None
111 | 110,Male,37,Lawyer,7.4,8,60,5,Normal,130/85,68,8000,None
112 | 111,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
113 | 112,Male,37,Lawyer,7.4,8,60,5,Normal,130/85,68,8000,None
114 | 113,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
115 | 114,Male,37,Lawyer,7.4,8,60,5,Normal,130/85,68,8000,None
116 | 115,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
117 | 116,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
118 | 117,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
119 | 118,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
120 | 119,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
121 | 120,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
122 | 121,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
123 | 122,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
124 | 123,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
125 | 124,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
126 | 125,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None
127 | 126,Female,37,Nurse,7.5,8,60,4,Normal Weight,120/80,70,8000,None
128 | 127,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None
129 | 128,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
130 | 129,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None
131 | 130,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None
132 | 131,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
133 | 132,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None
134 | 133,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None
135 | 134,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
136 | 135,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None
137 | 136,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None
138 | 137,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
139 | 138,Male,38,Lawyer,7.1,8,60,5,Normal,130/85,68,8000,None
140 | 139,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
141 | 140,Male,38,Lawyer,7.1,8,60,5,Normal,130/85,68,8000,None
142 | 141,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
143 | 142,Male,38,Lawyer,7.1,8,60,5,Normal,130/85,68,8000,None
144 | 143,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
145 | 144,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None
146 | 145,Male,38,Lawyer,7.1,8,60,5,Normal,130/85,68,8000,Sleep Apnea
147 | 146,Female,38,Lawyer,7.4,7,60,5,Obese,135/88,84,3300,Sleep Apnea
148 | 147,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,Insomnia
149 | 148,Male,39,Engineer,6.5,5,40,7,Overweight,132/87,80,4000,Insomnia
150 | 149,Female,39,Lawyer,6.9,7,50,6,Normal Weight,128/85,75,5500,None
151 | 150,Female,39,Accountant,8,9,80,3,Normal Weight,115/78,67,7500,None
152 | 151,Female,39,Accountant,8,9,80,3,Normal Weight,115/78,67,7500,None
153 | 152,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
154 | 153,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
155 | 154,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
156 | 155,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
157 | 156,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
158 | 157,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
159 | 158,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
160 | 159,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
161 | 160,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
162 | 161,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None
163 | 162,Female,40,Accountant,7.2,8,55,6,Normal Weight,119/77,73,7300,None
164 | 163,Female,40,Accountant,7.2,8,55,6,Normal Weight,119/77,73,7300,None
165 | 164,Male,40,Lawyer,7.9,8,90,5,Normal,130/85,68,8000,None
166 | 165,Male,40,Lawyer,7.9,8,90,5,Normal,130/85,68,8000,None
167 | 166,Male,41,Lawyer,7.6,8,90,5,Normal,130/85,70,8000,Insomnia
168 | 167,Male,41,Engineer,7.3,8,70,6,Normal Weight,121/79,72,6200,None
169 | 168,Male,41,Lawyer,7.1,7,55,6,Overweight,125/82,72,6000,None
170 | 169,Male,41,Lawyer,7.1,7,55,6,Overweight,125/82,72,6000,None
171 | 170,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None
172 | 171,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None
173 | 172,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None
174 | 173,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None
175 | 174,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None
176 | 175,Male,41,Lawyer,7.6,8,90,5,Normal,130/85,70,8000,None
177 | 176,Male,41,Lawyer,7.6,8,90,5,Normal,130/85,70,8000,None
178 | 177,Male,41,Lawyer,7.6,8,90,5,Normal,130/85,70,8000,None
179 | 178,Male,42,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
180 | 179,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None
181 | 180,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None
182 | 181,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None
183 | 182,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None
184 | 183,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None
185 | 184,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None
186 | 185,Female,42,Teacher,6.8,6,45,7,Overweight,130/85,78,5000,Sleep Apnea
187 | 186,Female,42,Teacher,6.8,6,45,7,Overweight,130/85,78,5000,Sleep Apnea
188 | 187,Female,43,Teacher,6.7,7,45,4,Overweight,135/90,65,6000,Insomnia
189 | 188,Male,43,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
190 | 189,Female,43,Teacher,6.7,7,45,4,Overweight,135/90,65,6000,Insomnia
191 | 190,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
192 | 191,Female,43,Teacher,6.7,7,45,4,Overweight,135/90,65,6000,Insomnia
193 | 192,Male,43,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia
194 | 193,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
195 | 194,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
196 | 195,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
197 | 196,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
198 | 197,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
199 | 198,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
200 | 199,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
201 | 200,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
202 | 201,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
203 | 202,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,Insomnia
204 | 203,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,Insomnia
205 | 204,Male,43,Engineer,6.9,6,47,7,Normal Weight,117/76,69,6800,None
206 | 205,Male,43,Engineer,7.6,8,75,4,Overweight,122/80,68,6800,None
207 | 206,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None
208 | 207,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None
209 | 208,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None
210 | 209,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None
211 | 210,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None
212 | 211,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None
213 | 212,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None
214 | 213,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None
215 | 214,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None
216 | 215,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None
217 | 216,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None
218 | 217,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None
219 | 218,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None
220 | 219,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,Sleep Apnea
221 | 220,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Sleep Apnea
222 | 221,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
223 | 222,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia
224 | 223,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
225 | 224,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia
226 | 225,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
227 | 226,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
228 | 227,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
229 | 228,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
230 | 229,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
231 | 230,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
232 | 231,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
233 | 232,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
234 | 233,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
235 | 234,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
236 | 235,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
237 | 236,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
238 | 237,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia
239 | 238,Female,44,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia
240 | 239,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
241 | 240,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia
242 | 241,Female,44,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia
243 | 242,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
244 | 243,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia
245 | 244,Female,44,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia
246 | 245,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
247 | 246,Female,44,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia
248 | 247,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
249 | 248,Male,44,Engineer,6.8,7,45,7,Overweight,130/85,78,5000,Insomnia
250 | 249,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,None
251 | 250,Male,44,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,None
252 | 251,Female,45,Teacher,6.8,7,30,6,Overweight,135/90,65,6000,Insomnia
253 | 252,Female,45,Teacher,6.8,7,30,6,Overweight,135/90,65,6000,Insomnia
254 | 253,Female,45,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia
255 | 254,Female,45,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia
256 | 255,Female,45,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia
257 | 256,Female,45,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia
258 | 257,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
259 | 258,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
260 | 259,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
261 | 260,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
262 | 261,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
263 | 262,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,None
264 | 263,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,None
265 | 264,Female,45,Manager,6.9,7,55,5,Overweight,125/82,75,5500,None
266 | 265,Male,48,Doctor,7.3,7,65,5,Obese,142/92,83,3500,Insomnia
267 | 266,Female,48,Nurse,5.9,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
268 | 267,Male,48,Doctor,7.3,7,65,5,Obese,142/92,83,3500,Insomnia
269 | 268,Female,49,Nurse,6.2,6,90,8,Overweight,140/95,75,10000,None
270 | 269,Female,49,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
271 | 270,Female,49,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
272 | 271,Female,49,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
273 | 272,Female,49,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
274 | 273,Female,49,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
275 | 274,Female,49,Nurse,6.2,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
276 | 275,Female,49,Nurse,6.2,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
277 | 276,Female,49,Nurse,6.2,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
278 | 277,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea
279 | 278,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea
280 | 279,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Insomnia
281 | 280,Female,50,Engineer,8.3,9,30,3,Normal,125/80,65,5000,None
282 | 281,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,None
283 | 282,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
284 | 283,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
285 | 284,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
286 | 285,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
287 | 286,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
288 | 287,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
289 | 288,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
290 | 289,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
291 | 290,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
292 | 291,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
293 | 292,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
294 | 293,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
295 | 294,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
296 | 295,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
297 | 296,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
298 | 297,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
299 | 298,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
300 | 299,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
301 | 300,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
302 | 301,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
303 | 302,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
304 | 303,Female,51,Nurse,7.1,7,55,6,Normal Weight,125/82,72,6000,None
305 | 304,Female,51,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
306 | 305,Female,51,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
307 | 306,Female,51,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
308 | 307,Female,52,Accountant,6.5,7,45,7,Overweight,130/85,72,6000,Insomnia
309 | 308,Female,52,Accountant,6.5,7,45,7,Overweight,130/85,72,6000,Insomnia
310 | 309,Female,52,Accountant,6.6,7,45,7,Overweight,130/85,72,6000,Insomnia
311 | 310,Female,52,Accountant,6.6,7,45,7,Overweight,130/85,72,6000,Insomnia
312 | 311,Female,52,Accountant,6.6,7,45,7,Overweight,130/85,72,6000,Insomnia
313 | 312,Female,52,Accountant,6.6,7,45,7,Overweight,130/85,72,6000,Insomnia
314 | 313,Female,52,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
315 | 314,Female,52,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
316 | 315,Female,52,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
317 | 316,Female,53,Engineer,8.3,9,30,3,Normal,125/80,65,5000,Insomnia
318 | 317,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
319 | 318,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
320 | 319,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
321 | 320,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
322 | 321,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
323 | 322,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
324 | 323,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
325 | 324,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
326 | 325,Female,53,Engineer,8.3,9,30,3,Normal,125/80,65,5000,None
327 | 326,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
328 | 327,Female,53,Engineer,8.3,9,30,3,Normal,125/80,65,5000,None
329 | 328,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
330 | 329,Female,53,Engineer,8.3,9,30,3,Normal,125/80,65,5000,None
331 | 330,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
332 | 331,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
333 | 332,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
334 | 333,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
335 | 334,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
336 | 335,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
337 | 336,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
338 | 337,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
339 | 338,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None
340 | 339,Female,54,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None
341 | 340,Female,55,Nurse,8.1,9,75,4,Overweight,140/95,72,5000,Sleep Apnea
342 | 341,Female,55,Nurse,8.1,9,75,4,Overweight,140/95,72,5000,Sleep Apnea
343 | 342,Female,56,Doctor,8.2,9,90,3,Normal Weight,118/75,65,10000,None
344 | 343,Female,56,Doctor,8.2,9,90,3,Normal Weight,118/75,65,10000,None
345 | 344,Female,57,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,None
346 | 345,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
347 | 346,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
348 | 347,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
349 | 348,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
350 | 349,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
351 | 350,Female,57,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
352 | 351,Female,57,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
353 | 352,Female,57,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
354 | 353,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
355 | 354,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
356 | 355,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
357 | 356,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
358 | 357,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
359 | 358,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
360 | 359,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,None
361 | 360,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,None
362 | 361,Female,59,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
363 | 362,Female,59,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
364 | 363,Female,59,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
365 | 364,Female,59,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
366 | 365,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
367 | 366,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
368 | 367,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
369 | 368,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370 | 369,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371 | 370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372 | 371,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
373 | 372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
374 | 373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
375 | 374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
--------------------------------------------------------------------------------
/models/sleep_disorder_predictor/model.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | import pickle
3 | import pandas as pd # Import pandas to handle DataFrames
4 | import numpy as np
5 | import warnings
6 | warnings.filterwarnings("ignore")
7 |
8 | # Load the model and the scaler
9 | model_path = 'models/sleep_disorder_predictor/saved_models/Model_Prediction.sav'
10 | preprocessor_path = 'models/sleep_disorder_predictor/saved_models/preprocessor.sav'
11 |
12 | # Load the pre-trained model and scaler using pickle
13 | loaded_model = pickle.load(open(model_path, 'rb'))
14 | preprocessor = pickle.load(open(preprocessor_path, 'rb'))
15 |
16 | # Define the prediction function
17 | def disease_get_prediction(Age, Sleep_Duration,
18 | Heart_Rate, Daily_Steps,
19 | Systolic, Diastolic, Occupation, Quality_of_Sleep, Gender,
20 | Physical_Activity_Level, Stress_Level, BMI_Category):
21 | # Create a DataFrame with the features using correct column names
22 | features = pd.DataFrame({
23 | 'Age': [int(Age)],
24 | 'Sleep Duration': [float(Sleep_Duration)], # Changed to match expected name
25 | 'Heart Rate': [int(Heart_Rate)], # Changed to match expected name
26 | 'Daily Steps': [int(Daily_Steps)], # Changed to match expected name
27 | 'Systolic': [float(Systolic)],
28 | 'Diastolic': [float(Diastolic)],
29 | 'Occupation': [Occupation],
30 | 'Quality of Sleep': [int(Quality_of_Sleep)], # Changed to match expected name
31 | 'Gender': [Gender],
32 | 'Physical Activity Level': [int(Physical_Activity_Level)], # Changed to match expected name
33 | 'Stress Level': [int(Stress_Level)], # Changed to match expected name
34 | 'BMI Category': [BMI_Category] # Changed to match expected name
35 | })
36 |
37 | # Apply the preprocessor (make sure it expects a DataFrame)
38 | preprocessed_data = preprocessor.transform(features)
39 |
40 | # Make prediction
41 | prediction = loaded_model.predict(preprocessed_data)
42 |
43 | return prediction
44 |
--------------------------------------------------------------------------------
/models/sleep_disorder_predictor/predict.py:
--------------------------------------------------------------------------------
1 | from models.sleep_disorder_predictor.model import disease_get_prediction
2 |
3 | def get_prediction(Age, Sleep_Duration,
4 | Heart_Rate, Daily_Steps,
5 | Systolic, Diastolic,Occupation,Quality_of_Sleep,Gender,
6 | Physical_Activity_Level, Stress_Level, BMI_Category):
7 |
8 | prediction = disease_get_prediction(Age, Sleep_Duration,
9 | Heart_Rate, Daily_Steps,
10 | Systolic, Diastolic,Occupation,Quality_of_Sleep,Gender,
11 | Physical_Activity_Level, Stress_Level, BMI_Category)
12 |
13 | message = ""
14 |
15 | # Provide message based on the prediction value
16 | if prediction==0:
17 | message= "Insomnia"
18 | elif prediction==1:
19 | message = "No disorder"
20 | elif prediction==2:
21 | message = "Sleep Apnea"
22 | else:
23 | message="Invalid details."
24 |
25 | return message+"\n\nRecommendation - To prevent sleep disorders, maintain a balanced lifestyle with regular exercise, a healthy diet, and stress management. Stick to a consistent sleep schedule, limit caffeine and alcohol, and create a relaxing bedtime routine."
--------------------------------------------------------------------------------
/models/sleep_disorder_predictor/saved_models/Model_Prediction.sav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/sleep_disorder_predictor/saved_models/Model_Prediction.sav
--------------------------------------------------------------------------------
/models/sleep_disorder_predictor/saved_models/preprocessor.sav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/sleep_disorder_predictor/saved_models/preprocessor.sav
--------------------------------------------------------------------------------
/models/stress_level_detect/model.py:
--------------------------------------------------------------------------------
1 | from joblib import load
2 |
3 | # Load the trained Random Forest model
4 | model = load('models/stress_level_detect/saved_models/random_forest_model.joblib')
5 |
6 | def stress_level_prediction(age, freq_no_purpose, freq_distracted, restless, worry_level, difficulty_concentrating, compare_to_successful_people, feelings_about_comparisons, freq_seeking_validation, freq_feeling_depressed, interest_fluctuation, sleep_issues):
7 | # Feature extraction
8 | features = [
9 | float(age),
10 | int(freq_no_purpose),
11 | int(freq_distracted),
12 | int(restless),
13 | int(worry_level),
14 | int(difficulty_concentrating),
15 | int(compare_to_successful_people),
16 | int(feelings_about_comparisons),
17 | int(freq_seeking_validation),
18 | int(freq_feeling_depressed),
19 | int(interest_fluctuation),
20 | int(sleep_issues)
21 | ]
22 |
23 | prediction = model.predict([features])[0]
24 |
25 | return prediction
26 |
27 |
--------------------------------------------------------------------------------
/models/stress_level_detect/predict.py:
--------------------------------------------------------------------------------
1 | from models.stress_level_detect.model import stress_level_prediction
2 |
3 | def get_prediction(age, freq_no_purpose, freq_distracted, restless, worry_level, difficulty_concentrating, compare_to_successful_people, feelings_about_comparisons, freq_seeking_validation, freq_feeling_depressed, interest_fluctuation, sleep_issues):
4 |
5 | prediction = stress_level_prediction(age, freq_no_purpose, freq_distracted, restless, worry_level, difficulty_concentrating, compare_to_successful_people, feelings_about_comparisons, freq_seeking_validation, freq_feeling_depressed, interest_fluctuation, sleep_issues)
6 |
7 | advice = ""
8 |
9 | # Provide advice based on the prediction value
10 | if prediction < 1.5:
11 | advice = "You are experiencing mild stress. Keep maintaining a balanced lifestyle, and consider engaging in activities that bring you joy and relaxation."
12 | elif 1.5 <= prediction < 3.5:
13 | advice = "You have a moderate stress level. It's important to take breaks and practice stress-relief techniques like mindfulness, walking, cycling, music or exercise."
14 | else:
15 | advice = "You are experiencing high stress levels. Consider reaching out to a mental health professional or practicing stress management techniques to help cope."
16 |
17 | return advice
--------------------------------------------------------------------------------
/models/stress_level_detect/saved_models/random_forest_model.joblib:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/stress_level_detect/saved_models/random_forest_model.joblib
--------------------------------------------------------------------------------
/models/text_sumarization/predict.py:
--------------------------------------------------------------------------------
1 | from transformers import pipeline
2 | import streamlit as st
3 |
4 | @st.cache_resource(show_spinner=True) # Cache the model loading for faster performance
5 | def load_summarizer():
6 | """Load and cache the text summarization pipeline model."""
7 | return pipeline("summarization", model="t5-small")
8 |
9 | def generate_summary(text: str) -> str:
10 | """Generate a summary for the given input text."""
11 | summarizer = load_summarizer()
12 | summary = summarizer(text, max_length=150, min_length=30, do_sample=False)
13 | return summary[0]["summary_text"]
14 |
--------------------------------------------------------------------------------
/models/translator_app/README.md:
--------------------------------------------------------------------------------
1 | # 🌐 Real-Time Language Translator
2 |
3 | A real-time language translation app built using Streamlit, Google Translate, and speech-to-text technology. This app allows users to speak in one language and get real-time translations in another, along with text-to-speech output for the translated text.
4 |
5 | ## Features
6 |
7 | - **Speech Recognition:** Capture spoken input using a microphone.
8 | - **Real-Time Translation:** Translate the captured speech into a chosen language.
9 | - **Text-to-Speech:** Listen to the translated text in the target language.
10 | - **Multiple Languages Supported:** Including English, Hindi, Tamil, Telugu, Marathi, Bengali, and more.
11 |
--------------------------------------------------------------------------------
/models/translator_app/assets/styles.css:
--------------------------------------------------------------------------------
1 | body {background-color: #0d1117; color: #c9d1d9;}
2 | .main {padding: 20px;}
3 | h1 {color: #58a6ff;}
4 | .info {font-size: 18px; color: #58a6ff; animation: glow 1s infinite;}
5 | .success {font-size: 18px; color: #34d058;}
6 | .stButton > button {
7 | background-color: #238636;
8 | color: white;
9 | border-radius: 12px;
10 | padding: 10px 30px;
11 | font-weight: bold;
12 | font-size: 16px;
13 | cursor: pointer;
14 | transition: background-color 0.3s ease;
15 | }
16 | .stButton > button:hover {
17 | background-color: #2ea043;
18 | }
19 |
20 | @keyframes glow {
21 | 0% {box-shadow: 0 0 5px #58a6ff;}
22 | 50% {box-shadow: 0 0 20px #58a6ff;}
23 | 100% {box-shadow: 0 0 5px #58a6ff;}
24 | }
25 |
--------------------------------------------------------------------------------
/models/translator_app/translation.py:
--------------------------------------------------------------------------------
1 | import speech_recognition as sr
2 | from googletrans import Translator
3 | from gtts import gTTS
4 | import pygame
5 | import streamlit as st
6 |
7 | # Initialize recognizer and translator
8 | recognizer = sr.Recognizer()
9 | translator = Translator()
10 |
11 | # Function to capture and translate speech
12 | def capture_and_translate(source_lang, target_lang):
13 | # Check for available audio input devices
14 | mic_list = sr.Microphone.list_microphone_names()
15 |
16 | if not mic_list:
17 | st.error("⚠️ No microphone found. Please connect a microphone and restart the app.")
18 | return None
19 |
20 | selected_mic_index = st.selectbox("Select a microphone", range(len(mic_list)), format_func=lambda x: mic_list[x])
21 |
22 | with sr.Microphone(device_index=selected_mic_index) as source:
23 | st.info("🎙️ Listening... Speak now.")
24 |
25 | recognizer.adjust_for_ambient_noise(source, duration=1)
26 | recognizer.energy_threshold = 200
27 |
28 | try:
29 | # Capture speech
30 | audio = recognizer.listen(source, timeout=15, phrase_time_limit=15)
31 | st.success("🔄 Processing...")
32 |
33 | # Recognize speech
34 | text = recognizer.recognize_google(audio, language=source_lang)
35 | st.write(f"🗣️ Original ({source_lang}): {text}")
36 |
37 | # Translate speech
38 | translation = translator.translate(text, src=source_lang, dest=target_lang)
39 | st.write(f"🔊 Translated ({target_lang}): {translation.text}")
40 |
41 | # Convert translation to speech
42 | tts = gTTS(text=translation.text, lang=target_lang)
43 | audio_file = "translated_audio.mp3"
44 | tts.save(audio_file)
45 |
46 | # Play the audio
47 | pygame.mixer.init()
48 | pygame.mixer.music.load(audio_file)
49 | pygame.mixer.music.play()
50 |
51 | st.audio(audio_file)
52 |
53 | while pygame.mixer.music.get_busy():
54 | pygame.time.Clock().tick(10)
55 |
56 | pygame.mixer.music.stop()
57 | pygame.mixer.quit()
58 |
59 | return audio_file
60 |
61 | except sr.WaitTimeoutError:
62 | st.error("⚠️ No speech detected. Try speaking louder.")
63 | except sr.UnknownValueError:
64 | st.error("⚠️ Could not recognize speech.")
65 | except Exception as e:
66 | st.error(f"⚠️ Error: {str(e)}")
67 | return None
68 |
--------------------------------------------------------------------------------
/models/translator_app/utils.py:
--------------------------------------------------------------------------------
1 | # Available languages dictionary
2 | LANGUAGES = {
3 | 'English': 'en',
4 | 'Hindi': 'hi',
5 | 'Tamil': 'ta',
6 | 'Telugu': 'te',
7 | 'Marathi': 'mr',
8 | 'Bengali': 'bn',
9 | 'Gujarati': 'gu',
10 | 'Kannada': 'kn',
11 | 'Malayalam': 'ml',
12 | 'Punjabi': 'pa',
13 | 'Urdu': 'ur'
14 | }
15 |
--------------------------------------------------------------------------------
/packages.txt:
--------------------------------------------------------------------------------
1 | portaudio19-dev
--------------------------------------------------------------------------------
/page_handler.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | import importlib.util
3 | import json
4 | from form_handler import FormHandler
5 |
6 |
7 | # Utility to dynamically import modules
8 | def load_module_from_path(module_name, file_path):
9 | spec = importlib.util.spec_from_file_location(module_name, file_path)
10 | module = importlib.util.module_from_spec(spec)
11 | spec.loader.exec_module(module)
12 | return module
13 |
14 |
15 | class PageHandler:
16 | def __init__(self, config_file_path):
17 | # Load the page configuration from JSON
18 | with open(config_file_path, "r") as f:
19 | self.pages = json.load(f)
20 |
21 | def render_page(self, page_name: str):
22 | # Check if the requested page exists in the JSON config
23 | if page_name not in self.pages:
24 | st.error("Page not found!")
25 | return
26 |
27 | page = self.pages[page_name]
28 | page_title = page.get("page_title", "Untitled Page")
29 | page_icon = page.get("page_icon", "📄") # Default to a generic icon
30 | model_predict_file_path = page.get("model_predict_file_path")
31 | form_config_path = page.get("form_config_path")
32 | tabs = page.get("tabs", [])
33 | # Set Streamlit's page config with the title and icon
34 | st.set_page_config(page_title=page_title, page_icon=page_icon)
35 |
36 | # Dynamically load the model prediction file
37 | model_module = load_module_from_path(
38 | f"{page_name}_model", model_predict_file_path
39 | )
40 | model_function = getattr(
41 | model_module, "get_prediction", None
42 | ) # or relevant model function
43 |
44 | # Create the tabs for the page
45 | tab_objects = st.tabs([tab["name"] for tab in tabs])
46 |
47 | # Iterate through the tabs to render them
48 | for i, tab in enumerate(tabs):
49 | with tab_objects[i]:
50 | if tab["type"] == "form":
51 | self.render_form(tab["form_name"], model_function, form_config_path)
52 | elif tab["type"] == "model_details":
53 | self.render_model_details(model_module,tabs[1])
54 |
55 | def render_form(self, form_name: str, model_function, form_config_path: str):
56 | form_handler = FormHandler(
57 | name=form_name,
58 | button_label="Predict",
59 | model=model_function,
60 | config_path=form_config_path,
61 | )
62 |
63 | # Render the form on the Streamlit page
64 | form_handler.render()
65 |
66 | def render_model_details(self, model_module,tab):
67 | # Dynamically load and call the model details function
68 | model_details_function = getattr(model_module, "model_details", None)
69 |
70 | #mentioning the title of the problem statement
71 | st.subheader("Problem Statement")
72 | st.write(tab["problem_statement"])
73 |
74 | #mentioning the title of the description
75 | st.subheader("Model Description")
76 | st.write(tab["description"])
77 |
78 | if model_details_function:
79 | metrics, prediction_plot, error_plot, performance_plot = model_details_function().evaluate()
80 |
81 | st.subheader(f"Model Accuracy: {metrics['Test_R2']:.2f}")
82 |
83 | #mentioning the title of the scores
84 | st.subheader(f"Scores: Training: {metrics['Train_R2']:.2f}, Testing: {metrics['Test_R2']:.2f}")
85 |
86 | # Display the scatter plot for predicted vs actual values
87 | #used clear_figure to clear the plot once displayed to avoid conflict
88 | if prediction_plot!=None:
89 | st.subheader("Model Prediction Plot")
90 | st.pyplot(prediction_plot, clear_figure=True)
91 | if error_plot!=None:
92 | st.subheader("Error Plot")
93 | st.pyplot(error_plot, clear_figure=True)
94 | if performance_plot!=None:
95 | st.subheader("Model Performance Plot")
96 | st.pyplot(performance_plot, clear_figure=True)
97 |
--------------------------------------------------------------------------------
/pages/Business_Performance_Forecasting.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | page_handler = PageHandler("pages/pages.json")
4 | page_handler.render_page("Business Performance Forecasting")
5 |
--------------------------------------------------------------------------------
/pages/Credit_Card_Fraud_Estimator.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | page_handler = PageHandler("pages/pages.json")
4 | page_handler.render_page("Credit Card Fraud Estimator")
5 |
--------------------------------------------------------------------------------
/pages/Customer_Income_Estimator.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | page_handler = PageHandler("pages/pages.json")
4 | page_handler.render_page("Customer Income Estimator")
5 |
--------------------------------------------------------------------------------
/pages/Gold_Price_Predictor.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | page_handler = PageHandler("pages/pages.json")
4 | page_handler.render_page("Gold Price Predictor")
5 |
--------------------------------------------------------------------------------
/pages/House_Price_Estimator.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | page_handler = PageHandler("pages/pages.json")
4 | page_handler.render_page("House Price Estimator")
5 |
--------------------------------------------------------------------------------
/pages/Insurance_Cost_Predictor.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | # Initialize the page handler with the path to the pages configuration file
4 | page_handler = PageHandler("pages/pages.json")
5 |
6 | # Render the page for Insurance Cost Predictor
7 | page_handler.render_page("Insurance Cost Predictor")
8 |
--------------------------------------------------------------------------------
/pages/Loan_Eligibility_Estimator.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | page_handler = PageHandler("pages/pages.json")
4 | page_handler.render_page("Loan Eligibility Estimator")
5 |
--------------------------------------------------------------------------------
/pages/PDF_Malware_Detection.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | from models.PDF_malware_detection.predict import predict_malware
3 | import tempfile
4 | import os
5 |
6 | st.title("Malware Detection for PDF Files")
7 |
8 | # Form for uploading the PDF file
9 | uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")
10 |
11 | # Create a submit button
12 | submit_button = st.button("Submit for Malware Detection")
13 |
14 | if uploaded_file is not None and submit_button:
15 | st.info("Processing file... Please wait.")
16 |
17 | # Create a temporary file
18 | with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
19 | tmp_file.write(uploaded_file.getvalue())
20 | tmp_file_path = tmp_file.name
21 |
22 | try:
23 | # Pass the temporary file path to the predict_malware function
24 | result = predict_malware(tmp_file_path)
25 |
26 | if result == 1:
27 | st.error("Malicious PDF detected!")
28 | else:
29 | st.success("The PDF is clean!")
30 |
31 | except Exception as e:
32 | st.error(f"An error occurred during processing: {str(e)}")
33 |
34 | finally:
35 | # Clean up the temporary file
36 | os.unlink(tmp_file_path)
37 |
38 | # Display some information about the uploaded file
39 | st.subheader("File Information:")
40 | st.json({
41 | "Filename": uploaded_file.name,
42 | "File size": f"{uploaded_file.size} bytes",
43 | "File type": uploaded_file.type
44 | })
45 |
46 | elif submit_button and uploaded_file is None:
47 | st.warning("Please upload a PDF file before submitting.")
--------------------------------------------------------------------------------
/pages/Parkinson_Disease_Detector.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | page_handler = PageHandler("pages/pages.json")
4 | page_handler.render_page("Parkinson Disease Detector")
--------------------------------------------------------------------------------
/pages/Sleep_Disorder_Predictor.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | page_handler = PageHandler("pages/pages.json")
4 | page_handler.render_page("Sleep Disorder Predictor")
--------------------------------------------------------------------------------
/pages/Stress_Level_Detector.py:
--------------------------------------------------------------------------------
1 | from page_handler import PageHandler
2 |
3 | page_handler = PageHandler("pages/pages.json")
4 | page_handler.render_page("Stress Level Detector")
5 |
--------------------------------------------------------------------------------
/pages/Text Summarizer.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | from models.text_sumarization.predict import generate_summary
3 |
4 | st.title("Text Summarization Tool")
5 |
6 | st.write("Enter the text you'd like to summarize (minimum 50 words).")
7 |
8 | user_input = st.text_area("Input Text", height=250)
9 |
10 | # A button to initiate the summarization process
11 | if st.button("Summarize"):
12 | if len(user_input.split()) < 50:
13 | st.warning("Please enter at least 50 words for summarization.")
14 | else:
15 | # Show a spinner while the summarization is being processed
16 | with st.spinner("Summarizing..."):
17 | summary = generate_summary(user_input) # Call the function from predict.py
18 | st.subheader("Summary:")
19 | st.code(summary, language="text", wrap_lines=True)
20 |
--------------------------------------------------------------------------------
/pages/Translator.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | from models.translator_app.translation import capture_and_translate
3 | from models.translator_app.utils import LANGUAGES
4 | import time
5 | import os
6 |
7 | # Load custom CSS
8 | def load_css():
9 | with open("models/translator_app/assets/styles.css") as f:
10 | st.markdown(f"", unsafe_allow_html=True)
11 |
12 | # UI Structure
13 | def main():
14 | st.title("🌐 Real-Time Language Translator")
15 | st.markdown("Translate spoken language into other languages in real-time with a sleek experience.")
16 |
17 | load_css() # Load custom styling
18 |
19 | # Language selection
20 | source_lang_name = st.selectbox("🌍 Select Source Language", list(LANGUAGES.keys()))
21 | target_lang_name = st.selectbox("🔄 Select Target Language", list(LANGUAGES.keys()))
22 |
23 | source_lang = LANGUAGES[source_lang_name]
24 | target_lang = LANGUAGES[target_lang_name]
25 |
26 | # Button to start listening
27 | if st.button("🎤 Start Listening", key="listen_button"):
28 | audio_file = capture_and_translate(source_lang, target_lang)
29 | if audio_file:
30 | time.sleep(1) # Ensure pygame cleanup
31 | try:
32 | os.remove(audio_file)
33 | except Exception as e:
34 | st.error(f"⚠️ Error while deleting the file: {str(e)}")
35 |
36 | if __name__ == "__main__":
37 | main()
38 |
--------------------------------------------------------------------------------
/pages/pages.json:
--------------------------------------------------------------------------------
1 | {
2 | "House Price Estimator": {
3 | "title": "House Price Estimator",
4 | "page_title": "House Price Estimator",
5 | "page_icon": "\ud83c\udfe0",
6 | "model_predict_file_path": "models/house_price/predict.py",
7 | "model_function": "get_prediction",
8 | "model_detail_function": "model_details",
9 | "form_config_path": "form_configs/house_price.json",
10 | "tabs": [
11 | {
12 | "name": "Estimator",
13 | "type": "form",
14 | "form_name": "House Price Form"
15 | },
16 | {
17 | "name": "Model Details",
18 | "type": "model_details",
19 | "problem_statement": "The model predicts house prices based on input features such as location, size, and number of rooms.",
20 | "description": "This model uses a linear regression algorithm to estimate house prices from the provided features, utilizing historical data to draw predictions."
21 | }
22 | ]
23 | },
24 | "Credit Card Fraud Estimator": {
25 | "title": "Credit Card Fraud Estimator",
26 | "page_title": "Credit Card Fraud Estimator",
27 | "page_icon": "\ud83d\udcb0",
28 | "model_predict_file_path": "models/credit_card_fraud/predict.py",
29 | "model_function": "get_prediction",
30 | "form_config_path": "form_configs/credit_card_fraud.json",
31 | "model_detail_function": "model_details",
32 | "tabs": [
33 | {
34 | "name": "Estimator",
35 | "type": "form",
36 | "form_name": "Credit Card Fraud Estimator"
37 | },
38 | {
39 | "name": "Model Details",
40 | "type": "model_details",
41 | "problem_statement": "The model predicts whether a credit card transaction is fraudulent based on input features such as transaction amount, transaction type, and customer profile.",
42 | "description": "This model uses a Support Vector Machine (SVM) to classify credit card transactions as fraudulent or non-fraudulent. It leverages the SVM's ability to create an optimal decision boundary, learning from historical transaction data with both normal and fraudulent transactions to make accurate predictions. The model is particularly useful in identifying complex patterns and borderline cases in high-dimensional data."
43 | }
44 | ]
45 | },
46 | "Loan Eligibility Estimator": {
47 | "title": "Loan Eligibility Estimator",
48 | "page_title": "Loan Eligibility Estimator",
49 | "page_icon": "\ud83d\udcb0",
50 | "model_predict_file_path": "models/loan_eligibility/predict.py",
51 | "model_function": "get_prediction",
52 | "form_config_path": "form_configs/loan_eligibility.json",
53 | "model_detail_function": "model_details",
54 | "tabs": [
55 | {
56 | "name": "Estimator",
57 | "type": "form",
58 | "form_name": "Loan Eligibility Form"
59 | },
60 | {
61 | "name": "Model Details",
62 | "type": "model_details",
63 | "problem_statement": "This model determines whether a user is eligible for a loan based on several financial and personal input factors.",
64 | "description": "It leverages a decision tree classification algorithm to predict loan eligibility, making use of past customer data and lending patterns."
65 | }
66 | ]
67 | },
68 | "Stress Level Detector": {
69 | "title": "Stress Level Detector",
70 | "page_title": "Stress Level Detector",
71 | "page_icon": "\u2691",
72 | "model_predict_file_path": "models/stress_level_detect/predict.py",
73 | "model_function": "get_prediction",
74 | "model_detail_function": "model_details",
75 | "form_config_path": "form_configs/stress_detection.json",
76 | "tabs": [
77 | {
78 | "name": "Stress Level Estimator",
79 | "type": "form",
80 | "form_name": "Stress Detection Form"
81 | },
82 | {
83 | "name": "Model Details",
84 | "type": "model_details",
85 | "problem_statement": "The model assesses the stress level of an individual based on biometric and behavioral data.",
86 | "description": "This model uses a support vector machine (SVM) to classify stress levels into different categories (low, medium, high) based on physiological indicators."
87 | }
88 | ]
89 | },
90 | "Parkinson Disease Detector": {
91 | "title": "Parkinson Disease Detector",
92 | "page_title": "Parkinson Disease Detector",
93 | "model_predict_file_path": "models/parkinson_disease_detector/parkinson_predict.py",
94 | "model_function": "get_prediction",
95 | "model_detail_function": "model_details",
96 | "form_config_path": "form_configs/parkinson_detection.json",
97 | "tabs": [
98 | {
99 | "name": "Parkinson's Disease Detector",
100 | "type": "form",
101 | "form_name": "Parkinson Detection Form"
102 | },
103 | {
104 | "name": "Model Details",
105 | "type": "model_details",
106 | "problem_statement": "The model aims to detect early signs of Parkinson's disease from voice and movement data.",
107 | "description": "Using a Random Forest classifier, the model predicts whether an individual is likely to have Parkinson's based on voice recordings and motor function metrics"
108 | }
109 | ]
110 | },
111 | "Customer Income Estimator": {
112 | "title": "Customer Income Estimator",
113 | "page_title": "Customer Income Estimator",
114 | "page_icon": "\ud83d\udcb0",
115 | "model_predict_file_path": "models/customer_income/predict.py",
116 | "model_function": "get_prediction",
117 | "model_detail_function": "model_details",
118 | "form_config_path": "form_configs/customer_income.json",
119 | "tabs": [
120 | {
121 | "name": "Estimator",
122 | "type": "form",
123 | "form_name": "Customer Income Estimation Form"
124 | },
125 | {
126 | "name": "Model Details",
127 | "type": "model_details",
128 | "problem_statement": "The model predicts the income of a customer based on various demographic and financial features.",
129 | "description": "This model uses Random Forest Regression to estimate the income of a customer based on input features such as age, education, employment status etc."
130 | }
131 | ]
132 | },
133 | "Gold Price Predictor": {
134 | "title": "Gold Price Predictor",
135 | "page_title": "Gold Price Predictor",
136 | "page_icon": "\ud83d\udcb0",
137 | "model_predict_file_path": "models/gold_price_prediction/predict.py",
138 | "model_function": "get_prediction",
139 | "model_detail_function": "model_details",
140 | "form_config_path": "form_configs/gold_price_prediction.json",
141 | "tabs": [
142 | {
143 | "name": "Gold Price Form",
144 | "type": "form",
145 | "form_name": "Gold Price Form"
146 | },
147 | {
148 | "name": "Model Details",
149 | "type": "model_details",
150 | "problem_statement": "The Gold Price Predictor leverages financial metrics and machine learning algorithms to forecast the price of gold (GLD). Gold prices are influenced by various economic factors, and this tool aims to provide accurate predictions based on historical data.",
151 | "description": "The dataset used for this model contains daily financial data, including stock market indices, commodity prices, and currency exchange rates. The goal is to predict the gold price (GLD) using features such as the S&P 500 Index (SPX), crude oil price (USO), silver price (SLV), and the EUR/USD exchange rate."
152 | }
153 | ]
154 | },
155 | "Sleep Disorder Predictor": {
156 | "title": "Sleep Disorder Predictor",
157 | "page_title": "Sleep Disorder Predictor",
158 | "model_predict_file_path": "models/sleep_disorder_predictor/predict.py",
159 | "model_function": "get_prediction",
160 | "model_detail_function": "model_details",
161 | "form_config_path": "form_configs/sleep_prediction.json",
162 | "tabs": [
163 | {
164 | "name": "Sleep Disorder Predictor",
165 | "type": "form",
166 | "form_name": "Sleep Prediction Form"
167 | },
168 | {
169 | "name": "Model Details",
170 | "type": "model_details",
171 |
172 | "problem_statement": "The model aims to predict the likelihood of an individual having a sleep disorder based on lifestyle, sleep quality, and health metrics.",
173 | "description": "Using a XGBoost classifier, the model predicts whether an individual is likely to have a sleep disorder based on features such as sleep duration, stress level, physical activity level, cardiovascular health metrics, and demographic information."
174 | }
175 | ]
176 | },
177 |
178 | "Malware_Detection": {
179 | "title": "PDF Malware Detection",
180 | "page_title": "PDF Malware Detection",
181 | "page_icon": "\ud83d\udd12",
182 | "model_predict_file_path": "models/pdf_malware_detection/predict.py",
183 | "model_function": "get_prediction",
184 | "model_detail_function": "model_details",
185 | "form_config_path": "form_configs/pdf_malware_detection.json",
186 | "tabs": [
187 | {
188 | "name": "Malware Detection Form",
189 | "type": "form",
190 | "form_name": "Malware Detection Form"
191 | },
192 | {
193 | "name": "Model Details",
194 | "type": "model_details"
195 | }
196 | ]
197 | },
198 |
199 | "Insurance Cost Predictor": {
200 | "title": "Insurance Cost Predictor",
201 | "page_title": "Insurance Cost Predictor",
202 | "page_icon": "\ud83d\udd12",
203 | "model_predict_file_path": "models/insurance_cost_predictor/predict.py",
204 | "model_function": "get_prediction",
205 | "model_detail_function": "model_details",
206 | "form_config_path": "form_configs/insurance_cost_predictor.json",
207 | "tabs": [
208 | {
209 | "name": "Insurance Cost Form",
210 | "type": "form",
211 | "form_name": "Insurance Cost Form"
212 | },
213 | {
214 | "name": "Model Details",
215 | "type": "model_details",
216 | "problem_statement": "The Insurance Cost Predictor estimates the insurance cost based on various personal factors such as age, BMI, number of children, smoker status, and region. By using machine learning, this tool provides accurate predictions to help users plan their insurance costs more effectively.",
217 | "description": "This model uses a dataset containing demographic and health-related factors to predict the cost of insurance. The features include age, sex, BMI, children, smoker status, and region, with predictions made using the Random Forest algorithm for accurate results. Ensemble techniques like XGBoost will also be used to further enhance the prediction accuracy."
218 | }
219 | ]
220 | },
221 | "Business Performance Forecasting": {
222 | "title": "Business Performance Forecasting",
223 | "page_title": "Business Performance Forecasting",
224 | "page_icon": "\ud83c\udf3e",
225 | "model_predict_file_path": "models/business_performance_forecasting/predict.py",
226 | "model_function": "get_prediction",
227 | "model_detail_function": "model_details",
228 | "form_config_path": "form_configs/business_performance_forecasting.json",
229 | "tabs": [
230 | {
231 | "name": "Business Forecast Form",
232 | "type": "form",
233 | "form_name": "Business Forecast Form"
234 | },
235 | {
236 | "name": "Model Details",
237 | "type": "model_details",
238 | "problem_statement": "The Business Performance Forecasting model predicts future profits based on R&D spend, administration costs, marketing spend, and state. By utilizing machine learning, this tool assists businesses in making informed decisions about resource allocation.",
239 | "description": "This model employs a dataset with features including R&D spend, administration costs, marketing spend, and geographic location to forecast profits. The predictions are generated using regression techniques, ensuring accuracy and reliability for business strategy planning."
240 | }
241 | ]
242 | }
243 |
244 | }
245 |
246 |
247 |
--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
1 | # Predictive Calc
2 |
3 | ## Overview
4 | **Predictive Calc** is an open-source project that provides a flexible collection of machine learning models designed to predict a wide variety of outcomes. Built with **Python** and **Streamlit**, the project offers an intuitive web interface, enabling users to easily interact with the models. The primary goal of the project is to streamline the integration of machine learning models with custom forms, allowing users to build their own prediction calculators tailored to specific use cases.
5 |
6 | ## Current Status
7 | The project is under active development with several machine learning models already implemented for various prediction tasks. The architecture is designed for dynamic configuration using JSON files, which map model parameters, inputs, and features. This design ensures new models can be seamlessly added or updated with minimal modification to the core codebase.
8 |
9 | The project has been successfully tested in local environments, and current efforts are focused on enhancing integration, optimizing deployment, and improving scalability for production-ready applications.
10 |
11 | ## How to Contribute:
12 | 1. Review existing issues and contribute towards resolving them.
13 | 2. Or create new issues to discuss new ideas, suggest features, or report bugs.
14 | 3. Fork the repository and create a new branch for your contribution.
15 | 4. Implement your changes and submit a pull request with a clear description.
16 | 5. Further details can be found in the [contributing.md](contributing.md) file.
17 |
18 | ## Setup Instructions
19 | 1. Fork or clone the repository.
20 | 2. Create a virtual environment and install the necessary dependencies:
21 | ```powershell
22 | python -m venv .venv
23 | .venv\Scripts\Activate
24 | ```
25 | 2. Install the necessary dependencies:
26 | ```powershell
27 | pip install -r requirements.txt
28 | ```
29 | 3. Run the Streamlit application using:
30 | ```powershell
31 | streamlit run app.py
32 | ```
33 | ## Docker Setup Instructions
34 |
35 | 1. Install [Docker](https://docs.docker.com/get-docker/) on your machine.
36 | 2. Windows user - Install [WSL](https://learn.microsoft.com/en-us/windows/wsl/install/) (Ubuntu-22.04).
37 | 3. Run the application using Docker Compose:
38 | ```powershell
39 | docker-compose up
40 | ```
41 | 4. To stop the application:
42 | ```powershell
43 | docker-compose down
44 | ```
45 |
46 | ## Our Valuable contributors
47 | [](https://github.com/yashasvini121/predictive-calc/graphs/contributors)
48 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | # Requirements.txt for the Streamlit app
2 | # Jupyter notebook-related packages and their dependencies are excluded because
3 | # this app does not need to open or interact with Jupyter notebooks.
4 |
5 | altair==5.4.1
6 | anyio==4.5.0
7 | argon2-cffi==23.1.0
8 | argon2-cffi-bindings==21.2.0
9 | arrow==1.3.0
10 | asttokens==2.4.1
11 | async-lru==2.0.4
12 | attrs==24.2.0
13 | babel==2.16.0
14 | beautifulsoup4==4.12.3
15 | bleach==6.1.0
16 | blinker==1.8.2
17 | cachetools==5.5.0
18 | certifi==2024.8.30
19 | cffi==1.17.1
20 | chardet==3.0.4
21 | charset-normalizer==3.3.2
22 | click==8.1.7
23 | cmdstanpy==1.2.4
24 | colorama==0.4.6
25 | contourpy==1.3.0
26 | cycler==0.12.1
27 | debugpy==1.8.5
28 | decorator==5.1.1
29 | defusedxml==0.7.1
30 | et-xmlfile==1.1.0
31 | executing==2.1.0
32 | fastjsonschema==2.20.0
33 | fonttools==4.53.1
34 | fqdn==1.5.1
35 | gitdb==4.0.11
36 | GitPython==3.1.43
37 | googletrans==4.0.0rc1
38 | gTTS==2.5.3
39 | h11==0.9.0
40 | h2==3.2.0
41 | holidays==0.57
42 | hpack==3.0.0
43 | hstspreload==2024.10.1
44 | httpcore==0.9.1
45 | httpx==0.13.3
46 | hyperframe==5.2.0
47 | idna==2.10
48 | imbalanced-learn==0.12.4
49 | imblearn==0.0
50 | importlib_resources==6.4.5
51 | iniconfig==2.0.0
52 | Jinja2==3.1.4
53 | joblib==1.4.2
54 | json5==0.9.25
55 | jsonpointer==3.0.0
56 | jsonschema==4.23.0
57 | jsonschema-specifications==2023.12.1
58 | kiwisolver==1.4.7
59 | markdown-it-py==3.0.0
60 | MarkupSafe==2.1.5
61 | matplotlib==3.9.2
62 | matplotlib-inline==0.1.7
63 | mdurl==0.1.2
64 | mistune==3.0.2
65 | narwhals==1.8.1
66 | numpy
67 | openpyxl==3.1.5
68 | overrides==7.7.0
69 | packaging==24.1
70 | pandas==2.2.2
71 | pandocfilters==1.5.1
72 | parso==0.8.4
73 | patsy==0.5.6
74 | pillow==10.4.0
75 | platformdirs==4.3.6
76 | plotly==5.24.1
77 | pluggy==1.5.0
78 | prometheus_client==0.20.0
79 | prompt_toolkit==3.0.47
80 | prophet==1.1.6
81 | protobuf==4.25.5
82 | psutil==6.0.0
83 | pure_eval==0.2.3
84 | pyarrow==17.0.0
85 | PyAudio==0.2.14
86 | pycparser==2.22
87 | pydeck==0.9.1
88 | pygame==2.6.1
89 | Pygments==2.18.0
90 | PyMuPDF==1.24.11
91 | pyparsing==3.1.4
92 | PyPDF2==3.0.1
93 | pdfid==1.1.3
94 | pytest==8.3.3
95 | pytest-mock==3.14.0
96 | python-dateutil==2.9.0.post0
97 | python-json-logger==2.0.7
98 | pytz==2024.2
99 | PyYAML==6.0.2
100 | pyzmq==26.2.0
101 | referencing==0.35.1
102 | requests==2.32.3
103 | rfc3339-validator==0.1.4
104 | rfc3986==1.5.0
105 | rfc3986-validator==0.1.1
106 | rich==13.8.1
107 | rpds-py==0.20.0
108 | scikit-learn==1.5.2
109 | scipy==1.14.1
110 | seaborn==0.13.2
111 | Send2Trash==1.8.3
112 | setuptools==75.1.0
113 | six==1.16.0
114 | smmap==5.0.1
115 | sniffio==1.3.1
116 | soupsieve==2.6
117 | SpeechRecognition==3.10.4
118 | stack-data==0.6.3
119 | stanio==0.5.1
120 | statsmodels==0.14.3
121 | streamlit==1.38.0
122 | tenacity==8.5.0
123 | terminado==0.18.1
124 | threadpoolctl==3.5.0
125 | tinycss2==1.3.0
126 | toml==0.10.2
127 | tornado==6.4.1
128 | tqdm==4.66.5
129 | traitlets==5.14.3
130 | types-python-dateutil==2.9.0.20240906
131 | typing_extensions==4.12.2
132 | tzdata==2024.1
133 | uri-template==1.3.0
134 | urllib3==2.2.3
135 | watchdog==4.0.2
136 | wcwidth==0.2.13
137 | webcolors==24.8.0
138 | webencodings==0.5.1
139 | websocket-client==1.8.0
140 | xgboost==2.1.1
141 | transformers==4.45.2
142 | tf_keras==2.17.0
143 |
--------------------------------------------------------------------------------
/todo.md:
--------------------------------------------------------------------------------
1 | ## To Do
2 | 1. Add docs
3 | 2. Add github actions
4 | 3. Add logging
5 | 4. Add tests
6 | 5. Add graphs
7 | 6. Improve the UI/UX
8 | 7. Complete the Loan Eligibility Calculator
9 | 8. Add more models
10 |
11 |
12 | # To Fix:
13 | 1. Plots
--------------------------------------------------------------------------------