├── .dockerignore ├── .github ├── ISSUE_TEMPLATE │ ├── add-new-model.md │ ├── bug_report.md │ └── feature_request.md ├── PULL_REQUEST_TEMPLATE.md ├── pull_request.md └── workflows │ └── validate-pr.yaml ├── .gitignore ├── App.py ├── Dockerfile ├── LICENSE ├── assets └── images │ └── machine_learning.png ├── contributing.md ├── dev-requirements.txt ├── docker-compose.debug.yml ├── docker-compose.yml ├── docs ├── project-structure.md └── tutorial.md ├── form_configs ├── business_performance_forecasting.json ├── credit_card_fraud.json ├── customer_income.json ├── gold_price_prediction.json ├── house_price.json ├── insurance_cost_predictor.json ├── loan_eligibility.json ├── parkinson_detection.json ├── sleep_prediction.json └── stress_detection.json ├── form_handler.py ├── machine-learning.gif ├── models ├── PDF_malware_detection │ ├── Data │ │ └── PDFMalware2022.csv │ ├── model.ipynb │ ├── pdf_extraction.py │ ├── predict.py │ └── saved_models │ │ └── random_forest_model.pkl ├── business_performance_forecasting │ ├── data │ │ └── 50_Startups.csv │ ├── model.py │ ├── notebooks │ │ └── business_performance_forecasting.ipynb │ ├── predict.py │ └── saved_models │ │ ├── evaluation_results.pkl │ │ ├── model.pkl │ │ └── scaler.pkl ├── credit_card_fraud │ ├── data │ │ └── creditcardcsvpresent.csv │ ├── model.py │ ├── modelEvaluation.py │ ├── predict.py │ └── saved_models │ │ └── creditCardFraud_svc_model.pkl ├── customer_income │ ├── data │ │ └── train.csv │ ├── model.py │ ├── notebooks │ │ └── customer_income.ipynb │ ├── predict.py │ └── saved_models │ │ ├── CImodel.pkl │ │ ├── CIscaler.pkl │ │ └── feature_names.pkl ├── gold_price_prediction │ ├── data │ │ └── gold_price_data.csv │ ├── model.py │ ├── notebooks │ │ └── Gold_Price_Prediction.ipynb │ ├── predict.py │ └── saved_models │ │ └── random_forest_model.joblib ├── house_price │ ├── ImprovedModel.py │ ├── ModelEvaluation.py │ ├── data │ │ └── housing.csv │ ├── model.py │ ├── predict.py │ └── saved_models │ │ ├── model_01.pkl │ │ ├── model_02.pkl │ │ ├── scaler_01.pkl │ │ └── scaler_02.pkl ├── insurance_cost_predictor │ ├── data │ │ └── insurance.csv │ ├── model.py │ ├── notebooks │ │ └── ins.ipynb │ ├── predict.py │ └── saved_models │ │ └── insurance_model.pkl ├── loan_eligibility │ ├── model.py │ └── predict.py ├── parkinson_disease_detector │ ├── data │ │ └── dataset.csv │ ├── notebooks │ │ └── Parkinson's_Disease.ipynb │ ├── parkinson_model.py │ ├── parkinson_predict.py │ └── saved_models │ │ ├── MinMaxScaler.sav │ │ └── Model_Prediction.sav ├── sleep_disorder_predictor │ ├── data │ │ └── dataset.csv │ ├── model.py │ ├── notebooks │ │ └── sleep-disorder.ipynb │ ├── predict.py │ └── saved_models │ │ ├── Model_Prediction.sav │ │ └── preprocessor.sav ├── stress_level_detect │ ├── model.py │ ├── notebooks │ │ └── stress_level_detection.ipynb │ ├── predict.py │ └── saved_models │ │ └── random_forest_model.joblib ├── text_sumarization │ └── predict.py └── translator_app │ ├── README.md │ ├── assets │ └── styles.css │ ├── translation.py │ └── utils.py ├── packages.txt ├── page_handler.py ├── pages ├── Business_Performance_Forecasting.py ├── Credit_Card_Fraud_Estimator.py ├── Customer_Income_Estimator.py ├── Gold_Price_Predictor.py ├── House_Price_Estimator.py ├── Insurance_Cost_Predictor.py ├── Loan_Eligibility_Estimator.py ├── PDF_Malware_Detection.py ├── Parkinson_Disease_Detector.py ├── Sleep_Disorder_Predictor.py ├── Stress_Level_Detector.py ├── Text Summarizer.py ├── Translator.py └── pages.json ├── readme.md ├── requirements.txt └── todo.md /.dockerignore: -------------------------------------------------------------------------------- 1 | **/__pycache__ 2 | **/.venv 3 | **/.classpath 4 | **/.dockerignore 5 | **/.env 6 | **/.git 7 | **/.gitignore 8 | **/.project 9 | **/.settings 10 | **/.toolstarget 11 | **/.vs 12 | **/.vscode 13 | **/*.*proj.user 14 | **/*.dbmdl 15 | **/*.jfm 16 | **/bin 17 | **/charts 18 | **/docker-compose* 19 | **/compose* 20 | **/Dockerfile* 21 | **/node_modules 22 | **/npm-debug.log 23 | **/obj 24 | **/secrets.dev.yaml 25 | **/values.dev.yaml 26 | LICENSE 27 | README.md 28 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/add-new-model.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Add New Model 3 | about: Add a new model to the project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | 🔍 **Problem Description**: 11 | 12 | 13 | 🧠 **Model Description**: 14 | 15 | 16 | ⏲️ **Estimated Time for Completion**: 17 | 18 | 19 | 🎯 **Expected Outcome**: 20 | 21 | 22 | 📄 **Additional Context**: 23 | 24 | 25 | **To be Mentioned while taking the issue**: 26 | - What is your participant role? 27 | 28 | 29 | **Note:** 30 | - Please review the project documentation and ensure your code aligns with the project structure. 31 | - Please ensure that either the `predict.py` file includes a properly implemented `model_details()` function or the notebook contains this function to print a detailed model report. The model will not be accepted without this function in place, as it is essential for generating the necessary model details. 32 | - Prefer using a new branch to resolve the issue, as it helps keep the main branch stable and makes it easier to manage and review your changes. 33 | - Strictly use the pull request template provided in the repository to create a pull request. -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | 🐞 **Describe the Bug**: 11 | 12 | 13 | 🔁 **To Reproduce**: 14 | 19 | 20 | 💡 **Expected Behavior**: 21 | 22 | 23 | 🖥️ **Device Information**: 24 | 29 | 30 | 📸 **Screenshots**: 31 | 32 | 33 | 📄 **Additional Context**: 34 | 35 | 36 | **To be Mentioned while taking the issue**: 37 | - What is your participant role? 38 | 39 | 40 | **Note:** 41 | - Please review the project documentation and ensure your code aligns with the project structure. If applicable, consider adding a `model_details` function for additional insights. 42 | - Prefer using a new branch to resolve the issue, as it helps keep the main branch stable and makes it easier to manage and review your changes. -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | 🌟 **Is your feature request related to a problem?**: 11 | 12 | 13 | 💡 **Describe the solution you'd like**: 14 | 15 | 16 | 🔀 **Describe alternatives considered**: 17 | 18 | 19 | 📄 **Additional Context**: 20 | 21 | 22 | **To be Mentioned while taking the issue**: 23 | - What is your participant role? 24 | 25 | 26 | **Note:** 27 | - Please review the project documentation and ensure your code aligns with the project structure. 28 | - Please ensure that either the `predict.py` file includes a properly implemented `model_details()` function or the notebook contains this function to print a detailed model report. The model will not be accepted without this function in place, as it is essential for generating the necessary model details. 29 | - Prefer using a new branch to resolve the issue, as it helps keep the main branch stable and makes it easier to manage and review your changes. 30 | - Strictly use the pull request template provided in the repository to create a pull request. -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | ### Description 2 | 3 | 4 | 5 | ### Issue Resolved 6 | 7 | 8 | 9 | ### Changes Made 10 | 11 | 12 | 13 | ### Screenshots or Videos 14 | 15 | 16 | 17 | ### Additional Details 18 | 19 | 20 | 21 | 22 | ### Checklist 23 | 24 | - [ ] My code follows the current [project structure](https://github.com/yashasvini121/predictive-calc/blob/master/docs/project-structure.md) 25 | - [ ] I have thoroughly reviewed and updated the `requirements.txt` file to include any new packages 26 | - [ ] The `predict.py` file includes a properly implemented `model_details()` function, or the notebook contains this function to print a detailed model report. **The model will not be accepted without this function**, as it is essential for generating the necessary model details. 27 | - [ ] I have added relevant tests (if necessary). 28 | - [ ] I have added comments in the code where needed. 29 | - [ ] This PR is submitted under **Hacktoberfest**. 30 | - [ ] This PR is submitted under **GirlScript Summer of Code (GSSoC-Extd)**. -------------------------------------------------------------------------------- /.github/pull_request.md: -------------------------------------------------------------------------------- 1 | ### Description 2 | 3 | 4 | 5 | ### Issue Resolved 6 | 7 | 8 | 9 | ### Changes Made 10 | 11 | 12 | 13 | ### Screenshots or Videos 14 | 15 | 16 | 17 | ### Additional Details 18 | 19 | 20 | 21 | 22 | ### Checklist 23 | 24 | - [ ] My code follows the current [project structure](https://github.com/yashasvini121/predictive-calc/blob/master/docs/project-structure.md) 25 | - [ ] I have thoroughly reviewed and updated the `requirements.txt` file to include any new packages 26 | - [ ] The `predict.py` file includes a properly implemented `model_details()` function, or the notebook contains this function to print a detailed model report. **The model will not be accepted without this function**, as it is essential for generating the necessary model details. 27 | - [ ] I have added relevant tests (if necessary). 28 | - [ ] I have added comments in the code where needed. 29 | - [ ] This PR is submitted under **Hacktoberfest**. 30 | - [ ] This PR is submitted under **GirlScript Summer of Code (GSSoC-Extd)**. -------------------------------------------------------------------------------- /.github/workflows/validate-pr.yaml: -------------------------------------------------------------------------------- 1 | name: Validate PR 2 | 3 | on: 4 | pull_request: 5 | branches: [ master ] 6 | types: [opened, synchronize, reopened, ready_for_review] 7 | 8 | jobs: 9 | build: 10 | 11 | runs-on: ubuntu-latest 12 | 13 | strategy: 14 | matrix: 15 | python-version: [3.12] 16 | 17 | steps: 18 | - name: Checkout repository 19 | uses: actions/checkout@v3 20 | 21 | - name: Install Ubuntu packages 22 | run: | 23 | sudo apt-get update 24 | sudo xargs -a packages.txt apt-get install -y 25 | shell: bash 26 | 27 | - name: Set up Python ${{ matrix.python-version }} 28 | uses: actions/setup-python@v4 29 | with: 30 | python-version: ${{ matrix.python-version }} 31 | architecture: 'x64' 32 | 33 | - name: Install Python dependencies 34 | run: | 35 | python -m pip install --upgrade pip 36 | pip install -r requirements.txt 37 | 38 | - name: Validate dependencies with pip-check 39 | run: | 40 | pip install pip-check 41 | pip-check 42 | continue-on-error: false # Fail the workflow if dependencies are invalid 43 | 44 | - name: Test Streamlit App 45 | run: | 46 | pip install streamlit 47 | streamlit run App.py --server.headless true --browser.gatherUsageStats false & 48 | sleep 10 # Wait for the app to start 49 | curl --retry 5 --retry-delay 5 http://localhost:8501 50 | env: 51 | STREAMLIT_SERVER_HEADLESS: true 52 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | *.log 3 | -------------------------------------------------------------------------------- /App.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | 3 | st.set_page_config(page_title="Predictive Calc - Machine Learning Models", page_icon="🤖") 4 | 5 | st.title("Welcome to Predictive Calc!") 6 | 7 | st.markdown(""" 8 | ## Explore Cutting-Edge Machine Learning Models 9 | **Predictive Calc** offers a powerful suite of machine learning models designed to assist you in making informed decisions. Whether it's predicting house prices, determining loan eligibility, or evaluating health risks, we have you covered. 10 | """) 11 | 12 | st.markdown(""" 13 | ## Why Choose Predictive Calc? """) 14 | features = [ 15 | { 16 | "title": "Accurate Predictions", 17 | "icon": "🔍", 18 | "description": "Harness cutting-edge machine learning algorithms that provide reliable and precise predictions." 19 | }, 20 | { 21 | "title": "User-Friendly Interface", 22 | "icon": "💻", 23 | "description": "Enjoy a seamless, intuitive experience with models designed for practical applications across various domains." 24 | }, 25 | { 26 | "title": "Comprehensive Calculators", 27 | "icon": "📊", 28 | "description": "Access a diverse set of models for financial analysis, health assessments, security checks, and more, all in one place." 29 | }, 30 | { 31 | "title": "Health & Financial Insights", 32 | "icon": "🏥💰", 33 | "description": "From estimating house prices and checking loan eligibility to evaluating health risks like Parkinson’s and stress levels, Predictive Calc offers essential tools for everyday decision-making." 34 | }, 35 | { 36 | "title": "Enhanced Document Analysis & Language Tools", 37 | "icon": "📄🌐", 38 | "description": "With a built-in text summarizer and translator, streamline your reading experience and break language barriers effortlessly. PDF Malware Detection also helps keep your documents safe." 39 | } 40 | ] 41 | 42 | # Display features in a structured card format 43 | for feature in features: 44 | st.markdown( 45 | f""" 46 |
56 | {feature["icon"]} 57 |
58 |

{feature["title"]}

59 |

{feature["description"]}

60 |
61 |
62 | """, 63 | unsafe_allow_html=True 64 | ) 65 | 66 | st.markdown("---") 67 | st.markdown("## Available Calculators") 68 | 69 | # Calculator information in a structured format 70 | calculators = [ 71 | { 72 | "name": "Income Estimator", 73 | "description": "Estimate the annual income based on socio-economic and demographic information.", 74 | "details": """ 75 | This calculator uses demographic and socio-economic variables to predict income level, providing insights into income patterns. 76 | """ 77 | }, 78 | { 79 | "name": "Gold Price Predictor", 80 | "description": "Predict future gold prices using various financial metrics.", 81 | "details": """ 82 | ### Introduction 83 | The Gold Price Predictor leverages financial metrics and machine learning algorithms to forecast the price of gold (GLD). Gold prices are influenced by various economic factors, and this tool aims to provide accurate predictions based on historical data. 84 | 85 | ### Gold Price Dataset 86 | The dataset used for this model contains daily financial data, including stock market indices, commodity prices, and currency exchange rates. The goal is to predict the gold price (GLD) using features such as the S&P 500 Index (SPX), crude oil price (USO), silver price (SLV), and the EUR/USD exchange rate. 87 | 88 | ### Additional Variable Information 89 | - **SPX**: The S&P 500 index value, which tracks the performance of 500 large companies listed on stock exchanges in the United States. 90 | - **USO**: The price of United States Oil Fund (USO), which reflects crude oil prices. 91 | - **SLV**: The price of iShares Silver Trust (SLV), which reflects silver prices. 92 | - **EUR/USD**: The Euro-to-U.S. Dollar exchange rate, which indicates the strength of the euro relative to the U.S. dollar. 93 | - **GLD**: The price of SPDR Gold Shares (GLD), which is the target variable representing gold prices. 94 | """ 95 | }, 96 | { 97 | "name": "House Price Estimator", 98 | "description": "Predict the price of a house based on various features.", 99 | "details": """ 100 | Using historical and current market data, this tool predicts the house price based on features like location, size, and amenities. 101 | """ 102 | }, 103 | { 104 | "name": "Loan Eligibility", 105 | "description": "Check your eligibility for various types of loans based on your financial profile.", 106 | "details": """ 107 | This calculator assesses loan eligibility by analyzing credit scores, income, and other relevant financial details. 108 | """ 109 | }, 110 | { 111 | "name": "Parkinson's Disease", 112 | "description": "Assess your risk of Parkinson's Disease with advanced ML algorithms.", 113 | "details": """ 114 | ### Introduction 115 | Parkinson's disease (PD) is a progressive neurodegenerative disorder that primarily affects movement. It often starts with subtle symptoms such as tremors, stiffness, and slow movement. 116 | 117 | ### Oxford Parkinson's Disease Detection Dataset (UCI ML Repository) 118 | The dataset contains biomedical voice measurements from 31 people, 23 of whom have Parkinson's disease (PD). The main goal is to differentiate between healthy individuals and those with PD using the "status" column, where 0 indicates healthy and 1 indicates PD. 119 | 120 | ### Additional Variable Information 121 | - **MDVP_Fo(Hz)**: Average vocal fundamental frequency. 122 | - **MDVP_Fhi(Hz)**: Maximum vocal fundamental frequency. 123 | - **MDVP_Flo(Hz)**: Minimum vocal fundamental frequency. 124 | - **MDVP_Jitter(%)**, **MDVP_Jitter(Abs)**, **MDVP_RAP**, **MDVP_PPQ**, **Jitter_DDP**: Measures of variation in fundamental frequency. 125 | - **MDVP_Shimmer**, **MDVP_Shimmer(dB)**, **Shimmer_APQ3**, **Shimmer_APQ5**, **MDVP_APQ**, **Shimmer_DDA**: Measures of variation in amplitude. 126 | - **NHR**, **HNR**: Noise-to-tonal ratio measures in the voice. 127 | - **status**: Health status of the subject (1 - Parkinson's, 0 - healthy). 128 | - **RPDE**, **D2**: Nonlinear dynamical complexity measures. 129 | - **DFA**: Signal fractal scaling exponent. 130 | - **spread1**, **spread2**, **PPE**: Nonlinear measures of fundamental frequency variation. 131 | """ 132 | }, 133 | { 134 | "name": "Sleep Disorder Prediction", 135 | "description": "Assess your risk of developing a sleep disorder using advanced ML algorithms.", 136 | "details": """ 137 | ### Introduction 138 | Sleep disorders can have a significant impact on an individual's overall health and well-being. These disorders often result from a combination of poor sleep habits, lifestyle factors, stress, and underlying medical conditions. 139 | 140 | ### Sleep Health and Lifestyle Dataset 141 | The dataset consists of sleep, lifestyle, and health metrics collected from 400 individuals. The main goal is to predict the likelihood of an individual having a sleep disorder using the "Sleep Disorder" column, which contains categorical values indicating the presence or absence of a sleep disorder. 142 | 143 | """ 144 | }, 145 | { 146 | "name": "PDF Malware Detector", 147 | "description": "Identify and alert users about potential malware in PDF files.", 148 | "details": """ 149 | ### Overview 150 | The PDF Malware Detector scans uploaded PDF files for malicious content, ensuring user safety and data protection. 151 | 152 | ### Key Features 153 | - **File Upload**: Simple drag-and-drop interface for easy file submission. 154 | - **Malware Detection**: Comprehensive analysis to detect harmful elements within PDFs. 155 | - **File Size Limit**: Supports files up to 200MB. 156 | 157 | ### Use Cases 158 | Perfect for users needing to verify the integrity of PDF documents before opening or sharing. 159 | """ 160 | }, 161 | { 162 | "name": "Stress Level Detector", 163 | "description": "Analyze your mental stress levels based on social media interactions.", 164 | "details": """ 165 | The model uses text analysis on social media data to identify signs of stress, helping users understand their mental health patterns. 166 | """ 167 | }, 168 | { 169 | "name": "Text Summarizer", 170 | "description": "Save time with concise, professional summaries of lengthy texts.", 171 | "details": """ 172 | Generate quick and comprehensive summaries of lengthy documents, ideal for students, researchers, and professionals. 173 | """ 174 | }, 175 | { 176 | "name": "Real-Time Language Translator", 177 | "description": "Translate spoken language into other languages instantly for seamless communication.", 178 | "details": """ 179 | ### Overview 180 | The Real-Time Language Translator uses advanced speech recognition and NLP to provide immediate translations between languages, enhancing communication in diverse settings. 181 | 182 | ### Key Features 183 | - **Instant Translation**: Real-time spoken language translation. 184 | - **Multiple Languages**: Supports a variety of source and target languages. 185 | - **User-Friendly Interface**: Easy to navigate for all users. 186 | 187 | ### Use Cases 188 | Ideal for travel, business meetings, and language learning, breaking down language barriers effortlessly. 189 | """ 190 | }, 191 | { 192 | "name": "Business Performance Forecaster", 193 | "description": "Forecast business profits based on various investment areas for better financial planning and budget allocation.", 194 | "details": """ 195 | ### Overview 196 | The Business Performance Forecaster predicts company profit based on investment in R&D, administration, and marketing, using machine learning to analyze investment patterns and optimize budget allocation. 197 | 198 | ### Key Features 199 | - **Profit Prediction**: Provides an estimated profit based on investment data. 200 | - **Investment Analysis**: Evaluates how different spending areas impact overall profit. 201 | - **Multi-Input Support**: Accounts for multiple variables like R&D, administration, and marketing expenses. 202 | 203 | ### Use Cases 204 | Useful for companies looking to plan budgets, assess the impact of investments, and improve decision-making processes in financial forecasting. 205 | """ 206 | } 207 | ] 208 | 209 | # Define shades of blue for calculators 210 | blue_shades = [ 211 | "#D1E8E2", # Light Blue 212 | "#A0D6E0", # Soft Blue 213 | "#7FB3E8", # Sky Blue 214 | ] 215 | 216 | # Display calculators in a table layout with two columns per row 217 | for i in range(0, len(calculators), 2): 218 | cols = st.columns(2) 219 | for j, col in enumerate(cols): 220 | if i + j < len(calculators): 221 | calc = calculators[i + j] 222 | # Use modulo to cycle through the blue shades 223 | color_index = (i + j) % len(blue_shades) 224 | color = blue_shades[color_index] 225 | with col: 226 | # Styled container for heading and description with different blue shades 227 | st.markdown( 228 | f""" 229 |
235 |

{calc['name']}

236 |

{calc['description']}

237 |
238 | """, 239 | unsafe_allow_html=True 240 | ) 241 | # More Info expander 242 | with st.expander("More Info"): 243 | st.write(calc["details"]) 244 | st.markdown("---") 245 | 246 | st.markdown("## Get Started Today!") 247 | st.markdown("Explore our calculators and take control of your predictive analytics journey!") 248 | 249 | st.write("Developed with ❤️ by Yashasvini Sharma | [Github](https://www.github.com/yashasvini121) | [LinkedIn](https://www.linkedin.com/in/yashasvini121/)") 250 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # app/Dockerfile 2 | 3 | FROM python:3.10-slim 4 | 5 | WORKDIR /app 6 | 7 | RUN apt-get update && apt-get install -y \ 8 | build-essential \ 9 | portaudio19-dev \ 10 | curl \ 11 | software-properties-common \ 12 | git \ 13 | && rm -rf /var/lib/apt/lists/* 14 | 15 | # RUN git clone https://github.com/streamlit/streamlit-example.git . 16 | 17 | COPY requirements.txt /app/ 18 | 19 | RUN pip3 install -r requirements.txt 20 | 21 | COPY . /app 22 | 23 | EXPOSE 8501 24 | 25 | HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health 26 | 27 | ENTRYPOINT ["streamlit", "run", "App.py"] -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Yashasvini Sharma 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /assets/images/machine_learning.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/assets/images/machine_learning.png -------------------------------------------------------------------------------- /contributing.md: -------------------------------------------------------------------------------- 1 | # How to Contribute: 2 | 1. Select an area of interest from the sections below. 3 | 2. Fork the repository and create a new branch for your contribution. 4 | 3. Implement your changes and submit a pull request with a clear description. 5 | 4. You can also create issues to discuss new ideas, suggest features, or report bugs. 6 | 5. Alternatively, review existing issues and contribute towards resolving them. 7 | 8 | ### Frontend Development (UI/UX Enhancements) 9 | - Help improve the design, responsiveness, and user experience of the web interface. 10 | - Key areas for enhancement include form layouts, interaction feedback, accessibility features, and mobile responsiveness. 11 | 12 | ### Machine Learning Contributions 13 | - Expand the scope of the project by adding new machine learning models for different prediction use cases. 14 | - **Notebook Contributions**: Share your model via a Jupyter notebook under the `models//notebooks/` directory. 15 | - **Full Model Integration**: Submit fully integrated models with optimized parameters, preprocessing steps, and final outputs. 16 | - You can also contribute by optimizing existing models, tuning hyperparameters, or improving dataset handling for better performance. 17 | 18 | ### Backend Development & System Integration 19 | - Help integrate new or existing machine learning models into the application’s backend using Python APIs. 20 | - Enhance the system's performance, develop API endpoints, and improve data handling capabilities for larger datasets. 21 | 22 | ### Documentation & Tutorials 23 | - Improve the project's documentation to help new contributors understand the structure and flow of the application. 24 | - Create and share tutorials or example use cases on building and integrating custom models into the system. 25 | 26 | ### Testing & Deployment 27 | - Contribute towards testing the application, prefer using pytest etc. 28 | 29 | ### Logging & Monitoring 30 | - Implement logging in the project 31 | -------------------------------------------------------------------------------- /dev-requirements.txt: -------------------------------------------------------------------------------- 1 | # Development dependencies for the project. 2 | # Includes Jupyter and other tools used during development, but not required for the Streamlit app in production. 3 | 4 | jupyterlab 5 | notebook 6 | 7 | # jupyterlab==4.2.5 8 | # jupyterlab_pygments==0.3.0 9 | # jupyterlab_server==2.27.3 10 | # jupyterlab_widgets==3.0.13 -------------------------------------------------------------------------------- /docker-compose.debug.yml: -------------------------------------------------------------------------------- 1 | version: '3.4' 2 | 3 | services: 4 | dockerupdate: 5 | image: dockerupdate 6 | build: 7 | context: . 8 | dockerfile: ./Dockerfile 9 | command: ["sh", "-c", "pip install debugpy -t /tmp && python /tmp/debugpy --wait-for-client --listen 0.0.0.0:5678 App.py "] 10 | ports: 11 | - 5678:5678 12 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: '3' 2 | 3 | services: 4 | webapp: 5 | build: 6 | context: . 7 | dockerfile: Dockerfile # Path to the Dockerfile 8 | ports: 9 | - "8501:8501" # Expose the container's port 8501 to the host 10 | environment: 11 | - PYTHONUNBUFFERED=1 # Ensures logs are immediately flushed to stdout 12 | healthcheck: 13 | test: ["CMD", "curl", "--fail", "http://localhost:8501/_stcore/health"] 14 | interval: 30s 15 | timeout: 10s 16 | retries: 3 17 | volumes: 18 | - .:/app # Mount the current directory to /app in the container -------------------------------------------------------------------------------- /docs/project-structure.md: -------------------------------------------------------------------------------- 1 | # Project Structure 2 | 3 | The directory layout of **Predictive Calc** is organized in a way that separates concerns between model development, frontend interaction, and configuration management. Below is a breakdown of the key folders and files: 4 | 5 | ``` 6 | predictive-calc/ 7 | ├── app.py # Main entry point for the Streamlit web app 8 | ├── docs/ # Documentation files 9 | │ ├── project-structure.md # Directory layout of the repository 10 | │ ├── tutorial.md # Steps to integrate a new machine learning model into the repository 11 | ├── form_configs/ # Configuration files for the forms 12 | │ ├── house_price.json # JSON configuration for house price model form input fields 13 | │ ├── loan_eligibility.json # JSON configuration for loan eligibility model form input fields 14 | │ ├── ... 15 | ├── models/ # Folder containing machine learning models 16 | │ ├── house_price/ # Example model directory (for house price prediction) 17 | │ │ ├── data/ # Datasets used by the models 18 | │ │ ├── notebooks/ # Jupyter notebooks for dataset exploration and model training 19 | │ │ │ ├── house_price.ipynb 20 | │ │ ├── saved_models/ # Serialized (pickled) model files and scalers 21 | │ │ │ ├── model.pkl # Trained model for predictions 22 | │ │ │ ├── scaler.pkl # Scaler for data normalization 23 | │ │ ├── model.py # Code to define and train the model 24 | │ │ ├── predict.py # Code to make predictions using the trained model. Contain get_prediction() function 25 | │ ├── modelEvaluation.py # Model evaluation class generates plots and metrics 26 | │ ├── ... 27 | ├── pages/ # Streamlit pages representing different calculators 28 | │ ├── pages.json # Configuration file for managing page details and settings 29 | │ ├── House_Price_Estimator.py 30 | │ ├── Loan_Eligibility_Estimator.py 31 | │ ├── ... 32 | ├── assets/ 33 | │ ├── images/ # Image assets used in the Streamlit app 34 | │ │ ├── machine-learning.png 35 | │ │ ├── machine-learning.gif 36 | │ │ ├── ... 37 | ├── form_handler.py # Class to handle dynamic form generation based on JSON configs 38 | ├── page_handler.py # Class to manage the page rendering logic and handles model predictions 39 | ├── requirements.txt # List of Python dependencies required for the Streamlit App 40 | ├── dev-requirements.txt # Development dependencies, including Jupyter and tools for local use, excluding those needed for production Streamlit 41 | ├── packages.txt # List of Ubuntu packages required for Streamlit App 42 | ├── Dockerfile # Instructions for building a Docker Image 43 | ├── docker-compose.yml # To set up a docker container for streamlit 44 | ├── docker-compose.debug.yml # To debug inside the docker container using Debugpy 45 | ├── readme.md # Overview of the project and setup instructions 46 | ``` 47 | 48 | ### Key Components 49 | 50 | #### 1. `app.py` 51 | This is the entry point for the **Streamlit** application. It initializes the app and renders the home page, and loads the model calculators having their pages in the `pages/` directory. 52 | 53 | #### 2. `page_handler.py` 54 | The page_handler.py file manages the rendering of pages in the Predictive Calc application by reading configurations from the pages.json file. It dynamically loads page titles, icons, and model paths, ensuring a smooth user experience. The class integrates model prediction logic and utilizes the `FormHandler` to generate dynamic forms based on the specified configurations. It also manages multiple tabs, enhancing functionality and allowing for easy updates or additions of new models while maintaining a cohesive interface. 55 | 56 | #### 3. `form_handler.py` 57 | This script dynamically generates the input forms based on the JSON configuration files. It maps user inputs to the model’s expected parameters and passes the data to the prediction logic. 58 | 59 | #### 4. `models/` 60 | Each model gets its own folder within the `models/` directory, which contains all necessary files for that particular model. This includes: 61 | - **notebooks/**: Jupyter notebooks for model training and experimentation. 62 | - **model.py**: Code defining and training the finally chosen machine learning model. 63 | - **predict.py**: Code to load the trained model and make predictions based on user input. 64 | - **saved_models/**: Directory where the trained model (`model.pkl`) and any preprocessing objects like scalers (`scaler.pkl`) are stored. 65 | - **data/**: Raw datasets used for training the model. 66 | - **modelEvaluation.py**: Scripts for model evaluation and reporting. 67 | 68 | #### 5. `pages/` 69 | Each model has a corresponding Streamlit page in the `pages/` folder. This page handles the frontend logic, rendering the forms for user input and displaying the prediction results. For example, the `house_price_calculator.py` page contains the interface for the house price prediction model. 70 | 71 | #### 6. `pages/pages.json` 72 | This file manages the configuration for all pages in the application, defining attributes such as titles, icons, model paths, and tab configurations for each calculator. This centralizes the configuration for easier updates and management of multiple pages. 73 | 74 | #### 7. `form_configs/` 75 | The `form_configs/` folder contains JSON configuration files that define the input fields required by each model. These JSON files dictate how the forms are dynamically generated by `form_handler.py`. For example, the `house_price.json` file specifies the input fields (e.g., square footage, number of bedrooms, etc.) needed for the house price prediction model. 76 | 77 | #### 8. `docs/` 78 | Contains the project's documentation. This is where you can find information about how the system is structured and how to contribute to the project, more specifically the `tutorial.md` and `project-stuctures.md` files. 79 | 80 | #### 9. `requirements.txt`, `dev-requirements.txt`, `packages.txt` 81 | - `requirements.txt` contains the Python dependencies required to run the Streamlit application. 82 | - `dev-requirements.txt` includes additional dependencies for development purposes, such as Jupyter notebooks and other tools. 83 | - `packages.txt` lists the Ubuntu packages required for the Streamlit application to run correctly. 84 | 85 | #### 10. `Dockerfile`, `docker-compose.yml`, `docker-compose.debug.yml` 86 | - `Dockerfile` contains the instructions for building a Docker image that can run the Streamlit application. 87 | - `docker-compose.yml` sets up a Docker container for the Streamlit application. 88 | - `docker-compose.debug.yml` allows debugging inside the Docker container using Debugpy. 89 | 90 | #### 11. `assets/` 91 | This folder contains all the assets used in the main project. 92 | -------------------------------------------------------------------------------- /docs/tutorial.md: -------------------------------------------------------------------------------- 1 | ## How to Integrate Your Model 2 | 3 | To integrate a new machine learning model into **Predictive Calc**, follow these steps: 4 | 5 | ### 1. Create Your Model Directory 6 | Navigate to the `models/` directory and create a new folder for your model. The folder should be named based on the problem your model addresses (e.g., `models/loan_eligibility/`). 7 | 8 | Inside your folder, you’ll need the following structure: 9 | ``` 10 | models/ 11 | ├── loan_approval/ 12 | │ ├── data/ # Folder for storing the dataset used for training 13 | │ ├── notebooks/ # Jupyter notebooks for training and experimentation 14 | │ ├── saved_models/ # Folder for saving the trained model and any scalers 15 | │ ├── model.py # Script to define and train your model 16 | │ ├── predict.py # Script to load the trained model and make predictions 17 | │ ├── modelEvaluation.py # (Optional) Script for model evaluation and testing 18 | ``` 19 | 20 | ### 2. Train Your Model 21 | Train your model in a Jupyter notebook and save the final trained model as a pickle file (`model.pkl`) inside the `saved_models/` folder. If your model requires preprocessing steps like scaling or encoding, save these objects as well (e.g., `scaler.pkl`). 22 | 23 | ### 3. Define the Prediction Logic 24 | In `predict.py`, load your saved model and write the logic for making predictions. This file should accept input data (coming from the web form) and return the prediction result. 25 | 26 | ### 4. Configure the Input Form 27 | In the `form_configs/` folder, create a new JSON configuration file (e.g., `loan_eligibility.json`). This file defines the fields that will appear in the input form and maps them to the prediction model’s expected input parameters. 28 | 29 | ### 5. Configure the Page Settings 30 | In the `pages/pages.json` file, add an entry for your new model. This configuration file manages the settings for all pages in the application, including titles, icons, model paths, and tab configurations. 31 | 32 | ### 6. Add a Streamlit Page 33 | - In the `pages/` directory, create a new Python file for the web interface (e.g., `loan_approval_calculator.py`). This file will call the `page_handler.py` script to render the form and display the prediction results. 34 | - The page name on the sidebar will be same as the file name. 35 | 36 | ### 7. Update the Main App 37 | In `app.py`, update the list of available pages 38 | 39 | ### 8. Update and Test the Dependencies 40 | - If you've added or updated any dependencies (packages/modules/libraries) for your model, update the `requirements.txt` file accordingly. 41 | - After updating, test the entire project to ensure there are no version conflicts between packages. This helps maintain a stable and reproducible environment for all the contributors. -------------------------------------------------------------------------------- /form_configs/business_performance_forecasting.json: -------------------------------------------------------------------------------- 1 | { 2 | "Business Forecast Form": { 3 | "R&D Spend": { 4 | "type": "number", 5 | "min_value": 0.0, 6 | "default_value": 100000.0, 7 | "step": 1000.0, 8 | "field_name": "RnD_Spend" 9 | }, 10 | "Administration": { 11 | "type": "number", 12 | "min_value": 0.0, 13 | "default_value": 50000.0, 14 | "step": 1000.0, 15 | "field_name": "Administration" 16 | }, 17 | "Marketing Spend": { 18 | "type": "number", 19 | "min_value": 0.0, 20 | "default_value": 100000.0, 21 | "step": 1000.0, 22 | "field_name": "Marketing_Spend" 23 | }, 24 | "State": { 25 | "type": "dropdown", 26 | "options": ["New York", "California", "Florida"], 27 | "default_value": "New York", 28 | "field_name": "State" 29 | } 30 | } 31 | } 32 | -------------------------------------------------------------------------------- /form_configs/credit_card_fraud.json: -------------------------------------------------------------------------------- 1 | { 2 | "Credit Card Fraud Estimator": { 3 | "Average Amount per Transaction per Day": { 4 | "type": "number", 5 | "min_value": 0, 6 | "max_value": 100000, 7 | "default_value": 100, 8 | "step": 100, 9 | "field_name": "avg_amount_per_day" 10 | }, 11 | "Transaction Amount": { 12 | "type": "number", 13 | "min_value": 0, 14 | "max_value": 100000, 15 | "default_value": 3000, 16 | "step": 100, 17 | "field_name": "transaction_amount" 18 | }, 19 | "Is Declined": { 20 | "type": "dropdown", 21 | "options": [ 22 | "Yes", 23 | "No" 24 | ], 25 | "default_value": "No", 26 | "field_name": "Is_declined" 27 | }, 28 | "Total Number of Declines per Day": { 29 | "type": "number", 30 | "min_value": 0, 31 | "max_value": 100, 32 | "default_value": 0, 33 | "step": 1, 34 | "field_name": "no_of_declines_per_day" 35 | }, 36 | "Is Foreign Transaction": { 37 | "type": "dropdown", 38 | "options": [ 39 | "Yes", 40 | "No" 41 | ], 42 | "default_value": "No", 43 | "field_name": "Is_Foreign_transaction" 44 | }, 45 | "Is High-Risk Country": { 46 | "type": "dropdown", 47 | "options": [ 48 | "Yes", 49 | "No" 50 | ], 51 | "default_value": "No", 52 | "field_name": "Is_High_Risk_country" 53 | }, 54 | "Daily Chargeback Average Amount": { 55 | "type": "number", 56 | "min_value": 0, 57 | "max_value": 10000, 58 | "default_value": 0, 59 | "step": 100, 60 | "field_name": "Daily_chargeback_avg_amt" 61 | }, 62 | "6-Month Average Chargeback Amount": { 63 | "type": "number", 64 | "min_value": 0, 65 | "max_value": 10000, 66 | "default_value": 0, 67 | "step": 100, 68 | "field_name": "six_month_avg_chbk_amt" 69 | }, 70 | "6-Month Chargeback Frequency": { 71 | "type": "number", 72 | "min_value": 0, 73 | "max_value": 100, 74 | "default_value": 0, 75 | "step": 1, 76 | "field_name": "six_month_chbk_freq" 77 | } 78 | } 79 | } -------------------------------------------------------------------------------- /form_configs/customer_income.json: -------------------------------------------------------------------------------- 1 | { 2 | "Customer Income Estimation Form": { 3 | "Age": { 4 | "field_name": "age", 5 | "type": "number", 6 | "min_value": 18, 7 | "max_value": 100, 8 | "default_value": 30, 9 | "step": 1 10 | }, 11 | "Workclass": { 12 | "field_name": "workclass", 13 | "type": "dropdown", 14 | "options": ["Private", "State-gov", "Self-emp-not-inc", "Federal-gov", 15 | "Local-gov", "Self-emp-inc", "Without-pay"], 16 | "default_value": "Private" 17 | }, 18 | "Financial Weight (fnlwgt)": { 19 | "field_name": "fnlwgt", 20 | "type": "number", 21 | "min_value": 0, 22 | "max_value": 1000000, 23 | "default_value": 100000, 24 | "step": 1000 25 | }, 26 | "Education Level": { 27 | "field_name": "education", 28 | "type": "dropdown", 29 | "options": ["Doctorate", "12th", "Bachelors", "7th-8th", "Some-college", 30 | "HS-grad", "9th", "10th", "11th", "Masters", "Preschool", 31 | "5th-6th", "Prof-school", "Assoc-voc", "Assoc-acdm", "1st-4th"], 32 | "default_value": "Bachelors" 33 | }, 34 | "Marital Status": { 35 | "field_name": "marital_status", 36 | "type": "dropdown", 37 | "options": ["Divorced", "Never-married", "Married-civ-spouse", "Widowed", 38 | "Separated", "Married-spouse-absent", "Married-AF-spouse"], 39 | "default_value": "Never-married" 40 | }, 41 | "Occupation": { 42 | "field_name": "occupation", 43 | "type": "dropdown", 44 | "options": ["Exec-managerial", "Other-service", "Transport-moving", 45 | "Adm-clerical", "Machine-op-inspct", "Sales", "Handlers-cleaners", 46 | "Farming-fishing", "Protective-serv", "Prof-specialty", 47 | "Craft-repair", "Tech-support", "Priv-house-serv", "Armed-Forces"], 48 | "default_value": "Exec-managerial" 49 | }, 50 | "Relationship": { 51 | "field_name": "relationship", 52 | "type": "dropdown", 53 | "options": ["Not-in-family", "Own-child", "Husband", "Wife", "Unmarried", 54 | "Other-relative"], 55 | "default_value": "Unmarried" 56 | }, 57 | "Race": { 58 | "field_name": "race", 59 | "type": "dropdown", 60 | "options": ["White", "Black", "Asian-Pac-Islander", "Amer-Indian-Eskimo", "Other"], 61 | "default_value": "White" 62 | }, 63 | "Sex": { 64 | "field_name": "sex", 65 | "type": "dropdown", 66 | "options": ["Male", "Female"], 67 | "default_value": "Male" 68 | }, 69 | "Capital Gain": { 70 | "field_name": "capital_gain", 71 | "type": "number", 72 | "min_value": 0, 73 | "max_value": 100000, 74 | "default_value": 0, 75 | "step": 1000 76 | }, 77 | "Capital Loss": { 78 | "field_name": "capital_loss", 79 | "type": "number", 80 | "min_value": 0, 81 | "max_value": 100000, 82 | "default_value": 0, 83 | "step": 1000 84 | }, 85 | "Earn per Hour": { 86 | "field_name": "hours_per_week", 87 | "type": "number", 88 | "min_value": 0, 89 | "max_value": 100, 90 | "default_value": 25, 91 | "step": 1 92 | }, 93 | "Native Country": { 94 | "field_name": "native_country", 95 | "type": "dropdown", 96 | "options": ["United-States", "Japan", "South", "Portugal", "Italy", "Mexico", 97 | "Ecuador", "England", "Philippines", "China", "Germany", 98 | "Dominican-Republic", "Jamaica", "Vietnam", "Thailand", 99 | "Puerto-Rico", "Cuba", "India", "Cambodia", "Yugoslavia", "Iran", 100 | "El-Salvador", "Poland", "Greece", "Ireland", "Canada", 101 | "Guatemala", "Scotland", "Columbia", "Outlying-US(Guam-USVI-etc)", 102 | "Haiti", "Peru", "Nicaragua", "Trinadad&Tobago", "Laos", "Taiwan", 103 | "France", "Hungary", "Honduras", "Hong", "Holand-Netherlands"], 104 | "default_value": "United-States" 105 | } 106 | } 107 | } -------------------------------------------------------------------------------- /form_configs/gold_price_prediction.json: -------------------------------------------------------------------------------- 1 | { 2 | "Gold Price Form": { 3 | "SPX": { 4 | "type": "slider", 5 | "min_value": 0.0, 6 | "default_value": 1447.16, 7 | "step": 1, 8 | "field_name": "spx" 9 | }, 10 | "USO": { 11 | "type": "number", 12 | "min_value": 0.0, 13 | "default_value": 78.47, 14 | "step": 0.01, 15 | "field_name": "uso" 16 | }, 17 | "SLV": { 18 | "type": "number", 19 | "min_value": 0.0, 20 | "default_value": 15.18, 21 | "step": 0.01, 22 | "field_name": "slv" 23 | }, 24 | "EUR/USD": { 25 | "type": "number", 26 | "min_value": 0.0, 27 | "default_value": 1.47, 28 | "step": 0.01, 29 | "field_name": "eur_usd" 30 | } 31 | } 32 | } 33 | -------------------------------------------------------------------------------- /form_configs/house_price.json: -------------------------------------------------------------------------------- 1 | { 2 | "House Price Form": { 3 | "Area (in square feet)": { 4 | "type": "number", 5 | "min_value": 500, 6 | "max_value": 10000, 7 | "default_value": 1000, 8 | "step": 100, 9 | "field_name": "area" 10 | }, 11 | "Near Main Road": { 12 | "type": "dropdown", 13 | "options": ["Yes", "No"], 14 | "default_value": "No", 15 | "field_name": "mainroad" 16 | }, 17 | "Guest Room": { 18 | "type": "dropdown", 19 | "options": ["Yes", "No"], 20 | "default_value": "No", 21 | "field_name": "guestroom" 22 | }, 23 | "Basement": { 24 | "type": "dropdown", 25 | "options": ["Yes", "No"], 26 | "default_value": "No", 27 | "field_name": "basement" 28 | }, 29 | "Hot Water Heating": { 30 | "type": "dropdown", 31 | "options": ["Yes", "No"], 32 | "default_value": "No", 33 | "field_name": "hotwaterheating" 34 | }, 35 | "Air Conditioning": { 36 | "type": "dropdown", 37 | "options": ["Yes", "No"], 38 | "default_value": "No", 39 | "field_name": "airconditioning" 40 | }, 41 | "Preferred Area": { 42 | "type": "dropdown", 43 | "options": ["Yes", "No"], 44 | "default_value": "No", 45 | "field_name": "prefarea" 46 | }, 47 | "Number of Bedrooms": { 48 | "type": "number", 49 | "min_value": 0, 50 | "max_value": 6, 51 | "default_value": 3, 52 | "step": 1, 53 | "field_name": "bedrooms" 54 | }, 55 | "Number of Bathrooms": { 56 | "type": "number", 57 | "min_value": 0, 58 | "max_value": 4, 59 | "default_value": 2, 60 | "step": 1, 61 | "field_name": "bathrooms" 62 | }, 63 | "Number of Stories": { 64 | "type": "number", 65 | "min_value": 1, 66 | "max_value": 4, 67 | "default_value": 2, 68 | "step": 1, 69 | "field_name": "stories" 70 | }, 71 | "Parking Spaces": { 72 | "type": "number", 73 | "min_value": 0, 74 | "max_value": 3, 75 | "default_value": 1, 76 | "step": 1, 77 | "field_name": "parking" 78 | }, 79 | "Furnishing Status": { 80 | "type": "dropdown", 81 | "field_name": "furnishingstatus", 82 | "options": ["semi-furnished", "unfurnished", "furnished"], 83 | "default_value": "semi-furnished" 84 | } 85 | } 86 | } 87 | -------------------------------------------------------------------------------- /form_configs/insurance_cost_predictor.json: -------------------------------------------------------------------------------- 1 | { 2 | "Insurance Cost Form": { 3 | "Age": { 4 | "type": "slider", 5 | "min_value": 0, 6 | "default_value": 30, 7 | "step": 1, 8 | "field_name": "age" 9 | }, 10 | "Sex": { 11 | "type": "dropdown", 12 | "options": ["Male", "Female"], 13 | "default_value": "Male", 14 | "field_name": "sex" 15 | }, 16 | "BMI": { 17 | "type": "number", 18 | "min_value": 0.0, 19 | "default_value": 25.0, 20 | "step": 0.1, 21 | "field_name": "bmi" 22 | }, 23 | "Children": { 24 | "type": "number", 25 | "min_value": 0, 26 | "default_value": 0, 27 | "step": 1, 28 | "field_name": "children" 29 | }, 30 | "Smoker": { 31 | "type": "dropdown", 32 | "options": ["Yes", "No"], 33 | "default_value": "No", 34 | "field_name": "smoker" 35 | }, 36 | "Region": { 37 | "type": "dropdown", 38 | "options": ["Southeast", "Southwest", "Northeast", "Northwest"], 39 | "default_value": "Southeast", 40 | "field_name": "region" 41 | } 42 | } 43 | } 44 | -------------------------------------------------------------------------------- /form_configs/loan_eligibility.json: -------------------------------------------------------------------------------- 1 | { 2 | "Loan Eligibility Form": { 3 | "Income": { 4 | "field_name": "income", 5 | "type": "number", 6 | "min_value": 10000, 7 | "max_value": 1000000, 8 | "default_value": 50000, 9 | "step": 5000 10 | }, 11 | "Loan Amount": { 12 | "field_name": "loan_amount", 13 | "type": "number", 14 | "min_value": 5000, 15 | "max_value": 500000, 16 | "default_value": 20000, 17 | "step": 1000 18 | }, 19 | "Credit Score": { 20 | "field_name": "credit_score", 21 | "type": "range", 22 | "min_value": 300, 23 | "max_value": 850, 24 | "default_value": [600, 700] 25 | } 26 | } 27 | } -------------------------------------------------------------------------------- /form_configs/parkinson_detection.json: -------------------------------------------------------------------------------- 1 | { 2 | "Parkinson Detection Form": { 3 | "MDVP_Fo_Hz": { 4 | "field_name": "MDVP_Fo_Hz", 5 | "type": "float", 6 | "min_value": 88.00000, 7 | "max_value": 260.00000, 8 | "default_value": 88.00000, 9 | "step": 0.000001 10 | }, 11 | "MDVP_Fhi_Hz": { 12 | "field_name": "MDVP_Fhi_Hz", 13 | "type": "float", 14 | "min_value": 102.00000, 15 | "max_value": 592.00000, 16 | "default_value": 102.00000, 17 | "step": 0.000001 18 | }, 19 | "MDVP_Flo_Hz": { 20 | "field_name": "MDVP_Flo_Hz", 21 | "type": "float", 22 | "min_value": 65.00000, 23 | "max_value": 240.00000, 24 | "default_value": 65.00000, 25 | "step": 0.000001 26 | }, 27 | "MDVP_Jitter_percent": { 28 | "field_name": "MDVP_Jitter_percent", 29 | "type": "float", 30 | "min_value": 0.00100, 31 | "max_value": 0.03300, 32 | "default_value": 0.00100, 33 | "step": 0.000001 34 | }, 35 | "MDVP_Jitter_Abs": { 36 | "field_name": "MDVP_Jitter_Abs", 37 | "type": "float", 38 | "min_value": 0.000020, 39 | "max_value": 0.000200, 40 | "default_value": 0.000020, 41 | "step": 0.000001 42 | }, 43 | "MDVP_RAP": { 44 | "field_name": "MDVP_RAP", 45 | "type": "float", 46 | "min_value": 0.000600, 47 | "max_value": 0.020000, 48 | "default_value": 0.000600, 49 | "step": 0.000001 50 | }, 51 | "MDVP_PPQ": { 52 | "field_name": "MDVP_PPQ", 53 | "type": "float", 54 | "min_value": 0.000900, 55 | "max_value": 0.020000, 56 | "default_value": 0.000900, 57 | "step": 0.000001 58 | }, 59 | "Jitter_DDP": { 60 | "field_name": "Jitter_DDP", 61 | "type": "float", 62 | "min_value": 0.00200, 63 | "max_value": 0.06500, 64 | "default_value": 0.00200, 65 | "step": 0.000001 66 | }, 67 | "MDVP_Shimmer": { 68 | "field_name": "MDVP_Shimmer", 69 | "type": "float", 70 | "min_value": 0.00900, 71 | "max_value": 0.12000, 72 | "default_value": 0.00900, 73 | "step": 0.000001 74 | }, 75 | "MDVP_Shimmer_dB": { 76 | "field_name": "MDVP_Shimmer_dB", 77 | "type": "float", 78 | "min_value": 0.08500, 79 | "max_value": 1.30200, 80 | "default_value": 0.08500, 81 | "step": 0.000001 82 | }, 83 | "Shimmer_APQ3": { 84 | "field_name": "Shimmer_APQ3", 85 | "type": "float", 86 | "min_value": 0.00400, 87 | "max_value": 0.05600, 88 | "default_value": 0.00400, 89 | "step": 0.000001 90 | }, 91 | "Shimmer_APQ5": { 92 | "field_name": "Shimmer_APQ5", 93 | "type": "float", 94 | "min_value": 0.00500, 95 | "max_value": 0.08000, 96 | "default_value": 0.00500, 97 | "step": 0.000001 98 | }, 99 | "MDVP_APQ": { 100 | "field_name": "MDVP_APQ", 101 | "type": "float", 102 | "min_value": 0.00700, 103 | "max_value": 0.14000, 104 | "default_value": 0.00700, 105 | "step": 0.000001 106 | }, 107 | "Shimmer_DDA": { 108 | "field_name": "Shimmer_DDA", 109 | "type": "float", 110 | "min_value": 0.01300, 111 | "max_value": 0.17000, 112 | "default_value": 0.01300, 113 | "step": 0.000001 114 | }, 115 | "NHR": { 116 | "field_name": "NHR", 117 | "type": "float", 118 | "min_value": 0.000600, 119 | "max_value": 0.310000, 120 | "default_value": 0.000600, 121 | "step": 0.000001 122 | }, 123 | "HNR": { 124 | "field_name": "HNR", 125 | "type": "float", 126 | "min_value": 8.00000, 127 | "max_value": 33.00000, 128 | "default_value": 8.00000, 129 | "step": 0.000001 130 | }, 131 | "RPDE": { 132 | "field_name": "RPDE", 133 | "type": "float", 134 | "min_value": 0.25000, 135 | "max_value": 0.68000, 136 | "default_value": 0.25000, 137 | "step": 0.000001 138 | }, 139 | "DFA": { 140 | "field_name": "DFA", 141 | "type": "float", 142 | "min_value": 0.57000, 143 | "max_value": 0.82000, 144 | "default_value": 0.57000, 145 | "step": 0.000001 146 | }, 147 | "Spread1": { 148 | "field_name": "spread1", 149 | "type": "float", 150 | "min_value": -7.00000, 151 | "max_value": -2.00000, 152 | "default_value": -7.00000, 153 | "step": 0.01 154 | }, 155 | "Spread2": { 156 | "field_name": "spread2", 157 | "type": "float", 158 | "min_value": 0.00600, 159 | "max_value": 0.45000, 160 | "default_value": 0.00600, 161 | "step": 0.000001 162 | }, 163 | "D2": { 164 | "field_name": "D2", 165 | "type": "float", 166 | "min_value": 1.42000, 167 | "max_value": 3.67000, 168 | "default_value": 1.42000, 169 | "step": 0.000001 170 | }, 171 | "PPE": { 172 | "field_name": "PPE", 173 | "type": "float", 174 | "min_value": 0.04000, 175 | "max_value": 0.50000, 176 | "default_value": 0.04000, 177 | "step": 0.000001 178 | } 179 | } 180 | } 181 | -------------------------------------------------------------------------------- /form_configs/sleep_prediction.json: -------------------------------------------------------------------------------- 1 | { 2 | "Sleep Prediction Form": { 3 | "Age": { 4 | "field_name": "Age", 5 | "type": "number", 6 | "min_value": 0, 7 | "max_value": 120, 8 | "default_value": 25, 9 | "step": 1 10 | }, 11 | "Sleep_Duration": { 12 | "field_name": "Sleep_Duration", 13 | "type": "float", 14 | "min_value": 0.0, 15 | "max_value": 24.0, 16 | "default_value": 8.0, 17 | "step": 0.1 18 | }, 19 | "Heart_Rate": { 20 | "field_name": "Heart_Rate", 21 | "type": "number", 22 | "min_value": 10, 23 | "max_value": 200, 24 | "default_value": 72, 25 | "step": 1 26 | }, 27 | "Daily_Steps": { 28 | "field_name": "Daily_Steps", 29 | "type": "number", 30 | "min_value": 0, 31 | "max_value": 1000000, 32 | "default_value": 0, 33 | "step": 10 34 | }, 35 | "Systolic": { 36 | "field_name": "Systolic", 37 | "type": "float", 38 | "min_value": 0.0, 39 | "max_value": 250.0, 40 | "default_value": 120.0, 41 | "step": 0.1 42 | }, 43 | "Diastolic": { 44 | "field_name": "Diastolic", 45 | "type": "float", 46 | "min_value": 0.0, 47 | "max_value": 250.0, 48 | "default_value": 80.0, 49 | "step": 0.1 50 | }, 51 | "Occupation": { 52 | "field_name": "Occupation", 53 | "type": "dropdown", 54 | "options": [ 55 | "Software Engineer" ,"Doctor" ,"Sales Representative","Teacher" ,"Nurse", 56 | "Engineer" ,"Accountant" ,"Scientist", "Lawyer" ,"Salesperson" ,"Manager" 57 | ], 58 | "default_value": "Software Engineer" 59 | }, 60 | "Quality_of_Sleep": { 61 | "field_name": "Quality_of_Sleep", 62 | "type": "number", 63 | "min_value": 0, 64 | "max_value": 10, 65 | "default_value": 0, 66 | "step": 1 67 | }, 68 | "Gender": { 69 | "field_name": "Gender", 70 | "type": "dropdown", 71 | "options": [ 72 | "Male", 73 | "Female" 74 | ], 75 | "default_value": "Male" 76 | 77 | }, 78 | "Physical_Activity_Level": { 79 | "field_name": "Physical_Activity_Level", 80 | "type": "number", 81 | "min_value": 0, 82 | "max_value": 200, 83 | "default_value": 0, 84 | "step": 1 85 | }, 86 | "Stress_Level": { 87 | "field_name": "Stress_Level", 88 | "type": "number", 89 | "min_value": 0, 90 | "max_value": 10, 91 | "default_value": 0, 92 | "step": 1 93 | }, 94 | "BMI_Category": { 95 | "field_name": "BMI_Category", 96 | "type": "dropdown", 97 | "options": [ 98 | "Normal Weight", 99 | "Obese", 100 | "Overweight" 101 | ], 102 | "default_value": "Normal Weight" 103 | } 104 | } 105 | } -------------------------------------------------------------------------------- /form_configs/stress_detection.json: -------------------------------------------------------------------------------- 1 | { 2 | "Stress Detection Form": { 3 | "Age": { 4 | "field_name": "age", 5 | "type": "number", 6 | "min_value": 0, 7 | "max_value": 100, 8 | "default_value": 25, 9 | "step": 1 10 | }, 11 | "Frequency of Using Social Media Without Purpose": { 12 | "field_name": "freq_no_purpose", 13 | "type": "slider", 14 | "min_value": 1, 15 | "max_value": 5, 16 | "default_value": 3, 17 | "step": 1 18 | }, 19 | "Frequency of Feeling Distracted": { 20 | "field_name": "freq_distracted", 21 | "type": "slider", 22 | "min_value": 1, 23 | "max_value": 5, 24 | "default_value": 3, 25 | "step": 1 26 | }, 27 | "Restlessness Level": { 28 | "field_name": "restless", 29 | "type": "slider", 30 | "min_value": 1, 31 | "max_value": 5, 32 | "default_value": 3, 33 | "step": 1 34 | }, 35 | "Worry Level": { 36 | "field_name": "worry_level", 37 | "type": "slider", 38 | "min_value": 1, 39 | "max_value": 5, 40 | "default_value": 3, 41 | "step": 1 42 | }, 43 | "Difficulty Concentrating": { 44 | "field_name": "difficulty_concentrating", 45 | "type": "slider", 46 | "min_value": 1, 47 | "max_value": 5, 48 | "default_value": 3, 49 | "step": 1 50 | }, 51 | "Comparison to Successful People": { 52 | "field_name": "compare_to_successful_people", 53 | "type": "slider", 54 | "min_value": 1, 55 | "max_value": 5, 56 | "default_value": 3, 57 | "step": 1 58 | }, 59 | "Feelings About Comparisons": { 60 | "field_name": "feelings_about_comparisons", 61 | "type": "slider", 62 | "min_value": 1, 63 | "max_value": 5, 64 | "default_value": 3, 65 | "step": 1 66 | }, 67 | "Frequency of Seeking Validation": { 68 | "field_name": "freq_seeking_validation", 69 | "type": "slider", 70 | "min_value": 1, 71 | "max_value": 5, 72 | "default_value": 3, 73 | "step": 1 74 | }, 75 | "Frequency of Feeling Depressed": { 76 | "field_name": "freq_feeling_depressed", 77 | "type": "slider", 78 | "min_value": 1, 79 | "max_value": 5, 80 | "default_value": 3, 81 | "step": 1 82 | }, 83 | "Interest Fluctuation": { 84 | "field_name": "interest_fluctuation", 85 | "type": "slider", 86 | "min_value": 1, 87 | "max_value": 5, 88 | "default_value": 3, 89 | "step": 1 90 | }, 91 | "Sleep Issues": { 92 | "field_name": "sleep_issues", 93 | "type": "slider", 94 | "min_value": 1, 95 | "max_value": 5, 96 | "default_value": 3, 97 | "step": 1 98 | } 99 | } 100 | } 101 | -------------------------------------------------------------------------------- /form_handler.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import json 3 | from typing import Dict, Any 4 | 5 | 6 | class FormHandler: 7 | """ 8 | A class to handle rendering a form in Streamlit based on a configuration file. 9 | 10 | Attributes: 11 | name (str): The name of the form to be rendered. 12 | button_label (str): The label for the submit button. 13 | model (callable): The model function to be called with form data. 14 | config_path (str): Path to the configuration JSON file. 15 | 16 | Structure of the configuration JSON file: 17 | { 18 | "Form Name": { 19 | "field_label": { 20 | "field_name": "field_name", 21 | "type": "field_type", 22 | "default_value": "default_value", 23 | "min_value": min_value, 24 | "max_value": max_value, 25 | "step": step, 26 | "options": ["option1", "option2"], 27 | }, 28 | ... 29 | } 30 | } 31 | """ 32 | 33 | def __init__( 34 | self, name: str, button_label: str, model: callable, config_path: str 35 | ) -> None: 36 | """ 37 | Initializes the FormHandler with the provided parameters. 38 | 39 | Parameters: 40 | name (str): The name of the form to be rendered. 41 | button_label (str): The label for the submit button. 42 | model (callable): The model function to be called with form data. 43 | config_path (str): Path to the configuration JSON file. 44 | """ 45 | self.name = name 46 | self.button_label = button_label 47 | self.model = model 48 | self.config_path = config_path 49 | self.fields = self.load_fields_from_config() 50 | 51 | def load_fields_from_config(self) -> Dict[str, Dict[str, Any]]: 52 | """ 53 | Loads form fields from the configuration JSON file. 54 | 55 | Returns: 56 | Dict[str, Dict[str, Any]]: A dictionary of form fields and their attributes. 57 | """ 58 | 59 | # Try-except block to handle the exception if file not found or if parsing has parsing issues in JSON 60 | try: 61 | with open(self.config_path, "r") as f: 62 | config = json.load(f) 63 | return config.get(self.name, {}) 64 | except FileNotFoundError: 65 | st.error(f"Configuration file not found: {self.config_path}") 66 | return {} 67 | except json.JSONDecodeError: 68 | st.error(f"Error parsing the configuration file: {self.config_path}") 69 | return {} 70 | 71 | 72 | def render(self) -> None: 73 | """ 74 | Renders the form in the Streamlit application. 75 | 76 | This method collects user input and, upon form submission, calls the specified model 77 | with the collected data mapped to the appropriate field names from the config file. 78 | """ 79 | # Dictionary to hold form data 80 | form_data: Dict[str, Any] = {} 81 | 82 | # Loop over the fields in the form 83 | for label, attributes in self.fields.items(): 84 | field_type = attributes.get("type") 85 | field_name = attributes.get( 86 | "field_name", label 87 | ) # Use field_name from the config 88 | 89 | #Handle different types of input fields 90 | if field_type == "number": 91 | form_data[field_name] = st.number_input( 92 | label, 93 | value=attributes.get("default_value"), 94 | min_value=attributes.get("min_value"), 95 | max_value=attributes.get("max_value"), 96 | step=attributes.get("step", 1), 97 | ) 98 | 99 | elif field_type == "float": # New case for float values 100 | form_data[field_name] = st.number_input( 101 | label, 102 | value=attributes.get("default_value"), 103 | min_value=attributes.get("min_value"), 104 | max_value=attributes.get("max_value"), 105 | step=attributes.get("step"), 106 | format="%.6f" # format to 6 decimal places 107 | ) 108 | 109 | elif field_type == "dropdown": 110 | form_data[field_name] = st.selectbox( 111 | label, 112 | options=attributes.get("options"), 113 | index=attributes.get("options").index( 114 | attributes.get("default_value") 115 | ), 116 | ) 117 | elif field_type == "range": 118 | form_data[field_name] = st.slider( 119 | label, 120 | min_value=attributes.get("min_value"), 121 | max_value=attributes.get("max_value"), 122 | value=tuple(attributes.get("default_value")), # type: ignore 123 | ) 124 | elif field_type == "multiselect": 125 | form_data[field_name] = st.multiselect( 126 | label, 127 | options=attributes.get("options"), 128 | default=attributes.get("default_value"), 129 | ) 130 | elif field_type == "text": 131 | form_data[field_name] = st.text_input( 132 | label, value=attributes.get("default_value") 133 | ) 134 | elif field_type == "checkbox": 135 | form_data[field_name] = st.checkbox( 136 | label, value=attributes.get("default_value", False) 137 | ) 138 | elif field_type == "radio": 139 | form_data[field_name] = st.radio( 140 | label, 141 | options=attributes.get("options"), 142 | index=attributes.get("options").index( 143 | attributes.get("default_value") 144 | ), 145 | ) 146 | elif field_type == "slider": 147 | form_data[field_name] = st.slider( 148 | label, 149 | min_value=attributes.get("min_value"), 150 | max_value=attributes.get("max_value"), 151 | value=attributes.get("default_value"), 152 | ) 153 | elif field_type == "date": 154 | form_data[field_name] = st.date_input(label) 155 | elif field_type == "time": 156 | form_data[field_name] = st.time_input(label) 157 | elif field_type == "file": 158 | form_data[field_name] = st.file_uploader(label) 159 | elif field_type == "password": 160 | form_data[field_name] = st.text_input(label, type="password") 161 | elif field_type == "image": 162 | form_data[field_name] = st.image(label) 163 | else: 164 | st.warning(f"Unknown field type: {field_type}") 165 | 166 | # Submit button 167 | if st.button(self.button_label): 168 | # Call the model with the form data 169 | result = self.model(**form_data) 170 | 171 | # Display the result 172 | st.success(f"Result: {result}") 173 | -------------------------------------------------------------------------------- /machine-learning.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/machine-learning.gif -------------------------------------------------------------------------------- /models/PDF_malware_detection/pdf_extraction.py: -------------------------------------------------------------------------------- 1 | from pdfid import pdfid 2 | import fitz 3 | from os.path import exists 4 | import sys 5 | 6 | # Function to convert bytes to kilobytes 7 | def bytes_to_kb(bytes): 8 | return bytes / 1024 9 | 10 | # Main function to extract features from the PDF file 11 | def extract_pdf_features(pdf_file): 12 | # Checking if the file exists 13 | if not exists(pdf_file): 14 | print(f"File {pdf_file} not found") 15 | return None 16 | 17 | features = {'FileName': pdf_file} 18 | 19 | # Open the PDF 20 | pdf = fitz.open(pdf_file) 21 | 22 | # Extract basic PDF metadata using PyMuPDF 23 | try: 24 | features['Pages'] = pdf.page_count 25 | except: 26 | features['Pages'] = -1 27 | 28 | try: 29 | features['XrefLength'] = pdf.xref_length() 30 | except: 31 | features['XrefLength'] = -1 32 | 33 | try: 34 | features['TitleCharacters'] = len(pdf.metadata.get('title', '')) 35 | except: 36 | features['TitleCharacters'] = -1 37 | 38 | features['isEncrypted'] = 1 if pdf.is_encrypted else 0 39 | 40 | # Extract image-related features 41 | images_count = 0 42 | for i in range(pdf.page_count): 43 | images_count += len(pdf.get_page_images(i)) 44 | 45 | features['Images'] = images_count 46 | 47 | # Extract embedded file details 48 | emb_count = pdf.embfile_count() 49 | emb_size_sum = 0 50 | if emb_count != 0: 51 | try: 52 | for i in range(emb_count): 53 | emb_size_sum += pdf.embfile_info(i) 54 | except: 55 | features['EmbeddedFiles'] = -1 56 | else: 57 | features['EmbeddedFiles'] = emb_size_sum / emb_count 58 | else: 59 | features['EmbeddedFiles'] = 0 60 | 61 | # Extract presence of text in the PDF 62 | text = 0 63 | for page in pdf: 64 | if len(page.get_text().split()): 65 | text = 1 66 | break 67 | features['Text'] = text 68 | 69 | # Close the PDF after processing 70 | pdf.close() 71 | 72 | # Extract additional PDF features using pdfid 73 | try: 74 | options = pdfid.get_fake_options() 75 | options.scan = True 76 | options.json = True 77 | list_of_dict = pdfid.PDFiDMain([pdf_file], options) 78 | pdf_features = list_of_dict['reports'][0] 79 | del pdf_features['version'] 80 | 81 | # Rename features to correspond to dataset names 82 | diff_in_feature_name = { 83 | 'header': 'Header', 84 | 'obj': 'Obj', 85 | 'endobj': 'Endobj', 86 | 'stream': 'Stream', 87 | 'endstream': 'Endstream', 88 | 'xref': 'Xref', 89 | 'trailer': 'Trailer', 90 | 'startxref': 'StartXref', 91 | '/Page': 'PageNo', 92 | '/Encrypt': 'Encrypt', 93 | '/ObjStm': 'ObjStm', 94 | '/JS': 'JS', 95 | '/JavaScript': 'JavaScript', 96 | '/AA': 'AA', 97 | '/OpenAction': 'OpenAction', 98 | '/AcroForm': 'AcroForm', 99 | '/JBIG2Decode': 'JBIG2Decode', 100 | '/RichMedia': 'RichMedia', 101 | '/Launch': 'Launch', 102 | '/EmbeddedFile': 'EmbeddedFile', 103 | '/XFA': 'XFA', 104 | '/Colors > 2^24': 'Colors' 105 | } 106 | 107 | for curr_name, new_name in diff_in_feature_name.items(): 108 | pdf_features[new_name] = features.pop(curr_name, -1) 109 | 110 | features.update(pdf_features) 111 | except Exception as e: 112 | print(f"Error extracting pdfid features: {e}") 113 | features.update({ 114 | 'Header': '-1', 115 | 'Obj': -1, 116 | 'Endobj': -1, 117 | 'Stream': -1, 118 | 'Endstream': -1, 119 | 'Xref': -1, 120 | 'Trailer': -1, 121 | 'StartXref': -1, 122 | 'PageNo': -1, 123 | 'Encrypt': -1, 124 | 'ObjStm': -1, 125 | 'JS': -1, 126 | 'JavaScript': -1, 127 | 'AA': -1, 128 | 'OpenAction': -1, 129 | 'AcroForm': -1, 130 | 'JBIG2Decode': -1, 131 | 'RichMedia': -1, 132 | 'Launch': -1, 133 | 'EmbeddedFile': -1, 134 | 'XFA': -1, 135 | 'Colors': -1 136 | }) 137 | 138 | return features 139 | 140 | # If run as a script 141 | if __name__ == '__main__': 142 | if len(sys.argv) != 2: 143 | print("Usage: python pdf_feature_extraction.py ") 144 | sys.exit(1) 145 | 146 | pdf_file = sys.argv[1] 147 | features = extract_pdf_features(pdf_file) 148 | 149 | if features: 150 | print(features) 151 | -------------------------------------------------------------------------------- /models/PDF_malware_detection/predict.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import json 3 | import joblib 4 | import os 5 | # from pdf_extraction import extract_pdf_features 6 | 7 | from pdfid import pdfid 8 | import fitz 9 | from os.path import exists 10 | import sys 11 | 12 | # Function to convert bytes to kilobytes 13 | def bytes_to_kb(bytes): 14 | return bytes / 1024 15 | 16 | # Main function to extract features from the PDF file 17 | def extract_pdf_features(pdf_file): 18 | # Checking if the file exists 19 | if not exists(pdf_file): 20 | print(f"File {pdf_file} not found") 21 | return None 22 | 23 | features = {'FileName': pdf_file} 24 | 25 | # Open the PDF 26 | pdf = fitz.open(pdf_file) 27 | 28 | # Extract basic PDF metadata using PyMuPDF 29 | try: 30 | features['Pages'] = pdf.page_count 31 | except: 32 | features['Pages'] = -1 33 | 34 | try: 35 | features['XrefLength'] = pdf.xref_length() 36 | except: 37 | features['XrefLength'] = -1 38 | 39 | try: 40 | features['TitleCharacters'] = len(pdf.metadata.get('title', '')) 41 | except: 42 | features['TitleCharacters'] = -1 43 | 44 | features['isEncrypted'] = 1 if pdf.is_encrypted else 0 45 | 46 | # Extract image-related features 47 | images_count = 0 48 | for i in range(pdf.page_count): 49 | images_count += len(pdf.get_page_images(i)) 50 | 51 | features['Images'] = images_count 52 | 53 | # Extract embedded file details 54 | emb_count = pdf.embfile_count() 55 | emb_size_sum = 0 56 | if emb_count != 0: 57 | try: 58 | for i in range(emb_count): 59 | emb_size_sum += pdf.embfile_info(i) 60 | except: 61 | features['EmbeddedFiles'] = -1 62 | else: 63 | features['EmbeddedFiles'] = emb_size_sum / emb_count 64 | else: 65 | features['EmbeddedFiles'] = 0 66 | 67 | # Extract presence of text in the PDF 68 | text = 0 69 | for page in pdf: 70 | if len(page.get_text().split()): 71 | text = 1 72 | break 73 | features['Text'] = text 74 | 75 | # Close the PDF after processing 76 | pdf.close() 77 | 78 | # Extract additional PDF features using pdfid 79 | try: 80 | options = pdfid.get_fake_options() 81 | options.scan = True 82 | options.json = True 83 | list_of_dict = pdfid.PDFiDMain([pdf_file], options) 84 | pdf_features = list_of_dict['reports'][0] 85 | del pdf_features['version'] 86 | 87 | # Rename features to correspond to dataset names 88 | diff_in_feature_name = { 89 | 'header': 'Header', 90 | 'obj': 'Obj', 91 | 'endobj': 'Endobj', 92 | 'stream': 'Stream', 93 | 'endstream': 'Endstream', 94 | 'xref': 'Xref', 95 | 'trailer': 'Trailer', 96 | 'startxref': 'StartXref', 97 | '/Page': 'PageNo', 98 | '/Encrypt': 'Encrypt', 99 | '/ObjStm': 'ObjStm', 100 | '/JS': 'JS', 101 | '/JavaScript': 'JavaScript', 102 | '/AA': 'AA', 103 | '/OpenAction': 'OpenAction', 104 | '/AcroForm': 'AcroForm', 105 | '/JBIG2Decode': 'JBIG2Decode', 106 | '/RichMedia': 'RichMedia', 107 | '/Launch': 'Launch', 108 | '/EmbeddedFile': 'EmbeddedFile', 109 | '/XFA': 'XFA', 110 | '/Colors > 2^24': 'Colors' 111 | } 112 | 113 | for curr_name, new_name in diff_in_feature_name.items(): 114 | pdf_features[new_name] = features.pop(curr_name, -1) 115 | 116 | features.update(pdf_features) 117 | except Exception as e: 118 | print(f"Error extracting pdfid features: {e}") 119 | features.update({ 120 | 'Header': '-1', 121 | 'Obj': -1, 122 | 'Endobj': -1, 123 | 'Stream': -1, 124 | 'Endstream': -1, 125 | 'Xref': -1, 126 | 'Trailer': -1, 127 | 'StartXref': -1, 128 | 'PageNo': -1, 129 | 'Encrypt': -1, 130 | 'ObjStm': -1, 131 | 'JS': -1, 132 | 'JavaScript': -1, 133 | 'AA': -1, 134 | 'OpenAction': -1, 135 | 'AcroForm': -1, 136 | 'JBIG2Decode': -1, 137 | 'RichMedia': -1, 138 | 'Launch': -1, 139 | 'EmbeddedFile': -1, 140 | 'XFA': -1, 141 | 'Colors': -1 142 | }) 143 | 144 | return features 145 | # Function to extract features from the PDF file using pdf_feature_extraction.py 146 | def extract_features(pdf_file): 147 | # command = f'python pdf_feature_extraction.py "{pdf_file}"' 148 | # result = subprocess.run(command, shell=True, capture_output=True, text=True) 149 | 150 | # if result.returncode != 0: 151 | # raise ValueError(f"Error in feature extraction: {result.stderr}") 152 | 153 | # # Parse the output JSON string to a dictionary 154 | # features = json.loads(result.stdout) 155 | features = extract_pdf_features(pdf_file) 156 | return features 157 | 158 | def header_to_numeric(header): 159 | if header.startswith('%PDF-'): 160 | return float(header.split('-')[1]) # Extract the version number 161 | return 0 162 | 163 | # Function to predict if the PDF contains malware 164 | def predict_malware(pdf_file, model_path = os.path.join(os.path.dirname(__file__), 'saved_models', 'random_forest_model.pkl')): 165 | # Extract features 166 | features = extract_features(pdf_file) 167 | print(features) 168 | 169 | # Load pre-trained model (replace with the actual path of your model) 170 | model = joblib.load(model_path) 171 | 172 | # Select the required features for prediction 173 | feature_vector = [ 174 | header_to_numeric(features.get('header', '')), 175 | features.get('obj',0), 176 | features.get('endobj',0), 177 | features.get('stream',0), 178 | features.get('endstream',0), 179 | features.get('xref',0), 180 | features.get('trailer',0), 181 | features.get('startxref',0), 182 | features.get('/Page', 0), 183 | features.get('/Encrypt', 0), 184 | features.get('ObjStm', 0), 185 | features.get('/JS',0), 186 | features.get('/JavaScript',0), 187 | features.get('/AA',0), 188 | features.get('/OpenAction',0), 189 | features.get('/AcroForm',0), 190 | features.get('/JBIG2Decode',0), 191 | features.get('/RichMedia',0), 192 | features.get('/Launch',0), 193 | features.get('/EmbeddedFile', 0), 194 | features.get('/XFA',0), 195 | features.get('/Colors',0), 196 | # Add more features as required by the model 197 | ] 198 | 199 | # Predict malware (assuming binary classification: 0 = benign, 1 = malicious) 200 | prediction = model.predict([feature_vector]) 201 | 202 | if prediction[0] == 1: 203 | print("The PDF contains malware.") 204 | else: 205 | print("The PDF is clean.") 206 | return prediction[0] 207 | 208 | # Example usage 209 | if __name__ == "__main__": 210 | pdf_file = r"C:\Users\agraw\Downloads\DAA_UNIT-II_BinarySearch_Notes.pdf" # Replace with the actual PDF file path 211 | # model_path = "malware_model.pkl" # Replace with the actual path of the trained model 212 | predict_malware(pdf_file) 213 | -------------------------------------------------------------------------------- /models/PDF_malware_detection/saved_models/random_forest_model.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/PDF_malware_detection/saved_models/random_forest_model.pkl -------------------------------------------------------------------------------- /models/business_performance_forecasting/data/50_Startups.csv: -------------------------------------------------------------------------------- 1 | R&D Spend,Administration,Marketing Spend,State,Profit 2 | 165349.2,136897.8,471784.1,New York,192261.83 3 | 162597.7,151377.59,443898.53,California,191792.06 4 | 153441.51,101145.55,407934.54,Florida,191050.39 5 | 144372.41,118671.85,383199.62,New York,182901.99 6 | 142107.34,91391.77,366168.42,Florida,166187.94 7 | 131876.9,99814.71,362861.36,New York,156991.12 8 | 134615.46,147198.87,127716.82,California,156122.51 9 | 130298.13,145530.06,323876.68,Florida,155752.6 10 | 120542.52,148718.95,311613.29,New York,152211.77 11 | 123334.88,108679.17,304981.62,California,149759.96 12 | 101913.08,110594.11,229160.95,Florida,146121.95 13 | 100671.96,91790.61,249744.55,California,144259.4 14 | 93863.75,127320.38,249839.44,Florida,141585.52 15 | 91992.39,135495.07,252664.93,California,134307.35 16 | 119943.24,156547.42,256512.92,Florida,132602.65 17 | 114523.61,122616.84,261776.23,New York,129917.04 18 | 78013.11,121597.55,264346.06,California,126992.93 19 | 94657.16,145077.58,282574.31,New York,125370.37 20 | 91749.16,114175.79,294919.57,Florida,124266.9 21 | 86419.7,153514.11,0,New York,122776.86 22 | 76253.86,113867.3,298664.47,California,118474.03 23 | 78389.47,153773.43,299737.29,New York,111313.02 24 | 73994.56,122782.75,303319.26,Florida,110352.25 25 | 67532.53,105751.03,304768.73,Florida,108733.99 26 | 77044.01,99281.34,140574.81,New York,108552.04 27 | 64664.71,139553.16,137962.62,California,107404.34 28 | 75328.87,144135.98,134050.07,Florida,105733.54 29 | 72107.6,127864.55,353183.81,New York,105008.31 30 | 66051.52,182645.56,118148.2,Florida,103282.38 31 | 65605.48,153032.06,107138.38,New York,101004.64 32 | 61994.48,115641.28,91131.24,Florida,99937.59 33 | 61136.38,152701.92,88218.23,New York,97483.56 34 | 63408.86,129219.61,46085.25,California,97427.84 35 | 55493.95,103057.49,214634.81,Florida,96778.92 36 | 46426.07,157693.92,210797.67,California,96712.8 37 | 46014.02,85047.44,205517.64,New York,96479.51 38 | 28663.76,127056.21,201126.82,Florida,90708.19 39 | 44069.95,51283.14,197029.42,California,89949.14 40 | 20229.59,65947.93,185265.1,New York,81229.06 41 | 38558.51,82982.09,174999.3,California,81005.76 42 | 28754.33,118546.05,172795.67,California,78239.91 43 | 27892.92,84710.77,164470.71,Florida,77798.83 44 | 23640.93,96189.63,148001.11,California,71498.49 45 | 15505.73,127382.3,35534.17,New York,69758.98 46 | 22177.74,154806.14,28334.72,California,65200.33 47 | 1000.23,124153.04,1903.93,New York,64926.08 48 | 1315.46,115816.21,297114.46,Florida,49490.75 49 | 0,135426.92,0,California,42559.73 50 | 542.05,51743.15,0,New York,35673.41 51 | 0,116983.8,45173.06,California,14681.4 -------------------------------------------------------------------------------- /models/business_performance_forecasting/model.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import os 3 | model_path = os.path.join(os.path.dirname(__file__), 'saved_models', 'model.pkl') 4 | scaler_path = os.path.join(os.path.dirname(__file__), 'saved_models', 'scaler.pkl') 5 | 6 | 7 | # Load the saved model and scaler 8 | def load_model_and_scaler(): 9 | with open(model_path, 'rb') as model_file: 10 | model = pickle.load(model_file) 11 | with open(scaler_path, 'rb') as scaler_file: 12 | scaler = pickle.load(scaler_file) 13 | 14 | return model, scaler 15 | -------------------------------------------------------------------------------- /models/business_performance_forecasting/notebooks/business_performance_forecasting.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "CazISR8X_HUG" 7 | }, 8 | "source": [ 9 | "# Multiple Linear Regression" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "pOyqYHTk_Q57" 16 | }, 17 | "source": [ 18 | "## Importing the libraries" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": { 25 | "id": "T_YHJjnD_Tja" 26 | }, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np\n", 30 | "import matplotlib.pyplot as plt\n", 31 | "import pandas as pd\n", 32 | "import pickle" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "id": "vgC61-ah_WIz" 39 | }, 40 | "source": [ 41 | "## Importing the dataset" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": { 48 | "id": "UrxyEKGn_ez7" 49 | }, 50 | "outputs": [], 51 | "source": [ 52 | "dataset = pd.read_csv('50_Startups.csv')\n", 53 | "X = dataset.iloc[:, :-1].values\n", 54 | "y = dataset.iloc[:, -1].values" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": null, 60 | "metadata": { 61 | "colab": { 62 | "base_uri": "https://localhost:8080/", 63 | "height": 874 64 | }, 65 | "id": "GOB3QhV9B5kD", 66 | "outputId": "905a7bca-1889-4d04-920f-5f3ed8211585" 67 | }, 68 | "outputs": [ 69 | { 70 | "name": "stdout", 71 | "output_type": "stream", 72 | "text": [ 73 | "[[165349.2 136897.8 471784.1 'New York']\n", 74 | " [162597.7 151377.59 443898.53 'California']\n", 75 | " [153441.51 101145.55 407934.54 'Florida']\n", 76 | " [144372.41 118671.85 383199.62 'New York']\n", 77 | " [142107.34 91391.77 366168.42 'Florida']\n", 78 | " [131876.9 99814.71 362861.36 'New York']\n", 79 | " [134615.46 147198.87 127716.82 'California']\n", 80 | " [130298.13 145530.06 323876.68 'Florida']\n", 81 | " [120542.52 148718.95 311613.29 'New York']\n", 82 | " [123334.88 108679.17 304981.62 'California']\n", 83 | " [101913.08 110594.11 229160.95 'Florida']\n", 84 | " [100671.96 91790.61 249744.55 'California']\n", 85 | " [93863.75 127320.38 249839.44 'Florida']\n", 86 | " [91992.39 135495.07 252664.93 'California']\n", 87 | " [119943.24 156547.42 256512.92 'Florida']\n", 88 | " [114523.61 122616.84 261776.23 'New York']\n", 89 | " [78013.11 121597.55 264346.06 'California']\n", 90 | " [94657.16 145077.58 282574.31 'New York']\n", 91 | " [91749.16 114175.79 294919.57 'Florida']\n", 92 | " [86419.7 153514.11 0.0 'New York']\n", 93 | " [76253.86 113867.3 298664.47 'California']\n", 94 | " [78389.47 153773.43 299737.29 'New York']\n", 95 | " [73994.56 122782.75 303319.26 'Florida']\n", 96 | " [67532.53 105751.03 304768.73 'Florida']\n", 97 | " [77044.01 99281.34 140574.81 'New York']\n", 98 | " [64664.71 139553.16 137962.62 'California']\n", 99 | " [75328.87 144135.98 134050.07 'Florida']\n", 100 | " [72107.6 127864.55 353183.81 'New York']\n", 101 | " [66051.52 182645.56 118148.2 'Florida']\n", 102 | " [65605.48 153032.06 107138.38 'New York']\n", 103 | " [61994.48 115641.28 91131.24 'Florida']\n", 104 | " [61136.38 152701.92 88218.23 'New York']\n", 105 | " [63408.86 129219.61 46085.25 'California']\n", 106 | " [55493.95 103057.49 214634.81 'Florida']\n", 107 | " [46426.07 157693.92 210797.67 'California']\n", 108 | " [46014.02 85047.44 205517.64 'New York']\n", 109 | " [28663.76 127056.21 201126.82 'Florida']\n", 110 | " [44069.95 51283.14 197029.42 'California']\n", 111 | " [20229.59 65947.93 185265.1 'New York']\n", 112 | " [38558.51 82982.09 174999.3 'California']\n", 113 | " [28754.33 118546.05 172795.67 'California']\n", 114 | " [27892.92 84710.77 164470.71 'Florida']\n", 115 | " [23640.93 96189.63 148001.11 'California']\n", 116 | " [15505.73 127382.3 35534.17 'New York']\n", 117 | " [22177.74 154806.14 28334.72 'California']\n", 118 | " [1000.23 124153.04 1903.93 'New York']\n", 119 | " [1315.46 115816.21 297114.46 'Florida']\n", 120 | " [0.0 135426.92 0.0 'California']\n", 121 | " [542.05 51743.15 0.0 'New York']\n", 122 | " [0.0 116983.8 45173.06 'California']]\n" 123 | ] 124 | } 125 | ], 126 | "source": [ 127 | "print(X)" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": { 133 | "id": "VadrvE7s_lS9" 134 | }, 135 | "source": [ 136 | "## Encoding categorical data" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": { 143 | "id": "wV3fD1mbAvsh" 144 | }, 145 | "outputs": [], 146 | "source": [ 147 | "from sklearn.compose import ColumnTransformer\n", 148 | "from sklearn.preprocessing import OneHotEncoder\n", 149 | "ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough')\n", 150 | "X = np.array(ct.fit_transform(X))" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": { 157 | "colab": { 158 | "base_uri": "https://localhost:8080/", 159 | "height": 874 160 | }, 161 | "id": "4ym3HdYeCGYG", 162 | "outputId": "9bd9e71a-bae0-45cb-fa26-9a0d480bb560" 163 | }, 164 | "outputs": [ 165 | { 166 | "name": "stdout", 167 | "output_type": "stream", 168 | "text": [ 169 | "[[0.0 0.0 1.0 165349.2 136897.8 471784.1]\n", 170 | " [1.0 0.0 0.0 162597.7 151377.59 443898.53]\n", 171 | " [0.0 1.0 0.0 153441.51 101145.55 407934.54]\n", 172 | " [0.0 0.0 1.0 144372.41 118671.85 383199.62]\n", 173 | " [0.0 1.0 0.0 142107.34 91391.77 366168.42]\n", 174 | " [0.0 0.0 1.0 131876.9 99814.71 362861.36]\n", 175 | " [1.0 0.0 0.0 134615.46 147198.87 127716.82]\n", 176 | " [0.0 1.0 0.0 130298.13 145530.06 323876.68]\n", 177 | " [0.0 0.0 1.0 120542.52 148718.95 311613.29]\n", 178 | " [1.0 0.0 0.0 123334.88 108679.17 304981.62]\n", 179 | " [0.0 1.0 0.0 101913.08 110594.11 229160.95]\n", 180 | " [1.0 0.0 0.0 100671.96 91790.61 249744.55]\n", 181 | " [0.0 1.0 0.0 93863.75 127320.38 249839.44]\n", 182 | " [1.0 0.0 0.0 91992.39 135495.07 252664.93]\n", 183 | " [0.0 1.0 0.0 119943.24 156547.42 256512.92]\n", 184 | " [0.0 0.0 1.0 114523.61 122616.84 261776.23]\n", 185 | " [1.0 0.0 0.0 78013.11 121597.55 264346.06]\n", 186 | " [0.0 0.0 1.0 94657.16 145077.58 282574.31]\n", 187 | " [0.0 1.0 0.0 91749.16 114175.79 294919.57]\n", 188 | " [0.0 0.0 1.0 86419.7 153514.11 0.0]\n", 189 | " [1.0 0.0 0.0 76253.86 113867.3 298664.47]\n", 190 | " [0.0 0.0 1.0 78389.47 153773.43 299737.29]\n", 191 | " [0.0 1.0 0.0 73994.56 122782.75 303319.26]\n", 192 | " [0.0 1.0 0.0 67532.53 105751.03 304768.73]\n", 193 | " [0.0 0.0 1.0 77044.01 99281.34 140574.81]\n", 194 | " [1.0 0.0 0.0 64664.71 139553.16 137962.62]\n", 195 | " [0.0 1.0 0.0 75328.87 144135.98 134050.07]\n", 196 | " [0.0 0.0 1.0 72107.6 127864.55 353183.81]\n", 197 | " [0.0 1.0 0.0 66051.52 182645.56 118148.2]\n", 198 | " [0.0 0.0 1.0 65605.48 153032.06 107138.38]\n", 199 | " [0.0 1.0 0.0 61994.48 115641.28 91131.24]\n", 200 | " [0.0 0.0 1.0 61136.38 152701.92 88218.23]\n", 201 | " [1.0 0.0 0.0 63408.86 129219.61 46085.25]\n", 202 | " [0.0 1.0 0.0 55493.95 103057.49 214634.81]\n", 203 | " [1.0 0.0 0.0 46426.07 157693.92 210797.67]\n", 204 | " [0.0 0.0 1.0 46014.02 85047.44 205517.64]\n", 205 | " [0.0 1.0 0.0 28663.76 127056.21 201126.82]\n", 206 | " [1.0 0.0 0.0 44069.95 51283.14 197029.42]\n", 207 | " [0.0 0.0 1.0 20229.59 65947.93 185265.1]\n", 208 | " [1.0 0.0 0.0 38558.51 82982.09 174999.3]\n", 209 | " [1.0 0.0 0.0 28754.33 118546.05 172795.67]\n", 210 | " [0.0 1.0 0.0 27892.92 84710.77 164470.71]\n", 211 | " [1.0 0.0 0.0 23640.93 96189.63 148001.11]\n", 212 | " [0.0 0.0 1.0 15505.73 127382.3 35534.17]\n", 213 | " [1.0 0.0 0.0 22177.74 154806.14 28334.72]\n", 214 | " [0.0 0.0 1.0 1000.23 124153.04 1903.93]\n", 215 | " [0.0 1.0 0.0 1315.46 115816.21 297114.46]\n", 216 | " [1.0 0.0 0.0 0.0 135426.92 0.0]\n", 217 | " [0.0 0.0 1.0 542.05 51743.15 0.0]\n", 218 | " [1.0 0.0 0.0 0.0 116983.8 45173.06]]\n" 219 | ] 220 | } 221 | ], 222 | "source": [ 223 | "print(X)" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": { 229 | "id": "WemVnqgeA70k" 230 | }, 231 | "source": [ 232 | "## Splitting the dataset into the Training set and Test set" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": { 239 | "id": "Kb_v_ae-A-20" 240 | }, 241 | "outputs": [], 242 | "source": [ 243 | "from sklearn.model_selection import train_test_split\n", 244 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": { 250 | "id": "k-McZVsQBINc" 251 | }, 252 | "source": [ 253 | "## Training the Multiple Linear Regression model on the Training set" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "metadata": { 260 | "colab": { 261 | "base_uri": "https://localhost:8080/", 262 | "height": 34 263 | }, 264 | "id": "ywPjx0L1BMiD", 265 | "outputId": "3417c2b0-6871-423c-a81f-643e35ae9f3e" 266 | }, 267 | "outputs": [ 268 | { 269 | "data": { 270 | "text/plain": [ 271 | "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)" 272 | ] 273 | }, 274 | "execution_count": 7, 275 | "metadata": { 276 | "tags": [] 277 | }, 278 | "output_type": "execute_result" 279 | } 280 | ], 281 | "source": [ 282 | "from sklearn.linear_model import LinearRegression\n", 283 | "regressor = LinearRegression()\n", 284 | "regressor.fit(X_train, y_train)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": { 290 | "id": "xNkXL1YQBiBT" 291 | }, 292 | "source": [ 293 | "## Predicting the Test set results" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": { 300 | "colab": { 301 | "base_uri": "https://localhost:8080/", 302 | "height": 188 303 | }, 304 | "id": "TQKmwvtdBkyb", 305 | "outputId": "72da0067-f2e3-48d3-fae7-86ddbf597e5e" 306 | }, 307 | "outputs": [ 308 | { 309 | "name": "stdout", 310 | "output_type": "stream", 311 | "text": [ 312 | "[[103015.2 103282.38]\n", 313 | " [132582.28 144259.4 ]\n", 314 | " [132447.74 146121.95]\n", 315 | " [ 71976.1 77798.83]\n", 316 | " [178537.48 191050.39]\n", 317 | " [116161.24 105008.31]\n", 318 | " [ 67851.69 81229.06]\n", 319 | " [ 98791.73 97483.56]\n", 320 | " [113969.44 110352.25]\n", 321 | " [167921.07 166187.94]]\n" 322 | ] 323 | } 324 | ], 325 | "source": [ 326 | "y_pred = regressor.predict(X_test)\n", 327 | "np.set_printoptions(precision=2)\n", 328 | "print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": { 334 | "id": "MC-XRwjE6x6M" 335 | }, 336 | "source": [ 337 | "# Saving the model\n" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "metadata": { 344 | "id": "HaEuLbtg_76M" 345 | }, 346 | "outputs": [], 347 | "source": [ 348 | "import os\n", 349 | "model_path = os.path.abspath(\"model.pkl\")\n", 350 | "scaler_path = os.path.abspath(\"scaler.pkl\")\n", 351 | "\n", 352 | "# Save the model and preprocessing objects\n", 353 | "with open(model_path, 'wb') as model_file:\n", 354 | " pickle.dump(model, model_file)\n", 355 | "\n", 356 | "with open(scaler_path, 'wb') as scaler_file:\n", 357 | " pickle.dump(ct, scaler_file)\n", 358 | "\n", 359 | "print(f\"Model saved at: {model_path}\")\n", 360 | "print(f\"Preprocessor saved at: {scaler_path}\")" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": { 366 | "id": "UwurUG9r63EK" 367 | }, 368 | "source": [ 369 | "# New Section" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": null, 375 | "metadata": { 376 | "id": "wmPoacS7eWMt" 377 | }, 378 | "outputs": [], 379 | "source": [ 380 | "def model_evaluation(train_X, train_Y, test_X, test_Y, output_file=\"evaluation_results.pkl\"):\n", 381 | " # Calculate R^2 score\n", 382 | " train_r2 = r2_score(train_Y, model.predict(train_X))\n", 383 | " test_r2 = r2_score(test_Y, y_pred)\n", 384 | "\n", 385 | " \n", 386 | " \n", 387 | "\n", 388 | " # Package results\n", 389 | " results = {\n", 390 | " \"Train_R2\": train_r2,\n", 391 | " \"Test_R2\": test_r2,\n", 392 | " \"plot_file\": plot_file # Save the plot file path\n", 393 | " }\n", 394 | "\n", 395 | " # Save results to a pickle file\n", 396 | " with open(output_file, \"wb\") as f:\n", 397 | " pickle.dump(results, f)\n", 398 | "\n", 399 | " print(f\"Evaluation data saved to {output_file}\")\n", 400 | " \n", 401 | "# Run this function once to generate the evaluation file\n", 402 | "model_evaluation(X_train, y_train, X_test, y_test)" 403 | ] 404 | } 405 | ], 406 | "metadata": { 407 | "colab": { 408 | "provenance": [] 409 | }, 410 | "kernelspec": { 411 | "display_name": "Python 3", 412 | "name": "python3" 413 | } 414 | }, 415 | "nbformat": 4, 416 | "nbformat_minor": 0 417 | } 418 | -------------------------------------------------------------------------------- /models/business_performance_forecasting/predict.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import pandas as pd 4 | import seaborn as sns 5 | import matplotlib.pyplot as plt 6 | import pickle 7 | from models.business_performance_forecasting.model import load_model_and_scaler # Import the function from model.py 8 | 9 | # Define the prediction function 10 | def get_prediction(RnD_Spend, Administration, Marketing_Spend, State): 11 | # Load the model and scalers 12 | model, scaler = load_model_and_scaler() 13 | # Prepare input features as a NumPy array 14 | input_data = np.array([[RnD_Spend, Administration, Marketing_Spend, State]]) 15 | 16 | # Apply the scaler 17 | scaled_data = scaler.transform(input_data) 18 | scaled_data = scaled_data.astype(float) 19 | 20 | # Make prediction using the loaded model 21 | prediction = model.predict(scaled_data) 22 | 23 | return prediction[0] # Return the predicted profit 24 | 25 | 26 | class ModelEvaluation: 27 | def __init__(self): 28 | metrics_file= os.path.join(os.path.dirname(__file__), 'saved_models', 'evaluation_results.pkl') 29 | # Load evaluation metrics from a pickle file 30 | with open(metrics_file, "rb") as f: 31 | self.metrics = pickle.load(f) 32 | print("Loaded metrics:", self.metrics) 33 | def evaluate(self): 34 | metrics = self.metrics 35 | return metrics, None, None, None 36 | 37 | def model_details(): 38 | evaluator = ModelEvaluation() 39 | return evaluator 40 | 41 | -------------------------------------------------------------------------------- /models/business_performance_forecasting/saved_models/evaluation_results.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/business_performance_forecasting/saved_models/evaluation_results.pkl -------------------------------------------------------------------------------- /models/business_performance_forecasting/saved_models/model.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/business_performance_forecasting/saved_models/model.pkl -------------------------------------------------------------------------------- /models/business_performance_forecasting/saved_models/scaler.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/business_performance_forecasting/saved_models/scaler.pkl -------------------------------------------------------------------------------- /models/credit_card_fraud/model.py: -------------------------------------------------------------------------------- 1 | # importing libraries 2 | from sklearn.model_selection import train_test_split 3 | from sklearn.svm import SVC 4 | import pandas as pd 5 | import warnings 6 | from sklearn.metrics import accuracy_score, classification_report, confusion_matrix 7 | import pandas as pd 8 | import pickle 9 | from models.credit_card_fraud.modelEvaluation import ModelEvaluation 10 | warnings.filterwarnings("ignore") 11 | 12 | 13 | # reading dataset 14 | data = pd.read_csv("models/credit_card_fraud/data/creditcardcsvpresent.csv") 15 | df = data.copy(deep=True) 16 | 17 | # df.info() 18 | 19 | # remove transaction_date all values are null 20 | # and also remove merchant id 21 | df = df.drop(columns=['Merchant_id', 'Transaction date'], axis=1) 22 | 23 | 24 | # encoding for qualitative variables 25 | code = { 26 | "N": 0, 27 | "Y": 1 } 28 | 29 | for obj in df.select_dtypes("object"): 30 | df[obj] = df[obj].map(code) 31 | 32 | # Target and Feature Identification 33 | target = "isFradulent" 34 | features = [col for col in df.columns if col != target] 35 | 36 | X = df[features] # Create a DataFrame for the features 37 | y = df[target] # Create a Series for the target 38 | 39 | 40 | # Split the dataset 41 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 42 | 43 | # Train SVM Classifier 44 | svm_model = SVC(kernel='rbf', class_weight='balanced', random_state=42) # RBF kernel (default) is good for non-linear problems 45 | svm_model.fit(X_train, y_train) 46 | 47 | # Make predictions 48 | y_pred = svm_model.predict(X_test) 49 | 50 | # Function to prepare input data into a DataFrame 51 | def prepare_input_data( 52 | avg_amount_per_day, 53 | transaction_amount, 54 | Is_declined, 55 | no_of_declines_per_day, 56 | Is_Foreign_transaction, 57 | Is_High_Risk_country, 58 | Daily_chargeback_avg_amt, 59 | six_month_avg_chbk_amt, 60 | six_month_chbk_freq, 61 | ): 62 | # Create a DataFrame with the input data 63 | input_data = { 64 | "Average Amount/transaction/day": [avg_amount_per_day], 65 | "Transaction_amount": [transaction_amount], 66 | "Is declined": [Is_declined], 67 | "Total Number of declines/day": [no_of_declines_per_day], 68 | "isForeignTransaction": [Is_Foreign_transaction], 69 | "isHighRiskCountry": [Is_High_Risk_country], 70 | "Daily_chargeback_avg_amt": [Daily_chargeback_avg_amt], 71 | "6_month_avg_chbk_amt": [six_month_avg_chbk_amt], 72 | "6-month_chbk_freq": [six_month_chbk_freq], 73 | } 74 | 75 | return pd.DataFrame(input_data) 76 | 77 | def get_prediction( 78 | avg_amount_per_day, 79 | transaction_amount, 80 | Is_declined, 81 | no_of_declines_per_day, 82 | Is_Foreign_transaction, 83 | Is_High_Risk_country, 84 | Daily_chargeback_avg_amt, 85 | six_month_avg_chbk_amt, 86 | six_month_chbk_freq, 87 | ): 88 | # Prepare the input data 89 | input_df = prepare_input_data( 90 | avg_amount_per_day, 91 | transaction_amount, 92 | Is_declined, 93 | no_of_declines_per_day, 94 | Is_Foreign_transaction, 95 | Is_High_Risk_country, 96 | Daily_chargeback_avg_amt, 97 | six_month_avg_chbk_amt, 98 | six_month_chbk_freq, 99 | ) 100 | # Predict using Random Forest 101 | predicted_value = svm_model.predict(input_df) 102 | 103 | # Return "Fraud" if fraud (1), else "Not a Fraud" 104 | return "Fraud" if predicted_value[0] == 1 else "Not a Fraud" 105 | 106 | 107 | # Function to save the model 108 | def save_model(): 109 | # Save the Random Forest model 110 | model_filename = 'creditCardFraud_svc_model.pkl' 111 | with open(model_filename, 'wb') as file: 112 | pickle.dump(svm_model, file) 113 | 114 | # # Function to evaluate accuracy 115 | def get_evaluator(): 116 | evaluator = ModelEvaluation(svm_model, X_train, y_train, X_test, y_test) 117 | return evaluator 118 | 119 | # save_model() 120 | -------------------------------------------------------------------------------- /models/credit_card_fraud/modelEvaluation.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import matplotlib.pyplot as plt 4 | import seaborn as sns 5 | from sklearn.metrics import accuracy_score, confusion_matrix 6 | 7 | 8 | class ModelEvaluation: 9 | def __init__(self, model, train_X, train_Y, test_X, test_Y): 10 | self.model = model 11 | self.train_X = train_X 12 | self.train_Y = train_Y 13 | self.test_X = test_X 14 | self.test_Y = test_Y 15 | self.evaluation_matrix = pd.DataFrame( 16 | np.zeros([1, 8]), 17 | columns=[ 18 | "Train-R2", 19 | "Test-R2", 20 | "Train-RSS", 21 | "Test-RSS", 22 | "Train-MSE", 23 | "Test-MSE", 24 | "Train-RMSE", 25 | "Test-RMSE", 26 | ], 27 | ) 28 | self.random_column = np.random.choice( 29 | train_X.columns[train_X.nunique() >= 50], 1, replace=False 30 | )[0] 31 | 32 | def evaluate(self): 33 | pred_train = self.model.predict(self.train_X) 34 | pred_test = self.model.predict(self.test_X) 35 | 36 | self.update_evaluation_matrix(pred_train, pred_test) 37 | metrics = self.get_metrics() 38 | prediction_plot = self.plot_predictions(pred_train) 39 | error_plot = self.plot_error_terms(pred_train) 40 | 41 | # adding performance graph of the model 42 | performance_plot = self.plot_performance_graph() 43 | 44 | return metrics, prediction_plot, error_plot, performance_plot 45 | 46 | def get_metrics(self): 47 | """Return a dictionary of evaluation metrics for easy integration.""" 48 | pred_train = self.model.predict(self.train_X) 49 | pred_test = self.model.predict(self.test_X) 50 | 51 | metrics = { 52 | "Train_R2": accuracy_score(self.train_Y, pred_train), 53 | "Test_R2": accuracy_score(self.test_Y, pred_test), 54 | "Train_RSS": np.sum(np.square(self.train_Y - pred_train)), 55 | "Test_RSS": np.sum(np.square(self.test_Y - pred_test)) 56 | } 57 | return metrics 58 | 59 | def plot_predictions(self, pred_train): 60 | # Predict on test data 61 | pred_test = self.model.predict(self.test_X) 62 | 63 | # Calculate confusion matrix 64 | cm = confusion_matrix(self.test_Y, pred_test) 65 | 66 | # Plot confusion matrix 67 | fig, ax = plt.subplots(figsize=(10, 6)) 68 | sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax) 69 | 70 | ax.set_title("Confusion Matrix") 71 | ax.set_xlabel("Predicted Labels") 72 | ax.set_ylabel("True Labels") 73 | 74 | plt.tight_layout() 75 | return fig 76 | 77 | def update_evaluation_matrix(self, pred_train, pred_test): 78 | return 79 | 80 | # making a separate function for plotting error terms 81 | def plot_error_terms(self, pred_train): 82 | fig, axes = plt.subplots(figsize=(15, 6)) 83 | 84 | # Plotting error distribution 85 | sns.histplot(self.train_Y - pred_train, bins=30, kde=True, ax=axes) 86 | axes.set_title("Error Terms Distribution") 87 | axes.set_xlabel("Errors") 88 | 89 | plt.tight_layout() 90 | return fig # returning figure the is created here 91 | 92 | def plot_performance_graph(self): 93 | # Predict on test data 94 | pred_test = self.model.predict(self.test_X) 95 | 96 | # Calculate confusion matrix 97 | cm = confusion_matrix(self.test_Y, pred_test) 98 | 99 | # Plot confusion matrix 100 | fig, ax = plt.subplots(figsize=(10, 6)) 101 | sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax) 102 | 103 | ax.set_title("Confusion Matrix") 104 | ax.set_xlabel("Predicted Labels") 105 | ax.set_ylabel("True Labels") 106 | 107 | plt.tight_layout() 108 | return fig 109 | -------------------------------------------------------------------------------- /models/credit_card_fraud/predict.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import pickle 3 | from models.credit_card_fraud.model import get_evaluator 4 | 5 | 6 | def load_model(model_path): 7 | """ Load the trained Random Forest model from the specified path. """ 8 | with open(model_path, 'rb') as file: 9 | return pickle.load(file) 10 | 11 | 12 | def prepare_input_data( 13 | avg_amount_per_day, 14 | transaction_amount, 15 | Is_declined, 16 | no_of_declines_per_day, 17 | Is_Foreign_transaction, 18 | Is_High_Risk_country, 19 | Daily_chargeback_avg_amt, 20 | six_month_avg_chbk_amt, 21 | six_month_chbk_freq, 22 | ): 23 | # Create a DataFrame with the input data 24 | input_data = { 25 | "Average Amount/transaction/day": [avg_amount_per_day], 26 | "Transaction_amount": [transaction_amount], 27 | "Is declined": [Is_declined], 28 | "Total Number of declines/day": [no_of_declines_per_day], 29 | "isForeignTransaction": [Is_Foreign_transaction], 30 | "isHighRiskCountry": [Is_High_Risk_country], 31 | "Daily_chargeback_avg_amt": [Daily_chargeback_avg_amt], 32 | "6_month_avg_chbk_amt": [six_month_avg_chbk_amt], 33 | "6-month_chbk_freq": [six_month_chbk_freq], 34 | } 35 | 36 | return pd.DataFrame(input_data) 37 | 38 | 39 | def get_prediction( 40 | avg_amount_per_day, 41 | transaction_amount, 42 | Is_declined, 43 | no_of_declines_per_day, 44 | Is_Foreign_transaction, 45 | Is_High_Risk_country, 46 | Daily_chargeback_avg_amt, 47 | six_month_avg_chbk_amt, 48 | six_month_chbk_freq, 49 | ): 50 | 51 | # Convert "no" to 0 and "yes" to 1 for the relevant fields 52 | Is_declined = 0 if Is_declined.lower() == "no" else 1 53 | Is_Foreign_transaction = 0 if Is_Foreign_transaction.lower() == "no" else 1 54 | Is_High_Risk_country = 0 if Is_High_Risk_country.lower() == "no" else 1 55 | 56 | # Prepare the input data 57 | input_df = prepare_input_data( 58 | avg_amount_per_day, 59 | transaction_amount, 60 | Is_declined, 61 | no_of_declines_per_day, 62 | Is_Foreign_transaction, 63 | Is_High_Risk_country, 64 | Daily_chargeback_avg_amt, 65 | six_month_avg_chbk_amt, 66 | six_month_chbk_freq, 67 | ) 68 | # print(input_df.values) 69 | # Load the model 70 | svm_model = load_model("models/credit_card_fraud/saved_models/creditCardFraud_svc_model.pkl") 71 | 72 | # Predict using Random Forest 73 | predicted_value = svm_model.predict(input_df) 74 | 75 | # Return "Fraud" if fraud (1), else "Not a Fraud" 76 | return "Fraud" if predicted_value[0] == 1 else "Not a Fraud" 77 | 78 | def model_details(): 79 | """Returns model evaluation details.""" 80 | return get_evaluator() 81 | -------------------------------------------------------------------------------- /models/credit_card_fraud/saved_models/creditCardFraud_svc_model.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/credit_card_fraud/saved_models/creditCardFraud_svc_model.pkl -------------------------------------------------------------------------------- /models/customer_income/model.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import numpy as np 3 | 4 | 5 | def load_model(): 6 | with open('models/customer_income/saved_models/CImodel.pkl', 'rb') as model_file: 7 | model = pickle.load(model_file) 8 | with open('models/customer_income/saved_models/CIscaler.pkl', 'rb') as scaler_file: 9 | scaler = pickle.load(scaler_file) 10 | return model, scaler 11 | 12 | 13 | def predict(features): 14 | model, scaler = load_model() 15 | features_array = np.array(features).reshape(1, -1) 16 | scaled_features = scaler.transform(features_array) 17 | prediction = model.predict(scaled_features) 18 | return prediction 19 | 20 | 21 | -------------------------------------------------------------------------------- /models/customer_income/predict.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import pickle 3 | from models.customer_income.model import predict 4 | 5 | def load_model(): 6 | with open('models/customer_income/saved_models/feature_names.pkl', 'rb') as feature_file: 7 | feature_names = pickle.load(feature_file) 8 | return feature_names 9 | 10 | def get_prediction(age, workclass, fnlwgt, education, marital_status, relationship, occupation, sex, race, capital_gain, capital_loss, hours_per_week, native_country): 11 | input_dict = { 12 | 'age': age, 13 | 'fnlwgt': fnlwgt, 14 | 'capital-gain': capital_gain, 15 | 'capital-loss': capital_loss, 16 | 'hours-per-week': hours_per_week, 17 | 'workclass_' + workclass: 1, 18 | 'education_' + education: 1, 19 | 'marital-status_' + marital_status: 1, 20 | 'relationship_' + relationship: 1, 21 | 'occupation_' + occupation: 1, 22 | 'sex_' + sex: 1, 23 | 'race_' + race: 1, 24 | 'native-country_' + native_country: 1, 25 | } 26 | 27 | feature_names = load_model() 28 | 29 | input_df = pd.DataFrame(0, index=[0], columns=feature_names) 30 | 31 | for key, value in input_dict.items(): 32 | if key in input_df.columns: 33 | input_df[key] = value 34 | else: 35 | print(f"Warning: {key} not found in feature columns.") 36 | 37 | 38 | result = predict(input_df) 39 | 40 | if result == 1: 41 | return "The person earns more than $50,000 per year." 42 | else: 43 | return "The person earns less than or equal to $50,000 per year." 44 | -------------------------------------------------------------------------------- /models/customer_income/saved_models/CImodel.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/customer_income/saved_models/CImodel.pkl -------------------------------------------------------------------------------- /models/customer_income/saved_models/CIscaler.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/customer_income/saved_models/CIscaler.pkl -------------------------------------------------------------------------------- /models/customer_income/saved_models/feature_names.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/customer_income/saved_models/feature_names.pkl -------------------------------------------------------------------------------- /models/gold_price_prediction/model.py: -------------------------------------------------------------------------------- 1 | from joblib import load 2 | 3 | # Load the trained model for gold price prediction 4 | model = load('models/gold_price_prediction/saved_models/random_forest_model.joblib') 5 | 6 | def gold_price_prediction(spx, uso, slv, eur_usd): 7 | # Feature extraction 8 | features = [ 9 | float(spx), 10 | float(uso), 11 | float(slv), 12 | float(eur_usd) 13 | ] 14 | 15 | # Predict the gold price (GLD) 16 | prediction = model.predict([features])[0] 17 | 18 | return prediction 19 | -------------------------------------------------------------------------------- /models/gold_price_prediction/predict.py: -------------------------------------------------------------------------------- 1 | from models.gold_price_prediction.model import gold_price_prediction 2 | 3 | def get_prediction(spx, uso, slv, eur_usd): 4 | # Call the function that makes the prediction using input features 5 | return gold_price_prediction(spx, uso, slv, eur_usd) 6 | -------------------------------------------------------------------------------- /models/gold_price_prediction/saved_models/random_forest_model.joblib: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/gold_price_prediction/saved_models/random_forest_model.joblib -------------------------------------------------------------------------------- /models/house_price/ImprovedModel.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | from sklearn.model_selection import train_test_split, cross_val_score 3 | from sklearn.preprocessing import StandardScaler, OneHotEncoder 4 | from sklearn.compose import ColumnTransformer 5 | from sklearn.pipeline import Pipeline 6 | from sklearn.feature_selection import RFE 7 | from sklearn.linear_model import LinearRegression 8 | from sklearn.ensemble import RandomForestRegressor 9 | import warnings 10 | import pickle 11 | from .ModelEvaluation import ModelEvaluation 12 | import os 13 | import logging 14 | import streamlit as st 15 | import numpy as np 16 | warnings.filterwarnings("ignore") 17 | 18 | # Define the directory for logs 19 | log_directory = 'models/house_price/logs' 20 | os.makedirs(log_directory, exist_ok=True) # Create the directory if it doesn't exist 21 | 22 | # Set up logging 23 | log_file = os.path.join(log_directory, 'model_training.log') 24 | logging.basicConfig( 25 | filename=log_file, 26 | level=logging.INFO, 27 | format='%(asctime)s - %(levelname)s - %(message)s' 28 | ) 29 | 30 | df = pd.read_csv("models/house_price/data/housing.csv") 31 | original_df = df.copy(deep=True) 32 | 33 | # Target and Feature Identification 34 | target = "price" 35 | features = [col for col in df.columns if col != target] 36 | 37 | # Separates numerical and categorical features based on unique values 38 | nu = df[features].nunique() 39 | numerical_features = [col for col in features if nu[col] > 16] 40 | categorical_features = [col for col in features if nu[col] <= 16] 41 | 42 | # Removing outliers using IQR 43 | def remove_outliers(df, numerical_features): 44 | for feature in numerical_features: 45 | Q1 = df[feature].quantile(0.25) 46 | Q3 = df[feature].quantile(0.75) 47 | IQR = Q3 - Q1 48 | df = df[(df[feature] >= (Q1 - 1.5 * IQR)) & (df[feature] <= (Q3 + 1.5 * IQR))] 49 | return df.reset_index(drop=True) 50 | 51 | 52 | # Handling missing values 53 | def handle_missing_values(df): 54 | null_summary = df.isnull().sum() 55 | null_percentage = (null_summary / df.shape[0]) * 100 56 | return pd.DataFrame( 57 | {"Total Null Values": null_summary, "Percentage": null_percentage} 58 | ).sort_values(by="Percentage", ascending=False) 59 | 60 | 61 | # Removes outliers from numerical features 62 | df = remove_outliers(df, numerical_features) 63 | 64 | # Filters categorical features without missing values 65 | null_value_summary = handle_missing_values(df) 66 | valid_categorical_features = [ 67 | col 68 | for col in categorical_features 69 | if col not in null_value_summary[null_value_summary["Percentage"] != 0].index 70 | ] 71 | 72 | # Encoding categorical features 73 | def encode_categorical_features(df, categorical_features): 74 | for feature in categorical_features: 75 | # Binary encoding for features with 2 unique values 76 | if df[feature].nunique() == 2: 77 | df[feature] = pd.get_dummies(df[feature], drop_first=True, prefix=feature) 78 | # Dummy encoding for features with more than 2 unique values 79 | elif 2 < df[feature].nunique() <= 16: 80 | df = pd.concat( 81 | [ 82 | df.drop([feature], axis=1), 83 | pd.get_dummies(df[feature], drop_first=True, prefix=feature), 84 | ], 85 | axis=1, 86 | ) 87 | return df 88 | 89 | df = encode_categorical_features(df, valid_categorical_features) 90 | 91 | # Renames columns to avoid invalid characters 92 | df.columns = [col.replace("-", "_").replace(" ", "_") for col in df.columns] 93 | 94 | # Splitting the data into training & testing sets 95 | X = df.drop([target], axis=1) 96 | Y = df[target] 97 | Train_X, Test_X, Train_Y, Test_Y = train_test_split( 98 | X, Y, train_size=0.8, test_size=0.2, random_state=100 99 | ) 100 | 101 | # Feature Scaling (Standardization) 102 | std = StandardScaler() 103 | Train_X_std = pd.DataFrame(std.fit_transform(Train_X), columns=X.columns) 104 | Test_X_std = pd.DataFrame(std.transform(Test_X), columns=X.columns) 105 | 106 | #Random Forest Algorithm 107 | rf_model = RandomForestRegressor(random_state=42, n_estimators=200, max_depth=8, min_samples_split=12) 108 | rf_model.fit(Train_X_std, Train_Y) 109 | 110 | 111 | pred_train = rf_model.predict(Train_X_std) 112 | pred_test = rf_model.predict(Test_X_std) 113 | 114 | # Calculate RMSE for train and test sets 115 | # train_rmse = np.sqrt(mean_squared_error(Train_Y, pred_train)) 116 | # test_rmse = np.sqrt(mean_squared_error(Test_Y, pred_test)) 117 | 118 | 119 | def prepare_input_data( 120 | area, 121 | mainroad, 122 | guestroom, 123 | basement, 124 | hotwaterheating, 125 | airconditioning, 126 | prefarea, 127 | additional_bedrooms, 128 | bathrooms, 129 | stories, 130 | parking, 131 | furnishingstatus, 132 | ): 133 | # Creates a dictionary for the input features 134 | input_data = { 135 | "area": [area], 136 | "mainroad": True if mainroad == "Yes" else False, 137 | "guestroom": True if guestroom == "Yes" else False, 138 | "basement": True if basement == "Yes" else False, 139 | "hotwaterheating": True if hotwaterheating == "Yes" else False, 140 | "airconditioning": True if airconditioning == "Yes" else False, 141 | "prefarea": True if prefarea == "Yes" else False, 142 | "bedrooms_2": additional_bedrooms == 2, 143 | "bedrooms_3": additional_bedrooms == 3, 144 | "bedrooms_4": additional_bedrooms == 4, 145 | "bedrooms_5": additional_bedrooms == 5, 146 | "bedrooms_6": additional_bedrooms == 6, 147 | "bathrooms_2": bathrooms == 2, 148 | "bathrooms_3": bathrooms == 3, 149 | "bathrooms_4": bathrooms == 4, 150 | "stories_2": stories == 2, 151 | "stories_3": stories == 3, 152 | "stories_4": stories == 4, 153 | "parking_1": parking == 1, 154 | "parking_2": parking == 2, 155 | "parking_3": parking == 3, 156 | "furnishingstatus_semi_furnished": furnishingstatus == "semi_furnished", 157 | "furnishingstatus_unfurnished": furnishingstatus == "unfurnished", 158 | } 159 | 160 | return pd.DataFrame(input_data) 161 | 162 | # Note: Not removing this fxn because of the warning in predict.py file 163 | 164 | 165 | ### Final Endpoint ### 166 | def get_predicted(area=0, mainroad=False, guestroom=False, basement=False, hotwaterheating=False, 167 | airconditioning=False, prefarea=False,bedrooms=0, bathrooms=2,stories=1, parking=1, 168 | furnishingstatus="semi_furnished",): 169 | 170 | input_df = prepare_input_data(area, mainroad, guestroom,basement, hotwaterheating, airconditioning, prefarea, 171 | bedrooms, bathrooms, stories, parking, furnishingstatus) 172 | 173 | input_std = pd.DataFrame(std.transform(input_df), columns=input_df.columns) 174 | predicted_price = rf_model.predict(input_std) 175 | return round(predicted_price[0],2) 176 | 177 | def save_model(): 178 | # todo: Ask the user for the model name, and warn that the model will be overwritten 179 | 180 | with open("./saved_models/model_02.pkl", "wb") as file: 181 | pickle.dump(rf_model, file) 182 | 183 | 184 | def save_scaler(): 185 | with open("./saved_models/scaler_02.pkl", "wb") as file: 186 | pickle.dump(std, file) 187 | 188 | 189 | def get_evaluator(): 190 | evaluator = ModelEvaluation(rf_model, Train_X_std, Train_Y, Test_X_std, Test_Y) 191 | return evaluator 192 | 193 | if __name__ == "__main__": 194 | save_model() 195 | save_scaler() 196 | # model_evaluation() 197 | -------------------------------------------------------------------------------- /models/house_price/ModelEvaluation.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import matplotlib.pyplot as plt 4 | import seaborn as sns 5 | from sklearn.metrics import r2_score, mean_squared_error 6 | 7 | 8 | class ModelEvaluation: 9 | def __init__(self, model, train_X, train_Y, test_X, test_Y): 10 | self.model = model 11 | self.train_X = train_X 12 | self.train_Y = train_Y 13 | self.test_X = test_X 14 | self.test_Y = test_Y 15 | self.evaluation_matrix = pd.DataFrame( 16 | np.zeros([1, 8]), 17 | columns=[ 18 | "Train-R2", 19 | "Test-R2", 20 | "Train-RSS", 21 | "Test-RSS", 22 | "Train-MSE", 23 | "Test-MSE", 24 | "Train-RMSE", 25 | "Test-RMSE", 26 | ], 27 | ) 28 | self.random_column = np.random.choice( 29 | train_X.columns[train_X.nunique() >= 50], 1, replace=False 30 | )[0] 31 | 32 | def evaluate(self): 33 | pred_train = self.model.predict(self.train_X) 34 | pred_test = self.model.predict(self.test_X) 35 | 36 | self.update_evaluation_matrix(pred_train, pred_test) 37 | metrics = self.get_metrics() 38 | prediction_plot = self.plot_predictions(pred_train) 39 | error_plot = self.plot_error_terms(pred_train) 40 | 41 | #adding performance graph of the model 42 | performance_plot = self.plot_performance_graph() 43 | 44 | return metrics, prediction_plot, error_plot, performance_plot 45 | 46 | def get_metrics(self): 47 | """Return a dictionary of evaluation metrics for easy integration.""" 48 | pred_train = self.model.predict(self.train_X) 49 | pred_test = self.model.predict(self.test_X) 50 | 51 | metrics = { 52 | "Train_R2": r2_score(self.train_Y, pred_train), 53 | "Test_R2": r2_score(self.test_Y, pred_test), 54 | "Train_RSS": np.sum(np.square(self.train_Y - pred_train)), 55 | "Test_RSS": np.sum(np.square(self.test_Y - pred_test)), 56 | "Train_MSE": mean_squared_error(self.train_Y, pred_train), 57 | "Test_MSE": mean_squared_error(self.test_Y, pred_test), 58 | "Train_RMSE": np.sqrt(mean_squared_error(self.train_Y, pred_train)), 59 | "Test_RMSE": np.sqrt(mean_squared_error(self.test_Y, pred_test)), 60 | } 61 | return metrics 62 | 63 | def plot_predictions(self, pred_train): 64 | fig, axes = plt.subplots(figsize=(15, 6)) 65 | 66 | # Plotting actual vs predicted 67 | axes.scatter(self.train_Y, pred_train, alpha=0.6) 68 | axes.plot( 69 | [self.train_Y.min(), self.train_Y.max()], 70 | [self.train_Y.min(), self.train_Y.max()], 71 | "r--", 72 | ) 73 | axes.set_title("Actual vs Predicted Prices") 74 | axes.set_xlabel("Actual Price") 75 | axes.set_ylabel("Predicted Price") 76 | 77 | plt.legend() 78 | plt.grid() 79 | plt.tight_layout() 80 | 81 | return fig #returning figure the is created here 82 | 83 | def update_evaluation_matrix(self, pred_train, pred_test): 84 | self.evaluation_matrix.loc[0] = [ 85 | r2_score(self.train_Y, pred_train), 86 | r2_score(self.test_Y, pred_test), 87 | np.sum(np.square(self.train_Y - pred_train)), 88 | np.sum(np.square(self.test_Y - pred_test)), 89 | mean_squared_error(self.train_Y, pred_train), 90 | mean_squared_error(self.test_Y, pred_test), 91 | np.sqrt(mean_squared_error(self.train_Y, pred_train)), 92 | np.sqrt(mean_squared_error(self.test_Y, pred_test)), 93 | ] 94 | 95 | #making a separate function for plotting error terms 96 | def plot_error_terms(self, pred_train): 97 | fig, axes = plt.subplots( figsize=(15, 6)) 98 | 99 | # Plotting error distribution 100 | sns.histplot(self.train_Y - pred_train, bins=30, kde=True, ax=axes) 101 | axes.set_title("Error Terms Distribution") 102 | axes.set_xlabel("Errors") 103 | 104 | plt.tight_layout() 105 | return fig #returning figure the is created here 106 | 107 | def plot_performance_graph(self): 108 | metrics = self.get_metrics() 109 | performance_data = { 110 | "Metric": ["Train RMSE", "Test RMSE"], 111 | "Value": [metrics["Train_RMSE"], metrics["Test_RMSE"]], 112 | } 113 | performance_df = pd.DataFrame(performance_data) 114 | 115 | fig, axes = plt.subplots( figsize=(15, 6)) 116 | sns.barplot(x="Metric", y="Value", data=performance_df) 117 | axes.set_title("Model Performance Comparison") 118 | axes.set_ylabel("RMSE") 119 | 120 | plt.tight_layout() 121 | return fig #returning figure the is created here 122 | -------------------------------------------------------------------------------- /models/house_price/model.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | from sklearn.model_selection import train_test_split 3 | from sklearn.preprocessing import StandardScaler 4 | from sklearn.feature_selection import RFE 5 | from sklearn.linear_model import LinearRegression 6 | import warnings 7 | import pickle 8 | from .ModelEvaluation import ModelEvaluation 9 | import os 10 | import logging 11 | warnings.filterwarnings("ignore") 12 | 13 | # Define the directory for logs 14 | log_directory = 'models/house_price/logs' 15 | os.makedirs(log_directory, exist_ok=True) # Create the directory if it doesn't exist 16 | 17 | # Set up logging 18 | log_file = os.path.join(log_directory, 'model_training.log') 19 | logging.basicConfig( 20 | filename=log_file, 21 | level=logging.INFO, 22 | format='%(asctime)s - %(levelname)s - %(message)s' 23 | ) 24 | 25 | df = pd.read_csv("models/house_price/data/housing.csv") 26 | original_df = df.copy(deep=True) 27 | 28 | # Target and Feature Identification 29 | target = "price" 30 | features = [col for col in df.columns if col != target] 31 | 32 | # Separates numerical and categorical features based on unique values 33 | nu = df[features].nunique() 34 | numerical_features = [col for col in features if nu[col] > 16] 35 | categorical_features = [col for col in features if nu[col] <= 16] 36 | 37 | # Removing outliers using IQR 38 | def remove_outliers(df, numerical_features): 39 | for feature in numerical_features: 40 | Q1 = df[feature].quantile(0.25) 41 | Q3 = df[feature].quantile(0.75) 42 | IQR = Q3 - Q1 43 | df = df[(df[feature] >= (Q1 - 1.5 * IQR)) & (df[feature] <= (Q3 + 1.5 * IQR))] 44 | return df.reset_index(drop=True) 45 | 46 | 47 | # Handling missing values 48 | def handle_missing_values(df): 49 | null_summary = df.isnull().sum() 50 | null_percentage = (null_summary / df.shape[0]) * 100 51 | return pd.DataFrame( 52 | {"Total Null Values": null_summary, "Percentage": null_percentage} 53 | ).sort_values(by="Percentage", ascending=False) 54 | 55 | 56 | # Removes outliers from numerical features 57 | df = remove_outliers(df, numerical_features) 58 | 59 | # Filters categorical features without missing values 60 | null_value_summary = handle_missing_values(df) 61 | valid_categorical_features = [ 62 | col 63 | for col in categorical_features 64 | if col not in null_value_summary[null_value_summary["Percentage"] != 0].index 65 | ] 66 | 67 | # Encoding categorical features 68 | def encode_categorical_features(df, categorical_features): 69 | for feature in categorical_features: 70 | # Binary encoding for features with 2 unique values 71 | if df[feature].nunique() == 2: 72 | df[feature] = pd.get_dummies(df[feature], drop_first=True, prefix=feature) 73 | # Dummy encoding for features with more than 2 unique values 74 | elif 2 < df[feature].nunique() <= 16: 75 | df = pd.concat( 76 | [ 77 | df.drop([feature], axis=1), 78 | pd.get_dummies(df[feature], drop_first=True, prefix=feature), 79 | ], 80 | axis=1, 81 | ) 82 | return df 83 | 84 | df = encode_categorical_features(df, valid_categorical_features) 85 | 86 | # Renames columns to avoid invalid characters 87 | df.columns = [col.replace("-", "_").replace(" ", "_") for col in df.columns] 88 | 89 | # Splitting the data into training & testing sets 90 | X = df.drop([target], axis=1) 91 | Y = df[target] 92 | Train_X, Test_X, Train_Y, Test_Y = train_test_split( 93 | X, Y, train_size=0.8, test_size=0.2, random_state=100 94 | ) 95 | 96 | # Feature Scaling (Standardization) 97 | std = StandardScaler() 98 | Train_X_std = pd.DataFrame(std.fit_transform(Train_X), columns=X.columns) 99 | Test_X_std = pd.DataFrame(std.transform(Test_X), columns=X.columns) 100 | 101 | # Multiple Linear Regression with sklearn 102 | MLR = LinearRegression().fit(Train_X_std, Train_Y) 103 | pred_train = MLR.predict(Train_X_std) 104 | pred_test = MLR.predict(Test_X_std) 105 | 106 | # Calculate RMSE for train and test sets 107 | # train_rmse = np.sqrt(mean_squared_error(Train_Y, pred_train)) 108 | # test_rmse = np.sqrt(mean_squared_error(Test_Y, pred_test)) 109 | 110 | 111 | def prepare_input_data( 112 | area, 113 | mainroad, 114 | guestroom, 115 | basement, 116 | hotwaterheating, 117 | airconditioning, 118 | prefarea, 119 | additional_bedrooms, 120 | bathrooms, 121 | stories, 122 | parking, 123 | furnishingstatus, 124 | ): 125 | # Creates a dictionary for the input features 126 | input_data = { 127 | "area": [area], 128 | "mainroad": True if mainroad == "Yes" else False, 129 | "guestroom": True if guestroom == "Yes" else False, 130 | "basement": True if basement == "Yes" else False, 131 | "hotwaterheating": True if hotwaterheating == "Yes" else False, 132 | "airconditioning": True if airconditioning == "Yes" else False, 133 | "prefarea": True if prefarea == "Yes" else False, 134 | "bedrooms_2": additional_bedrooms == 2, 135 | "bedrooms_3": additional_bedrooms == 3, 136 | "bedrooms_4": additional_bedrooms == 4, 137 | "bedrooms_5": additional_bedrooms == 5, 138 | "bedrooms_6": additional_bedrooms == 6, 139 | "bathrooms_2": bathrooms == 2, 140 | "bathrooms_3": bathrooms == 3, 141 | "bathrooms_4": bathrooms == 4, 142 | "stories_2": stories == 2, 143 | "stories_3": stories == 3, 144 | "stories_4": stories == 4, 145 | "parking_1": parking == 1, 146 | "parking_2": parking == 2, 147 | "parking_3": parking == 3, 148 | "furnishingstatus_semi_furnished": furnishingstatus == "semi_furnished", 149 | "furnishingstatus_unfurnished": furnishingstatus == "unfurnished", 150 | } 151 | 152 | return pd.DataFrame(input_data) 153 | 154 | # Note: Not removing this fxn because of the warning in predict.py file 155 | 156 | 157 | ### Final Endpoint ### 158 | # Predicts the price of a house based on the input features 159 | def get_prediction( 160 | area=0, 161 | mainroad=False, 162 | guestroom=False, 163 | basement=False, 164 | hotwaterheating=False, 165 | airconditioning=False, 166 | prefarea=False, 167 | bedrooms=0, 168 | bathrooms=2, 169 | stories=1, 170 | parking=1, 171 | furnishingstatus="semi_furnished", 172 | ): 173 | # Modifying the input data to match the model's input format 174 | input_df = prepare_input_data( 175 | area, 176 | mainroad, 177 | guestroom, 178 | basement, 179 | hotwaterheating, 180 | airconditioning, 181 | prefarea, 182 | bedrooms, 183 | bathrooms, 184 | stories, 185 | parking, 186 | furnishingstatus, 187 | ) 188 | 189 | # Standardizes the input data 190 | input_std = pd.DataFrame(std.transform(input_df), columns=input_df.columns) 191 | 192 | # Predicts the price 193 | predicted_price = MLR.predict(input_std) 194 | 195 | return round(predicted_price[0], 2) 196 | 197 | 198 | def save_model(): 199 | # todo: Ask the user for the model name, and warn that the model will be overwritten 200 | 201 | with open("models/house_price/saved_models/model_01.pkl", "wb") as file: 202 | pickle.dump(MLR, file) 203 | 204 | 205 | def save_scaler(): 206 | with open("models/house_price/saved_models/scaler_01.pkl", "wb") as file: 207 | pickle.dump(std, file) 208 | 209 | 210 | def get_evaluator(): 211 | evaluator = ModelEvaluation(MLR, Train_X_std, Train_Y, Test_X_std, Test_Y) 212 | return evaluator 213 | 214 | # if __name__ == "__main__": 215 | # save_model() 216 | # save_scaler() 217 | # model_evaluation() 218 | -------------------------------------------------------------------------------- /models/house_price/predict.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import pandas as pd 3 | # from models.house_price.model import get_evaluator 4 | from models.house_price.ImprovedModel import get_evaluator 5 | 6 | """ 7 | Predict.py file: 8 | Contains the following functions: 9 | - load_model: Loads a model from a pickle file. 10 | - prepare_input_data: Prepares the input data for the model. 11 | - [IMPORTANT] get_prediction: Predicts the price of a house based on the input features. 12 | - test_house_price_prediction: Tests the house price prediction model. 13 | - [IMPORTANT] model_details: Returns the details of the model. 14 | """ 15 | 16 | 17 | def load_model(filepath): 18 | """Loads a model from the given pickle file path.""" 19 | with open(filepath, "rb") as file: 20 | model = pickle.load(file) 21 | return model 22 | 23 | 24 | def prepare_input_data( 25 | area, 26 | mainroad, 27 | guestroom, 28 | basement, 29 | hotwaterheating, 30 | airconditioning, 31 | prefarea, 32 | additional_bedrooms, 33 | bathrooms, 34 | stories, 35 | parking, 36 | furnishingstatus, 37 | ): 38 | """ 39 | Prepares the input data for the model by converting user inputs into a 40 | structured DataFrame format. 41 | """ 42 | input_data = { 43 | "area": [area], 44 | "mainroad": mainroad == "Yes", 45 | "guestroom": guestroom == "Yes", 46 | "basement": basement == "Yes", 47 | "hotwaterheating": hotwaterheating == "Yes", 48 | "airconditioning": airconditioning == "Yes", 49 | "prefarea": prefarea == "Yes", 50 | "bedrooms_2": additional_bedrooms == 2, 51 | "bedrooms_3": additional_bedrooms == 3, 52 | "bedrooms_4": additional_bedrooms == 4, 53 | "bedrooms_5": additional_bedrooms == 5, 54 | "bedrooms_6": additional_bedrooms == 6, 55 | "bathrooms_2": bathrooms == 2, 56 | "bathrooms_3": bathrooms == 3, 57 | "bathrooms_4": bathrooms == 4, 58 | "stories_2": stories == 2, 59 | "stories_3": stories == 3, 60 | "stories_4": stories == 4, 61 | "parking_1": parking == 1, 62 | "parking_2": parking == 2, 63 | "parking_3": parking == 3, 64 | "furnishingstatus_semi_furnished": furnishingstatus == "semi_furnished", 65 | "furnishingstatus_unfurnished": furnishingstatus == "unfurnished", 66 | } 67 | 68 | return pd.DataFrame(input_data) 69 | 70 | 71 | def get_prediction( 72 | area=0, 73 | mainroad=False, 74 | guestroom=False, 75 | basement=False, 76 | hotwaterheating=False, 77 | airconditioning=False, 78 | prefarea=False, 79 | bedrooms=0, 80 | bathrooms=2, 81 | stories=1, 82 | parking=1, 83 | furnishingstatus="semi_furnished", 84 | ): 85 | """ 86 | Predicts the house price based on the input features. 87 | Returns the predicted house price rounded to two decimal places. 88 | """ 89 | # Prepare input data 90 | input_df = prepare_input_data( 91 | area, 92 | mainroad, 93 | guestroom, 94 | basement, 95 | hotwaterheating, 96 | airconditioning, 97 | prefarea, 98 | bedrooms, 99 | bathrooms, 100 | stories, 101 | parking, 102 | furnishingstatus, 103 | ) 104 | 105 | # Load the model and the scaler 106 | model = load_model("models/house_price/saved_models/model_02.pkl") 107 | scaler = load_model("models/house_price/saved_models/scaler_02.pkl") 108 | 109 | # Scale the input data 110 | input_scaled = scaler.transform(input_df) 111 | scaled_df = pd.DataFrame(input_scaled, columns=scaler.get_feature_names_out()) 112 | 113 | # Predict the house price 114 | predicted_price = model.predict(scaled_df) 115 | 116 | return round(predicted_price[0], 2) 117 | 118 | 119 | def test_house_price_prediction(): 120 | """Test function to predict a sample house price.""" 121 | # Sample inputs 122 | sample_input = { 123 | "area": 3000, 124 | "mainroad": "Yes", 125 | "guestroom": "No", 126 | "basement": "Yes", 127 | "hotwaterheating": "No", 128 | "airconditioning": "Yes", 129 | "prefarea": "Yes", 130 | "bedrooms": 2, 131 | "bathrooms": 3, 132 | "stories": 2, 133 | "parking": 2, 134 | "furnishingstatus": "semi_furnished", 135 | } 136 | 137 | predicted_price = get_prediction(**sample_input) 138 | 139 | print("Predicted House Price: Rs.", predicted_price) 140 | 141 | 142 | def model_details(): 143 | """Returns model evaluation details.""" 144 | return get_evaluator() 145 | -------------------------------------------------------------------------------- /models/house_price/saved_models/model_01.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/house_price/saved_models/model_01.pkl -------------------------------------------------------------------------------- /models/house_price/saved_models/model_02.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/house_price/saved_models/model_02.pkl -------------------------------------------------------------------------------- /models/house_price/saved_models/scaler_01.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/house_price/saved_models/scaler_01.pkl -------------------------------------------------------------------------------- /models/house_price/saved_models/scaler_02.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/house_price/saved_models/scaler_02.pkl -------------------------------------------------------------------------------- /models/insurance_cost_predictor/model.py: -------------------------------------------------------------------------------- 1 | from joblib import load 2 | 3 | # Load the trained model for insurance cost prediction 4 | model = load("models/insurance_cost_predictor/saved_models/insurance_model.pkl") 5 | 6 | def insurance_cost_prediction(age, sex, bmi, children, smoker, region): 7 | # Feature extraction and conversions 8 | sex_value = 0 if sex.lower() == 'male' else 1 # 0 for male, 1 for female 9 | smoker_value = 0 if smoker.lower() == 'yes' else 1 # 0 for smoker, 1 for non-smoker 10 | region_dict = {'southeast': 0, 'southwest': 1, 'northeast': 2, 'northwest': 3} 11 | region_value = region_dict.get(region.lower(), -1) # Convert region to numerical value 12 | 13 | # Prepare features for prediction 14 | features = [ 15 | float(age), 16 | float(sex_value), 17 | float(bmi), 18 | int(children), 19 | float(smoker_value), 20 | float(region_value) 21 | ] 22 | 23 | # Predict the insurance cost (charges) 24 | prediction = model.predict([features])[0] 25 | 26 | return prediction 27 | -------------------------------------------------------------------------------- /models/insurance_cost_predictor/predict.py: -------------------------------------------------------------------------------- 1 | from models.insurance_cost_predictor.model import insurance_cost_prediction 2 | 3 | def get_prediction(age, sex, bmi, children, smoker, region): 4 | # Call the function that makes the insurance cost prediction using input features 5 | return insurance_cost_prediction(age, sex, bmi, children, smoker, region) 6 | -------------------------------------------------------------------------------- /models/insurance_cost_predictor/saved_models/insurance_model.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/insurance_cost_predictor/saved_models/insurance_model.pkl -------------------------------------------------------------------------------- /models/loan_eligibility/model.py: -------------------------------------------------------------------------------- 1 | def loan_eligibility(income, loan_amount, credit_score): 2 | # Placeholder logic for loan eligibility prediction 3 | if income > 50000 and credit_score[0] > 700: 4 | return "Loan approved" 5 | else: 6 | return "Loan denied" 7 | -------------------------------------------------------------------------------- /models/loan_eligibility/predict.py: -------------------------------------------------------------------------------- 1 | from models.loan_eligibility.model import loan_eligibility 2 | 3 | def get_prediction(income, loan_amount, credit_score): 4 | return loan_eligibility(income, loan_amount, credit_score) -------------------------------------------------------------------------------- /models/parkinson_disease_detector/parkinson_model.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import pickle 3 | import numpy as np 4 | import warnings 5 | warnings.filterwarnings("ignore") 6 | 7 | # Load the model and the scaler 8 | model_path = 'models/parkinson_disease_detector/saved_models/Model_Prediction.sav' 9 | scaler_path = 'models/parkinson_disease_detector/saved_models/MinMaxScaler.sav' 10 | 11 | # Load the pre-trained model and scaler using pickle 12 | loaded_model = pickle.load(open(model_path, 'rb')) 13 | scaler = pickle.load(open(scaler_path, 'rb')) 14 | 15 | # Define the prediction function 16 | def disease_get_prediction(MDVP_Fo_Hz, MDVP_Fhi_Hz, MDVP_Flo_Hz, 17 | MDVP_Jitter_percent, MDVP_Jitter_Abs, 18 | MDVP_RAP, MDVP_PPQ, Jitter_DDP, 19 | MDVP_Shimmer, MDVP_Shimmer_dB, 20 | Shimmer_APQ3, Shimmer_APQ5, 21 | MDVP_APQ, Shimmer_DDA, NHR, 22 | HNR, RPDE, DFA, spread1, 23 | spread2, D2, PPE): 24 | features = np.array([[ 25 | float(MDVP_Fo_Hz), float(MDVP_Fhi_Hz), float(MDVP_Flo_Hz), 26 | float(MDVP_Jitter_percent), float(MDVP_Jitter_Abs), 27 | float(MDVP_RAP), float(MDVP_PPQ), float(Jitter_DDP), 28 | float(MDVP_Shimmer), float(MDVP_Shimmer_dB), 29 | float(Shimmer_APQ3), float(Shimmer_APQ5), 30 | float(MDVP_APQ), float(Shimmer_DDA), 31 | float(NHR), float(HNR), 32 | float(RPDE), float(DFA), 33 | float(spread1), float(spread2), 34 | float(D2), float(PPE) 35 | ]]) 36 | 37 | # Apply the scaler 38 | scaled_data = scaler.transform(features) 39 | 40 | # Make prediction 41 | prediction = loaded_model.predict(scaled_data) 42 | 43 | return prediction 44 | -------------------------------------------------------------------------------- /models/parkinson_disease_detector/parkinson_predict.py: -------------------------------------------------------------------------------- 1 | from models.parkinson_disease_detector.parkinson_model import disease_get_prediction 2 | 3 | def get_prediction(MDVP_Fo_Hz, MDVP_Fhi_Hz, MDVP_Flo_Hz, MDVP_Jitter_percent, MDVP_Jitter_Abs, MDVP_RAP, MDVP_PPQ, Jitter_DDP, MDVP_Shimmer, MDVP_Shimmer_dB, Shimmer_APQ3, Shimmer_APQ5, MDVP_APQ, Shimmer_DDA, NHR, HNR, RPDE, DFA, spread1, spread2, D2, PPE): 4 | 5 | prediction = disease_get_prediction(MDVP_Fo_Hz, MDVP_Fhi_Hz, MDVP_Flo_Hz, MDVP_Jitter_percent, MDVP_Jitter_Abs, MDVP_RAP, MDVP_PPQ, Jitter_DDP, MDVP_Shimmer, MDVP_Shimmer_dB, Shimmer_APQ3, Shimmer_APQ5, MDVP_APQ, Shimmer_DDA, NHR, HNR, RPDE, DFA, spread1, spread2, D2, PPE) 6 | 7 | message = "" 8 | 9 | # Provide message based on the prediction value 10 | if prediction==1: 11 | message= "The prediction indicates you may have Parkinson's Disease. Please consult a doctor." 12 | elif prediction==0: 13 | message = "The prediction indicates you are healthy." 14 | else: 15 | message="Invalid details." 16 | 17 | return message -------------------------------------------------------------------------------- /models/parkinson_disease_detector/saved_models/MinMaxScaler.sav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/parkinson_disease_detector/saved_models/MinMaxScaler.sav -------------------------------------------------------------------------------- /models/parkinson_disease_detector/saved_models/Model_Prediction.sav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/parkinson_disease_detector/saved_models/Model_Prediction.sav -------------------------------------------------------------------------------- /models/sleep_disorder_predictor/data/dataset.csv: -------------------------------------------------------------------------------- 1 | Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder 2 | 1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,None 3 | 2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,None 4 | 3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,None 5 | 4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea 6 | 5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea 7 | 6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia 8 | 7,Male,29,Teacher,6.3,6,40,7,Obese,140/90,82,3500,Insomnia 9 | 8,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None 10 | 9,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None 11 | 10,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None 12 | 11,Male,29,Doctor,6.1,6,30,8,Normal,120/80,70,8000,None 13 | 12,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None 14 | 13,Male,29,Doctor,6.1,6,30,8,Normal,120/80,70,8000,None 15 | 14,Male,29,Doctor,6,6,30,8,Normal,120/80,70,8000,None 16 | 15,Male,29,Doctor,6,6,30,8,Normal,120/80,70,8000,None 17 | 16,Male,29,Doctor,6,6,30,8,Normal,120/80,70,8000,None 18 | 17,Female,29,Nurse,6.5,5,40,7,Normal Weight,132/87,80,4000,Sleep Apnea 19 | 18,Male,29,Doctor,6,6,30,8,Normal,120/80,70,8000,Sleep Apnea 20 | 19,Female,29,Nurse,6.5,5,40,7,Normal Weight,132/87,80,4000,Insomnia 21 | 20,Male,30,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None 22 | 21,Male,30,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 23 | 22,Male,30,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 24 | 23,Male,30,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 25 | 24,Male,30,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 26 | 25,Male,30,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None 27 | 26,Male,30,Doctor,7.9,7,75,6,Normal,120/80,70,8000,None 28 | 27,Male,30,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None 29 | 28,Male,30,Doctor,7.9,7,75,6,Normal,120/80,70,8000,None 30 | 29,Male,30,Doctor,7.9,7,75,6,Normal,120/80,70,8000,None 31 | 30,Male,30,Doctor,7.9,7,75,6,Normal,120/80,70,8000,None 32 | 31,Female,30,Nurse,6.4,5,35,7,Normal Weight,130/86,78,4100,Sleep Apnea 33 | 32,Female,30,Nurse,6.4,5,35,7,Normal Weight,130/86,78,4100,Insomnia 34 | 33,Female,31,Nurse,7.9,8,75,4,Normal Weight,117/76,69,6800,None 35 | 34,Male,31,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None 36 | 35,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 37 | 36,Male,31,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None 38 | 37,Male,31,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None 39 | 38,Male,31,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None 40 | 39,Male,31,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None 41 | 40,Male,31,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None 42 | 41,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 43 | 42,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 44 | 43,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 45 | 44,Male,31,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None 46 | 45,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 47 | 46,Male,31,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None 48 | 47,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 49 | 48,Male,31,Doctor,7.8,7,75,6,Normal,120/80,70,8000,None 50 | 49,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 51 | 50,Male,31,Doctor,7.7,7,75,6,Normal,120/80,70,8000,Sleep Apnea 52 | 51,Male,32,Engineer,7.5,8,45,3,Normal,120/80,70,8000,None 53 | 52,Male,32,Engineer,7.5,8,45,3,Normal,120/80,70,8000,None 54 | 53,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None 55 | 54,Male,32,Doctor,7.6,7,75,6,Normal,120/80,70,8000,None 56 | 55,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None 57 | 56,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None 58 | 57,Male,32,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 59 | 58,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None 60 | 59,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None 61 | 60,Male,32,Doctor,7.7,7,75,6,Normal,120/80,70,8000,None 62 | 61,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None 63 | 62,Male,32,Doctor,6,6,30,8,Normal,125/80,72,5000,None 64 | 63,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,None 65 | 64,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,None 66 | 65,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,None 67 | 66,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,None 68 | 67,Male,32,Accountant,7.2,8,50,6,Normal Weight,118/76,68,7000,None 69 | 68,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,Insomnia 70 | 69,Female,33,Scientist,6.2,6,50,6,Overweight,128/85,76,5500,None 71 | 70,Female,33,Scientist,6.2,6,50,6,Overweight,128/85,76,5500,None 72 | 71,Male,33,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None 73 | 72,Male,33,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None 74 | 73,Male,33,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None 75 | 74,Male,33,Doctor,6.1,6,30,8,Normal,125/80,72,5000,None 76 | 75,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None 77 | 76,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None 78 | 77,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None 79 | 78,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None 80 | 79,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None 81 | 80,Male,33,Doctor,6,6,30,8,Normal,125/80,72,5000,None 82 | 81,Female,34,Scientist,5.8,4,32,8,Overweight,131/86,81,5200,Sleep Apnea 83 | 82,Female,34,Scientist,5.8,4,32,8,Overweight,131/86,81,5200,Sleep Apnea 84 | 83,Male,35,Teacher,6.7,7,40,5,Overweight,128/84,70,5600,None 85 | 84,Male,35,Teacher,6.7,7,40,5,Overweight,128/84,70,5600,None 86 | 85,Male,35,Software Engineer,7.5,8,60,5,Normal Weight,120/80,70,8000,None 87 | 86,Female,35,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 88 | 87,Male,35,Engineer,7.2,8,60,4,Normal,125/80,65,5000,None 89 | 88,Male,35,Engineer,7.2,8,60,4,Normal,125/80,65,5000,None 90 | 89,Male,35,Engineer,7.3,8,60,4,Normal,125/80,65,5000,None 91 | 90,Male,35,Engineer,7.3,8,60,4,Normal,125/80,65,5000,None 92 | 91,Male,35,Engineer,7.3,8,60,4,Normal,125/80,65,5000,None 93 | 92,Male,35,Engineer,7.3,8,60,4,Normal,125/80,65,5000,None 94 | 93,Male,35,Software Engineer,7.5,8,60,5,Normal Weight,120/80,70,8000,None 95 | 94,Male,35,Lawyer,7.4,7,60,5,Obese,135/88,84,3300,Sleep Apnea 96 | 95,Female,36,Accountant,7.2,8,60,4,Normal,115/75,68,7000,Insomnia 97 | 96,Female,36,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 98 | 97,Female,36,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 99 | 98,Female,36,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 100 | 99,Female,36,Teacher,7.1,8,60,4,Normal,115/75,68,7000,None 101 | 100,Female,36,Teacher,7.1,8,60,4,Normal,115/75,68,7000,None 102 | 101,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,None 103 | 102,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,None 104 | 103,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,None 105 | 104,Male,36,Teacher,6.6,5,35,7,Overweight,129/84,74,4800,Sleep Apnea 106 | 105,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,Sleep Apnea 107 | 106,Male,36,Teacher,6.6,5,35,7,Overweight,129/84,74,4800,Insomnia 108 | 107,Female,37,Nurse,6.1,6,42,6,Overweight,126/83,77,4200,None 109 | 108,Male,37,Engineer,7.8,8,70,4,Normal Weight,120/80,68,7000,None 110 | 109,Male,37,Engineer,7.8,8,70,4,Normal Weight,120/80,68,7000,None 111 | 110,Male,37,Lawyer,7.4,8,60,5,Normal,130/85,68,8000,None 112 | 111,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 113 | 112,Male,37,Lawyer,7.4,8,60,5,Normal,130/85,68,8000,None 114 | 113,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 115 | 114,Male,37,Lawyer,7.4,8,60,5,Normal,130/85,68,8000,None 116 | 115,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 117 | 116,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 118 | 117,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 119 | 118,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 120 | 119,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 121 | 120,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 122 | 121,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 123 | 122,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 124 | 123,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 125 | 124,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 126 | 125,Female,37,Accountant,7.2,8,60,4,Normal,115/75,68,7000,None 127 | 126,Female,37,Nurse,7.5,8,60,4,Normal Weight,120/80,70,8000,None 128 | 127,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None 129 | 128,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 130 | 129,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None 131 | 130,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None 132 | 131,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 133 | 132,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None 134 | 133,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None 135 | 134,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 136 | 135,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None 137 | 136,Male,38,Lawyer,7.3,8,60,5,Normal,130/85,68,8000,None 138 | 137,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 139 | 138,Male,38,Lawyer,7.1,8,60,5,Normal,130/85,68,8000,None 140 | 139,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 141 | 140,Male,38,Lawyer,7.1,8,60,5,Normal,130/85,68,8000,None 142 | 141,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 143 | 142,Male,38,Lawyer,7.1,8,60,5,Normal,130/85,68,8000,None 144 | 143,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 145 | 144,Female,38,Accountant,7.1,8,60,4,Normal,115/75,68,7000,None 146 | 145,Male,38,Lawyer,7.1,8,60,5,Normal,130/85,68,8000,Sleep Apnea 147 | 146,Female,38,Lawyer,7.4,7,60,5,Obese,135/88,84,3300,Sleep Apnea 148 | 147,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,Insomnia 149 | 148,Male,39,Engineer,6.5,5,40,7,Overweight,132/87,80,4000,Insomnia 150 | 149,Female,39,Lawyer,6.9,7,50,6,Normal Weight,128/85,75,5500,None 151 | 150,Female,39,Accountant,8,9,80,3,Normal Weight,115/78,67,7500,None 152 | 151,Female,39,Accountant,8,9,80,3,Normal Weight,115/78,67,7500,None 153 | 152,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 154 | 153,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 155 | 154,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 156 | 155,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 157 | 156,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 158 | 157,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 159 | 158,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 160 | 159,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 161 | 160,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 162 | 161,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,None 163 | 162,Female,40,Accountant,7.2,8,55,6,Normal Weight,119/77,73,7300,None 164 | 163,Female,40,Accountant,7.2,8,55,6,Normal Weight,119/77,73,7300,None 165 | 164,Male,40,Lawyer,7.9,8,90,5,Normal,130/85,68,8000,None 166 | 165,Male,40,Lawyer,7.9,8,90,5,Normal,130/85,68,8000,None 167 | 166,Male,41,Lawyer,7.6,8,90,5,Normal,130/85,70,8000,Insomnia 168 | 167,Male,41,Engineer,7.3,8,70,6,Normal Weight,121/79,72,6200,None 169 | 168,Male,41,Lawyer,7.1,7,55,6,Overweight,125/82,72,6000,None 170 | 169,Male,41,Lawyer,7.1,7,55,6,Overweight,125/82,72,6000,None 171 | 170,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None 172 | 171,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None 173 | 172,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None 174 | 173,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None 175 | 174,Male,41,Lawyer,7.7,8,90,5,Normal,130/85,70,8000,None 176 | 175,Male,41,Lawyer,7.6,8,90,5,Normal,130/85,70,8000,None 177 | 176,Male,41,Lawyer,7.6,8,90,5,Normal,130/85,70,8000,None 178 | 177,Male,41,Lawyer,7.6,8,90,5,Normal,130/85,70,8000,None 179 | 178,Male,42,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 180 | 179,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None 181 | 180,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None 182 | 181,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None 183 | 182,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None 184 | 183,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None 185 | 184,Male,42,Lawyer,7.8,8,90,5,Normal,130/85,70,8000,None 186 | 185,Female,42,Teacher,6.8,6,45,7,Overweight,130/85,78,5000,Sleep Apnea 187 | 186,Female,42,Teacher,6.8,6,45,7,Overweight,130/85,78,5000,Sleep Apnea 188 | 187,Female,43,Teacher,6.7,7,45,4,Overweight,135/90,65,6000,Insomnia 189 | 188,Male,43,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 190 | 189,Female,43,Teacher,6.7,7,45,4,Overweight,135/90,65,6000,Insomnia 191 | 190,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 192 | 191,Female,43,Teacher,6.7,7,45,4,Overweight,135/90,65,6000,Insomnia 193 | 192,Male,43,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia 194 | 193,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 195 | 194,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 196 | 195,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 197 | 196,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 198 | 197,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 199 | 198,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 200 | 199,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 201 | 200,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 202 | 201,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia 203 | 202,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,Insomnia 204 | 203,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,Insomnia 205 | 204,Male,43,Engineer,6.9,6,47,7,Normal Weight,117/76,69,6800,None 206 | 205,Male,43,Engineer,7.6,8,75,4,Overweight,122/80,68,6800,None 207 | 206,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None 208 | 207,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None 209 | 208,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None 210 | 209,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None 211 | 210,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None 212 | 211,Male,43,Engineer,7.7,8,90,5,Normal,130/85,70,8000,None 213 | 212,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None 214 | 213,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None 215 | 214,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None 216 | 215,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None 217 | 216,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None 218 | 217,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None 219 | 218,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,None 220 | 219,Male,43,Engineer,7.8,8,90,5,Normal,130/85,70,8000,Sleep Apnea 221 | 220,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Sleep Apnea 222 | 221,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 223 | 222,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia 224 | 223,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 225 | 224,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia 226 | 225,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 227 | 226,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 228 | 227,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 229 | 228,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 230 | 229,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 231 | 230,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 232 | 231,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 233 | 232,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 234 | 233,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 235 | 234,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 236 | 235,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 237 | 236,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 238 | 237,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia 239 | 238,Female,44,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia 240 | 239,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 241 | 240,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia 242 | 241,Female,44,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia 243 | 242,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 244 | 243,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia 245 | 244,Female,44,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia 246 | 245,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 247 | 246,Female,44,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia 248 | 247,Male,44,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia 249 | 248,Male,44,Engineer,6.8,7,45,7,Overweight,130/85,78,5000,Insomnia 250 | 249,Male,44,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,None 251 | 250,Male,44,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,None 252 | 251,Female,45,Teacher,6.8,7,30,6,Overweight,135/90,65,6000,Insomnia 253 | 252,Female,45,Teacher,6.8,7,30,6,Overweight,135/90,65,6000,Insomnia 254 | 253,Female,45,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia 255 | 254,Female,45,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia 256 | 255,Female,45,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia 257 | 256,Female,45,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia 258 | 257,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 259 | 258,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 260 | 259,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 261 | 260,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 262 | 261,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia 263 | 262,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,None 264 | 263,Female,45,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,None 265 | 264,Female,45,Manager,6.9,7,55,5,Overweight,125/82,75,5500,None 266 | 265,Male,48,Doctor,7.3,7,65,5,Obese,142/92,83,3500,Insomnia 267 | 266,Female,48,Nurse,5.9,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 268 | 267,Male,48,Doctor,7.3,7,65,5,Obese,142/92,83,3500,Insomnia 269 | 268,Female,49,Nurse,6.2,6,90,8,Overweight,140/95,75,10000,None 270 | 269,Female,49,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 271 | 270,Female,49,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 272 | 271,Female,49,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 273 | 272,Female,49,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 274 | 273,Female,49,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 275 | 274,Female,49,Nurse,6.2,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 276 | 275,Female,49,Nurse,6.2,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 277 | 276,Female,49,Nurse,6.2,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 278 | 277,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea 279 | 278,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea 280 | 279,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Insomnia 281 | 280,Female,50,Engineer,8.3,9,30,3,Normal,125/80,65,5000,None 282 | 281,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,None 283 | 282,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 284 | 283,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 285 | 284,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 286 | 285,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 287 | 286,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 288 | 287,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 289 | 288,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 290 | 289,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 291 | 290,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 292 | 291,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 293 | 292,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 294 | 293,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 295 | 294,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 296 | 295,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 297 | 296,Female,50,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 298 | 297,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 299 | 298,Female,50,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 300 | 299,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 301 | 300,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 302 | 301,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 303 | 302,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 304 | 303,Female,51,Nurse,7.1,7,55,6,Normal Weight,125/82,72,6000,None 305 | 304,Female,51,Nurse,6,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 306 | 305,Female,51,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 307 | 306,Female,51,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea 308 | 307,Female,52,Accountant,6.5,7,45,7,Overweight,130/85,72,6000,Insomnia 309 | 308,Female,52,Accountant,6.5,7,45,7,Overweight,130/85,72,6000,Insomnia 310 | 309,Female,52,Accountant,6.6,7,45,7,Overweight,130/85,72,6000,Insomnia 311 | 310,Female,52,Accountant,6.6,7,45,7,Overweight,130/85,72,6000,Insomnia 312 | 311,Female,52,Accountant,6.6,7,45,7,Overweight,130/85,72,6000,Insomnia 313 | 312,Female,52,Accountant,6.6,7,45,7,Overweight,130/85,72,6000,Insomnia 314 | 313,Female,52,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 315 | 314,Female,52,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 316 | 315,Female,52,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 317 | 316,Female,53,Engineer,8.3,9,30,3,Normal,125/80,65,5000,Insomnia 318 | 317,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 319 | 318,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 320 | 319,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 321 | 320,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 322 | 321,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 323 | 322,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 324 | 323,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 325 | 324,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 326 | 325,Female,53,Engineer,8.3,9,30,3,Normal,125/80,65,5000,None 327 | 326,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 328 | 327,Female,53,Engineer,8.3,9,30,3,Normal,125/80,65,5000,None 329 | 328,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 330 | 329,Female,53,Engineer,8.3,9,30,3,Normal,125/80,65,5000,None 331 | 330,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 332 | 331,Female,53,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 333 | 332,Female,53,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 334 | 333,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 335 | 334,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 336 | 335,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 337 | 336,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 338 | 337,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 339 | 338,Female,54,Engineer,8.4,9,30,3,Normal,125/80,65,5000,None 340 | 339,Female,54,Engineer,8.5,9,30,3,Normal,125/80,65,5000,None 341 | 340,Female,55,Nurse,8.1,9,75,4,Overweight,140/95,72,5000,Sleep Apnea 342 | 341,Female,55,Nurse,8.1,9,75,4,Overweight,140/95,72,5000,Sleep Apnea 343 | 342,Female,56,Doctor,8.2,9,90,3,Normal Weight,118/75,65,10000,None 344 | 343,Female,56,Doctor,8.2,9,90,3,Normal Weight,118/75,65,10000,None 345 | 344,Female,57,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,None 346 | 345,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 347 | 346,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 348 | 347,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 349 | 348,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 350 | 349,Female,57,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 351 | 350,Female,57,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 352 | 351,Female,57,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 353 | 352,Female,57,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 354 | 353,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 355 | 354,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 356 | 355,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 357 | 356,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 358 | 357,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 359 | 358,Female,58,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 360 | 359,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,None 361 | 360,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,None 362 | 361,Female,59,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 363 | 362,Female,59,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 364 | 363,Female,59,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 365 | 364,Female,59,Nurse,8.2,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 366 | 365,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 367 | 366,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 368 | 367,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 369 | 368,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 370 | 369,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 371 | 370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 372 | 371,Female,59,Nurse,8,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 373 | 372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 374 | 373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea 375 | 374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea -------------------------------------------------------------------------------- /models/sleep_disorder_predictor/model.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import pickle 3 | import pandas as pd # Import pandas to handle DataFrames 4 | import numpy as np 5 | import warnings 6 | warnings.filterwarnings("ignore") 7 | 8 | # Load the model and the scaler 9 | model_path = 'models/sleep_disorder_predictor/saved_models/Model_Prediction.sav' 10 | preprocessor_path = 'models/sleep_disorder_predictor/saved_models/preprocessor.sav' 11 | 12 | # Load the pre-trained model and scaler using pickle 13 | loaded_model = pickle.load(open(model_path, 'rb')) 14 | preprocessor = pickle.load(open(preprocessor_path, 'rb')) 15 | 16 | # Define the prediction function 17 | def disease_get_prediction(Age, Sleep_Duration, 18 | Heart_Rate, Daily_Steps, 19 | Systolic, Diastolic, Occupation, Quality_of_Sleep, Gender, 20 | Physical_Activity_Level, Stress_Level, BMI_Category): 21 | # Create a DataFrame with the features using correct column names 22 | features = pd.DataFrame({ 23 | 'Age': [int(Age)], 24 | 'Sleep Duration': [float(Sleep_Duration)], # Changed to match expected name 25 | 'Heart Rate': [int(Heart_Rate)], # Changed to match expected name 26 | 'Daily Steps': [int(Daily_Steps)], # Changed to match expected name 27 | 'Systolic': [float(Systolic)], 28 | 'Diastolic': [float(Diastolic)], 29 | 'Occupation': [Occupation], 30 | 'Quality of Sleep': [int(Quality_of_Sleep)], # Changed to match expected name 31 | 'Gender': [Gender], 32 | 'Physical Activity Level': [int(Physical_Activity_Level)], # Changed to match expected name 33 | 'Stress Level': [int(Stress_Level)], # Changed to match expected name 34 | 'BMI Category': [BMI_Category] # Changed to match expected name 35 | }) 36 | 37 | # Apply the preprocessor (make sure it expects a DataFrame) 38 | preprocessed_data = preprocessor.transform(features) 39 | 40 | # Make prediction 41 | prediction = loaded_model.predict(preprocessed_data) 42 | 43 | return prediction 44 | -------------------------------------------------------------------------------- /models/sleep_disorder_predictor/predict.py: -------------------------------------------------------------------------------- 1 | from models.sleep_disorder_predictor.model import disease_get_prediction 2 | 3 | def get_prediction(Age, Sleep_Duration, 4 | Heart_Rate, Daily_Steps, 5 | Systolic, Diastolic,Occupation,Quality_of_Sleep,Gender, 6 | Physical_Activity_Level, Stress_Level, BMI_Category): 7 | 8 | prediction = disease_get_prediction(Age, Sleep_Duration, 9 | Heart_Rate, Daily_Steps, 10 | Systolic, Diastolic,Occupation,Quality_of_Sleep,Gender, 11 | Physical_Activity_Level, Stress_Level, BMI_Category) 12 | 13 | message = "" 14 | 15 | # Provide message based on the prediction value 16 | if prediction==0: 17 | message= "Insomnia" 18 | elif prediction==1: 19 | message = "No disorder" 20 | elif prediction==2: 21 | message = "Sleep Apnea" 22 | else: 23 | message="Invalid details." 24 | 25 | return message+"\n\nRecommendation - To prevent sleep disorders, maintain a balanced lifestyle with regular exercise, a healthy diet, and stress management. Stick to a consistent sleep schedule, limit caffeine and alcohol, and create a relaxing bedtime routine." -------------------------------------------------------------------------------- /models/sleep_disorder_predictor/saved_models/Model_Prediction.sav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/sleep_disorder_predictor/saved_models/Model_Prediction.sav -------------------------------------------------------------------------------- /models/sleep_disorder_predictor/saved_models/preprocessor.sav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/sleep_disorder_predictor/saved_models/preprocessor.sav -------------------------------------------------------------------------------- /models/stress_level_detect/model.py: -------------------------------------------------------------------------------- 1 | from joblib import load 2 | 3 | # Load the trained Random Forest model 4 | model = load('models/stress_level_detect/saved_models/random_forest_model.joblib') 5 | 6 | def stress_level_prediction(age, freq_no_purpose, freq_distracted, restless, worry_level, difficulty_concentrating, compare_to_successful_people, feelings_about_comparisons, freq_seeking_validation, freq_feeling_depressed, interest_fluctuation, sleep_issues): 7 | # Feature extraction 8 | features = [ 9 | float(age), 10 | int(freq_no_purpose), 11 | int(freq_distracted), 12 | int(restless), 13 | int(worry_level), 14 | int(difficulty_concentrating), 15 | int(compare_to_successful_people), 16 | int(feelings_about_comparisons), 17 | int(freq_seeking_validation), 18 | int(freq_feeling_depressed), 19 | int(interest_fluctuation), 20 | int(sleep_issues) 21 | ] 22 | 23 | prediction = model.predict([features])[0] 24 | 25 | return prediction 26 | 27 | -------------------------------------------------------------------------------- /models/stress_level_detect/predict.py: -------------------------------------------------------------------------------- 1 | from models.stress_level_detect.model import stress_level_prediction 2 | 3 | def get_prediction(age, freq_no_purpose, freq_distracted, restless, worry_level, difficulty_concentrating, compare_to_successful_people, feelings_about_comparisons, freq_seeking_validation, freq_feeling_depressed, interest_fluctuation, sleep_issues): 4 | 5 | prediction = stress_level_prediction(age, freq_no_purpose, freq_distracted, restless, worry_level, difficulty_concentrating, compare_to_successful_people, feelings_about_comparisons, freq_seeking_validation, freq_feeling_depressed, interest_fluctuation, sleep_issues) 6 | 7 | advice = "" 8 | 9 | # Provide advice based on the prediction value 10 | if prediction < 1.5: 11 | advice = "You are experiencing mild stress. Keep maintaining a balanced lifestyle, and consider engaging in activities that bring you joy and relaxation." 12 | elif 1.5 <= prediction < 3.5: 13 | advice = "You have a moderate stress level. It's important to take breaks and practice stress-relief techniques like mindfulness, walking, cycling, music or exercise." 14 | else: 15 | advice = "You are experiencing high stress levels. Consider reaching out to a mental health professional or practicing stress management techniques to help cope." 16 | 17 | return advice -------------------------------------------------------------------------------- /models/stress_level_detect/saved_models/random_forest_model.joblib: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashasvini121/predictive-calc/89c505348817797e7f4962e21f6e67daaf0c6ad1/models/stress_level_detect/saved_models/random_forest_model.joblib -------------------------------------------------------------------------------- /models/text_sumarization/predict.py: -------------------------------------------------------------------------------- 1 | from transformers import pipeline 2 | import streamlit as st 3 | 4 | @st.cache_resource(show_spinner=True) # Cache the model loading for faster performance 5 | def load_summarizer(): 6 | """Load and cache the text summarization pipeline model.""" 7 | return pipeline("summarization", model="t5-small") 8 | 9 | def generate_summary(text: str) -> str: 10 | """Generate a summary for the given input text.""" 11 | summarizer = load_summarizer() 12 | summary = summarizer(text, max_length=150, min_length=30, do_sample=False) 13 | return summary[0]["summary_text"] 14 | -------------------------------------------------------------------------------- /models/translator_app/README.md: -------------------------------------------------------------------------------- 1 | # 🌐 Real-Time Language Translator 2 | 3 | A real-time language translation app built using Streamlit, Google Translate, and speech-to-text technology. This app allows users to speak in one language and get real-time translations in another, along with text-to-speech output for the translated text. 4 | 5 | ## Features 6 | 7 | - **Speech Recognition:** Capture spoken input using a microphone. 8 | - **Real-Time Translation:** Translate the captured speech into a chosen language. 9 | - **Text-to-Speech:** Listen to the translated text in the target language. 10 | - **Multiple Languages Supported:** Including English, Hindi, Tamil, Telugu, Marathi, Bengali, and more. 11 | -------------------------------------------------------------------------------- /models/translator_app/assets/styles.css: -------------------------------------------------------------------------------- 1 | body {background-color: #0d1117; color: #c9d1d9;} 2 | .main {padding: 20px;} 3 | h1 {color: #58a6ff;} 4 | .info {font-size: 18px; color: #58a6ff; animation: glow 1s infinite;} 5 | .success {font-size: 18px; color: #34d058;} 6 | .stButton > button { 7 | background-color: #238636; 8 | color: white; 9 | border-radius: 12px; 10 | padding: 10px 30px; 11 | font-weight: bold; 12 | font-size: 16px; 13 | cursor: pointer; 14 | transition: background-color 0.3s ease; 15 | } 16 | .stButton > button:hover { 17 | background-color: #2ea043; 18 | } 19 | 20 | @keyframes glow { 21 | 0% {box-shadow: 0 0 5px #58a6ff;} 22 | 50% {box-shadow: 0 0 20px #58a6ff;} 23 | 100% {box-shadow: 0 0 5px #58a6ff;} 24 | } 25 | -------------------------------------------------------------------------------- /models/translator_app/translation.py: -------------------------------------------------------------------------------- 1 | import speech_recognition as sr 2 | from googletrans import Translator 3 | from gtts import gTTS 4 | import pygame 5 | import streamlit as st 6 | 7 | # Initialize recognizer and translator 8 | recognizer = sr.Recognizer() 9 | translator = Translator() 10 | 11 | # Function to capture and translate speech 12 | def capture_and_translate(source_lang, target_lang): 13 | # Check for available audio input devices 14 | mic_list = sr.Microphone.list_microphone_names() 15 | 16 | if not mic_list: 17 | st.error("⚠️ No microphone found. Please connect a microphone and restart the app.") 18 | return None 19 | 20 | selected_mic_index = st.selectbox("Select a microphone", range(len(mic_list)), format_func=lambda x: mic_list[x]) 21 | 22 | with sr.Microphone(device_index=selected_mic_index) as source: 23 | st.info("🎙️ Listening... Speak now.") 24 | 25 | recognizer.adjust_for_ambient_noise(source, duration=1) 26 | recognizer.energy_threshold = 200 27 | 28 | try: 29 | # Capture speech 30 | audio = recognizer.listen(source, timeout=15, phrase_time_limit=15) 31 | st.success("🔄 Processing...") 32 | 33 | # Recognize speech 34 | text = recognizer.recognize_google(audio, language=source_lang) 35 | st.write(f"🗣️ Original ({source_lang}): {text}") 36 | 37 | # Translate speech 38 | translation = translator.translate(text, src=source_lang, dest=target_lang) 39 | st.write(f"🔊 Translated ({target_lang}): {translation.text}") 40 | 41 | # Convert translation to speech 42 | tts = gTTS(text=translation.text, lang=target_lang) 43 | audio_file = "translated_audio.mp3" 44 | tts.save(audio_file) 45 | 46 | # Play the audio 47 | pygame.mixer.init() 48 | pygame.mixer.music.load(audio_file) 49 | pygame.mixer.music.play() 50 | 51 | st.audio(audio_file) 52 | 53 | while pygame.mixer.music.get_busy(): 54 | pygame.time.Clock().tick(10) 55 | 56 | pygame.mixer.music.stop() 57 | pygame.mixer.quit() 58 | 59 | return audio_file 60 | 61 | except sr.WaitTimeoutError: 62 | st.error("⚠️ No speech detected. Try speaking louder.") 63 | except sr.UnknownValueError: 64 | st.error("⚠️ Could not recognize speech.") 65 | except Exception as e: 66 | st.error(f"⚠️ Error: {str(e)}") 67 | return None 68 | -------------------------------------------------------------------------------- /models/translator_app/utils.py: -------------------------------------------------------------------------------- 1 | # Available languages dictionary 2 | LANGUAGES = { 3 | 'English': 'en', 4 | 'Hindi': 'hi', 5 | 'Tamil': 'ta', 6 | 'Telugu': 'te', 7 | 'Marathi': 'mr', 8 | 'Bengali': 'bn', 9 | 'Gujarati': 'gu', 10 | 'Kannada': 'kn', 11 | 'Malayalam': 'ml', 12 | 'Punjabi': 'pa', 13 | 'Urdu': 'ur' 14 | } 15 | -------------------------------------------------------------------------------- /packages.txt: -------------------------------------------------------------------------------- 1 | portaudio19-dev -------------------------------------------------------------------------------- /page_handler.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import importlib.util 3 | import json 4 | from form_handler import FormHandler 5 | 6 | 7 | # Utility to dynamically import modules 8 | def load_module_from_path(module_name, file_path): 9 | spec = importlib.util.spec_from_file_location(module_name, file_path) 10 | module = importlib.util.module_from_spec(spec) 11 | spec.loader.exec_module(module) 12 | return module 13 | 14 | 15 | class PageHandler: 16 | def __init__(self, config_file_path): 17 | # Load the page configuration from JSON 18 | with open(config_file_path, "r") as f: 19 | self.pages = json.load(f) 20 | 21 | def render_page(self, page_name: str): 22 | # Check if the requested page exists in the JSON config 23 | if page_name not in self.pages: 24 | st.error("Page not found!") 25 | return 26 | 27 | page = self.pages[page_name] 28 | page_title = page.get("page_title", "Untitled Page") 29 | page_icon = page.get("page_icon", "📄") # Default to a generic icon 30 | model_predict_file_path = page.get("model_predict_file_path") 31 | form_config_path = page.get("form_config_path") 32 | tabs = page.get("tabs", []) 33 | # Set Streamlit's page config with the title and icon 34 | st.set_page_config(page_title=page_title, page_icon=page_icon) 35 | 36 | # Dynamically load the model prediction file 37 | model_module = load_module_from_path( 38 | f"{page_name}_model", model_predict_file_path 39 | ) 40 | model_function = getattr( 41 | model_module, "get_prediction", None 42 | ) # or relevant model function 43 | 44 | # Create the tabs for the page 45 | tab_objects = st.tabs([tab["name"] for tab in tabs]) 46 | 47 | # Iterate through the tabs to render them 48 | for i, tab in enumerate(tabs): 49 | with tab_objects[i]: 50 | if tab["type"] == "form": 51 | self.render_form(tab["form_name"], model_function, form_config_path) 52 | elif tab["type"] == "model_details": 53 | self.render_model_details(model_module,tabs[1]) 54 | 55 | def render_form(self, form_name: str, model_function, form_config_path: str): 56 | form_handler = FormHandler( 57 | name=form_name, 58 | button_label="Predict", 59 | model=model_function, 60 | config_path=form_config_path, 61 | ) 62 | 63 | # Render the form on the Streamlit page 64 | form_handler.render() 65 | 66 | def render_model_details(self, model_module,tab): 67 | # Dynamically load and call the model details function 68 | model_details_function = getattr(model_module, "model_details", None) 69 | 70 | #mentioning the title of the problem statement 71 | st.subheader("Problem Statement") 72 | st.write(tab["problem_statement"]) 73 | 74 | #mentioning the title of the description 75 | st.subheader("Model Description") 76 | st.write(tab["description"]) 77 | 78 | if model_details_function: 79 | metrics, prediction_plot, error_plot, performance_plot = model_details_function().evaluate() 80 | 81 | st.subheader(f"Model Accuracy: {metrics['Test_R2']:.2f}") 82 | 83 | #mentioning the title of the scores 84 | st.subheader(f"Scores: Training: {metrics['Train_R2']:.2f}, Testing: {metrics['Test_R2']:.2f}") 85 | 86 | # Display the scatter plot for predicted vs actual values 87 | #used clear_figure to clear the plot once displayed to avoid conflict 88 | if prediction_plot!=None: 89 | st.subheader("Model Prediction Plot") 90 | st.pyplot(prediction_plot, clear_figure=True) 91 | if error_plot!=None: 92 | st.subheader("Error Plot") 93 | st.pyplot(error_plot, clear_figure=True) 94 | if performance_plot!=None: 95 | st.subheader("Model Performance Plot") 96 | st.pyplot(performance_plot, clear_figure=True) 97 | -------------------------------------------------------------------------------- /pages/Business_Performance_Forecasting.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | page_handler = PageHandler("pages/pages.json") 4 | page_handler.render_page("Business Performance Forecasting") 5 | -------------------------------------------------------------------------------- /pages/Credit_Card_Fraud_Estimator.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | page_handler = PageHandler("pages/pages.json") 4 | page_handler.render_page("Credit Card Fraud Estimator") 5 | -------------------------------------------------------------------------------- /pages/Customer_Income_Estimator.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | page_handler = PageHandler("pages/pages.json") 4 | page_handler.render_page("Customer Income Estimator") 5 | -------------------------------------------------------------------------------- /pages/Gold_Price_Predictor.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | page_handler = PageHandler("pages/pages.json") 4 | page_handler.render_page("Gold Price Predictor") 5 | -------------------------------------------------------------------------------- /pages/House_Price_Estimator.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | page_handler = PageHandler("pages/pages.json") 4 | page_handler.render_page("House Price Estimator") 5 | -------------------------------------------------------------------------------- /pages/Insurance_Cost_Predictor.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | # Initialize the page handler with the path to the pages configuration file 4 | page_handler = PageHandler("pages/pages.json") 5 | 6 | # Render the page for Insurance Cost Predictor 7 | page_handler.render_page("Insurance Cost Predictor") 8 | -------------------------------------------------------------------------------- /pages/Loan_Eligibility_Estimator.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | page_handler = PageHandler("pages/pages.json") 4 | page_handler.render_page("Loan Eligibility Estimator") 5 | -------------------------------------------------------------------------------- /pages/PDF_Malware_Detection.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | from models.PDF_malware_detection.predict import predict_malware 3 | import tempfile 4 | import os 5 | 6 | st.title("Malware Detection for PDF Files") 7 | 8 | # Form for uploading the PDF file 9 | uploaded_file = st.file_uploader("Upload a PDF file", type="pdf") 10 | 11 | # Create a submit button 12 | submit_button = st.button("Submit for Malware Detection") 13 | 14 | if uploaded_file is not None and submit_button: 15 | st.info("Processing file... Please wait.") 16 | 17 | # Create a temporary file 18 | with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file: 19 | tmp_file.write(uploaded_file.getvalue()) 20 | tmp_file_path = tmp_file.name 21 | 22 | try: 23 | # Pass the temporary file path to the predict_malware function 24 | result = predict_malware(tmp_file_path) 25 | 26 | if result == 1: 27 | st.error("Malicious PDF detected!") 28 | else: 29 | st.success("The PDF is clean!") 30 | 31 | except Exception as e: 32 | st.error(f"An error occurred during processing: {str(e)}") 33 | 34 | finally: 35 | # Clean up the temporary file 36 | os.unlink(tmp_file_path) 37 | 38 | # Display some information about the uploaded file 39 | st.subheader("File Information:") 40 | st.json({ 41 | "Filename": uploaded_file.name, 42 | "File size": f"{uploaded_file.size} bytes", 43 | "File type": uploaded_file.type 44 | }) 45 | 46 | elif submit_button and uploaded_file is None: 47 | st.warning("Please upload a PDF file before submitting.") -------------------------------------------------------------------------------- /pages/Parkinson_Disease_Detector.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | page_handler = PageHandler("pages/pages.json") 4 | page_handler.render_page("Parkinson Disease Detector") -------------------------------------------------------------------------------- /pages/Sleep_Disorder_Predictor.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | page_handler = PageHandler("pages/pages.json") 4 | page_handler.render_page("Sleep Disorder Predictor") -------------------------------------------------------------------------------- /pages/Stress_Level_Detector.py: -------------------------------------------------------------------------------- 1 | from page_handler import PageHandler 2 | 3 | page_handler = PageHandler("pages/pages.json") 4 | page_handler.render_page("Stress Level Detector") 5 | -------------------------------------------------------------------------------- /pages/Text Summarizer.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | from models.text_sumarization.predict import generate_summary 3 | 4 | st.title("Text Summarization Tool") 5 | 6 | st.write("Enter the text you'd like to summarize (minimum 50 words).") 7 | 8 | user_input = st.text_area("Input Text", height=250) 9 | 10 | # A button to initiate the summarization process 11 | if st.button("Summarize"): 12 | if len(user_input.split()) < 50: 13 | st.warning("Please enter at least 50 words for summarization.") 14 | else: 15 | # Show a spinner while the summarization is being processed 16 | with st.spinner("Summarizing..."): 17 | summary = generate_summary(user_input) # Call the function from predict.py 18 | st.subheader("Summary:") 19 | st.code(summary, language="text", wrap_lines=True) 20 | -------------------------------------------------------------------------------- /pages/Translator.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | from models.translator_app.translation import capture_and_translate 3 | from models.translator_app.utils import LANGUAGES 4 | import time 5 | import os 6 | 7 | # Load custom CSS 8 | def load_css(): 9 | with open("models/translator_app/assets/styles.css") as f: 10 | st.markdown(f"", unsafe_allow_html=True) 11 | 12 | # UI Structure 13 | def main(): 14 | st.title("🌐 Real-Time Language Translator") 15 | st.markdown("Translate spoken language into other languages in real-time with a sleek experience.") 16 | 17 | load_css() # Load custom styling 18 | 19 | # Language selection 20 | source_lang_name = st.selectbox("🌍 Select Source Language", list(LANGUAGES.keys())) 21 | target_lang_name = st.selectbox("🔄 Select Target Language", list(LANGUAGES.keys())) 22 | 23 | source_lang = LANGUAGES[source_lang_name] 24 | target_lang = LANGUAGES[target_lang_name] 25 | 26 | # Button to start listening 27 | if st.button("🎤 Start Listening", key="listen_button"): 28 | audio_file = capture_and_translate(source_lang, target_lang) 29 | if audio_file: 30 | time.sleep(1) # Ensure pygame cleanup 31 | try: 32 | os.remove(audio_file) 33 | except Exception as e: 34 | st.error(f"⚠️ Error while deleting the file: {str(e)}") 35 | 36 | if __name__ == "__main__": 37 | main() 38 | -------------------------------------------------------------------------------- /pages/pages.json: -------------------------------------------------------------------------------- 1 | { 2 | "House Price Estimator": { 3 | "title": "House Price Estimator", 4 | "page_title": "House Price Estimator", 5 | "page_icon": "\ud83c\udfe0", 6 | "model_predict_file_path": "models/house_price/predict.py", 7 | "model_function": "get_prediction", 8 | "model_detail_function": "model_details", 9 | "form_config_path": "form_configs/house_price.json", 10 | "tabs": [ 11 | { 12 | "name": "Estimator", 13 | "type": "form", 14 | "form_name": "House Price Form" 15 | }, 16 | { 17 | "name": "Model Details", 18 | "type": "model_details", 19 | "problem_statement": "The model predicts house prices based on input features such as location, size, and number of rooms.", 20 | "description": "This model uses a linear regression algorithm to estimate house prices from the provided features, utilizing historical data to draw predictions." 21 | } 22 | ] 23 | }, 24 | "Credit Card Fraud Estimator": { 25 | "title": "Credit Card Fraud Estimator", 26 | "page_title": "Credit Card Fraud Estimator", 27 | "page_icon": "\ud83d\udcb0", 28 | "model_predict_file_path": "models/credit_card_fraud/predict.py", 29 | "model_function": "get_prediction", 30 | "form_config_path": "form_configs/credit_card_fraud.json", 31 | "model_detail_function": "model_details", 32 | "tabs": [ 33 | { 34 | "name": "Estimator", 35 | "type": "form", 36 | "form_name": "Credit Card Fraud Estimator" 37 | }, 38 | { 39 | "name": "Model Details", 40 | "type": "model_details", 41 | "problem_statement": "The model predicts whether a credit card transaction is fraudulent based on input features such as transaction amount, transaction type, and customer profile.", 42 | "description": "This model uses a Support Vector Machine (SVM) to classify credit card transactions as fraudulent or non-fraudulent. It leverages the SVM's ability to create an optimal decision boundary, learning from historical transaction data with both normal and fraudulent transactions to make accurate predictions. The model is particularly useful in identifying complex patterns and borderline cases in high-dimensional data." 43 | } 44 | ] 45 | }, 46 | "Loan Eligibility Estimator": { 47 | "title": "Loan Eligibility Estimator", 48 | "page_title": "Loan Eligibility Estimator", 49 | "page_icon": "\ud83d\udcb0", 50 | "model_predict_file_path": "models/loan_eligibility/predict.py", 51 | "model_function": "get_prediction", 52 | "form_config_path": "form_configs/loan_eligibility.json", 53 | "model_detail_function": "model_details", 54 | "tabs": [ 55 | { 56 | "name": "Estimator", 57 | "type": "form", 58 | "form_name": "Loan Eligibility Form" 59 | }, 60 | { 61 | "name": "Model Details", 62 | "type": "model_details", 63 | "problem_statement": "This model determines whether a user is eligible for a loan based on several financial and personal input factors.", 64 | "description": "It leverages a decision tree classification algorithm to predict loan eligibility, making use of past customer data and lending patterns." 65 | } 66 | ] 67 | }, 68 | "Stress Level Detector": { 69 | "title": "Stress Level Detector", 70 | "page_title": "Stress Level Detector", 71 | "page_icon": "\u2691", 72 | "model_predict_file_path": "models/stress_level_detect/predict.py", 73 | "model_function": "get_prediction", 74 | "model_detail_function": "model_details", 75 | "form_config_path": "form_configs/stress_detection.json", 76 | "tabs": [ 77 | { 78 | "name": "Stress Level Estimator", 79 | "type": "form", 80 | "form_name": "Stress Detection Form" 81 | }, 82 | { 83 | "name": "Model Details", 84 | "type": "model_details", 85 | "problem_statement": "The model assesses the stress level of an individual based on biometric and behavioral data.", 86 | "description": "This model uses a support vector machine (SVM) to classify stress levels into different categories (low, medium, high) based on physiological indicators." 87 | } 88 | ] 89 | }, 90 | "Parkinson Disease Detector": { 91 | "title": "Parkinson Disease Detector", 92 | "page_title": "Parkinson Disease Detector", 93 | "model_predict_file_path": "models/parkinson_disease_detector/parkinson_predict.py", 94 | "model_function": "get_prediction", 95 | "model_detail_function": "model_details", 96 | "form_config_path": "form_configs/parkinson_detection.json", 97 | "tabs": [ 98 | { 99 | "name": "Parkinson's Disease Detector", 100 | "type": "form", 101 | "form_name": "Parkinson Detection Form" 102 | }, 103 | { 104 | "name": "Model Details", 105 | "type": "model_details", 106 | "problem_statement": "The model aims to detect early signs of Parkinson's disease from voice and movement data.", 107 | "description": "Using a Random Forest classifier, the model predicts whether an individual is likely to have Parkinson's based on voice recordings and motor function metrics" 108 | } 109 | ] 110 | }, 111 | "Customer Income Estimator": { 112 | "title": "Customer Income Estimator", 113 | "page_title": "Customer Income Estimator", 114 | "page_icon": "\ud83d\udcb0", 115 | "model_predict_file_path": "models/customer_income/predict.py", 116 | "model_function": "get_prediction", 117 | "model_detail_function": "model_details", 118 | "form_config_path": "form_configs/customer_income.json", 119 | "tabs": [ 120 | { 121 | "name": "Estimator", 122 | "type": "form", 123 | "form_name": "Customer Income Estimation Form" 124 | }, 125 | { 126 | "name": "Model Details", 127 | "type": "model_details", 128 | "problem_statement": "The model predicts the income of a customer based on various demographic and financial features.", 129 | "description": "This model uses Random Forest Regression to estimate the income of a customer based on input features such as age, education, employment status etc." 130 | } 131 | ] 132 | }, 133 | "Gold Price Predictor": { 134 | "title": "Gold Price Predictor", 135 | "page_title": "Gold Price Predictor", 136 | "page_icon": "\ud83d\udcb0", 137 | "model_predict_file_path": "models/gold_price_prediction/predict.py", 138 | "model_function": "get_prediction", 139 | "model_detail_function": "model_details", 140 | "form_config_path": "form_configs/gold_price_prediction.json", 141 | "tabs": [ 142 | { 143 | "name": "Gold Price Form", 144 | "type": "form", 145 | "form_name": "Gold Price Form" 146 | }, 147 | { 148 | "name": "Model Details", 149 | "type": "model_details", 150 | "problem_statement": "The Gold Price Predictor leverages financial metrics and machine learning algorithms to forecast the price of gold (GLD). Gold prices are influenced by various economic factors, and this tool aims to provide accurate predictions based on historical data.", 151 | "description": "The dataset used for this model contains daily financial data, including stock market indices, commodity prices, and currency exchange rates. The goal is to predict the gold price (GLD) using features such as the S&P 500 Index (SPX), crude oil price (USO), silver price (SLV), and the EUR/USD exchange rate." 152 | } 153 | ] 154 | }, 155 | "Sleep Disorder Predictor": { 156 | "title": "Sleep Disorder Predictor", 157 | "page_title": "Sleep Disorder Predictor", 158 | "model_predict_file_path": "models/sleep_disorder_predictor/predict.py", 159 | "model_function": "get_prediction", 160 | "model_detail_function": "model_details", 161 | "form_config_path": "form_configs/sleep_prediction.json", 162 | "tabs": [ 163 | { 164 | "name": "Sleep Disorder Predictor", 165 | "type": "form", 166 | "form_name": "Sleep Prediction Form" 167 | }, 168 | { 169 | "name": "Model Details", 170 | "type": "model_details", 171 | 172 | "problem_statement": "The model aims to predict the likelihood of an individual having a sleep disorder based on lifestyle, sleep quality, and health metrics.", 173 | "description": "Using a XGBoost classifier, the model predicts whether an individual is likely to have a sleep disorder based on features such as sleep duration, stress level, physical activity level, cardiovascular health metrics, and demographic information." 174 | } 175 | ] 176 | }, 177 | 178 | "Malware_Detection": { 179 | "title": "PDF Malware Detection", 180 | "page_title": "PDF Malware Detection", 181 | "page_icon": "\ud83d\udd12", 182 | "model_predict_file_path": "models/pdf_malware_detection/predict.py", 183 | "model_function": "get_prediction", 184 | "model_detail_function": "model_details", 185 | "form_config_path": "form_configs/pdf_malware_detection.json", 186 | "tabs": [ 187 | { 188 | "name": "Malware Detection Form", 189 | "type": "form", 190 | "form_name": "Malware Detection Form" 191 | }, 192 | { 193 | "name": "Model Details", 194 | "type": "model_details" 195 | } 196 | ] 197 | }, 198 | 199 | "Insurance Cost Predictor": { 200 | "title": "Insurance Cost Predictor", 201 | "page_title": "Insurance Cost Predictor", 202 | "page_icon": "\ud83d\udd12", 203 | "model_predict_file_path": "models/insurance_cost_predictor/predict.py", 204 | "model_function": "get_prediction", 205 | "model_detail_function": "model_details", 206 | "form_config_path": "form_configs/insurance_cost_predictor.json", 207 | "tabs": [ 208 | { 209 | "name": "Insurance Cost Form", 210 | "type": "form", 211 | "form_name": "Insurance Cost Form" 212 | }, 213 | { 214 | "name": "Model Details", 215 | "type": "model_details", 216 | "problem_statement": "The Insurance Cost Predictor estimates the insurance cost based on various personal factors such as age, BMI, number of children, smoker status, and region. By using machine learning, this tool provides accurate predictions to help users plan their insurance costs more effectively.", 217 | "description": "This model uses a dataset containing demographic and health-related factors to predict the cost of insurance. The features include age, sex, BMI, children, smoker status, and region, with predictions made using the Random Forest algorithm for accurate results. Ensemble techniques like XGBoost will also be used to further enhance the prediction accuracy." 218 | } 219 | ] 220 | }, 221 | "Business Performance Forecasting": { 222 | "title": "Business Performance Forecasting", 223 | "page_title": "Business Performance Forecasting", 224 | "page_icon": "\ud83c\udf3e", 225 | "model_predict_file_path": "models/business_performance_forecasting/predict.py", 226 | "model_function": "get_prediction", 227 | "model_detail_function": "model_details", 228 | "form_config_path": "form_configs/business_performance_forecasting.json", 229 | "tabs": [ 230 | { 231 | "name": "Business Forecast Form", 232 | "type": "form", 233 | "form_name": "Business Forecast Form" 234 | }, 235 | { 236 | "name": "Model Details", 237 | "type": "model_details", 238 | "problem_statement": "The Business Performance Forecasting model predicts future profits based on R&D spend, administration costs, marketing spend, and state. By utilizing machine learning, this tool assists businesses in making informed decisions about resource allocation.", 239 | "description": "This model employs a dataset with features including R&D spend, administration costs, marketing spend, and geographic location to forecast profits. The predictions are generated using regression techniques, ensuring accuracy and reliability for business strategy planning." 240 | } 241 | ] 242 | } 243 | 244 | } 245 | 246 | 247 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # Predictive Calc 2 | 3 | ## Overview 4 | **Predictive Calc** is an open-source project that provides a flexible collection of machine learning models designed to predict a wide variety of outcomes. Built with **Python** and **Streamlit**, the project offers an intuitive web interface, enabling users to easily interact with the models. The primary goal of the project is to streamline the integration of machine learning models with custom forms, allowing users to build their own prediction calculators tailored to specific use cases. 5 | 6 | ## Current Status 7 | The project is under active development with several machine learning models already implemented for various prediction tasks. The architecture is designed for dynamic configuration using JSON files, which map model parameters, inputs, and features. This design ensures new models can be seamlessly added or updated with minimal modification to the core codebase. 8 | 9 | The project has been successfully tested in local environments, and current efforts are focused on enhancing integration, optimizing deployment, and improving scalability for production-ready applications. 10 | 11 | ## How to Contribute: 12 | 1. Review existing issues and contribute towards resolving them. 13 | 2. Or create new issues to discuss new ideas, suggest features, or report bugs. 14 | 3. Fork the repository and create a new branch for your contribution. 15 | 4. Implement your changes and submit a pull request with a clear description. 16 | 5. Further details can be found in the [contributing.md](contributing.md) file. 17 | 18 | ## Setup Instructions 19 | 1. Fork or clone the repository. 20 | 2. Create a virtual environment and install the necessary dependencies: 21 | ```powershell 22 | python -m venv .venv 23 | .venv\Scripts\Activate 24 | ``` 25 | 2. Install the necessary dependencies: 26 | ```powershell 27 | pip install -r requirements.txt 28 | ``` 29 | 3. Run the Streamlit application using: 30 | ```powershell 31 | streamlit run app.py 32 | ``` 33 | ## Docker Setup Instructions 34 | 35 | 1. Install [Docker](https://docs.docker.com/get-docker/) on your machine. 36 | 2. Windows user - Install [WSL](https://learn.microsoft.com/en-us/windows/wsl/install/) (Ubuntu-22.04). 37 | 3. Run the application using Docker Compose: 38 | ```powershell 39 | docker-compose up 40 | ``` 41 | 4. To stop the application: 42 | ```powershell 43 | docker-compose down 44 | ``` 45 | 46 | ## Our Valuable contributors 47 | [![Contributors](https://contrib.rocks/image?repo=yashasvini121/predictive-calc)](https://github.com/yashasvini121/predictive-calc/graphs/contributors) 48 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # Requirements.txt for the Streamlit app 2 | # Jupyter notebook-related packages and their dependencies are excluded because 3 | # this app does not need to open or interact with Jupyter notebooks. 4 | 5 | altair==5.4.1 6 | anyio==4.5.0 7 | argon2-cffi==23.1.0 8 | argon2-cffi-bindings==21.2.0 9 | arrow==1.3.0 10 | asttokens==2.4.1 11 | async-lru==2.0.4 12 | attrs==24.2.0 13 | babel==2.16.0 14 | beautifulsoup4==4.12.3 15 | bleach==6.1.0 16 | blinker==1.8.2 17 | cachetools==5.5.0 18 | certifi==2024.8.30 19 | cffi==1.17.1 20 | chardet==3.0.4 21 | charset-normalizer==3.3.2 22 | click==8.1.7 23 | cmdstanpy==1.2.4 24 | colorama==0.4.6 25 | contourpy==1.3.0 26 | cycler==0.12.1 27 | debugpy==1.8.5 28 | decorator==5.1.1 29 | defusedxml==0.7.1 30 | et-xmlfile==1.1.0 31 | executing==2.1.0 32 | fastjsonschema==2.20.0 33 | fonttools==4.53.1 34 | fqdn==1.5.1 35 | gitdb==4.0.11 36 | GitPython==3.1.43 37 | googletrans==4.0.0rc1 38 | gTTS==2.5.3 39 | h11==0.9.0 40 | h2==3.2.0 41 | holidays==0.57 42 | hpack==3.0.0 43 | hstspreload==2024.10.1 44 | httpcore==0.9.1 45 | httpx==0.13.3 46 | hyperframe==5.2.0 47 | idna==2.10 48 | imbalanced-learn==0.12.4 49 | imblearn==0.0 50 | importlib_resources==6.4.5 51 | iniconfig==2.0.0 52 | Jinja2==3.1.4 53 | joblib==1.4.2 54 | json5==0.9.25 55 | jsonpointer==3.0.0 56 | jsonschema==4.23.0 57 | jsonschema-specifications==2023.12.1 58 | kiwisolver==1.4.7 59 | markdown-it-py==3.0.0 60 | MarkupSafe==2.1.5 61 | matplotlib==3.9.2 62 | matplotlib-inline==0.1.7 63 | mdurl==0.1.2 64 | mistune==3.0.2 65 | narwhals==1.8.1 66 | numpy 67 | openpyxl==3.1.5 68 | overrides==7.7.0 69 | packaging==24.1 70 | pandas==2.2.2 71 | pandocfilters==1.5.1 72 | parso==0.8.4 73 | patsy==0.5.6 74 | pillow==10.4.0 75 | platformdirs==4.3.6 76 | plotly==5.24.1 77 | pluggy==1.5.0 78 | prometheus_client==0.20.0 79 | prompt_toolkit==3.0.47 80 | prophet==1.1.6 81 | protobuf==4.25.5 82 | psutil==6.0.0 83 | pure_eval==0.2.3 84 | pyarrow==17.0.0 85 | PyAudio==0.2.14 86 | pycparser==2.22 87 | pydeck==0.9.1 88 | pygame==2.6.1 89 | Pygments==2.18.0 90 | PyMuPDF==1.24.11 91 | pyparsing==3.1.4 92 | PyPDF2==3.0.1 93 | pdfid==1.1.3 94 | pytest==8.3.3 95 | pytest-mock==3.14.0 96 | python-dateutil==2.9.0.post0 97 | python-json-logger==2.0.7 98 | pytz==2024.2 99 | PyYAML==6.0.2 100 | pyzmq==26.2.0 101 | referencing==0.35.1 102 | requests==2.32.3 103 | rfc3339-validator==0.1.4 104 | rfc3986==1.5.0 105 | rfc3986-validator==0.1.1 106 | rich==13.8.1 107 | rpds-py==0.20.0 108 | scikit-learn==1.5.2 109 | scipy==1.14.1 110 | seaborn==0.13.2 111 | Send2Trash==1.8.3 112 | setuptools==75.1.0 113 | six==1.16.0 114 | smmap==5.0.1 115 | sniffio==1.3.1 116 | soupsieve==2.6 117 | SpeechRecognition==3.10.4 118 | stack-data==0.6.3 119 | stanio==0.5.1 120 | statsmodels==0.14.3 121 | streamlit==1.38.0 122 | tenacity==8.5.0 123 | terminado==0.18.1 124 | threadpoolctl==3.5.0 125 | tinycss2==1.3.0 126 | toml==0.10.2 127 | tornado==6.4.1 128 | tqdm==4.66.5 129 | traitlets==5.14.3 130 | types-python-dateutil==2.9.0.20240906 131 | typing_extensions==4.12.2 132 | tzdata==2024.1 133 | uri-template==1.3.0 134 | urllib3==2.2.3 135 | watchdog==4.0.2 136 | wcwidth==0.2.13 137 | webcolors==24.8.0 138 | webencodings==0.5.1 139 | websocket-client==1.8.0 140 | xgboost==2.1.1 141 | transformers==4.45.2 142 | tf_keras==2.17.0 143 | -------------------------------------------------------------------------------- /todo.md: -------------------------------------------------------------------------------- 1 | ## To Do 2 | 1. Add docs 3 | 2. Add github actions 4 | 3. Add logging 5 | 4. Add tests 6 | 5. Add graphs 7 | 6. Improve the UI/UX 8 | 7. Complete the Loan Eligibility Calculator 9 | 8. Add more models 10 | 11 | 12 | # To Fix: 13 | 1. Plots --------------------------------------------------------------------------------