├── .gitignore ├── Week 4 ├── 3. Try to make GUI for your algorithm with Gradio or streamlit │ └── Readme.md ├── 1. Make the telegram bot that get us bitcoin price for specific time │ ├── Readme.md │ └── BitCoin Bot.ipynb └── 2. Implement an ML algorithm for predict etherium coin │ ├── Readme.md │ └── ethereum-price-prediction.ipynb ├── Week 1 ├── 3. Fill Features Part Of LinkedIn (Add your GitHub, Kaggle, Resume[PDF], ...) │ └── Readme.md ├── 1. Learn Git and GitHub with add and management on github │ └── Readme.md ├── 5. Learn Markdown Deeper │ ├── Readme.md │ └── Comprehensive Markdown Commands.ipynb ├── 4. Search About Kaggle And NoteBook Part │ └── README.md └── 2. Add your previous notebook on Kaggle │ └── README.md ├── Week 3 ├── 1. Write LinkedIn post about Finance libraries and Pros and Cons │ └── Readme.md ├── 3. Try to extract telegram datas (Data Collection) │ └── Readme.md ├── 2. Analysis Data that extract from one of best libraries │ ├── Readme.md │ └── Healthcare.ipynb └── 4. Learn about Streamlit and Gradio Libraries │ └── Readme.md ├── Week 2 ├── 1. Search about time series forecasting parameters and features │ └── Readme.md ├── 4. Increase Your Connections on LinkedIn │ └── Readme.md ├── 2. Search about telegram API and features that we can get from that │ └── Readme.md └── 3. Find python finance data collector libraries and make list for that │ └── Readme.md ├── LICENSE └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | .DS_Store 3 | -------------------------------------------------------------------------------- /Week 4/3. Try to make GUI for your algorithm with Gradio or streamlit/Readme.md: -------------------------------------------------------------------------------- 1 | # make GUI for your algorithm with Gradio or streamlit 2 | 3 | 4 | ## Link to Dashboard: 5 | https://ethereum-price-prediction.streamlit.app/ 6 | 7 | -------------------------------------------------------------------------------- /Week 1/3. Fill Features Part Of LinkedIn (Add your GitHub, Kaggle, Resume[PDF], ...)/Readme.md: -------------------------------------------------------------------------------- 1 | # LinkedIn Profile Update 2 | 3 | I'm excited to share that I've updated my LinkedIn profile based on the skills and knowledge I've gained so far! 🎉 4 | 5 | I would be thrilled if you could follow me on LinkedIn to stay connected and see my progress in the field of Data Science. 6 | 7 | You can find my profile here: [Amin Gholami on LinkedIn](https://www.linkedin.com/in/amiingholami) 8 | 9 | Thank you for your support! 🙏 10 | -------------------------------------------------------------------------------- /Week 3/1. Write LinkedIn post about Finance libraries and Pros and Cons/Readme.md: -------------------------------------------------------------------------------- 1 | # Write LinkedIn Post about Finance Libraries and Pros and Cons 2 | 3 | For this task, I've decided to write an article on LinkedIn that dives into the world of financial libraries in Python. The article explores various libraries, examining their unique features, advantages, and disadvantages. 4 | 5 | In this comprehensive overview, you'll find insights on popular libraries like **Pandas**, **NumPy**, **SciPy**, and **QuantLib**, among others. I discuss how each library can enhance your financial analysis and modeling, along with the trade-offs to consider when choosing the right tools for your projects. 6 | 7 | Curious to learn more? Check out the full article here: [Financial Libraries in Python: Pros and Cons](https://www.linkedin.com/pulse/financial-libraries-python-pros-cons-amin-gholami-xclof) 8 | 9 | -------------------------------------------------------------------------------- /Week 2/1. Search about time series forecasting parameters and features/Readme.md: -------------------------------------------------------------------------------- 1 | ## Time Series Forecasting: Parameters and Features 2 | 3 | As part of my internship, I was assigned a task to **search about time series forecasting parameters and features**. 4 | 5 | For this, I studied the notebook available at the following link: 6 | 7 | [Intro to Time Series Forecasting Notebook](https://github.com/AmiinGholami/MyInternship/blob/main/Week%202/1.%20Search%20about%20time%20series%20forecasting%20parameters%20and%20features/intro-to-time-series-forecasting.ipynb) 8 | 9 | The notebook provides a comprehensive introduction to time series forecasting, covering the key parameters and features involved in this technique. This research helped me understand essential concepts like trend, seasonality, autocorrelation, and the importance of features such as lagged values and rolling statistics in building effective time series models. 10 | 11 | Feel free to check it out! 12 | -------------------------------------------------------------------------------- /Week 2/4. Increase Your Connections on LinkedIn/Readme.md: -------------------------------------------------------------------------------- 1 | # 4. Increase Your Connections on LinkedIn 2 | 3 | As part of this task, I focused on growing my professional network on LinkedIn, specifically targeting connections in the fields of Data Science, Data Analysis, and related areas. 4 | 5 | I already had a well-established LinkedIn profile, which I updated with my most recent experiences, skills, and accomplishments to better reflect my journey into Data Science. After enhancing my profile, I started actively reaching out to professionals and experts in these fields, aiming to build meaningful connections. 6 | 7 | By connecting with individuals who share my interests, I’m expanding my knowledge and staying up-to-date with the latest trends, opportunities, and discussions in Data Science and Analytics. This step has been instrumental in positioning myself within a growing community of like-minded professionals. 8 | 9 | ## LinkedIn Profile: 10 | Follow me on LinkedIn to see my latest updates and be part of my network! 11 | 12 | You can find my profile here: [Amin Gholami on LinkedIn](https://www.linkedin.com/in/amiingholami) 13 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Amin Gholami 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MyInternship 2 | In this repository, I will document my learning journey in data science. You’ll find resources, insights, and projects that reflect the knowledge and skills I acquire along the way. I aim to create a comprehensive collection that not only helps me track my progress but also serves as a resource for others interested in the field 3 | 4 | # Intern 5 | 6 | WEEK 1 : 7 | 8 | - [x] Learn Git and GitHub with add and management on github 9 | - [x] Add your previous notebook on Kaggle 10 | - [x] Fill Features Part Of LinkedIn (Add your GitHub, Kaggle, Resume[PDF], ...) 11 | - [x] Search About Kaggle And NoteBook Part 12 | - [x] Learn Markdown Deeper 13 | 14 | Note: Make a google doc and add a link in README.md (Knowledge Repository) 15 | 16 | Note: for each Task you must make notebook 17 | 18 | Time : 22 Mehr 1403 19 | 20 | --- 21 | 22 | WEEK 2 : 23 | 24 | - [x] Search about time series forecasting parameters and features 25 | - [x] Search about telegram API and features that we can get from that 26 | - [x] Find python finance data collector libraries and make list for that 27 | - [x] Increase your connection at LinkedIn 28 | 29 | Note: Make a google doc and add a link in README.md (Knowledge Repository) 30 | Note: for each Task you must make notebook 31 | 32 | Time : 30 Mehr 1403 33 | 34 | --- 35 | 36 | WEEK 3 : 37 | 38 | - [x] Write LinkedIn post about Finance libraries and Pros and Cons 39 | - [x] Analysis Data that extract from one of best libraries 40 | - [x] Try to extract telegram datas (Data Collection) 41 | - [x] Learn about Streamlit and Gradio Libraries 42 | 43 | Note: Make a google doc and add a link in README.md (Knowledge Repository) 44 | Note: for each Task you must make notebook 45 | 46 | Time : 9 Aban 1403 47 | 48 | --- 49 | 50 | WEEK 4 : 51 | 52 | - [x] Make the telegram bot that get us bitcoin price for specific time (with bot father and Finance library) 53 | - [x] Implement an ML algorithm for predict etherium coin (you can use ml algorithm like SVM, DT or DL algorithm like LSTM try to train algorithm and focus on hyper-paramenters) 54 | - [x] Try to make GUI for your algorithm with Gradio or streamlit 55 | 56 | Note: Make a google doc and add a link in README.md (Knowledge Repository) 57 | Note: for each Task you must make notebook 58 | 59 | Time : 5 Day 1403 60 | 61 | -------------------------------------------------------------------------------- /Week 1/1. Learn Git and GitHub with add and management on github/Readme.md: -------------------------------------------------------------------------------- 1 | # Welcome to My Git and GitHub Learning Journey! 2 | 3 | Hello, fellow learners! 👋 4 | 5 | I’m excited to share that I have delved into the world of **Git** and **GitHub**! Through my studies, I have explored a multitude of concepts, commands, and best practices that are essential for anyone looking to enhance their skills in version control and collaborative coding. 6 | 7 | ## What You Will Find Here 8 | 9 | In this repository, I have organized my learnings into **13 comprehensive sections** that cover a wide range of topics related to Git and GitHub. Each section is crafted to be clear and easy to understand, making it suitable for both beginners and those looking to refine their skills. 10 | 11 | ### Topics Covered 12 | The sections include, but are not limited to: 13 | 14 | 1. **Introduction to Git**: Understanding what Git is and why version control is important. 15 | 2. **Setting Up Git**: Installation and configuration essentials. 16 | 3. **First Steps with Git**: Creating repositories and making your first commits. 17 | 4. **Tracking Changes**: How to monitor your progress and understand the state of your projects. 18 | 5. **Branching and Merging**: Learning about branches and how to effectively merge them. 19 | 6. **Working with Remote Repositories**: Collaborating with others using GitHub. 20 | 7. **Managing Conflicts**: Strategies for resolving conflicts that arise during collaboration. 21 | 8. **Tagging and Releases**: Understanding versioning and managing releases. 22 | 9. **Debugging with Git**: Tools and techniques for tracking down issues. 23 | 10. **Advanced Git Commands**: Exploring powerful commands for experienced users. 24 | 11. **Collaboration with GitHub**: Understanding forks, pull requests, and code licensing. 25 | 12. **Git GUI Tools**: An overview of graphical tools that enhance the Git experience. 26 | 13. **Conclusion and Best Practices**: Summarizing key concepts and providing tips for effective version control. 27 | 28 | ## Join the Journey! 29 | 30 | I invite you to explore the content I have compiled here. Whether you're a seasoned developer or just starting your coding journey, I believe you will find valuable insights that can enhance your understanding and usage of Git and GitHub. 31 | 32 | ### Contributing 33 | 34 | This repository is open for contributions! If you have additional insights, resources, or corrections, feel free to fork this repository and submit a pull request. Let’s collaborate and learn together! 35 | 36 | Thank you for visiting my repository. Happy coding! 🚀 37 | 38 | --- 39 | -------------------------------------------------------------------------------- /Week 1/5. Learn Markdown Deeper/Readme.md: -------------------------------------------------------------------------------- 1 | # Markdown Commands Reference 2 | 3 | Welcome to the Markdown Commands Reference! 📜 4 | 5 | In this document, you'll find a comprehensive list of Markdown commands along with a brief explanation of each. Markdown is a lightweight markup language that allows you to format text easily and efficiently. This reference will help you get familiar with the syntax and usage of Markdown. 6 | 7 | ## Table of Contents 8 | 9 | 1. **Headings** 10 | 2. **Text Formatting** 11 | 3. **Lists** 12 | 4. **Links** 13 | 5. **Images** 14 | 6. **Blockquotes** 15 | 7. **Code** 16 | 8. **Horizontal Rules** 17 | 9. **Tables** 18 | 10. **Footnotes** 19 | 11. **Task Lists** 20 | 12. **HTML Elements** 21 | 22 | --- 23 | 24 | ## 1. Headings 25 | 26 | Markdown provides six levels of headings, which are created using the `#` symbol. 27 | 28 | - `# Heading 1` 29 | - `## Heading 2` 30 | - `### Heading 3` 31 | - `#### Heading 4` 32 | - `##### Heading 5` 33 | - `###### Heading 6` 34 | 35 | ### Purpose 36 | Headings help structure your document and make it easier to read. 37 | 38 | --- 39 | 40 | ## 2. Text Formatting 41 | 42 | ### Bold and Italics 43 | - `**Bold Text**` or `__Bold Text__` → **Bold Text** 44 | - `*Italic Text*` or `_Italic Text_` → *Italic Text* 45 | 46 | ### Strikethrough 47 | - `~~Strikethrough~~` → ~~Strikethrough~~ 48 | 49 | ### Purpose 50 | These formats emphasize text and can highlight important information. 51 | 52 | --- 53 | 54 | ## 3. Lists 55 | 56 | ### Unordered Lists 57 | - Use `*`, `+`, or `-` to create bullet points. 58 | 59 | Example: 60 | - Item 1 61 | - Item 2 62 | - Item 3 63 | 64 | ### Ordered Lists 65 | - Use numbers followed by a period. 66 | 67 | Example: 68 | 1. First item 69 | 2. Second item 70 | 3. Third item 71 | 72 | ### Purpose 73 | Lists organize information and make it easier to digest. 74 | 75 | --- 76 | 77 | ## 4. Links 78 | 79 | - `[Link Text](URL)` → [Link Text](https://www.example.com) 80 | 81 | ### Purpose 82 | Links allow you to connect to external resources or pages. 83 | 84 | --- 85 | 86 | ## 5. Images 87 | 88 | - `![Alt Text](Image URL)` → ![Alt Text](https://www.example.com/image.jpg) 89 | 90 | ### Purpose 91 | Images enhance visual appeal and provide context to your text. 92 | 93 | --- 94 | 95 | ## 6. Blockquotes 96 | 97 | - Use `>` to create a blockquote. 98 | 99 | Example: 100 | > This is a blockquote. 101 | 102 | ### Purpose 103 | Blockquotes highlight important quotes or excerpts. 104 | 105 | --- 106 | 107 | ## 7. Code 108 | 109 | ### Inline Code 110 | - Use backticks `` ` `` for inline code. 111 | 112 | Example: `code here` 113 | 114 | ### Code Blocks 115 | - Use triple backticks (```) for multiline code blocks. 116 | 117 | ### Purpose 118 | Code formatting helps present code snippets clearly. 119 | 120 | --- 121 | 122 | ## 8. Horizontal Rules 123 | 124 | - Use `---`, `***`, or `___` to create horizontal lines. 125 | 126 | Example: 127 | --- 128 | 129 | ### Purpose 130 | Horizontal rules visually separate sections in your document. 131 | 132 | --- 133 | 134 | ## 9. Tables 135 | 136 | - Use pipes `|` and dashes `-` to create tables. 137 | 138 | Example: 139 | | Header 1 | Header 2 | 140 | |----------|----------| 141 | | Row 1 | Row 2 | 142 | | Row 3 | Row 4 | 143 | 144 | ### Purpose 145 | Tables organize data in a structured format. 146 | 147 | --- 148 | 149 | ## 10. Footnotes 150 | 151 | - Use `[^1]` to create a footnote reference and define it at the bottom of the document. 152 | 153 | Example: 154 | This is a text with a footnote[^1]. 155 | 156 | [^1]: This is the footnote text. 157 | 158 | ### Purpose 159 | Footnotes provide additional information without cluttering the main text. 160 | 161 | --- 162 | 163 | ## 11. Task Lists 164 | 165 | - Use `- [ ]` for unchecked tasks and `- [x]` for checked tasks. 166 | 167 | Example: 168 | - [x] Task 1 169 | - [ ] Task 2 170 | - [ ] Task 3 171 | 172 | ### Purpose 173 | Task lists help track progress on tasks or projects. 174 | 175 | --- 176 | 177 | ## 12. HTML Elements 178 | 179 | - Markdown supports HTML syntax for more complex formatting. 180 | 181 | Example: 182 | ```html 183 |
184 |

This is an HTML element.

185 |
186 | ``` 187 | 188 | ### Purpose 189 | 190 | HTML elements expand the capabilities of Markdown for advanced users. 191 | 192 | 193 | ## Conclusion 194 | 195 | Feel free to explore and use these commands to enhance your documents with Markdown. This reference aims to help you become proficient in Markdown formatting, making your writing clearer and more engaging. 196 | 197 | Happy writing! ✍️ 198 | -------------------------------------------------------------------------------- /Week 4/1. Make the telegram bot that get us bitcoin price for specific time/Readme.md: -------------------------------------------------------------------------------- 1 | # **Project Report: Bitcoin Price Telegram Bot** 2 | 3 | ## **Introduction** 4 | This report documents the development of a Telegram bot named **AmiinBitcoinPriceBot**. The bot retrieves the price of Bitcoin for a specific date using the **yFinance library** and interacts with users via Telegram's bot platform. 5 | 6 | --- 7 | 8 | ## **Objectives** 9 | The primary goal was to create a bot that: 10 | 1. Allows users to input a specific date. 11 | 2. Fetches and displays the closing price of Bitcoin (in USD) for the given date. 12 | 3. Provides an easy-to-use interface via Telegram. 13 | 14 | --- 15 | 16 | ## **Development Process** 17 | 18 | ### **1. Bot Creation in Telegram** 19 | 1. The bot was registered using **BotFather** in Telegram. 20 | 2. The following command sequence was used: 21 | - `/start` to initiate BotFather. 22 | - `/newbot` to create a new bot. 23 | 3. The bot was given the name **Bitcoin Price Tracker** and the username **AmiinBitcoinPriceBot**. 24 | 4. A unique **access token** was generated for the bot. 25 | 26 | --- 27 | 28 | ### **2. Environment Setup** 29 | 1. Python was chosen as the programming language due to its extensive libraries and ease of integration. 30 | 2. The following Python libraries were installed: 31 | ```bash 32 | pip install pyTelegramBotAPI yfinance 33 | 3. The development environment was configured on a local machine. 34 | 35 | 3. Implementation 36 | 37 | ## The implementation was divided into distinct steps: 38 | 39 | ### Bot Initialization 40 | 41 | The bot was initialized using the telebot library with the token provided by BotFather: 42 | 43 | ``` TOKEN = 'your-telegram-bot-token' 44 | bot = telebot.TeleBot(TOKEN) 45 | ``` 46 | 47 | ## Fetching Bitcoin Price 48 | 49 | The yFinance library was used to retrieve historical Bitcoin prices: 50 | 51 | ``` 52 | def get_bitcoin_price(date): 53 | btc = yf.Ticker("BTC-USD") 54 | historical = btc.history(start=date, end=date) 55 | price = historical['Close'].iloc[0] 56 | return f"Bitcoin price on {date} was ${price:.2f}" 57 | ``` 58 | 59 | ## Handling User Commands 60 | 61 | The bot was programmed to handle the /price command and user-provided dates: 62 | 63 | ``` 64 | @bot.message_handler(commands=['price']) 65 | def send_welcome(message): 66 | bot.reply_to(message, "Please enter the date in YYYY-MM-DD format:") 67 | 68 | @bot.message_handler(func=lambda message: True) 69 | def fetch_price(message): 70 | date = message.text.strip() 71 | try: 72 | datetime.strptime(date, '%Y-%m-%d') # Validate date format 73 | price_message = get_bitcoin_price(date) 74 | bot.reply_to(message, price_message) 75 | except ValueError: 76 | bot.reply_to(message, "Invalid date format. Please use YYYY-MM-DD.") 77 | ``` 78 | 79 | ## Running the Bot 80 | 81 | The bot was activated using the following script: 82 | 83 | ``` 84 | print("Bot is running...") 85 | bot.polling() 86 | ``` 87 | 88 | ## 4. Testing 89 | 1. The bot was tested in Telegram by sending the /price command. 90 | 2. Various dates were inputted, and the bot successfully fetched Bitcoin prices. 91 | 3. Invalid inputs (e.g., incorrect date format) were handled gracefully. 92 | 93 | ## Results 94 | • The bot, AmiinBitcoinPriceBot, is fully operational and responds accurately to user requests. 95 | • It provides the closing price of Bitcoin for any valid date input in the YYYY-MM-DD format. 96 | 97 | ## Challenges and Resolutions 98 | 1. Date Validation 99 | • Challenge: Ensuring the input date format is correct. 100 | • Resolution: Used Python’s datetime.strptime for validation. 101 | 2. Data Retrieval for Non-Trading Days 102 | • Challenge: Some dates (e.g., weekends) had no trading data. 103 | • Resolution: Instructed users to select valid trading dates. 104 | 3. API Response Time 105 | • Challenge: Occasionally, the API took longer than expected to respond. 106 | • Resolution: Implemented a user-friendly error message. 107 | 108 | ## Future Improvements 109 | 1. Deploy the bot on a cloud server (e.g., Heroku or AWS) for 24/7 availability. 110 | 2. Add functionality for live Bitcoin price tracking. 111 | 3. Implement multi-language support for a broader audience. 112 | 4. Enhance error handling for API issues. 113 | 114 | ## Conclusion 115 | 116 | The AmiinBitcoinPriceBot successfully fulfills its purpose of providing historical Bitcoin prices. Its user-friendly interaction and reliable data fetching make it a valuable tool for cryptocurrency enthusiasts. 117 | 118 | For further development or inquiries, feel free to contact me! -------------------------------------------------------------------------------- /Week 4/2. Implement an ML algorithm for predict etherium coin/Readme.md: -------------------------------------------------------------------------------- 1 | # Ethereum Price Prediction using Machine Learning 2 | 3 | This project focuses on predicting Ethereum prices using machine learning techniques. The dataset contains historical Ethereum price data, and we implement a robust model to forecast future prices. This repository includes the entire pipeline, from data preprocessing to model evaluation and optimization. 4 | 5 | --- 6 | 7 | ## Table of Contents 8 | 1. [Introduction](#introduction) 9 | 2. [Dataset Details](#dataset-details) 10 | 3. [Project Steps](#project-steps) 11 | 4. [Model Performance](#model-performance) 12 | 5. [Key Features](#key-features) 13 | 6. [How to Run](#how-to-run) 14 | 7. [Future Work](#future-work) 15 | 8. [Acknowledgments](#acknowledgments) 16 | 17 | --- 18 | 19 | ## Introduction 20 | 21 | Predicting cryptocurrency prices is a challenging task due to high volatility and dynamic trends. In this project, we utilize supervised machine learning algorithms, including Decision Trees (DT), Support Vector Machines (SVM), and Long Short-Term Memory networks (LSTM), to create an optimized model capable of forecasting Ethereum prices. 22 | 23 | --- 24 | 25 | ## Dataset Details 26 | 27 | - **Source:** Historical Ethereum price data. 28 | - **Features:** 29 | - `date`: Timestamp of the record. 30 | - `open`: Opening price of Ethereum. 31 | - `high`: Highest price of Ethereum for the day. 32 | - `low`: Lowest price of Ethereum for the day. 33 | - `close`: Closing price of Ethereum. 34 | - `volume`: Trading volume. 35 | 36 | --- 37 | 38 | ## Project Steps 39 | 40 | ### 1. **Data Preprocessing** 41 | - Handled missing values and formatted date/time fields. 42 | - Converted `Timestamp` data to numerical format to address errors. 43 | - Normalized features for better performance. 44 | 45 | ### 2. **Exploratory Data Analysis (EDA)** 46 | - Visualized price trends using line plots. 47 | - Analyzed distributions using histograms and boxplots. 48 | - Checked correlations between features. 49 | 50 | ### 3. **Feature Engineering** 51 | - Derived new features like daily returns and moving averages. 52 | - Selected relevant features to reduce noise. 53 | 54 | ### 4. **Model Implementation** 55 | - Implemented multiple algorithms: 56 | - **Decision Tree (DT):** Initial implementation to understand data behavior. 57 | - **SVM:** Enhanced model performance with kernel optimizations. 58 | - **LSTM:** Focused on temporal dependencies in data. 59 | 60 | ### 5. **Model Optimization** 61 | - Tuned hyperparameters using Grid Search and Cross-Validation. 62 | - Applied early stopping to prevent overfitting. 63 | 64 | ### 6. **Evaluation Metrics** 65 | - **Mean Absolute Error (MAE):** 1.39 66 | - **Mean Squared Error (MSE):** 16.53 67 | - **R-squared (R²):** 0.9997 68 | 69 | --- 70 | 71 | ## Model Performance 72 | 73 | Our final model achieved outstanding performance: 74 | - **Optimized Model Metrics:** 75 | - Mean Absolute Error: `1.39` 76 | - Mean Squared Error: `16.53` 77 | - R-squared: `0.9997` 78 | 79 | This performance indicates high accuracy in predicting Ethereum prices. However, additional testing on unseen datasets is recommended to validate generalization. 80 | 81 | --- 82 | 83 | ## Key Features 84 | 85 | 1. **Comprehensive Pipeline:** From raw data to optimized predictions. 86 | 2. **Multiple Algorithms:** Comparison of ML techniques (DT, SVM) and DL (LSTM). 87 | 3. **Hyperparameter Tuning:** Focused on improving model precision. 88 | 4. **Visualizations:** Clear EDA plots for insights. 89 | 5. **Reproducible Workflow:** Step-by-step instructions provided. 90 | 91 | --- 92 | 93 | ## How to Run 94 | 95 | 1. Clone the repository: 96 | ```bash 97 | git clone https://github.com/yourusername/ethereum-price-prediction.git 98 | ``` 99 | 100 | 2. Install dependencies: 101 | ```bash 102 | pip install -r requirements.txt 103 | ``` 104 | 3. Download the dataset: 105 | Place your dataset file (ETH_day.csv) in the project directory. 106 | 107 | 4. Run the main script: 108 | python main.py 109 | 110 | 5. View predictions: 111 | Results will be saved in the output folder. 112 | 113 | ## Future Work 114 | 1. Expand Dataset: Use larger datasets with more features. 115 | 2. Advanced Models: Explore ensemble techniques and transformer architectures. 116 | 3. Real-Time Predictions: Integrate with live price feeds for dynamic updates. 117 | 4. Web Deployment: Build an interactive interface for user predictions. 118 | 119 | ## Acknowledgments 120 | 121 | Special thanks to the contributors and the open-source community for datasets and tools. 122 | -------------------------------------------------------------------------------- /Week 1/5. Learn Markdown Deeper/Comprehensive Markdown Commands.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "5da00ed5-84ab-4e28-9077-02620163cb07", 6 | "metadata": {}, 7 | "source": [ 8 | "# Comprehensive Markdown Commands" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "19ed4c6a-d9ca-4d5d-9170-bc638e1393da", 14 | "metadata": {}, 15 | "source": [ 16 | "\n", 17 | "## 1. Headings\n", 18 | "\t\t# Heading 1 to ###### Heading 6: Six levels of headings.\n", 19 | "\n", 20 | "\t•\t# Heading 1\n", 21 | " Creates a top-level heading (H1).\n", 22 | "\t•\t## Heading 2\n", 23 | " Creates a second-level heading (H2).\n", 24 | "\t•\t### Heading 3\n", 25 | " Creates a third-level heading (H3).\n", 26 | "\t•\t#### Heading 4\n", 27 | " Creates a fourth-level heading (H4).\n", 28 | "\t•\t##### Heading 5\n", 29 | " Creates a fifth-level heading (H5).\n", 30 | "\t•\t###### Heading 6\n", 31 | " Creates a sixth-level heading (H6).\n", 32 | "\n", 33 | "## 2. Emphasis\n", 34 | "\t•\t*italic* or _italic_: Italic text.\n", 35 | "\t•\t**bold** or __bold__: Bold text.\n", 36 | "\t•\t***bold and italic***: Bold and italic text.\n", 37 | "\n", 38 | "## 3. Lists\n", 39 | "\t•\tUnordered Lists:\n", 40 | "\t•\t* Item\n", 41 | "\t•\t- Item\n", 42 | "\t•\t+ Item\n", 43 | "\t•\tOrdered Lists:\n", 44 | "\t•\t1. Item\n", 45 | "\t•\t2. Item\n", 46 | "\n", 47 | "## 4. Links\n", 48 | "\t•\t[Link Text](URL): Creates a hyperlink.\n", 49 | "\n", 50 | "## 5. Images\n", 51 | "\t•\t![Alt Text](Image URL): Embeds an image.\n", 52 | "\n", 53 | "## 6. Blockquotes\n", 54 | "\t•\t> Quote: Renders a block quote.\n", 55 | "\n", 56 | "## 7. Code\n", 57 | "\t•\tInline Code: `code`: Renders inline code.\n", 58 | "\t•\tCode Block:\n", 59 | " ```: Renders a block of code.\n", 60 | "\n", 61 | "## 8. Horizontal Rule\n", 62 | "\t•\t---, ***, or ___: Creates a horizontal line.\n", 63 | "\n", 64 | "## 9. Tables\n", 65 | "\t\t\n", 66 | "| Header 1 | Header 2 |\n", 67 | "|----------|----------|\n", 68 | "| Row 1 | Row 2 |\n", 69 | "\n", 70 | "Creates a simple table layout.\n", 71 | "\n", 72 | "## 10. Strikethrough\n", 73 | "\t•\t~~strikethrough~~: Renders strikethrough text.\n", 74 | "\n", 75 | "## 11. Footnotes\n", 76 | "\t•\tText[^1] and [^1]: Footnote text here.: Creates a footnote.\n", 77 | "\n", 78 | "## 12. Task Lists\n", 79 | "\n", 80 | "- [ ] Task 1 \n", 81 | "- [x] Task 2 \n", 82 | "Creates a checklist.\n", 83 | "\n", 84 | "## 13. Definition Lists\n", 85 | "Term \n", 86 | ": Definition\n", 87 | "Creates definition lists (not standard in all Markdown flavors).\n", 88 | "\n", 89 | "## 14. Automatic Links\n", 90 | "\t•\t: Automatically turns a URL into a link.\n", 91 | "\n", 92 | "## 15. Subscript and Superscript\n", 93 | "\t•\tSubscript: H~2~O (may not be supported everywhere).\n", 94 | "\t•\tSuperscript: x^2 (may not be supported everywhere).\n", 95 | "\n", 96 | "## 16. Syntax Highlighting\n", 97 | "\t•\tSome platforms allow you to specify a language for syntax highlighting in code blocks, such as:\n", 98 | " ```python\n", 99 | " def function():\n", 100 | " pass\n", 101 | "\n", 102 | "## 17. Custom HTML\n", 103 | "\t•\tYou can use raw HTML for more complex formatting, such as
or tags.\n", 104 | "\n", 105 | "## 18. LaTeX\n", 106 | "\t•\tSome Markdown flavors support LaTeX for mathematical expressions, such as:\n", 107 | " $$ E = mc^2 $$\n", 108 | "\n", 109 | "## 19. Emoji\n", 110 | "\t•\t:smile: or :heart:: Renders emojis in some platforms (like GitHub).\n", 111 | "\n", 112 | "## 20. Attributes\n", 113 | "\t•\tSome Markdown implementations (like GitHub Flavored Markdown) support adding attributes to elements, such as:\n", 114 | " ### My Heading {#custom-id}\n", 115 | "\n", 116 | "\n", 117 | "## 21. Comments\n", 118 | "\t•\t: Adds comments that will not be rendered in the output.\n", 119 | "\n", 120 | "\n", 121 | "## Conclusion\n", 122 | "This more comprehensive list covers the standard Markdown commands and some additional features that may be available in certain Markdown flavors. Depending on the platform (like GitHub, Reddit, or documentation sites), you might encounter variations or extensions that provide even more functionality. Always check the specific Markdown documentation for the platform you are using for any unique features or syntaxes!\n", 123 | "\n", 124 | "\n", 125 | "\n" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "id": "1b79b9fd-cf4a-4131-b912-f145f8eab8c1", 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [] 135 | } 136 | ], 137 | "metadata": { 138 | "kernelspec": { 139 | "display_name": "Python 3 (ipykernel)", 140 | "language": "python", 141 | "name": "python3" 142 | }, 143 | "language_info": { 144 | "codemirror_mode": { 145 | "name": "ipython", 146 | "version": 3 147 | }, 148 | "file_extension": ".py", 149 | "mimetype": "text/x-python", 150 | "name": "python", 151 | "nbconvert_exporter": "python", 152 | "pygments_lexer": "ipython3", 153 | "version": "3.11.7" 154 | } 155 | }, 156 | "nbformat": 4, 157 | "nbformat_minor": 5 158 | } 159 | -------------------------------------------------------------------------------- /Week 3/3. Try to extract telegram datas (Data Collection)/Readme.md: -------------------------------------------------------------------------------- 1 | # Task Report: Try to Extract Telegram Data (Data Collection) 2 | 3 | In this task, I aimed to explore methods for extracting data from Telegram, a popular messaging platform. The objective was to collect data from Telegram through its API, utilizing libraries in Python that support Telegram data extraction. This document details the steps I took, the tools I explored, and the challenges encountered. 4 | 5 | ## Table of Contents 6 | 1. [Introduction](#introduction) 7 | 2. [Steps for Data Extraction from Telegram](#steps-for-data-extraction-from-telegram) 8 | - [Step 1: Accessing the Telegram API](#step-1-accessing-the-telegram-api) 9 | - [Step 2: Selecting the Right Library](#step-2-selecting-the-right-library) 10 | - [Step 3: Extracting Data from Telegram](#step-3-extracting-data-from-telegram) 11 | - [Step 4: Analyzing and Storing Extracted Data](#step-4-analyzing-and-storing-extracted-data) 12 | 3. [Challenges and Troubleshooting](#challenges-and-troubleshooting) 13 | 4. [Conclusion](#conclusion) 14 | 15 | ## Introduction 16 | This task was centered around data collection from Telegram using Python libraries and Telegram's API. The main objective was to gain hands-on experience with accessing the API, configuring data extraction processes, and understanding the kind of data accessible via Telegram. 17 | 18 | ## Steps for Data Extraction from Telegram 19 | 20 | ### Step 1: Accessing the Telegram API 21 | To access Telegram's data, the first step involves acquiring access to the Telegram API. Telegram provides a comprehensive API that allows developers to interact with user data, chat histories, groups, and channels. 22 | 23 | 1. **Creating a Telegram Developer Account**: 24 | To use the Telegram API, I created a developer account on Telegram by visiting [https://my.telegram.org](https://my.telegram.org). 25 | 2. **Obtaining API Keys**: 26 | Once registered as a developer, I generated the API ID and API hash. These credentials are essential for authenticating and making requests to the Telegram servers. 27 | 28 | ### Step 2: Selecting the Right Library 29 | Python offers several libraries specifically designed to interact with Telegram’s API. For this task, I researched and experimented with the following libraries: 30 | 31 | - **Telethon**: A popular asynchronous Python library for accessing Telegram's API, ideal for working with larger data sets. 32 | - **python-telegram-bot**: A robust and well-documented library used primarily for bot development, but also supports data retrieval to some extent. 33 | 34 | After reviewing the documentation, I chose **Telethon** for this task due to its comprehensive support for data extraction and asynchronous functionality. 35 | 36 | #### Installing Telethon 37 | To start, I installed the Telethon library using pip: 38 | 39 | ```bash 40 | pip install telethon 41 | ``` 42 | 43 | ### Step 3: Extracting Data from Telegram 44 | 45 | After setting up the API access and installing the library, I proceeded to extract data from Telegram. The following were the key actions performed in this step: 46 | 47 | #### 1. Establishing a Connection: 48 | Using the API credentials (API ID and API hash), I established a connection to Telegram’s servers via Telethon: 49 | 50 | ```bash 51 | 52 | from telethon import TelegramClient 53 | 54 | api_id = 'YOUR_API_ID' 55 | api_hash = 'YOUR_API_HASH' 56 | 57 | client = TelegramClient('session_name', api_id, api_hash) 58 | 59 | ``` 60 | 61 | #### 2. Fetching Data: 62 | With the connection established, I explored fetching data such as messages, group member lists, and file attachments. Here’s an example of how I retrieved messages from a specific chat: 63 | 64 | ```bash 65 | async with client: 66 | async for message in client.iter_messages('chat_or_channel_name'): 67 | print(message.sender_id, message.text) 68 | ``` 69 | 70 | #### 3. Saving Data: 71 | I experimented with saving the extracted data in various formats such as JSON and CSV for further analysis. For instance, saving messages in JSON format allows easy data manipulation and querying. 72 | 73 | 74 | ### Step 4: Analyzing and Storing Extracted Data: 75 | Once the data was extracted, the next step was to analyze and store it. For instance, with extracted messages, I performed some basic data analysis, such as identifying the frequency of messages, common keywords, and peak activity times in the chat. Here’s an example of how I saved the data in a JSON file: 76 | 77 | ```bash 78 | import json 79 | 80 | data = [{"sender_id": message.sender_id, "text": message.text} for message in messages] 81 | with open("messages.json", "w") as outfile: 82 | json.dump(data, outfile) 83 | ``` 84 | ## Challenges and Troubleshooting 85 | 86 | While attempting to run the code and retrieve data, I encountered an error related to API access. The API returned an error message, indicating that there might be restrictions or a configuration issue. To troubleshoot, I referred to the Telegram API Documentation and carefully reviewed my authentication process. 87 | 88 | • API Errors: At times, errors in authentication can result from incorrect API IDs or hash keys. Double-checking these values in the Telethon initialization solved some issues. 89 | • Access Restrictions: Some channels and groups have restricted access, which prevents data extraction. In these cases, user permission or bot authorization might be required. 90 | 91 | ## Conclusion 92 | 93 | Through this task, I gained a comprehensive understanding of using the Telegram API and the Telethon library to extract data. While the initial configuration posed some challenges, the process provided insights into the data Telegram offers and its potential applications in analysis. 94 | 95 | In future tasks, I plan to delve deeper into data analysis with Telegram data, explore more complex data processing, and experiment with storing data in databases. 96 | -------------------------------------------------------------------------------- /Week 2/2. Search about telegram API and features that we can get from that/Readme.md: -------------------------------------------------------------------------------- 1 | # Research on Telegram API and Available Features 2 | 3 | As part of this task, I conducted an extensive search on the **Telegram API** and the features that developers can access through it. The official Telegram website serves as the most reliable and informative resource for understanding how the APIs work, and I gathered all my findings directly from there. Below is a detailed breakdown of what I learned about Telegram's developer resources. 4 | 5 | ## Overview of Telegram APIs 6 | 7 | Telegram offers three primary types of APIs, each catering to different needs for developers: 8 | 9 | ### 1. **Bot API** 10 | The **Bot API** enables developers to create bots that use Telegram messages as their interface. These bots can interact with users, send messages, and respond based on predefined commands. Telegram Bots are unique accounts that do not require a phone number to set up, and they serve as an interface for running code on your server. 11 | 12 | Using the **Bot API**, developers do not need to worry about encryption protocols or the internal workings of Telegram's communication system. Instead, Telegram’s intermediary server handles all the encryption and data transmission. Developers communicate with Telegram servers through a simplified **HTTPS interface**. 13 | 14 | Developers can also leverage the **Payments API** as part of the Bot API, allowing them to accept payments from Telegram users globally. 15 | 16 | ### 2. **Telegram API (MTProto)** 17 | The **Telegram API** allows developers to create their own custom Telegram clients. It is an open-source solution available to all developers who wish to build on the Telegram platform. This API supports creating full-fledged applications that can mimic or extend Telegram's functionality. You can also explore the source code of existing Telegram clients for a deeper understanding of how the platform operates. 18 | 19 | #### Key Features of the Telegram API: 20 | - **User Authorization**: Manage user phone number registration and login processes. 21 | - **Two-factor Authentication**: Secure user access by implementing two-factor authentication (2FA). 22 | - **End-to-End Encryption**: Ensures secure messaging between users. 23 | - **File Upload and Download**: Efficiently handle large data transfers. 24 | - **Pagination**: Retrieve data from large sets of objects (such as messages or contacts). 25 | - **Channel and Group Management**: Manage different types of groups (basic, supergroups, gigagroups) and their features, such as polls, reactions, and scheduled messages. 26 | - **Security**: Work with secret chats and voice/video calls, all with end-to-end encryption. 27 | 28 | ### 3. **TDLib (Telegram Database Library)** 29 | **TDLib** is a developer library that simplifies the process of building custom Telegram apps. It abstracts away the complexities of network implementation, encryption, and data storage, making it easier to focus on the UI and features. 30 | 31 | TDLib is designed for performance, security, and ease of use. It works on a variety of platforms, including Android, iOS, Windows, macOS, and Linux. Developers can use it with virtually any programming language since it is open-source. 32 | 33 | #### Key Benefits of TDLib: 34 | - **Cross-Platform Support**: Build apps for any major platform. 35 | - **Performance**: Optimized for fast, secure, and efficient app development. 36 | - **Security**: Provides built-in encryption and handles all network protocols. 37 | 38 | ### 4. **Gateway API** 39 | Telegram also offers the **Gateway API**, which is especially useful for businesses, apps, and websites. This API allows them to send verification codes through Telegram, providing an alternative to traditional SMS verification. 40 | 41 | By using the Gateway API, businesses can significantly reduce costs while benefiting from the security and speed Telegram offers. Telegram’s vast network of over 950 million users ensures fast and reliable delivery of verification codes. 42 | 43 | ## Telegram Widgets 44 | Telegram allows developers to add **Widgets** to their websites. These widgets can enhance user engagement by allowing visitors to interact with your Telegram channels or bots directly from the webpage. 45 | 46 | ## Developer Contributions (Designers and Animators) 47 | Telegram also welcomes contributions from designers and animators. Developers can create **Animated Stickers** or **Custom Themes** for the Telegram platform, further enriching the user experience. 48 | 49 | ## Features Recap and Important Insights 50 | 51 | - **Bot API**: Used for creating interactive bots without the need for a phone number. 52 | - **Telegram API**: Allows building fully customized Telegram clients, managing channels, and handling various user interactions. 53 | - **TDLib**: Simplifies Telegram client development by managing encryption, data storage, and network communications. 54 | - **Gateway API**: Provides a cost-effective and secure alternative to SMS for verification codes. 55 | 56 | These APIs are available for free and open to all developers interested in extending or customizing Telegram’s functionality. 57 | 58 | --- 59 | 60 | ## Detailed Telegram API Documentation 61 | 62 | Here is a summary of the original documentation from the **Telegram API** that outlines the major functionalities: 63 | 64 | > **Telegram APIs** 65 | > 66 | > We offer three kinds of APIs for developers. The **Bot API** allows you to easily create programs that use Telegram messages for an interface. The **Telegram API** and **TDLib** allow you to build your own customized Telegram clients. You are welcome to use both APIs free of charge. Lastly, the **Gateway API** allows any business, app, or website to send verification codes through Telegram instead of traditional SMS. 67 | > 68 | > You can also add Telegram Widgets to your website. 69 | > 70 | > Designers are welcome to create **Animated Stickers** or **Custom Themes** for Telegram. 71 | 72 | For a more in-depth look into each of these APIs and to access detailed guides, developers can refer to the official documentation on Telegram's site. These APIs provide incredible flexibility and allow businesses and developers to leverage the full power of Telegram in their applications and services. 73 | 74 | --- 75 | 76 | By exploring these resources, I gained a strong understanding of the **Telegram API** and how developers can use it to build sophisticated bots, clients, and services. Telegram's APIs are highly flexible and open, giving developers the tools they need to integrate the platform into their applications seamlessly. 77 | 78 | -------------------------------------------------------------------------------- /Week 4/1. Make the telegram bot that get us bitcoin price for specific time/BitCoin Bot.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 2, 6 | "id": "b2aa354f", 7 | "metadata": {}, 8 | "outputs": [ 9 | { 10 | "name": "stdout", 11 | "output_type": "stream", 12 | "text": [ 13 | "Defaulting to user installation because normal site-packages is not writeable\n", 14 | "Requirement already satisfied: pyTelegramBotAPI in /Users/amiin/Library/Python/3.9/lib/python/site-packages (4.25.0)\n", 15 | "Requirement already satisfied: yfinance in /Users/amiin/Library/Python/3.9/lib/python/site-packages (0.2.51)\n", 16 | "Requirement already satisfied: requests in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from pyTelegramBotAPI) (2.32.3)\n", 17 | "Requirement already satisfied: pandas>=1.3.0 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from yfinance) (2.2.3)\n", 18 | "Requirement already satisfied: numpy>=1.16.5 in /Library/Python/3.9/site-packages (from yfinance) (2.0.1)\n", 19 | "Requirement already satisfied: multitasking>=0.0.7 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from yfinance) (0.0.11)\n", 20 | "Requirement already satisfied: lxml>=4.9.1 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from yfinance) (5.2.2)\n", 21 | "Requirement already satisfied: platformdirs>=2.0.0 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from yfinance) (4.3.6)\n", 22 | "Requirement already satisfied: pytz>=2022.5 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from yfinance) (2024.2)\n", 23 | "Requirement already satisfied: frozendict>=2.3.4 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from yfinance) (2.4.6)\n", 24 | "Requirement already satisfied: peewee>=3.16.2 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from yfinance) (3.17.8)\n", 25 | "Requirement already satisfied: beautifulsoup4>=4.11.1 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from yfinance) (4.12.3)\n", 26 | "Requirement already satisfied: html5lib>=1.1 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from yfinance) (1.1)\n", 27 | "Requirement already satisfied: soupsieve>1.2 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from beautifulsoup4>=4.11.1->yfinance) (2.5)\n", 28 | "Requirement already satisfied: six>=1.9 in /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages (from html5lib>=1.1->yfinance) (1.15.0)\n", 29 | "Requirement already satisfied: webencodings in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from html5lib>=1.1->yfinance) (0.5.1)\n", 30 | "Requirement already satisfied: python-dateutil>=2.8.2 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from pandas>=1.3.0->yfinance) (2.9.0.post0)\n", 31 | "Requirement already satisfied: tzdata>=2022.7 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from pandas>=1.3.0->yfinance) (2024.2)\n", 32 | "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from requests->pyTelegramBotAPI) (3.3.2)\n", 33 | "Requirement already satisfied: idna<4,>=2.5 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from requests->pyTelegramBotAPI) (3.7)\n", 34 | "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from requests->pyTelegramBotAPI) (2.2.2)\n", 35 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/amiin/Library/Python/3.9/lib/python/site-packages (from requests->pyTelegramBotAPI) (2024.6.2)\n", 36 | "Note: you may need to restart the kernel to use updated packages.\n" 37 | ] 38 | } 39 | ], 40 | "source": [ 41 | "pip install pyTelegramBotAPI yfinance" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 3, 47 | "id": "f469ea27", 48 | "metadata": {}, 49 | "outputs": [ 50 | { 51 | "name": "stderr", 52 | "output_type": "stream", 53 | "text": [ 54 | "/Users/amiin/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020\n", 55 | " warnings.warn(\n" 56 | ] 57 | } 58 | ], 59 | "source": [ 60 | "import telebot\n", 61 | "import yfinance as yf\n", 62 | "from datetime import datetime" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 4, 68 | "id": "759dee33", 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [ 72 | "TOKEN = '7586492472:AAFMco51mLQtt1IfJi7cs5QS0HnCASGuOA4'\n", 73 | "bot = telebot.TeleBot(TOKEN)" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 5, 79 | "id": "a8ebc396", 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "def get_bitcoin_price(date):\n", 84 | " try:\n", 85 | " # دریافت داده‌های تاریخی بیت‌کوین\n", 86 | " btc = yf.Ticker(\"BTC-USD\")\n", 87 | " historical = btc.history(start=date, end=date)\n", 88 | " price = historical['Close'].iloc[0]\n", 89 | " return f\"Bitcoin price on {date} was ${price:.2f}\"\n", 90 | " except:\n", 91 | " return \"Error: Unable to fetch data for the specified date.\"" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 6, 97 | "id": "6dd32253", 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "@bot.message_handler(commands=['price'])\n", 102 | "def send_welcome(message):\n", 103 | " bot.reply_to(message, \"Please enter the date in YYYY-MM-DD format:\")" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 7, 109 | "id": "0c873b79", 110 | "metadata": {}, 111 | "outputs": [], 112 | "source": [ 113 | "@bot.message_handler(func=lambda message: True)\n", 114 | "def fetch_price(message):\n", 115 | " date = message.text.strip()\n", 116 | " try:\n", 117 | " datetime.strptime(date, '%Y-%m-%d') # بررسی فرمت تاریخ\n", 118 | " price_message = get_bitcoin_price(date)\n", 119 | " bot.reply_to(message, price_message)\n", 120 | " except ValueError:\n", 121 | " bot.reply_to(message, \"Invalid date format. Please use YYYY-MM-DD.\")" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "id": "22ec77d6", 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "name": "stdout", 132 | "output_type": "stream", 133 | "text": [ 134 | "Bot is running...\n" 135 | ] 136 | } 137 | ], 138 | "source": [ 139 | "print(\"Bot is running...\")\n", 140 | "bot.polling()" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "id": "b3151a1a", 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [] 150 | } 151 | ], 152 | "metadata": { 153 | "kernelspec": { 154 | "display_name": "Python 3", 155 | "language": "python", 156 | "name": "python3" 157 | }, 158 | "language_info": { 159 | "codemirror_mode": { 160 | "name": "ipython", 161 | "version": 3 162 | }, 163 | "file_extension": ".py", 164 | "mimetype": "text/x-python", 165 | "name": "python", 166 | "nbconvert_exporter": "python", 167 | "pygments_lexer": "ipython3", 168 | "version": "3.9.6" 169 | } 170 | }, 171 | "nbformat": 4, 172 | "nbformat_minor": 5 173 | } 174 | -------------------------------------------------------------------------------- /Week 3/2. Analysis Data that extract from one of best libraries/Readme.md: -------------------------------------------------------------------------------- 1 | # Analysis Data that extract from one of best libraries 2 | 3 | 4 | # Stroke Prediction Project 🚀 5 | 6 | Welcome to the **Stroke Prediction Project**! This is a collaborative effort by a team of passionate data scientists who decided to come together and tackle a meaningful and ambitious project aimed at enhancing healthcare. 7 | 8 | --- 9 | 10 | ### 📍 **Our Mission** 11 | Our goal is to develop an AI model that can accurately predict the risk of stroke. Initially, this project was intended as a hands-on data science practice, but once the idea was sparked, it quickly evolved into something far more impactful. We're driven to create a tool that can be integrated into the medical field, providing valuable insights and possibly saving lives. 12 | 13 | --- 14 | 15 | ## 🔍 **Project Overview** 16 | 17 | Our approach to the **Stroke Prediction Project** includes several phases: 18 | 19 | 1. **Phase 1**: Use the current [Kaggle dataset](https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset) to build a strong initial model. This dataset offers key patient health metrics for preliminary testing. 20 | 21 | 2. **Phase 2**: Introduce real-world data, enhancing the model’s predictive power. We aim to incorporate patient data that reflects broader, more diverse demographics. 22 | 23 | 3. **Phase 3**: Expand to incorporate brain scan images through **image processing techniques**. By analyzing scans, we aspire to add a new dimension to the model, boosting both accuracy and reliability. 24 | 25 | Our endgame is to contribute something truly meaningful to healthcare: an advanced tool capable of providing early stroke risk assessment. 26 | 27 | --- 28 | 29 | ## 👥 **Meet the Team** 30 | 31 | Our team brings together diverse talents and backgrounds. Here’s a look at the members and the skills they bring to the table: 32 | 33 | - **Amin Gholami** - [GitHub](https://github.com/AmiinGholami) 34 | *Project Lead & Data Visualization Specialist* 35 | Amin handles project management and data visualization. With a knack for uncovering insights from complex data, Amin’s visualizations bring clarity to our analysis. 36 | 37 | - **Samaneh Tanhapour** 38 | *Medical Expert & Data Preprocessing Specialist* 39 | Samaneh is our team’s medical expert, and her background in healthcare gives our project a unique edge. Her insight into medical data ensures our preprocessing aligns with clinical standards. 40 | 41 | - **Atabak Rezqi** - [GitHub](https://github.com/databak) 42 | *Deep Learning Architect* 43 | Atabak is skilled in advanced deep learning models. He will lead the effort to create a powerful neural network capable of handling complex data, from numerical to image data. 44 | 45 | - **Samane Najarian** - [GitHub](https://github.com/SamaneNajarian) 46 | *Data Preprocessing & Statistical Modeling Specialist* 47 | Samane focuses on data cleaning and preprocessing, ensuring our dataset is clean, complete, and ready for analysis. 48 | 49 | - **Mahdie Mirzaie** - [GitHub](https://github.com/Mahdiyeh-Mirzaei) 50 | *Model Evaluation & Machine Learning Expert* 51 | Mahdie works on evaluating and fine-tuning classic machine learning models, ensuring we achieve the highest accuracy possible with traditional algorithms. 52 | 53 | --- 54 | 55 | ## 🔄 **Project Workflow** 56 | 57 | Each team member plays a specific role, allowing us to divide tasks efficiently. Here’s our workflow, with task allocation for each project phase: 58 | 59 | ### Project Steps: 60 | 61 | 1. **Data Preprocessing, Cleaning, and Handling Missing Values** 62 | **Assigned to**: Samaneh Tanhapour & Samane Najarian 63 | This stage is crucial for ensuring the dataset is primed for analysis. We clean, filter, and handle any missing data, aiming for a reliable dataset. 64 | 65 | 2. **Exploratory Data Analysis (EDA) and Data Visualization** 66 | **Assigned to**: Amin Gholami 67 | Amin will create compelling visualizations to reveal patterns and insights that may not be immediately obvious from the raw data. 68 | 69 | 3. **Feature Engineering** 70 | **Assigned to**: Samaneh Tanhapour 71 | Feature engineering helps us highlight the most important factors that contribute to stroke risk, refining the dataset for optimal model performance. 72 | 73 | 4. **Modeling and Evaluation with Classic Algorithms** 74 | **Assigned to**: Mahdie Mirzaie 75 | This step includes testing multiple machine learning algorithms, comparing their performance, and fine-tuning parameters. 76 | 77 | 5. **Deep Learning Modeling** 78 | **Assigned to**: Atabak Rezqi 79 | Atabak will focus on building and training deep learning models. These models, though complex, may uncover patterns that simpler algorithms miss. 80 | 81 | 6. **Statistical Modeling** 82 | **Assigned to**: Atabak Rezqi & Samane Najarian 83 | This involves using statistical methods to create predictive models and compare results with other models. 84 | 85 | 7. **Deployment and Presentation** 86 | **Assigned to**: Amin Gholami 87 | Once the model is ready, Amin will lead the deployment, making the model accessible for testing and presentation. 88 | 89 | --- 90 | 91 | ## 🌟 **The Vision Ahead** 92 | 93 | While our initial dataset and models are steps forward, our vision is to ultimately implement brain scan analysis and integrate real-world patient data. This could make our stroke risk prediction model a vital asset in healthcare, potentially assisting doctors in early diagnosis and intervention. 94 | 95 | --- 96 | 97 | ## 🧩 **Tech Stack** 98 | 99 | - **Data Processing**: Python, Pandas, NumPy 100 | - **Visualization**: Matplotlib, Seaborn 101 | - **Machine Learning**: Scikit-Learn 102 | - **Deep Learning**: TensorFlow, Keras 103 | - **Deployment**: Streamlit (for initial prototyping) 104 | 105 | --- 106 | 107 | ## 📅 **Project Timeline** 108 | 109 | | Stage | Description | Status | 110 | |-----------------------------|----------------------------------------------------------------|-------------| 111 | | Initial Data Collection | Obtaining and preparing the initial dataset | Complete | 112 | | Data Preprocessing | Cleaning and preparing data | In Progress | 113 | | Exploratory Data Analysis | Visualization and understanding of data trends | Pending | 114 | | Feature Engineering | Highlighting key features for analysis | Pending | 115 | | Model Training | Implementing ML models | Pending | 116 | | Model Evaluation | Testing and comparing model performances | Pending | 117 | | Deep Learning Integration | Introducing complex deep learning models | Pending | 118 | | Deployment | Finalizing and deploying model for testing | Pending | 119 | 120 | --- 121 | 122 | ## 📢 **Stay Updated** 123 | 124 | Follow our progress on this journey towards creating an impactful healthcare tool. We’ll keep this README updated with our latest findings, code snippets, and model results. This is only the beginning, and we’re excited to see where this project will take us! 125 | 126 | --- 127 | 128 | ## 📈 **Get Involved** 129 | 130 | We’re always open to collaboration, feedback, and suggestions! Feel free to reach out to any of us through our GitHub profiles. Join us in pushing the boundaries of data science and machine learning in healthcare. 131 | 132 | --- 133 | 134 | **Thank you for visiting our project repository!** 135 | 136 | Stay tuned for updates, and let’s make healthcare smarter and more accessible together. 137 | -------------------------------------------------------------------------------- /Week 1/4. Search About Kaggle And NoteBook Part/README.md: -------------------------------------------------------------------------------- 1 | # 4. Kaggle and Notebook Exploration 2 | 3 | Kaggle is one of the largest and most popular platforms in the world for data scientists, machine learning practitioners, and AI enthusiasts. It has become an indispensable tool for anyone aiming to improve their skills, gain exposure to real-world problems, and connect with like-minded professionals. In this task, I was asked to explore Kaggle, create an account, and begin interacting with its features as part of my data science internship. 4 | 5 | ## What is Kaggle? 6 | 7 | Kaggle is essentially an online hub where data science and machine learning practitioners can come together to share their work, compete in coding challenges, and learn from one another. Founded in 2010, the platform was designed to bring together datasets, coding competitions, educational resources, and a collaborative community—all in one place. Whether you’re a beginner, intermediate, or advanced data scientist, Kaggle offers something valuable for every level of expertise. 8 | 9 | Here are some key features of Kaggle: 10 | - **Datasets**: Kaggle offers a vast collection of datasets across different fields such as healthcare, finance, technology, and social sciences. These datasets are often used in both machine learning projects and real-world applications. For data scientists, being able to find high-quality datasets is crucial for building models and conducting analyses. 11 | - **Competitions**: One of the most exciting features of Kaggle is its data science competitions. These competitions often challenge participants to solve complex real-world problems for rewards, sometimes even involving cash prizes. Kaggle’s competitions have been recognized globally, attracting participants from all over the world. By participating in these challenges, users can practice their skills, build a portfolio, and even get noticed by potential employers. 12 | - **Kaggle Notebooks (formerly known as Kernels)**: Kaggle provides users with the ability to write and execute code in an online Jupyter notebook environment. This feature is one of the platform's most valuable, as it allows users to experiment with datasets and models directly on the platform without needing to configure a local environment. 13 | - **Community & Learning Resources**: Kaggle's community is robust, filled with people who are eager to share their knowledge and insights. There are also plenty of tutorials and hands-on courses available for users who want to dive into new areas or strengthen their existing knowledge. 14 | 15 | ## My Kaggle Journey 16 | 17 | As part of my internship task, I was asked to explore Kaggle, set up an account, and learn how to work with its different features, including Kaggle Notebooks. I’m excited to announce that I’ve created my Kaggle account, and I’m looking forward to using the platform to further develop my data science skills. Here’s my Kaggle profile: 18 | 19 | [https://www.kaggle.com/amiingholami](https://www.kaggle.com/amiingholami) 20 | 21 | Feel free to follow me on Kaggle, and let’s connect! We can collaborate on projects, participate in competitions together, and help each other grow in this ever-evolving field of data science. 22 | 23 | ### One Issue I Encountered: Account Verification 24 | 25 | Unfortunately, while creating my account, I ran into a notable issue—Kaggle requires phone verification, and it turns out that they currently don’t support Iranian phone numbers for this process. This is a significant limitation for users from Iran, as phone verification is necessary for certain features on the platform, such as participating in competitions or earning medals for your work. 26 | 27 | However, there are a few workarounds that can help you get started on Kaggle even if you cannot verify your account with an Iranian number: 28 | - **Using a Non-Iranian Phone Number**: If you have access to an international phone number (from another country), you can use that to verify your Kaggle account. Many people in the data science community use virtual phone services to bypass this issue. 29 | - **Limited Access Without Verification**: Even without phone verification, you can still access many of Kaggle’s most valuable features. You can create notebooks, explore datasets, and take part in the learning resources and discussions. The only downside is that certain competitions and advanced features will remain locked until your phone is verified. 30 | 31 | ## Kaggle Notebooks: A Game-Changer for Data Scientists 32 | 33 | One of Kaggle’s most powerful features is **Kaggle Notebooks**. These are online Jupyter notebooks that run directly in the browser, which means you don’t need to worry about setting up a development environment on your local machine. This feature is a huge time-saver, as you can get straight to coding without needing to install dependencies or troubleshoot your setup. 34 | 35 | Some key benefits of Kaggle Notebooks include: 36 | - **Collaboration**: Kaggle Notebooks allow you to share your code with the community. This makes it easier for others to view your work, offer suggestions, and help you improve your projects. 37 | - **Preconfigured Environment**: Kaggle provides a cloud environment that is preloaded with most of the libraries and tools that data scientists need. You don’t have to worry about installing pandas, scikit-learn, TensorFlow, or any other common packages—they’re all ready to go! 38 | - **Free GPU and TPU Access**: Kaggle provides free access to GPU and TPU resources, which is an enormous advantage for those who need to train large machine learning models but don’t have access to high-end hardware. 39 | - **Notebook Tutorials**: The platform offers a variety of tutorials and example notebooks that guide you through common tasks like data preprocessing, exploratory data analysis (EDA), and model building. 40 | 41 | With Kaggle Notebooks, the learning curve for data science is greatly reduced. You can experiment, learn, and even run full machine learning pipelines without needing a powerful local setup. 42 | 43 | ## Why Should You Use Kaggle? 44 | 45 | If you’re interested in data science, machine learning, or artificial intelligence, Kaggle is an absolute must. Here’s why: 46 | - **Learning**: Kaggle provides a wealth of resources to help you improve your knowledge and skills in data science. Whether you’re a beginner looking to learn Python or a more experienced practitioner interested in deep learning, there are resources available for you. 47 | - **Real-World Experience**: Kaggle competitions are based on real-world problems, which means that you can apply the theories and algorithms you’ve learned to actual datasets. This practical experience is incredibly valuable when building a portfolio and seeking job opportunities. 48 | - **Collaboration**: Data science is not a solo sport. Kaggle allows you to collaborate with others from around the globe, making it easy to learn from experienced professionals and share your own insights. 49 | - **Portfolio Building**: By participating in competitions and creating your own notebooks, you can build a portfolio that showcases your data science expertise. Many Kaggle users have been recruited by top companies simply based on the work they’ve done on the platform. 50 | 51 | ## Final Thoughts 52 | 53 | Kaggle is an incredibly powerful tool for any aspiring data scientist. From the endless variety of datasets to the highly competitive challenges, the platform provides everything you need to grow in the field of data science. Although there are some challenges—like the phone verification issue for users in certain countries—the value Kaggle offers far outweighs these inconveniences. 54 | 55 | As I continue my journey in data science, I will be using Kaggle to hone my skills, participate in competitions, and share my projects with the world. Please follow me on Kaggle and join me in this exciting field: 56 | 57 | [Follow me on Kaggle](https://www.kaggle.com/amiingholami) 58 | 59 | Together, we can learn, grow, and contribute to the ever-expanding data science community. 60 | -------------------------------------------------------------------------------- /Week 3/4. Learn about Streamlit and Gradio Libraries/Readme.md: -------------------------------------------------------------------------------- 1 | # 📊 Report on Learning Streamlit and Gradio Libraries 2 | 3 | ## Task Overview 4 | As part of my internship, I was assigned the task to **learn about Streamlit and Gradio Libraries**, both of which are powerful tools for building interactive web applications tailored for data science and machine learning workflows. The goal was to understand their installation process, core functionalities, and how to build basic applications with each library. Here's a detailed explanation of the steps I followed and the results. 5 | 6 | 7 | ## Making My Own GDP Dashboard with Streamlit: 8 | 9 | 🌎 GDP dashboard: https://gdp-dashboard-of-amin.streamlit.app/ 10 | 11 | Repository : https://github.com/AmiinGholami/GDPdashboard 12 | 13 | --- 14 | 15 | ## 1. Learning Streamlit 16 | 17 | ### What is Streamlit? 18 | Streamlit is a fast and easy way to create custom web apps in Python, specifically for machine learning and data science. It allows developers to quickly turn data scripts into shareable web apps with very little effort. 19 | 20 | ### Step-by-Step Process 21 | 22 | #### a) **Installation and Setup** 23 | To get started with Streamlit, I first needed to install it. The installation is quite straightforward: 24 | ```bash 25 | pip install streamlit 26 | ``` 27 | 28 | Once installed, you can start any Streamlit application with: 29 | ```bash 30 | streamlit run app.py 31 | ``` 32 | Streamlit automatically launches the app in the browser, making development quick and interactive. 33 | 34 | 35 | ### b) **Building a Basic App** 36 | 37 | I started with a simple example to understand the basic functionalities: 38 | ```bash 39 | import streamlit as st 40 | 41 | st.title("Hello, Streamlit!") 42 | st.write("This is my first Streamlit application.") 43 | 44 | # Create a simple slider to adjust values 45 | slider_value = st.slider('Select a value', 0, 100, 50) 46 | st.write(f'Selected value: {slider_value}') 47 | ``` 48 | With this small code, I was able to launch a web app that allowed users to interact with a slider, providing instant visual feedback. 49 | 50 | 51 | ### c) **Displaying Data** 52 | Streamlit is especially useful for visualizing data in data science projects. Here’s how I learned to display a Pandas DataFrame: 53 | 54 | ```bash 55 | import pandas as pd 56 | 57 | data = {'Name': ['John', 'Jane', 'Sam'], 58 | 'Age': [28, 34, 22]} 59 | df = pd.DataFrame(data) 60 | 61 | st.write("Displaying DataFrame:") 62 | st.write(df) 63 | ``` 64 | This helped me understand how easy it is to display and manipulate data within a Streamlit app. The integration with other Python libraries like Pandas is seamless. 65 | 66 | ### d) **Visualizing Data** 67 | One of the key features of Streamlit is the ability to quickly visualize data with minimal code. Here’s an example where I generated a simple line chart using Matplotlib: 68 | 69 | 70 | ```bash 71 | import matplotlib.pyplot as plt 72 | 73 | fig, ax = plt.subplots() 74 | ax.plot([1, 2, 3, 4], [10, 20, 25, 30]) 75 | st.pyplot(fig) 76 | ``` 77 | 78 | Streamlit makes it incredibly easy to incorporate any data visualization tools into a web app, which is essential for data science projects. 79 | 80 | 81 | ### Key Takeaways from Streamlit: 82 | 83 | • Fast development: It allows for the rapid creation of apps. 84 | • Easy to use: You can turn simple Python scripts into powerful web apps. 85 | • Perfect for data science: It integrates seamlessly with popular data science libraries like Pandas, Matplotlib, and Seaborn. 86 | 87 | 88 | ## 2. Learning Gradio 89 | 90 | ### What is Gradio? 91 | Gradio is a Python library that makes it super simple to create UIs for machine learning models and share them via a web interface. It’s especially designed for users who want to quickly build and test ML models in a user-friendly environment. 92 | 93 | 94 | I am using Gradio Playground for making more familiar with Gradio 95 | 96 | Gradio Playground: https://www.gradio.app/playground 97 | 98 | 99 | ### Step-by-Step Process 100 | 101 | ### a) **Installation and Setup** 102 | Just like Streamlit, Gradio has an easy setup process. To install: 103 | ```bash 104 | pip install gradio 105 | ``` 106 | 107 | Once installed, running a Gradio app can be done in a Python script, or directly in Jupyter notebooks. 108 | 109 | 110 | ### b) **Building a Simple Gradio Interface** 111 | 112 | The first step in learning Gradio was to create a basic interface: 113 | ```bash 114 | import gradio as gr 115 | 116 | def greet(name): 117 | return f"Hello {name}!" 118 | 119 | iface = gr.Interface(fn=greet, inputs="text", outputs="text") 120 | iface.launch() 121 | ``` 122 | This simple app allows a user to input their name, and the app will greet them by displaying “Hello [Name]!”. I learned how easy it was to create interactive interfaces without needing extensive knowledge of web development. 123 | 124 | ### c) **Integrating Machine Learning Models** 125 | 126 | Gradio is particularly useful for building interfaces to interact with machine learning models. I tried building an app that uses a pre-trained machine learning model for image classification: 127 | 128 | ```bash 129 | import gradio as gr 130 | import tensorflow as tf 131 | 132 | model = tf.keras.applications.MobileNetV2() 133 | 134 | def classify_image(img): 135 | img = tf.image.resize(img, (224, 224)) 136 | img = tf.expand_dims(img, 0) 137 | prediction = model.predict(img) 138 | return prediction 139 | 140 | iface = gr.Interface(fn=classify_image, inputs="image", outputs="label") 141 | iface.launch() 142 | ``` 143 | 144 | With this example, I learned how to create an interactive interface that allows users to upload an image, and the app will classify the image using a pre-trained model. Gradio provides a convenient way to showcase machine learning models with minimal code. 145 | 146 | ### d) **Customization and Sharing** 147 | 148 | One of Gradio’s strong points is that it lets you share your interface with anyone instantly, just by using a shareable link. This was a valuable feature for quick collaboration and testing: 149 | 150 | ```bash 151 | iface.launch(share=True) 152 | ``` 153 | This creates a public link that anyone can access to interact with the app. 154 | 155 | 156 | ### Key Takeaways from Gradio: 157 | 158 | • Simple and intuitive: It allows developers to build interfaces for ML models with very little code. 159 | • Real-time sharing: You can easily share your app with others through a public link. 160 | • Optimized for ML: Perfect for testing and showcasing machine learning models quickly. 161 | 162 | 163 | ## 3. Comparison Between Streamlit and Gradio 164 | 165 | | Feature | Streamlit | Gradio | 166 | |-------------------------|--------------------------------------|-------------------------------------| 167 | | **Primary Use Case** | Data dashboards, visualization | Machine learning model interfaces | 168 | | **Ease of Use** | Very easy for building data apps | Extremely simple for ML interfaces | 169 | | **Installation** | `pip install streamlit` | `pip install gradio` | 170 | | **Customizability** | Highly customizable | Limited but effective | 171 | | **Best for** | Data science apps | ML model testing and deployment | 172 | | **App Sharing** | Needs deployment setup or Streamlit Cloud | One-click public link sharing | 173 | 174 | ## Conclusion 175 | 176 | After completing this task, I have a solid understanding of both Streamlit and Gradio, their use cases, and their strengths. Streamlit is best suited for building data dashboards and applications that display large datasets, while Gradio is more focused on creating quick, user-friendly interfaces for machine learning models. Both libraries offer fast prototyping and development, making them excellent choices for data scientists and machine learning engineers. 177 | 178 | ## Next Steps: 179 | 180 | • I plan to explore more advanced functionalities of Streamlit, such as integrating APIs. 181 | • For Gradio, I aim to build more complex interfaces for machine learning models that accept multiple types of input (e.g., images and text). 182 | -------------------------------------------------------------------------------- /Week 1/2. Add your previous notebook on Kaggle/README.md: -------------------------------------------------------------------------------- 1 | # My Kaggle Projects 2 | 3 | Welcome to my GitHub repository! Here, you'll find a collection of projects that I have completed on Kaggle. These projects showcase my skills in data analysis, machine learning, and data visualization. Each project is designed to solve real-world problems and includes detailed analyses and findings. 4 | 5 | Feel free to explore the projects below. You can click on the links to view the notebooks directly on Kaggle, and each section includes a brief description of the project and its objectives. 6 | 7 | ## 1. House Price Prediction 8 | **[Link to Project](https://www.kaggle.com/code/amiingholami/house-price-project)** 9 | 10 | This project focuses on predicting house prices in Tehran using machine learning techniques. The dataset includes various features such as area, number of rooms, presence of parking, warehouse, and elevator, along with the corresponding prices in both Toman and USD. 11 | 12 | ### Objectives: 13 | 14 | • Data Cleaning: Initial steps involve cleaning the dataset to handle missing values, outliers, and converting data types for better analysis. 15 | • Feature Engineering: Identifying and selecting the most relevant features that contribute to price determination, enhancing the model’s accuracy. 16 | • Model Development: Utilizing regression algorithms to create a predictive model that estimates house prices based on the provided features. 17 | • Model Evaluation: Assessing the model’s performance using metrics such as Mean Absolute Error (MAE) and R-squared to ensure its reliability in real-world applications. 18 | 19 | ### Outcome: 20 | 21 | The final model aims to provide a robust tool for potential home buyers and real estate investors to estimate house prices effectively in Tehran’s competitive real estate market. 22 | 23 | 24 | 25 | 26 | --- 27 | 28 | ## 2. Heart Attack Prediction 29 | **[Link to Project](https://www.kaggle.com/code/amiingholami/medical-ml-project)** 30 | 31 | 32 | In this project, various solutions and methodologies were tested for predicting the likelihood of heart attacks using a medical dataset. The dataset comprises essential features such as age, sex, exercise angina, chest pain type, resting blood pressure, cholesterol levels, fasting blood sugar, electrocardiographic results, maximum heart rate achieved, and other clinical indicators. 33 | 34 | ### Objectives: 35 | 36 | • Data Preprocessing: The initial phase involved cleaning the dataset by handling missing values and converting categorical variables into numerical formats for analysis. 37 | 38 | • Exploratory Data Analysis (EDA): Analyzing the relationships between different features and the target variable to uncover patterns and insights that could enhance model performance. 39 | 40 | • Model Development: Implementing a range of machine learning algorithms, including logistic regression, decision trees, and ensemble methods, to find the most effective model for predicting heart attack occurrences. 41 | 42 | • Model Evaluation: Assessing the models’ performances using metrics such as accuracy, precision, recall, and the F1 score to identify the best-performing solution. 43 | 44 | 45 | ### Outcome: 46 | 47 | Through rigorous testing of different approaches, a satisfactory prediction model was achieved that accurately estimates the risk of heart attacks. This model serves as a valuable tool for healthcare professionals and individuals seeking to understand their heart health better. 48 | 49 | 50 | --- 51 | 52 | ## 3. Mall Customer Segmentation & Clustering 53 | **[Link to Project](https://www.kaggle.com/code/amiingholami/mall-customer-segmentation-clustering)** 54 | 55 | The Mall Customer Segmentation & Clustering project focuses on analyzing customer data to identify distinct segments within a shopping mall. The goal is to understand customer behavior and preferences, which can inform marketing strategies and improve customer engagement. 56 | 57 | ### Objectives: 58 | 59 | • Data Acquisition: The project begins with the collection of customer data, including demographics, spending scores, and annual income. 60 | • Data Preprocessing: This step involves cleaning the dataset by handling missing values and normalizing numerical features to prepare it for analysis. 61 | • Exploratory Data Analysis (EDA): Conducting a thorough analysis of the dataset to uncover patterns, trends, and insights into customer behavior and spending habits. 62 | • Clustering Techniques: Implementing various clustering algorithms, such as K-Means, Hierarchical Clustering, and DBSCAN, to group customers based on their similarities in spending behavior and demographics. 63 | • Model Evaluation: Evaluating the clustering results using metrics like silhouette score and inertia to determine the optimal number of clusters and assess the effectiveness of the segmentation. 64 | 65 | ### Outcome: 66 | 67 | The project successfully segments customers into distinct groups, providing valuable insights into customer preferences and behaviors. These segments can be leveraged by marketing teams to tailor campaigns, enhance customer experiences, and ultimately drive sales and loyalty within the mall. 68 | 69 | 70 | 71 | --- 72 | 73 | ## 4. Recommendation System for IMDB Movies 74 | **[Link to Project](https://www.kaggle.com/code/amiingholami/recommendation-system-for-imdb-movies)** 75 | 76 | The Recommendation System for IMDB Movies project aims to develop a robust and efficient system that suggests movies to users based on their preferences and viewing history. By leveraging data science techniques, the project seeks to enhance user experience on movie platforms, guiding viewers to films they are likely to enjoy. 77 | 78 | ### Objectives: 79 | 80 | • Data Collection: The project begins with gathering a comprehensive dataset from IMDB, which includes movie titles, genres, ratings, user reviews, and other relevant attributes. This dataset serves as the foundation for building the recommendation system. 81 | • Data Preprocessing: The collected data undergoes cleaning and preprocessing. This includes handling missing values, normalizing ratings, and transforming categorical variables (e.g., genres) into a suitable format for analysis. The goal is to ensure high-quality data for effective modeling. 82 | • Exploratory Data Analysis (EDA): Through EDA, various insights about the dataset are uncovered. This includes understanding user demographics, popular movie genres, and the relationship between ratings and reviews. Visualizations are employed to present findings, helping to identify trends and patterns. 83 | • Building the Recommendation Model: The core of the project lies in developing the recommendation system. Various techniques are explored, including: 84 | • Collaborative Filtering: This method leverages user behavior to recommend movies based on similar users’ preferences. By analyzing user-item interactions, the system identifies movies that users with similar tastes enjoyed. 85 | • Content-Based Filtering: This approach recommends movies based on the attributes of the items themselves. For example, if a user enjoyed action films, the system suggests other action movies with similar features (like cast, directors, or themes). 86 | • Hybrid Model: Combining both collaborative and content-based filtering techniques provides a more comprehensive recommendation. This model takes advantage of the strengths of each approach, improving the overall accuracy of recommendations. 87 | • Evaluation Metrics: The effectiveness of the recommendation system is assessed using metrics such as precision, recall, F1 score, and mean absolute error (MAE). These metrics help gauge the system’s performance and identify areas for improvement. 88 | 89 | ### Outcome: 90 | 91 | The final output of this project is a fully functional recommendation system that provides users with personalized movie suggestions. Users can input their favorite films or ratings, and the system will respond with tailored recommendations that match their interests. 92 | 93 | ### Applications: 94 | 95 | The developed recommendation system can be integrated into movie streaming platforms, enhancing user engagement and satisfaction. By guiding users to relevant content, the system can increase viewership and retention rates, ultimately driving platform success. 96 | 97 | 98 | --- 99 | 100 | ## Thank you for visiting my repository! 101 | 102 | I hope you find these projects informative and inspiring. Feel free to reach out if you have any questions or feedback. 103 | -------------------------------------------------------------------------------- /Week 2/3. Find python finance data collector libraries and make list for that/Readme.md: -------------------------------------------------------------------------------- 1 | # Top Python Libraries for Financial Data Collection & Analysis 2 | 3 | Python has become a key player in the finance world, offering a vast array of libraries that cater to financial data collection, analysis, and modeling needs. Whether you're into trading, quantitative finance, or risk management, having the right tools can significantly enhance your workflow. This article outlines some of the best Python packages and libraries for finance, focusing on their unique features and applications. 4 | 5 | ## 1. **Pandas**: Data Manipulation & Analysis 6 | **Pandas** is the go-to library for handling and transforming datasets, especially in finance where time-series data is essential. It provides flexible data structures like DataFrames, allowing for easy data manipulation, indexing, and integration with other libraries. 7 | 8 | - **Key Features:** 9 | - Efficient manipulation of structured data (DataFrames) 10 | - Tools for reading/writing between in-memory structures and file formats (CSV, Excel, SQL) 11 | - Time-series analysis tools 12 | 13 | - **Use Case:** Ideal for organizing, cleaning, and transforming financial data. 14 | 15 | --- 16 | 17 | ## 2. **NumPy**: Numerical Computing 18 | **NumPy** is fundamental for numerical computations in finance. It enables the creation of large, multi-dimensional arrays and provides a suite of mathematical functions to perform complex calculations like those used in options pricing and risk models. 19 | 20 | - **Key Features:** 21 | - Multi-dimensional array objects (ndarrays) 22 | - Mathematical functions for array operations 23 | - Linear algebra and random number generation 24 | 25 | - **Use Case:** Pricing options, risk assessment models, and financial simulations. 26 | 27 | --- 28 | 29 | ## 3. **Matplotlib**: Data Visualization 30 | **Matplotlib** is a powerful tool for creating static, animated, and interactive plots. In finance, it’s widely used to visualize trends, historical data, and model outputs. 31 | 32 | - **Key Features:** 33 | - Flexible plotting capabilities (line charts, bar charts, histograms, etc.) 34 | - Customization options for figures 35 | - Integration with NumPy and Pandas for easy plotting of financial data 36 | 37 | - **Use Case:** Visualizing financial trends, portfolio performance, and market simulations. 38 | 39 | --- 40 | 41 | ## 4. **SciPy**: Advanced Scientific Computing 42 | Building on top of NumPy, **SciPy** offers more advanced mathematical tools that are essential for financial modeling. It provides algorithms for optimization, linear algebra, and statistical functions commonly needed in finance. 43 | 44 | - **Key Features:** 45 | - Modules for optimization and integration 46 | - Advanced signal and image processing 47 | - Linear algebra, interpolation, and statistical functions 48 | 49 | - **Use Case:** Financial modeling, risk management, and derivative pricing. 50 | 51 | --- 52 | 53 | ## 5. **Statsmodels**: Statistical Modeling 54 | **Statsmodels** provides classes and functions to implement various statistical models and tests, which are indispensable in finance for tasks like time series analysis, regression modeling, and hypothesis testing. 55 | 56 | - **Key Features:** 57 | - Time-series analysis and econometrics models 58 | - Tools for linear regression, logistic regression, and more 59 | - Conducting hypothesis testing and statistical analysis 60 | 61 | - **Use Case:** Time-series forecasting, econometric modeling, and risk assessment. 62 | 63 | --- 64 | 65 | ## 6. **Scikit-learn**: Machine Learning 66 | **Scikit-learn** is the leading library for machine learning in Python, and it’s extensively used in finance for predictive modeling and developing algorithmic trading strategies. 67 | 68 | - **Key Features:** 69 | - A wide range of supervised and unsupervised learning algorithms 70 | - Cross-validation tools for model evaluation 71 | - Integration with NumPy and Pandas 72 | 73 | - **Use Case:** Predictive analytics, algorithmic trading, and credit risk modeling. 74 | 75 | --- 76 | 77 | ## 7. **QuantLib**: Quantitative Finance 78 | **QuantLib** is a dedicated library for quantitative finance, built mainly for derivative pricing, interest rate modeling, and risk management. It's written in C++ but has Python bindings for ease of use. 79 | 80 | - **Key Features:** 81 | - Tools for derivatives pricing and risk management 82 | - Interest rate models and term structure calculations 83 | - Support for exotic options and fixed-income instruments 84 | 85 | - **Use Case:** Derivatives pricing, bond valuation, and risk assessments. 86 | 87 | --- 88 | 89 | ## 8. **Pyfolio**: Portfolio and Risk Analytics 90 | **Pyfolio** is designed for detailed risk and performance analytics of financial portfolios. It helps investors and analysts understand the risk-return profile of their investments. 91 | 92 | - **Key Features:** 93 | - Tear sheet creation for analyzing portfolio performance 94 | - Tools for analyzing returns, positions, and transactions 95 | - Risk-adjusted performance metrics 96 | 97 | - **Use Case:** Backtesting, risk analysis, and portfolio management. 98 | 99 | --- 100 | 101 | ## 9. **Zipline**: Algorithmic Trading 102 | **Zipline** is an open-source backtesting framework that allows for the simulation of trading strategies. It is the core library behind Quantopian’s algorithmic trading platform. 103 | 104 | - **Key Features:** 105 | - Backtesting of trading algorithms 106 | - Integration with Pandas for data handling 107 | - Tools for creating, testing, and executing trading strategies 108 | 109 | - **Use Case:** Algorithmic trading, strategy development, and performance tracking. 110 | 111 | --- 112 | 113 | ## 10. **FBProphet**: Time Series Forecasting 114 | Developed by Facebook, **FBProphet** is a great tool for time series forecasting, commonly used in stock market prediction and economic trend analysis. 115 | 116 | - **Key Features:** 117 | - Tools for detecting daily, weekly, and yearly trends 118 | - Capable of handling missing data and outliers 119 | - Models time-series data with seasonal patterns 120 | 121 | - **Use Case:** Stock market analysis, demand forecasting, and trend prediction. 122 | 123 | --- 124 | 125 | ## 11. **Seaborn**: Statistical Data Visualization 126 | **Seaborn** builds on top of Matplotlib and simplifies complex statistical plots. It is particularly useful in finance for visualizing the relationships between variables in a dataset. 127 | 128 | - **Key Features:** 129 | - Statistical plots such as heatmaps, time-series plots, and regression plots 130 | - Integrates seamlessly with Pandas DataFrames 131 | - Enhanced aesthetics for easier interpretation of plots 132 | 133 | - **Use Case:** Correlation analysis, risk-return heatmaps, and trend visualization. 134 | 135 | --- 136 | 137 | ## 12. **Keras**: Deep Learning for Finance 138 | **Keras** is a high-level neural network API that simplifies the creation of deep learning models. It’s particularly useful in finance for tasks like fraud detection, algorithmic trading, and predictive analytics. 139 | 140 | - **Key Features:** 141 | - Easy model building with deep neural networks 142 | - Runs on top of TensorFlow, CNTK, or Theano 143 | - Wide range of neural network layers and architectures 144 | 145 | - **Use Case:** Fraud detection, algorithmic trading, and financial forecasting. 146 | 147 | --- 148 | 149 | ## 13. **Plotly**: Interactive Graphs 150 | **Plotly** Plotly is a graphing library that makes interactive, publication-quality graphs online. 151 | 152 | --- 153 | 154 | ## 14. **ECOS**: Convex Optimization 155 | **ECOS** (Embedded Conic Solver) is a numerical software for solving convex optimization problems. 156 | 157 | --- 158 | 159 | ## 15. **SCS**: Large-scale Convex Optimization 160 | **SCS** (Splitting Conic Solver) is a numerical optimization algorithm for solving large-scale convex cone problems – useful in financial contexts where robust optimization is required. 161 | 162 | --- 163 | 164 | ## Additional Tools in Financial Analysis 165 | 166 | - TensorFlow: Often used alongside Keras, TensorFlow is a tool for machine learning and deep learning, offering extensive capabilities in modeling complex financial systems and predictive analytics. 167 | - PyMC3: Ideal for Bayesian modeling and probabilistic machine learning, PyMC3 is useful in finance for risk management and econometric analysis. 168 | - Dash: A Python framework for building analytical web applications. Dash can be used to create interactive, web-based dashboards for data visualization and financial analysis without requiring complex web development skills. 169 | 170 | 171 | 172 | ----- 173 | 174 | # Categorizing Python Libraries for Financial Data Collection 175 | 176 | ## 1. Categorization Based on Primary Functionality 177 | 178 | These libraries are grouped based on their primary usage or functionality in financial processes. 179 | 180 | ### **Data Manipulation and Analysis** 181 | - **Pandas**: Provides efficient data structures for manipulating large datasets, especially time-series data, which is crucial for financial analysis. 182 | - **NumPy**: A fundamental package for numerical computing. It allows complex financial calculations, such as options pricing and risk management. 183 | 184 | ### **Visualization** 185 | - **Matplotlib**: Generates static, animated, and interactive plots, widely used for visualizing financial trends. 186 | - **Seaborn**: Built on top of Matplotlib, Seaborn is designed for creating attractive and informative statistical visualizations, ideal for financial heatmaps and time-series analysis. 187 | - **Plotly**: Focuses on creating interactive plots such as candlestick charts and 3D graphs for financial instruments. 188 | 189 | ### **Statistical Modeling and Econometrics** 190 | - **Statsmodels**: Offers tools for estimating and testing statistical models, particularly useful for time-series analysis and financial risk assessment. 191 | - **SciPy**: Extends NumPy with advanced scientific computing tools like linear algebra and optimization, often used in financial modeling. 192 | 193 | ### **Machine Learning and AI** 194 | - **Scikit-learn**: A widely-used machine learning library for predictive modeling in finance, such as algorithmic trading. 195 | - **Keras**: Simplifies deep learning with neural networks, helping to detect fraud, optimize portfolios, and enhance algorithmic trading. 196 | 197 | ### **Algorithmic Trading** 198 | - **Zipline**: Designed specifically for backtesting trading algorithms, this library is favored by finance professionals for developing and deploying trading strategies. 199 | 200 | ### **Quantitative Finance** 201 | - **QuantLib**: A quantitative finance library offering tools for derivatives pricing, interest rate models, and risk management. 202 | 203 | ### **Portfolio and Risk Analytics** 204 | - **Pyfolio**: A specialized library for portfolio performance and risk analytics, used to analyze returns, positions, and risks in investment strategies. 205 | 206 | ### **Forecasting** 207 | - **FBProphet**: Developed by Facebook, this library specializes in time-series forecasting, ideal for stock market and economic trend analysis. 208 | 209 | --- 210 | 211 | ## 2. Categorization Based on Complexity 212 | 213 | ### **Basic Usage** 214 | - **Pandas**: A general-purpose library for working with financial data. 215 | - **NumPy**: Suitable for fundamental numerical tasks like calculations and matrix operations. 216 | 217 | ### **Intermediate Usage** 218 | - **Matplotlib**: Used for creating standard financial visualizations like line plots and histograms. 219 | - **Statsmodels**: Offers basic econometrics and statistical models, often used in time series forecasting. 220 | 221 | ### **Advanced Usage** 222 | - **Scikit-learn**: Incorporates machine learning algorithms for predictive analysis. 223 | - **Keras**: A deep learning library that allows advanced financial forecasting and fraud detection using neural networks. 224 | - **QuantLib**: Requires a deeper understanding of financial mathematics for complex derivative pricing and quantitative models. 225 | 226 | --- 227 | 228 | ## 3. Categorization Based on Financial Application 229 | 230 | ### **Risk Management and Portfolio Analysis** 231 | - **Pyfolio**: For risk analytics and performance metrics of financial portfolios. 232 | - **SCS**: Helps in solving large-scale convex optimization problems relevant in financial risk management. 233 | - **ECOS**: Used for convex optimization in portfolio management and asset allocation. 234 | 235 | ### **Time-Series Forecasting** 236 | - **FBProphet**: Designed for time series data with patterns across different scales, particularly useful in stock and economic forecasting. 237 | 238 | ### **Algorithmic Trading** 239 | - **Zipline**: Tailored for developing and testing algorithmic trading strategies. 240 | 241 | --- 242 | 243 | ## 4. Categorization Based on Integration with Other Libraries 244 | 245 | ### **Built on NumPy** 246 | - **SciPy**: Extends NumPy with additional scientific computation tools. 247 | - **Pandas**: Built on NumPy for more efficient data manipulation, especially for time-series data. 248 | 249 | ### **Built on Matplotlib** 250 | - **Seaborn**: Extends Matplotlib’s capabilities, making it easier to generate complex statistical plots. 251 | - **Plotly**: Offers interactive graphing functionalities that enhance Matplotlib’s basic features. 252 | 253 | --- 254 | 255 | ## 5. Additional Libraries to Consider 256 | 257 | ### **TensorFlow** 258 | Used alongside Keras for deep learning tasks, especially in building more complex financial models for prediction and algorithmic trading. 259 | 260 | ### **Dash** 261 | A web application framework for building data visualization dashboards, often used in financial analysis to create interactive reports and monitoring tools. 262 | 263 | ### **PyMC3** 264 | Ideal for probabilistic programming, used in Bayesian modeling for risk assessment and econometric analysis. 265 | 266 | --- 267 | 268 | ## Conclusion 269 | By categorizing Python libraries based on their functionality, complexity, and financial applications, it's easier to determine which ones are most suitable for specific tasks in finance. Whether you're focused on algorithmic trading, time-series forecasting, risk management, or quantitative finance, there's a Python library tailored to meet your needs. 270 | -------------------------------------------------------------------------------- /Week 4/2. Implement an ML algorithm for predict etherium coin/ethereum-price-prediction.ipynb: -------------------------------------------------------------------------------- 1 | {"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.10.12","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[{"sourceId":1085416,"sourceType":"datasetVersion","datasetId":478632}],"dockerImageVersionId":30822,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"code","source":"import pandas as pd\n\n# بارگذاری دیتاست\ndf = pd.read_csv('/kaggle/input/ethereum-historical-dataset/ETH_1H.csv')\n\n# مشاهده اولین چند ردیف از دیتاست\nprint(df.head())","metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:16:42.989082Z","iopub.execute_input":"2024-12-26T14:16:42.989489Z","iopub.status.idle":"2024-12-26T14:16:43.065711Z","shell.execute_reply.started":"2024-12-26T14:16:42.989446Z","shell.execute_reply":"2024-12-26T14:16:43.064642Z"}},"outputs":[{"name":"stdout","text":" Unix Timestamp Date Symbol Open High Low \\\n0 1586995200000 2020-04-16 00:00:00 ETHUSD 152.94 152.94 150.39 \n1 1586991600000 2020-04-15 23:00:00 ETHUSD 155.81 155.81 151.39 \n2 1586988000000 2020-04-15 22:00:00 ETHUSD 157.18 157.30 155.32 \n3 1586984400000 2020-04-15 21:00:00 ETHUSD 158.04 158.31 157.16 \n4 1586980800000 2020-04-15 20:00:00 ETHUSD 157.10 158.10 156.87 \n\n Close Volume \n0 150.39 650.188125 \n1 152.94 4277.567299 \n2 155.81 106.337279 \n3 157.18 55.244131 \n4 158.04 144.262622 \n","output_type":"stream"}],"execution_count":18},{"cell_type":"code","source":"# بررسی داده‌های گمشده\nprint(df.isnull().sum())\n\n# پر کردن داده‌های گمشده (در صورت نیاز)\ndf = df.fillna(method='ffill')\n\n# بررسی نوع داده‌ها\nprint(df.dtypes)\n\n# اگر لازم باشد، تبدیل داده‌ها به نوع مناسب\n# مثال: تبدیل تاریخ به فرمت DateTime\ndf['Date'] = pd.to_datetime(df['Date'])","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:16:45.223020Z","iopub.execute_input":"2024-12-26T14:16:45.223430Z","iopub.status.idle":"2024-12-26T14:16:45.265082Z","shell.execute_reply.started":"2024-12-26T14:16:45.223394Z","shell.execute_reply":"2024-12-26T14:16:45.263886Z"}},"outputs":[{"name":"stdout","text":"Unix Timestamp 0\nDate 0\nSymbol 0\nOpen 0\nHigh 0\nLow 0\nClose 0\nVolume 0\ndtype: int64\nUnix Timestamp int64\nDate object\nSymbol object\nOpen float64\nHigh float64\nLow float64\nClose float64\nVolume float64\ndtype: object\n","output_type":"stream"},{"name":"stderr","text":":5: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.\n df = df.fillna(method='ffill')\n","output_type":"stream"}],"execution_count":19},{"cell_type":"code","source":"# فرض می‌کنیم که هدف (target) ستون 'Price' است\nX = df.drop(columns=['Close'])\ny = df['Close']","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:16:48.933348Z","iopub.execute_input":"2024-12-26T14:16:48.933688Z","iopub.status.idle":"2024-12-26T14:16:48.940894Z","shell.execute_reply.started":"2024-12-26T14:16:48.933663Z","shell.execute_reply":"2024-12-26T14:16:48.939874Z"}},"outputs":[],"execution_count":20},{"cell_type":"code","source":"from sklearn.model_selection import train_test_split\n\n# تقسیم داده‌ها به داده‌های آموزشی و آزمایشی\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:16:51.017059Z","iopub.execute_input":"2024-12-26T14:16:51.017465Z","iopub.status.idle":"2024-12-26T14:16:51.029741Z","shell.execute_reply.started":"2024-12-26T14:16:51.017432Z","shell.execute_reply":"2024-12-26T14:16:51.028497Z"}},"outputs":[],"execution_count":21},{"cell_type":"code","source":"# تبدیل تاریخ به عدد (مثلاً تعداد روزها از 1970-01-01)\ndf['Date_numeric'] = df['Date'].apply(lambda x: x.timestamp())\n\n# انتخاب فقط ستون‌های عددی\nnumeric_features = df.select_dtypes(include=['float64', 'int64']).columns\ndf_numeric = df[numeric_features]\n\nprint(df.dtypes)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:16:53.473940Z","iopub.execute_input":"2024-12-26T14:16:53.474353Z","iopub.status.idle":"2024-12-26T14:16:53.563829Z","shell.execute_reply.started":"2024-12-26T14:16:53.474319Z","shell.execute_reply":"2024-12-26T14:16:53.562777Z"}},"outputs":[{"name":"stdout","text":"Unix Timestamp int64\nDate datetime64[ns]\nSymbol object\nOpen float64\nHigh float64\nLow float64\nClose float64\nVolume float64\nDate_numeric float64\ndtype: object\n","output_type":"stream"}],"execution_count":22},{"cell_type":"code","source":"df = df.drop(columns=['Date'])","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:15:27.403089Z","iopub.execute_input":"2024-12-26T14:15:27.403498Z","iopub.status.idle":"2024-12-26T14:15:27.410179Z","shell.execute_reply.started":"2024-12-26T14:15:27.403465Z","shell.execute_reply":"2024-12-26T14:15:27.408997Z"}},"outputs":[],"execution_count":15},{"cell_type":"code","source":"import pandas as pd\n\n# خواندن داده‌ها\ndf = pd.read_csv('/kaggle/input/ethereum-historical-dataset/ETH_1H.csv')\n\n# بررسی نوع ستون‌ها\nprint(df.dtypes)\n\n# تبدیل ستون تاریخ به datetime (در صورت نیاز)\ndf['Date'] = pd.to_datetime(df['Date'])\n\n# اگر مدل به ستون تاریخ نیاز دارد، آن را به مقدار عددی تبدیل کنید:\ndf['Date_numeric'] = df['Date'].apply(lambda x: x.timestamp())\n\n# حذف ستون تاریخ اصلی اگر نیازی به آن نیست\ndf = df.drop(columns=['Date'])\n\n# بررسی داده‌ها\nprint(df.head())","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:17:28.434562Z","iopub.execute_input":"2024-12-26T14:17:28.434905Z","iopub.status.idle":"2024-12-26T14:17:28.594485Z","shell.execute_reply.started":"2024-12-26T14:17:28.434876Z","shell.execute_reply":"2024-12-26T14:17:28.593594Z"}},"outputs":[{"name":"stdout","text":"Unix Timestamp int64\nDate object\nSymbol object\nOpen float64\nHigh float64\nLow float64\nClose float64\nVolume float64\ndtype: object\n Unix Timestamp Symbol Open High Low Close Volume \\\n0 1586995200000 ETHUSD 152.94 152.94 150.39 150.39 650.188125 \n1 1586991600000 ETHUSD 155.81 155.81 151.39 152.94 4277.567299 \n2 1586988000000 ETHUSD 157.18 157.30 155.32 155.81 106.337279 \n3 1586984400000 ETHUSD 158.04 158.31 157.16 157.18 55.244131 \n4 1586980800000 ETHUSD 157.10 158.10 156.87 158.04 144.262622 \n\n Date_numeric \n0 1.586995e+09 \n1 1.586992e+09 \n2 1.586988e+09 \n3 1.586984e+09 \n4 1.586981e+09 \n","output_type":"stream"}],"execution_count":24},{"cell_type":"code","source":"# انتخاب فقط ستون‌های عددی\nnumeric_columns = df.select_dtypes(include=['float64', 'int64']).columns\ndf_numeric = df[numeric_columns]\n\n# بررسی ویژگی‌های نهایی برای مدل\nprint(df_numeric.head())","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:17:52.353898Z","iopub.execute_input":"2024-12-26T14:17:52.354232Z","iopub.status.idle":"2024-12-26T14:17:52.370586Z","shell.execute_reply.started":"2024-12-26T14:17:52.354205Z","shell.execute_reply":"2024-12-26T14:17:52.369406Z"}},"outputs":[{"name":"stdout","text":" Unix Timestamp Open High Low Close Volume Date_numeric\n0 1586995200000 152.94 152.94 150.39 150.39 650.188125 1.586995e+09\n1 1586991600000 155.81 155.81 151.39 152.94 4277.567299 1.586992e+09\n2 1586988000000 157.18 157.30 155.32 155.81 106.337279 1.586988e+09\n3 1586984400000 158.04 158.31 157.16 157.18 55.244131 1.586984e+09\n4 1586980800000 157.10 158.10 156.87 158.04 144.262622 1.586981e+09\n","output_type":"stream"}],"execution_count":26},{"cell_type":"code","source":"# پیدا کردن مشکل در داده‌ها\nprint(df.dtypes)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:18:14.049919Z","iopub.execute_input":"2024-12-26T14:18:14.050291Z","iopub.status.idle":"2024-12-26T14:18:14.056370Z","shell.execute_reply.started":"2024-12-26T14:18:14.050238Z","shell.execute_reply":"2024-12-26T14:18:14.055237Z"}},"outputs":[{"name":"stdout","text":"Unix Timestamp int64\nSymbol object\nOpen float64\nHigh float64\nLow float64\nClose float64\nVolume float64\nDate_numeric float64\ndtype: object\n","output_type":"stream"}],"execution_count":28},{"cell_type":"code","source":"from sklearn.model_selection import train_test_split\nfrom sklearn.ensemble import RandomForestRegressor\n\n# آماده‌سازی داده‌ها\nX = df_numeric.drop(columns=['Close'])\ny = df_numeric['Close']\n\n# تقسیم داده‌ها\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# ایجاد و آموزش مدل\nmodel = RandomForestRegressor()\nmodel.fit(X_train, y_train)\n\n# پیش‌بینی\npredictions = model.predict(X_test)\nprint(predictions)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:18:27.420398Z","iopub.execute_input":"2024-12-26T14:18:27.420804Z","iopub.status.idle":"2024-12-26T14:18:44.779339Z","shell.execute_reply.started":"2024-12-26T14:18:27.420775Z","shell.execute_reply":"2024-12-26T14:18:44.778038Z"}},"outputs":[{"name":"stdout","text":"[148.2095 11.39 540.4283 ... 153.0184 185.0882 141.0508]\n","output_type":"stream"}],"execution_count":29},{"cell_type":"code","source":"from sklearn.tree import DecisionTreeRegressor\n\n# ساخت مدل\nmodel = DecisionTreeRegressor(random_state=42)\n\n# آموزش مدل\nmodel.fit(X_train, y_train)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:18:44.780688Z","iopub.execute_input":"2024-12-26T14:18:44.780988Z","iopub.status.idle":"2024-12-26T14:18:45.069988Z","shell.execute_reply.started":"2024-12-26T14:18:44.780961Z","shell.execute_reply":"2024-12-26T14:18:45.068931Z"}},"outputs":[{"execution_count":30,"output_type":"execute_result","data":{"text/plain":"DecisionTreeRegressor(random_state=42)","text/html":"
DecisionTreeRegressor(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
"},"metadata":{}}],"execution_count":30},{"cell_type":"code","source":"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n\n# پیش‌بینی قیمت‌ها\ny_pred = model.predict(X_test)\n\n# محاسبه معیارهای ارزیابی\nmae = mean_absolute_error(y_test, y_pred)\nmse = mean_squared_error(y_test, y_pred)\nr2 = r2_score(y_test, y_pred)\n\nprint(f'Mean Absolute Error: {mae}')\nprint(f'Mean Squared Error: {mse}')\nprint(f'R-squared: {r2}')","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:19:09.124730Z","iopub.execute_input":"2024-12-26T14:19:09.125108Z","iopub.status.idle":"2024-12-26T14:19:09.139348Z","shell.execute_reply.started":"2024-12-26T14:19:09.125076Z","shell.execute_reply":"2024-12-26T14:19:09.138064Z"}},"outputs":[{"name":"stdout","text":"Mean Absolute Error: 1.4305681159420292\nMean Squared Error: 19.186084202898556\nR-squared: 0.9996716545572174\n","output_type":"stream"}],"execution_count":31},{"cell_type":"code","source":"# هایپرپارامترهای مدل Decision Tree\nfrom sklearn.model_selection import GridSearchCV\n\n# تعیین پارامترها برای جستجوی بهترین ترکیب\nparam_grid = {\n 'max_depth': [5, 10, 15, 20, None],\n 'min_samples_split': [2, 5, 10],\n 'min_samples_leaf': [1, 2, 4]\n}\n\n# جستجو برای بهترین ترکیب هایپرپارامترها\ngrid_search = GridSearchCV(DecisionTreeRegressor(random_state=42), param_grid, cv=3, scoring='neg_mean_squared_error')\ngrid_search.fit(X_train, y_train)\n\n# مشاهده بهترین ترکیب پارامترها\nprint(\"Best Hyperparameters:\", grid_search.best_params_)","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:19:17.196708Z","iopub.execute_input":"2024-12-26T14:19:17.197056Z","iopub.status.idle":"2024-12-26T14:19:33.892141Z","shell.execute_reply.started":"2024-12-26T14:19:17.197028Z","shell.execute_reply":"2024-12-26T14:19:33.891143Z"}},"outputs":[{"name":"stdout","text":"Best Hyperparameters: {'max_depth': 10, 'min_samples_leaf': 4, 'min_samples_split': 2}\n","output_type":"stream"}],"execution_count":32},{"cell_type":"code","source":"# ساخت مدل با بهترین هایپرپارامترها\nbest_model = grid_search.best_estimator_\n\n# آموزش مدل با بهترین هایپرپارامترها\nbest_model.fit(X_train, y_train)\n\n# پیش‌بینی با مدل بهینه‌شده\ny_pred_best = best_model.predict(X_test)\n\n# ارزیابی مدل بهینه‌شده\nmae_best = mean_absolute_error(y_test, y_pred_best)\nmse_best = mean_squared_error(y_test, y_pred_best)\nr2_best = r2_score(y_test, y_pred_best)\n\nprint(f'Optimized Model - Mean Absolute Error: {mae_best}')\nprint(f'Optimized Model - Mean Squared Error: {mse_best}')\nprint(f'Optimized Model - R-squared: {r2_best}')","metadata":{"trusted":true,"execution":{"iopub.status.busy":"2024-12-26T14:19:37.666083Z","iopub.execute_input":"2024-12-26T14:19:37.666492Z","iopub.status.idle":"2024-12-26T14:19:37.849922Z","shell.execute_reply.started":"2024-12-26T14:19:37.666460Z","shell.execute_reply":"2024-12-26T14:19:37.848784Z"}},"outputs":[{"name":"stdout","text":"Optimized Model - Mean Absolute Error: 1.3912500968665278\nOptimized Model - Mean Squared Error: 16.53824871335939\nOptimized Model - R-squared: 0.9997169688958304\n","output_type":"stream"}],"execution_count":33},{"cell_type":"code","source":"","metadata":{"trusted":true},"outputs":[],"execution_count":null}]} -------------------------------------------------------------------------------- /Week 3/2. Analysis Data that extract from one of best libraries/Healthcare.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "a086142d-f422-4dc6-9d93-bdfc123cfa2f", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "import pandas as pd\n", 11 | "import numpy as np\n", 12 | "from plotly.subplots import make_subplots\n", 13 | "import plotly.graph_objects as go\n", 14 | "import plotly.io as pio\n", 15 | "pio.renderers.default = 'notebook' \n", 16 | "import warnings\n", 17 | "warnings.filterwarnings(\"ignore\")" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "id": "4cf47bc1-89b3-4d4b-81bd-620382196ff3", 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "data": { 28 | "text/html": [ 29 | "
\n", 30 | "\n", 43 | "\n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | "
idgenderagehypertensionheart_diseaseever_marriedwork_typeResidence_typeavg_glucose_levelbmismoking_statusstroke
09046Male67.001YesPrivateUrban228.6936.6formerly smoked1
151676Female61.000YesSelf-employedRural202.21NaNnever smoked1
231112Male80.001YesPrivateRural105.9232.5never smoked1
360182Female49.000YesPrivateUrban171.2334.4smokes1
41665Female79.010YesSelf-employedRural174.1224.0never smoked1
\n", 139 | "
" 140 | ], 141 | "text/plain": [ 142 | " id gender age hypertension heart_disease ever_married \\\n", 143 | "0 9046 Male 67.0 0 1 Yes \n", 144 | "1 51676 Female 61.0 0 0 Yes \n", 145 | "2 31112 Male 80.0 0 1 Yes \n", 146 | "3 60182 Female 49.0 0 0 Yes \n", 147 | "4 1665 Female 79.0 1 0 Yes \n", 148 | "\n", 149 | " work_type Residence_type avg_glucose_level bmi smoking_status \\\n", 150 | "0 Private Urban 228.69 36.6 formerly smoked \n", 151 | "1 Self-employed Rural 202.21 NaN never smoked \n", 152 | "2 Private Rural 105.92 32.5 never smoked \n", 153 | "3 Private Urban 171.23 34.4 smokes \n", 154 | "4 Self-employed Rural 174.12 24.0 never smoked \n", 155 | "\n", 156 | " stroke \n", 157 | "0 1 \n", 158 | "1 1 \n", 159 | "2 1 \n", 160 | "3 1 \n", 161 | "4 1 " 162 | ] 163 | }, 164 | "execution_count": 2, 165 | "metadata": {}, 166 | "output_type": "execute_result" 167 | } 168 | ], 169 | "source": [ 170 | "Healthcare=pd.read_csv(\"healthcare-dataset-stroke-data.csv\")\n", 171 | "Healthcare.head()" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 3, 177 | "id": "169b262c-fcf4-45c3-8783-2b937db89c86", 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "data": { 182 | "text/plain": [ 183 | "(5110, 12)" 184 | ] 185 | }, 186 | "execution_count": 3, 187 | "metadata": {}, 188 | "output_type": "execute_result" 189 | } 190 | ], 191 | "source": [ 192 | "Healthcare.shape" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 4, 198 | "id": "69ba7d6b-f732-4189-8882-b6af4610bebf", 199 | "metadata": { 200 | "scrolled": true 201 | }, 202 | "outputs": [ 203 | { 204 | "name": "stdout", 205 | "output_type": "stream", 206 | "text": [ 207 | "\n", 208 | "RangeIndex: 5110 entries, 0 to 5109\n", 209 | "Data columns (total 12 columns):\n", 210 | " # Column Non-Null Count Dtype \n", 211 | "--- ------ -------------- ----- \n", 212 | " 0 id 5110 non-null int64 \n", 213 | " 1 gender 5110 non-null object \n", 214 | " 2 age 5110 non-null float64\n", 215 | " 3 hypertension 5110 non-null int64 \n", 216 | " 4 heart_disease 5110 non-null int64 \n", 217 | " 5 ever_married 5110 non-null object \n", 218 | " 6 work_type 5110 non-null object \n", 219 | " 7 Residence_type 5110 non-null object \n", 220 | " 8 avg_glucose_level 5110 non-null float64\n", 221 | " 9 bmi 4909 non-null float64\n", 222 | " 10 smoking_status 5110 non-null object \n", 223 | " 11 stroke 5110 non-null int64 \n", 224 | "dtypes: float64(3), int64(4), object(5)\n", 225 | "memory usage: 479.2+ KB\n" 226 | ] 227 | } 228 | ], 229 | "source": [ 230 | "Healthcare.info()" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "id": "e75ed4c9-b873-4fbe-a612-ec304339dd31", 236 | "metadata": {}, 237 | "source": [ 238 | "MetaData" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "id": "75a72432-731a-4e9b-83d2-cc0889d73942", 244 | "metadata": {}, 245 | "source": [ 246 | "Dataset Name: Stroke Prediction Dataset\n", 247 | "\n", 248 | "Description: This dataset contains information on individuals' health factors, including medical conditions, lifestyle habits, and demographics, to predict the likelihood of a stroke occurrence.\n", 249 | "\n", 250 | "Data Columns:\n", 251 | "\n", 252 | "id (int64): Unique identifier for each record. \n", 253 | "gender (object): Gender of the individual (e.g., \"Male\", \"Female\"). \n", 254 | "age (float64): Age of the individual in years. \n", 255 | "hypertension (int64): Whether the individual has hypertension (1 = Yes, 0 = No). \n", 256 | "heart_disease (int64): Whether the individual has heart disease (1 = Yes, 0 = No). \n", 257 | "ever_married (object): Marital status (\"Yes\", \"No\"). \n", 258 | "work_type (object): The type of work the individual does (\"Private\", \"Self-employed\", \"Govt_job\", \"children\", \"Never_worked\"). \n", 259 | "Residence_type (object): Type of area where the individual resides (\"Urban\", \"Rural\"). \n", 260 | "avg_glucose_level (float64): Average glucose level in the blood. \n", 261 | "bmi (float64): Body Mass Index (BMI). Some values may be missing (4910 non-null entries). \n", 262 | "smoking_status (object): Smoking status (\"formerly smoked\", \"Never smoked\", \"Smokes\", \"Unknown\"). \n", 263 | "stroke (int64): Target variable indicating whether the individual had a stroke (1 = Yes, 0 = No)." 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 5, 269 | "id": "a79feb38-a26d-4933-a1f9-e72590c7b53e", 270 | "metadata": {}, 271 | "outputs": [ 272 | { 273 | "data": { 274 | "text/plain": [ 275 | "np.int64(0)" 276 | ] 277 | }, 278 | "execution_count": 5, 279 | "metadata": {}, 280 | "output_type": "execute_result" 281 | } 282 | ], 283 | "source": [ 284 | "np.sum(Healthcare.duplicated())" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "id": "c6712ca0-d516-4abf-908c-d3c95d8e7636", 290 | "metadata": {}, 291 | "source": [ 292 | "There is no any duplicated." 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "id": "26670296-5451-44ef-af90-d4396ddc204b", 298 | "metadata": {}, 299 | "source": [ 300 | "Handling Missing Values" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 6, 306 | "id": "73db8488-da33-4217-8ed5-f883271258ac", 307 | "metadata": {}, 308 | "outputs": [ 309 | { 310 | "data": { 311 | "text/plain": [ 312 | "id 0\n", 313 | "gender 0\n", 314 | "age 0\n", 315 | "hypertension 0\n", 316 | "heart_disease 0\n", 317 | "ever_married 0\n", 318 | "work_type 0\n", 319 | "Residence_type 0\n", 320 | "avg_glucose_level 0\n", 321 | "bmi 201\n", 322 | "smoking_status 0\n", 323 | "stroke 0\n", 324 | "dtype: int64" 325 | ] 326 | }, 327 | "execution_count": 6, 328 | "metadata": {}, 329 | "output_type": "execute_result" 330 | } 331 | ], 332 | "source": [ 333 | "np.sum(Healthcare.isnull())" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "id": "b84328bb-a1a8-49bc-875a-b9ee2d26008b", 339 | "metadata": {}, 340 | "source": [ 341 | "Result: There is no any missing value in dataset except bmi, so we just need to handle missing values in bmi." 342 | ] 343 | }, 344 | { 345 | "cell_type": "code", 346 | "execution_count": 7, 347 | "id": "4764c087-616a-4c61-a6b9-1f279ded3e48", 348 | "metadata": {}, 349 | "outputs": [ 350 | { 351 | "name": "stdout", 352 | "output_type": "stream", 353 | "text": [ 354 | "Percentage of missing values in 'bmi': 3.93%\n" 355 | ] 356 | } 357 | ], 358 | "source": [ 359 | "missing_percentage = Healthcare['bmi'].isnull().mean() * 100\n", 360 | "print(f\"Percentage of missing values in '{'bmi'}': {missing_percentage:.2f}%\")" 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": 8, 366 | "id": "6f4611c1-f779-402c-b7e7-1a645b51435f", 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "data": { 371 | "text/plain": [ 372 | "(np.float64(28.893236911794666), np.float64(28.1))" 373 | ] 374 | }, 375 | "execution_count": 8, 376 | "metadata": {}, 377 | "output_type": "execute_result" 378 | } 379 | ], 380 | "source": [ 381 | "Healthcare.bmi.mean(),Healthcare.bmi.median()" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 9, 387 | "id": "f52b459e-e12c-46e0-a0a4-bfbe5f2a40c2", 388 | "metadata": {}, 389 | "outputs": [ 390 | { 391 | "data": { 392 | "text/plain": [ 393 | "" 394 | ] 395 | }, 396 | "execution_count": 9, 397 | "metadata": {}, 398 | "output_type": "execute_result" 399 | }, 400 | { 401 | "data": { 402 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkcAAAGdCAYAAAAYDtcjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAwoUlEQVR4nO3deXRUZZ7G8acgVEEwC4tJJUOAiAiyb3bMsLQ0dAJkaBW6p9kjRFEMCkQxplsRpJtgaNPiSjtHQI8gyBlEBaUJu0pkNYZFIyAQlFRgBFIQJGS584eTO7cMtlCEVMV8P+fcc3Lf9617f9f3SD3n3reqbIZhGAIAAIAkqZ6vCwAAAPAnhCMAAAALwhEAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFgQjgAAACwCfF1AbVBRUaETJ04oKChINpvN1+UAAIArYBiGzp07p8jISNWrd+X3gwhHV+DEiROKiorydRkAAMALx48fV4sWLa54POHoCgQFBUn64T9ucHCwj6sBAABXwu12Kyoqynwfv1KEoytQ+SgtODiYcAQAQC1ztUtiWJANAABgQTgCAACwIBwBAABYEI4AAAAsfBqO0tPTddtttykoKEhhYWG66667lJeX5zHm4sWLSk5OVrNmzXTDDTdo+PDhKiws9BiTn5+vhIQEBQYGKiwsTNOnT1dZWZnHmM2bN6tHjx5yOBy6+eabtXjx4ut9eQAAoBbyaTjasmWLkpOT9emnnyorK0ulpaWKi4tTcXGxOWbatGl6//33tWLFCm3ZskUnTpzQsGHDzP7y8nIlJCTo0qVL2rZtm15//XUtXrxYM2bMMMccOXJECQkJ6t+/v3JycjR16lTde++9+uc//1mj1wsAAPyfzTAMw9dFVDp16pTCwsK0ZcsW9evXT0VFRbrxxhu1dOlS/f73v5ckffnll7r11luVnZ2t22+/XR9++KH+4z/+QydOnFB4eLgkacGCBUpNTdWpU6dkt9uVmpqqNWvWaN++fea5RowYobNnz2rt2rU/W5fb7VZISIiKior4KD8AALWEt+/ffrXmqKioSJLUtGlTSdLu3btVWlqqgQMHmmPat2+vli1bKjs7W5KUnZ2tzp07m8FIkuLj4+V2u7V//35zjPUYlWMqj/FjJSUlcrvdHhsAAKgb/CYcVVRUaOrUqerdu7c6deokSXK5XLLb7QoNDfUYGx4eLpfLZY6xBqPK/sq+fzXG7Xbr+++/r1JLenq6QkJCzI2fDgEAoO7wm3CUnJysffv2admyZb4uRWlpaSoqKjK348eP+7okAABQQ/zi50MmT56s1atXa+vWrR4/DOd0OnXp0iWdPXvW4+5RYWGhnE6nOWbHjh0ex6v8NJt1zI8/4VZYWKjg4GA1atSoSj0Oh0MOh6Narg0AANQuPr1zZBiGJk+erHfeeUcbN25UdHS0R3/Pnj3VoEEDbdiwwWzLy8tTfn6+YmNjJUmxsbHau3evTp48aY7JyspScHCwOnToYI6xHqNyTOUxAAAAKvn002oPPvigli5dqnfffVft2rUz20NCQsw7OpMmTdIHH3ygxYsXKzg4WA899JAkadu2bZJ++Ch/t27dFBkZqYyMDLlcLo0dO1b33nuv5syZI+mHj/J36tRJycnJmjBhgjZu3KiHH35Ya9asUXx8/M/WyafVAACofbx9//ZpOPqpX8ldtGiR7rnnHkk/fAnkI488orfeekslJSWKj4/Xyy+/bD4yk6Rjx45p0qRJ2rx5sxo3bqzExETNnTtXAQH//9Rw8+bNmjZtmg4cOKAWLVroySefNM/xcwhHAADUPrUyHNUWhKOqWj++xtclXLWjcxN8XQIAoAb9Ir7nCAAAwNcIRwAAABaEIwAAAAvCEQAAgAXhCAAAwIJwBAAAYEE4AgAAsCAcAQAAWBCOAAAALAhHAAAAFoQjAAAAC8IRAACABeEIAADAgnAEAABgQTgCAACwIBwBAABYEI4AAAAsCEcAAAAWhCMAAAALwhEAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAIsAXxcA1JTWj6/xdQlX7ejcBF+XAAB1DneOAAAALAhHAAAAFoQjAAAAC8IRAACABeEIAADAwqfhaOvWrRo6dKgiIyNls9m0atUqj36bzXbZbd68eeaY1q1bV+mfO3eux3Fyc3PVt29fNWzYUFFRUcrIyKiJywMAALWQT8NRcXGxunbtqpdeeumy/QUFBR7bwoULZbPZNHz4cI9xTz/9tMe4hx56yOxzu92Ki4tTq1attHv3bs2bN08zZ87Uq6++el2vDQAA1E4+/Z6jwYMHa/DgwT/Z73Q6Pfbfffdd9e/fXzfddJNHe1BQUJWxlZYsWaJLly5p4cKFstvt6tixo3JycpSZmamJEyde+0UAAIBflFqz5qiwsFBr1qxRUlJSlb65c+eqWbNm6t69u+bNm6eysjKzLzs7W/369ZPdbjfb4uPjlZeXpzNnztRI7QAAoPaoNd+Q/frrrysoKEjDhg3zaH/44YfVo0cPNW3aVNu2bVNaWpoKCgqUmZkpSXK5XIqOjvZ4TXh4uNnXpEmTKucqKSlRSUmJue92u6v7cgAAgJ+qNeFo4cKFGj16tBo2bOjRnpKSYv7dpUsX2e123X///UpPT5fD4fDqXOnp6Zo1a9Y11QsAAGqnWvFY7aOPPlJeXp7uvffenx0bExOjsrIyHT16VNIP65YKCws9xlTu/9Q6pbS0NBUVFZnb8ePHr+0CAABArVErwtFrr72mnj17qmvXrj87NicnR/Xq1VNYWJgkKTY2Vlu3blVpaak5JisrS+3atbvsIzVJcjgcCg4O9tgAAEDd4NNwdP78eeXk5CgnJ0eSdOTIEeXk5Cg/P98c43a7tWLFisveNcrOztZzzz2nzz//XF9//bWWLFmiadOmacyYMWbwGTVqlOx2u5KSkrR//34tX75c8+fP93gcBwAAUMmna4527dql/v37m/uVgSUxMVGLFy+WJC1btkyGYWjkyJFVXu9wOLRs2TLNnDlTJSUlio6O1rRp0zyCT0hIiNatW6fk5GT17NlTzZs314wZM/gYPwAAuCybYRiGr4vwd263WyEhISoqKuIR2/9p/fgaX5dQJxydm+DrEgCg1vL2/btWrDkCAACoKYQjAAAAC8IRAACABeEIAADAgnAEAABgQTgCAACwIBwBAABYEI4AAAAsCEcAAAAWhCMAAAALwhEAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAAvCEQAAgAXhCAAAwIJwBAAAYEE4AgAAsCAcAQAAWBCOAAAALAhHAAAAFoQjAAAAC8IRAACABeEIAADAgnAEAABgQTgCAACwIBwBAABYEI4AAAAsCEcAAAAWhCMAAAALn4ajrVu3aujQoYqMjJTNZtOqVas8+u+55x7ZbDaPbdCgQR5jTp8+rdGjRys4OFihoaFKSkrS+fPnPcbk5uaqb9++atiwoaKiopSRkXG9Lw0AANRSPg1HxcXF6tq1q1566aWfHDNo0CAVFBSY21tvveXRP3r0aO3fv19ZWVlavXq1tm7dqokTJ5r9brdbcXFxatWqlXbv3q158+Zp5syZevXVV6/bdQEAgNorwJcnHzx4sAYPHvwvxzgcDjmdzsv2ffHFF1q7dq127typXr16SZJeeOEFDRkyRH/7298UGRmpJUuW6NKlS1q4cKHsdrs6duyonJwcZWZmeoQoAAAAqRasOdq8ebPCwsLUrl07TZo0Sd99953Zl52drdDQUDMYSdLAgQNVr149bd++3RzTr18/2e12c0x8fLzy8vJ05syZy56zpKREbrfbYwMAAHWDX4ejQYMG6Y033tCGDRv0zDPPaMuWLRo8eLDKy8slSS6XS2FhYR6vCQgIUNOmTeVyucwx4eHhHmMq9yvH/Fh6erpCQkLMLSoqqrovDQAA+CmfPlb7OSNGjDD/7ty5s7p06aI2bdpo8+bNGjBgwHU7b1pamlJSUsx9t9tNQAIAoI7w6ztHP3bTTTepefPmOnTokCTJ6XTq5MmTHmPKysp0+vRpc52S0+lUYWGhx5jK/Z9ay+RwOBQcHOyxAQCAuqFWhaNvvvlG3333nSIiIiRJsbGxOnv2rHbv3m2O2bhxoyoqKhQTE2OO2bp1q0pLS80xWVlZateunZo0aVKzFwAAAPyeT8PR+fPnlZOTo5ycHEnSkSNHlJOTo/z8fJ0/f17Tp0/Xp59+qqNHj2rDhg268847dfPNNys+Pl6SdOutt2rQoEG67777tGPHDn3yySeaPHmyRowYocjISEnSqFGjZLfblZSUpP3792v58uWaP3++x2MzAACASj4NR7t27VL37t3VvXt3SVJKSoq6d++uGTNmqH79+srNzdXvfvc73XLLLUpKSlLPnj310UcfyeFwmMdYsmSJ2rdvrwEDBmjIkCHq06ePx3cYhYSEaN26dTpy5Ih69uypRx55RDNmzOBj/AAA4LJshmEYvi7C37ndboWEhKioqIj1R/+n9eNrfF1CnXB0boKvSwCAWsvb9+9ateYIAADgeiMcAQAAWBCOAAAALAhHAAAAFoQjAAAAC8IRAACABeEIAADAgnAEAABgQTgCAACwIBwBAABYEI4AAAAsCEcAAAAWhCMAAAALwhEAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAAvCEQAAgAXhCAAAwIJwBAAAYEE4AgAAsCAcAQAAWBCOAAAALAhHAAAAFoQjAAAAC8IRAACABeEIAADAgnAEAABgQTgCAACwIBwBAABY+DQcbd26VUOHDlVkZKRsNptWrVpl9pWWlio1NVWdO3dW48aNFRkZqXHjxunEiRMex2jdurVsNpvHNnfuXI8xubm56tu3rxo2bKioqChlZGTUxOUBAIBayKfhqLi4WF27dtVLL71Upe/ChQvas2ePnnzySe3Zs0crV65UXl6efve731UZ+/TTT6ugoMDcHnroIbPP7XYrLi5OrVq10u7duzVv3jzNnDlTr7766nW9NgAAUDsF+PLkgwcP1uDBgy/bFxISoqysLI+2F198Ub/61a+Un5+vli1bmu1BQUFyOp2XPc6SJUt06dIlLVy4UHa7XR07dlROTo4yMzM1ceLE6rsYAADwi1Cr1hwVFRXJZrMpNDTUo33u3Llq1qyZunfvrnnz5qmsrMzsy87OVr9+/WS32822+Ph45eXl6cyZM5c9T0lJidxut8cGAADqBp/eOboaFy9eVGpqqkaOHKng4GCz/eGHH1aPHj3UtGlTbdu2TWlpaSooKFBmZqYkyeVyKTo62uNY4eHhZl+TJk2qnCs9PV2zZs26jlcDAAD8Va0IR6WlpfrP//xPGYahV155xaMvJSXF/LtLly6y2+26//77lZ6eLofD4dX50tLSPI7rdrsVFRXlXfEAAKBW8ftwVBmMjh07po0bN3rcNbqcmJgYlZWV6ejRo2rXrp2cTqcKCws9xlTu/9Q6JYfD4XWwAgAAtZtfrzmqDEYHDx7U+vXr1axZs599TU5OjurVq6ewsDBJUmxsrLZu3arS0lJzTFZWltq1a3fZR2oAAKBu8+mdo/Pnz+vQoUPm/pEjR5STk6OmTZsqIiJCv//977Vnzx6tXr1a5eXlcrlckqSmTZvKbrcrOztb27dvV//+/RUUFKTs7GxNmzZNY8aMMYPPqFGjNGvWLCUlJSk1NVX79u3T/Pnz9fe//90n1wwAAPybzTAMw1cn37x5s/r371+lPTExUTNnzqyykLrSpk2bdMcdd2jPnj168MEH9eWXX6qkpETR0dEaO3asUlJSPB6L5ebmKjk5WTt37lTz5s310EMPKTU19YrrdLvdCgkJUVFR0c8+1qsrWj++xtcl1AlH5yb4ugQAqLW8ff/2aTiqLQhHVRGOagbhCAC85+37t1drjr7++mtvXgYAAOD3vApHN998s/r3768333xTFy9erO6aAAAAfMarcLRnzx516dJFKSkpcjqduv/++7Vjx47qrg0AAKDGeRWOunXrpvnz5+vEiRNauHChCgoK1KdPH3Xq1EmZmZk6depUddcJAABQI67pe44CAgI0bNgwrVixQs8884wOHTqkRx99VFFRURo3bpwKCgqqq04AAIAacU3haNeuXXrwwQcVERGhzMxMPfroozp8+LCysrJ04sQJ3XnnndVVJwAAQI3w6ksgMzMztWjRIuXl5WnIkCF64403NGTIENWr90PWio6O1uLFi9W6devqrBUAAOC68yocvfLKK5owYYLuueceRUREXHZMWFiYXnvttWsqDgAAoKZ5FY4OHjz4s2PsdrsSExO9OTwAAIDPeLXmaNGiRVqxYkWV9hUrVuj111+/5qIAAAB8xatwlJ6erubNm1dpDwsL05w5c665KAAAAF/xKhzl5+df9kdhW7Vqpfz8/GsuCgAAwFe8CkdhYWHKzc2t0v7555+rWbNm11wUAACAr3gVjkaOHKmHH35YmzZtUnl5ucrLy7Vx40ZNmTJFI0aMqO4aAQAAaoxXn1abPXu2jh49qgEDBigg4IdDVFRUaNy4caw5AgAAtZpX4chut2v58uWaPXu2Pv/8czVq1EidO3dWq1atqrs+AACAGuVVOKp0yy236JZbbqmuWgAAAHzOq3BUXl6uxYsXa8OGDTp58qQqKio8+jdu3FgtxQEAANQ0r8LRlClTtHjxYiUkJKhTp06y2WzVXRcAAIBPeBWOli1bprfffltDhgyp7noAAAB8yquP8tvtdt18883VXQsAAIDPeRWOHnnkEc2fP1+GYVR3PQAAAD7l1WO1jz/+WJs2bdKHH36ojh07qkGDBh79K1eurJbiAAAAappX4Sg0NFR33313ddcCAADgc16Fo0WLFlV3HQAAAH7BqzVHklRWVqb169frH//4h86dOydJOnHihM6fP19txQEAANQ0r+4cHTt2TIMGDVJ+fr5KSkr029/+VkFBQXrmmWdUUlKiBQsWVHedAAAANcKrO0dTpkxRr169dObMGTVq1Mhsv/vuu7Vhw4ZqKw4AAKCmeXXn6KOPPtK2bdtkt9s92lu3bq1vv/22WgoDAADwBa/uHFVUVKi8vLxK+zfffKOgoKBrLgoAAMBXvApHcXFxeu6558x9m82m8+fP66mnnuInRQAAQK3m1WO1Z599VvHx8erQoYMuXryoUaNG6eDBg2revLneeuut6q4RAACgxngVjlq0aKHPP/9cy5YtU25urs6fP6+kpCSNHj3aY4E2AABAbeNVOJKkgIAAjRkzpjprAQAA8Dmv1hy98cYb/3K7Ulu3btXQoUMVGRkpm82mVatWefQbhqEZM2YoIiJCjRo10sCBA3Xw4EGPMadPn9bo0aMVHBys0NBQJSUlVfkiytzcXPXt21cNGzZUVFSUMjIyvLlsAABQB3h152jKlCke+6Wlpbpw4YLsdrsCAwM1bty4KzpOcXGxunbtqgkTJmjYsGFV+jMyMvT888/r9ddfV3R0tJ588knFx8frwIEDatiwoSRp9OjRKigoUFZWlkpLSzV+/HhNnDhRS5culSS53W7FxcVp4MCBWrBggfbu3asJEyYoNDRUEydO9ObyAQDAL5hX4ejMmTNV2g4ePKhJkyZp+vTpV3ycwYMHa/DgwZftMwxDzz33nJ544gndeeedkn64YxUeHq5Vq1ZpxIgR+uKLL7R27Vrt3LlTvXr1kiS98MILGjJkiP72t78pMjJSS5Ys0aVLl7Rw4ULZ7XZ17NhROTk5yszMJBwBAIAqvP5ttR9r27at5s6dW+WukreOHDkil8ulgQMHmm0hISGKiYlRdna2JCk7O1uhoaFmMJKkgQMHql69etq+fbs5pl+/fh5fWBkfH6+8vLzLhjxJKikpkdvt9tgAAEDdUG3hSPphkfaJEyeq5Vgul0uSFB4e7tEeHh5u9rlcLoWFhVWpoWnTph5jLncM6zl+LD09XSEhIeYWFRV17RcEAABqBa8eq7333nse+4ZhqKCgQC+++KJ69+5dLYX5UlpamlJSUsx9t9tNQAIAoI7wKhzdddddHvs2m0033nijfvOb3+jZZ5+tjrrkdDolSYWFhYqIiDDbCwsL1a1bN3PMyZMnPV5XVlam06dPm693Op0qLCz0GFO5XznmxxwOhxwOR7VcBwAAqF28/m0161ZeXi6Xy6WlS5d6BJlrER0dLafTqQ0bNphtbrdb27dvV2xsrCQpNjZWZ8+e1e7du80xGzduVEVFhWJiYswxW7duVWlpqTkmKytL7dq1U5MmTaqlVgAA8MtRrWuOrtb58+eVk5OjnJwcST8sws7JyVF+fr5sNpumTp2qv/zlL3rvvfe0d+9ejRs3TpGRkeadq1tvvVWDBg3Sfffdpx07duiTTz7R5MmTNWLECEVGRkqSRo0aJbvdrqSkJO3fv1/Lly/X/PnzPR6bAQAAVPLqsdrVBIvMzMyf7Nu1a5f69+9f5biJiYlavHixHnvsMRUXF2vixIk6e/as+vTpo7Vr15rfcSRJS5Ys0eTJkzVgwADVq1dPw4cP1/PPP2/2h4SEaN26dUpOTlbPnj3VvHlzzZgxg4/xAwCAy7IZhmFc7Yv69++vzz77TKWlpWrXrp0k6auvvlL9+vXVo0eP/z+4zaaNGzdWX7U+4na7FRISoqKiIgUHB/u6HL/Q+vE1vi6hTjg6N8HXJQBAreXt+7dXd46GDh2qoKAgvf766+a6nTNnzmj8+PHq27evHnnkEW8OCwAA4HNerTl69tlnlZ6e7rGguUmTJvrLX/5SbZ9WAwAA8AWvwpHb7dapU6eqtJ86dUrnzp275qIAAAB8xatwdPfdd2v8+PFauXKlvvnmG33zzTf67//+byUlJV32B2QBAABqC6/WHC1YsECPPvqoRo0aZX5/UEBAgJKSkjRv3rxqLRAAAKAmeRWOAgMD9fLLL2vevHk6fPiwJKlNmzZq3LhxtRYHAABQ067pSyALCgpUUFCgtm3bqnHjxvLiWwEAAAD8ilfh6LvvvtOAAQN0yy23aMiQISooKJAkJSUl8TF+AABQq3kVjqZNm6YGDRooPz9fgYGBZvsf//hHrV27ttqKAwAAqGlerTlat26d/vnPf6pFixYe7W3bttWxY8eqpTAAAABf8OrOUXFxsccdo0qnT5+Ww+G45qIAAAB8xatw1LdvX73xxhvmvs1mU0VFhTIyMjx+SBYAAKC28eqxWkZGhgYMGKBdu3bp0qVLeuyxx7R//36dPn1an3zySXXXCAAAUGO8unPUqVMnffXVV+rTp4/uvPNOFRcXa9iwYfrss8/Upk2b6q4RAACgxlz1naPS0lINGjRICxYs0J///OfrURMAAIDPXPWdowYNGig3N/d61AIAAOBzXj1WGzNmjF577bXqrgUAAMDnvFqQXVZWpoULF2r9+vXq2bNnld9Uy8zMrJbiAAAAatpVhaOvv/5arVu31r59+9SjRw9J0ldffeUxxmazVV91AAAANeyqwlHbtm1VUFCgTZs2Sfrh50Kef/55hYeHX5fiAAAAatpVrTkyDMNj/8MPP1RxcXG1FgQAAOBLXi3IrvTjsAQAAFDbXVU4stlsVdYUscYIAAD8klzVmiPDMHTPPfeYPy578eJFPfDAA1U+rbZy5crqqxAAAKAGXVU4SkxM9NgfM2ZMtRYDAADga1cVjhYtWnS96gAAAPAL17QgGwAA4JeGcAQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAAvCEQAAgIXfh6PWrVubv+lm3ZKTkyVJd9xxR5W+Bx54wOMY+fn5SkhIUGBgoMLCwjR9+nSVlZX54nIAAICfu6pvyPaFnTt3qry83Nzft2+ffvvb3+oPf/iD2Xbffffp6aefNvcDAwPNv8vLy5WQkCCn06lt27apoKBA48aNU4MGDTRnzpyauQgAAFBr+H04uvHGGz32586dqzZt2ujXv/612RYYGCin03nZ169bt04HDhzQ+vXrFR4erm7dumn27NlKTU3VzJkzZbfbr2v9AACgdvH7x2pWly5d0ptvvqkJEybIZrOZ7UuWLFHz5s3VqVMnpaWl6cKFC2Zfdna2OnfurPDwcLMtPj5ebrdb+/fvv+x5SkpK5Ha7PTYAAFA3+P2dI6tVq1bp7Nmzuueee8y2UaNGqVWrVoqMjFRubq5SU1OVl5enlStXSpJcLpdHMJJk7rtcrsueJz09XbNmzbo+FwEAAPxarQpHr732mgYPHqzIyEizbeLEiebfnTt3VkREhAYMGKDDhw+rTZs2Xp0nLS1NKSkp5r7b7VZUVJT3hQMAgFqj1oSjY8eOaf369eYdoZ8SExMjSTp06JDatGkjp9OpHTt2eIwpLCyUpJ9cp+RwOORwOKqhagAAUNvUmjVHixYtUlhYmBISEv7luJycHElSRESEJCk2NlZ79+7VyZMnzTFZWVkKDg5Whw4drlu9AACgdqoVd44qKiq0aNEiJSYmKiDg/0s+fPiwli5dqiFDhqhZs2bKzc3VtGnT1K9fP3Xp0kWSFBcXpw4dOmjs2LHKyMiQy+XSE088oeTkZO4OAQCAKmpFOFq/fr3y8/M1YcIEj3a73a7169frueeeU3FxsaKiojR8+HA98cQT5pj69etr9erVmjRpkmJjY9W4cWMlJiZ6fC8SAABApVoRjuLi4mQYRpX2qKgobdmy5Wdf36pVK33wwQfXozQAAPALU2vWHAEAANQEwhEAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAAvCEQAAgAXhCAAAwIJwBAAAYEE4AgAAsCAcAQAAWBCOAAAALAhHAAAAFoQjAAAAC8IRAACABeEIAADAgnAEAABgQTgCAACwIBwBAABYEI4AAAAsCEcAAAAWAb4uAMBPa/34Gl+XcNWOzk3wdQkAcE24cwQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAAvCEQAAgAXhCAAAwMKvw9HMmTNls9k8tvbt25v9Fy9eVHJyspo1a6YbbrhBw4cPV2Fhoccx8vPzlZCQoMDAQIWFhWn69OkqKyur6UsBAAC1hN9/Q3bHjh21fv16cz8g4P9LnjZtmtasWaMVK1YoJCREkydP1rBhw/TJJ59IksrLy5WQkCCn06lt27apoKBA48aNU4MGDTRnzpwavxYAAOD//D4cBQQEyOl0VmkvKirSa6+9pqVLl+o3v/mNJGnRokW69dZb9emnn+r222/XunXrdODAAa1fv17h4eHq1q2bZs+erdTUVM2cOVN2u72mLwcAAPg5v36sJkkHDx5UZGSkbrrpJo0ePVr5+fmSpN27d6u0tFQDBw40x7Zv314tW7ZUdna2JCk7O1udO3dWeHi4OSY+Pl5ut1v79++v2QsBAAC1gl/fOYqJidHixYvVrl07FRQUaNasWerbt6/27dsnl8slu92u0NBQj9eEh4fL5XJJklwul0cwquyv7PspJSUlKikpMffdbnc1XREAAPB3fh2OBg8ebP7dpUsXxcTEqFWrVnr77bfVqFGj63be9PR0zZo167odHwAA+C+/f6xmFRoaqltuuUWHDh2S0+nUpUuXdPbsWY8xhYWF5holp9NZ5dNrlfuXW8dUKS0tTUVFReZ2/Pjx6r0QAADgt2pVODp//rwOHz6siIgI9ezZUw0aNNCGDRvM/ry8POXn5ys2NlaSFBsbq7179+rkyZPmmKysLAUHB6tDhw4/eR6Hw6Hg4GCPDQAA1A1+/Vjt0Ucf1dChQ9WqVSudOHFCTz31lOrXr6+RI0cqJCRESUlJSklJUdOmTRUcHKyHHnpIsbGxuv322yVJcXFx6tChg8aOHauMjAy5XC498cQTSk5OlsPh8PHVAQAAf+TX4eibb77RyJEj9d133+nGG29Unz599Omnn+rGG2+UJP39739XvXr1NHz4cJWUlCg+Pl4vv/yy+fr69etr9erVmjRpkmJjY9W4cWMlJibq6aef9tUlAQAAP2czDMPwdRH+zu12KyQkREVFRTxi+z+tH1/j6xLgp47OTfB1CQAgyfv371q15ggAAOB6IxwBAABYEI4AAAAsCEcAAAAWhCMAAAALwhEAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFj49W+r1RX8FAcAAP6DO0cAAAAWhCMAAAALwhEAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAAvCEQAAgAXhCAAAwIJwBAAAYEE4AgAAsCAcAQAAWBCOAAAALAhHAAAAFoQjAAAAC8IRAACABeEIAADAgnAEAABgQTgCAACwIBwBAABY+HU4Sk9P12233aagoCCFhYXprrvuUl5enseYO+64QzabzWN74IEHPMbk5+crISFBgYGBCgsL0/Tp01VWVlaTlwIAAGqJAF8X8K9s2bJFycnJuu2221RWVqY//elPiouL04EDB9S4cWNz3H333aenn37a3A8MDDT/Li8vV0JCgpxOp7Zt26aCggKNGzdODRo00Jw5c2r0egAAgP/z63C0du1aj/3FixcrLCxMu3fvVr9+/cz2wMBAOZ3Oyx5j3bp1OnDggNavX6/w8HB169ZNs2fPVmpqqmbOnCm73X5drwEAANQufv1Y7ceKiookSU2bNvVoX7JkiZo3b65OnTopLS1NFy5cMPuys7PVuXNnhYeHm23x8fFyu93av3//Zc9TUlIit9vtsQEAgLrBr+8cWVVUVGjq1Knq3bu3OnXqZLaPGjVKrVq1UmRkpHJzc5Wamqq8vDytXLlSkuRyuTyCkSRz3+VyXfZc6enpmjVr1nW6EgAA4M9qTThKTk7Wvn379PHHH3u0T5w40fy7c+fOioiI0IABA3T48GG1adPGq3OlpaUpJSXF3He73YqKivKucAAAUKvUisdqkydP1urVq7Vp0ya1aNHiX46NiYmRJB06dEiS5HQ6VVhY6DGmcv+n1ik5HA4FBwd7bAAAoG7w63BkGIYmT56sd955Rxs3blR0dPTPviYnJ0eSFBERIUmKjY3V3r17dfLkSXNMVlaWgoOD1aFDh+tSNwAAqL38+rFacnKyli5dqnfffVdBQUHmGqGQkBA1atRIhw8f1tKlSzVkyBA1a9ZMubm5mjZtmvr166cuXbpIkuLi4tShQweNHTtWGRkZcrlceuKJJ5ScnCyHw+HLywMAAH7Ir+8cvfLKKyoqKtIdd9yhiIgIc1u+fLkkyW63a/369YqLi1P79u31yCOPaPjw4Xr//ffNY9SvX1+rV69W/fr1FRsbqzFjxmjcuHEe34sEAABQya/vHBmG8S/7o6KitGXLlp89TqtWrfTBBx9UV1kAAOAXzK/vHAEAANQ0whEAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAAvCEQAAgAXhCAAAwIJwBAAAYEE4AgAAsCAcAQAAWBCOAAAALAhHAAAAFoQjAAAAC8IRAACABeEIAADAIsDXBQD4ZWn9+Bpfl3DVjs5N8HUJAPwId44AAAAsCEcAAAAWhCMAAAALwhEAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAAvCEQAAgAU/PAugzuPHcgFYcecIAADAok6Fo5deekmtW7dWw4YNFRMTox07dvi6JAAA4GfqTDhavny5UlJS9NRTT2nPnj3q2rWr4uPjdfLkSV+XBgAA/EidCUeZmZm67777NH78eHXo0EELFixQYGCgFi5c6OvSAACAH6kTC7IvXbqk3bt3Ky0tzWyrV6+eBg4cqOzs7CrjS0pKVFJSYu4XFRVJktxu93Wpr6LkwnU5LoBfruv17xHwS1L5/4lhGFf1ujoRjv7nf/5H5eXlCg8P92gPDw/Xl19+WWV8enq6Zs2aVaU9KirqutUIAFcj5DlfVwDUHufOnVNISMgVj68T4ehqpaWlKSUlxdyvqKjQ6dOn1axZM9lsNh9W5r/cbreioqJ0/PhxBQcH+7ocWDA3/o358V/Mjf+60rkxDEPnzp1TZGTkVR2/ToSj5s2bq379+iosLPRoLywslNPprDLe4XDI4XB4tIWGhl7PEn8xgoOD+UfETzE3/o358V/Mjf+6krm5mjtGlerEgmy73a6ePXtqw4YNZltFRYU2bNig2NhYH1YGAAD8TZ24cyRJKSkpSkxMVK9evfSrX/1Kzz33nIqLizV+/HhflwYAAPxInQlHf/zjH3Xq1CnNmDFDLpdL3bp109q1a6ss0oZ3HA6HnnrqqSqPI+F7zI1/Y378F3Pjv6733NiMq/18GwAAwC9YnVhzBAAAcKUIRwAAABaEIwAAAAvCEQAAgAXhCFcsPT1dt912m4KCghQWFqa77rpLeXl5HmMuXryo5ORkNWvWTDfccIOGDx9e5cs3cf3NnTtXNptNU6dONduYG9/69ttvNWbMGDVr1kyNGjVS586dtWvXLrPfMAzNmDFDERERatSokQYOHKiDBw/6sOK6oby8XE8++aSio6PVqFEjtWnTRrNnz/b4LS7mpuZs3bpVQ4cOVWRkpGw2m1atWuXRfyVzcfr0aY0ePVrBwcEKDQ1VUlKSzp8/f1V1EI5wxbZs2aLk5GR9+umnysrKUmlpqeLi4lRcXGyOmTZtmt5//32tWLFCW7Zs0YkTJzRs2DAfVl337Ny5U//4xz/UpUsXj3bmxnfOnDmj3r17q0GDBvrwww914MABPfvss2rSpIk5JiMjQ88//7wWLFig7du3q3HjxoqPj9fFixd9WPkv3zPPPKNXXnlFL774or744gs988wzysjI0AsvvGCOYW5qTnFxsbp27aqXXnrpsv1XMhejR4/W/v37lZWVpdWrV2vr1q2aOHHi1RViAF46efKkIcnYsmWLYRiGcfbsWaNBgwbGihUrzDFffPGFIcnIzs72VZl1yrlz54y2bdsaWVlZxq9//WtjypQphmEwN76Wmppq9OnT5yf7KyoqDKfTacybN89sO3v2rOFwOIy33nqrJkqssxISEowJEyZ4tA0bNswYPXq0YRjMjS9JMt555x1z/0rm4sCBA4YkY+fOneaYDz/80LDZbMa33357xefmzhG8VlRUJElq2rSpJGn37t0qLS3VwIEDzTHt27dXy5YtlZ2d7ZMa65rk5GQlJCR4zIHE3Pjae++9p169eukPf/iDwsLC1L17d/3Xf/2X2X/kyBG5XC6P+QkJCVFMTAzzc539+7//uzZs2KCvvvpKkvT555/r448/1uDBgyUxN/7kSuYiOztboaGh6tWrlzlm4MCBqlevnrZv337F56oz35CN6lVRUaGpU6eqd+/e6tSpkyTJ5XLJbrdX+ZHe8PBwuVwuH1RZtyxbtkx79uzRzp07q/QxN7719ddf65VXXlFKSor+9Kc/aefOnXr44Ydlt9uVmJhozsGPv7Gf+bn+Hn/8cbndbrVv317169dXeXm5/vrXv2r06NGSxNz4kSuZC5fLpbCwMI/+gIAANW3a9Krmi3AEryQnJ2vfvn36+OOPfV0KJB0/flxTpkxRVlaWGjZs6Oty8CMVFRXq1auX5syZI0nq3r279u3bpwULFigxMdHH1dVtb7/9tpYsWaKlS5eqY8eOysnJ0dSpUxUZGcnc1GE8VsNVmzx5slavXq1NmzapRYsWZrvT6dSlS5d09uxZj/GFhYVyOp01XGXdsnv3bp08eVI9evRQQECAAgICtGXLFj3//PMKCAhQeHg4c+NDERER6tChg0fbrbfeqvz8fEky5+DHnx5kfq6/6dOn6/HHH9eIESPUuXNnjR07VtOmTVN6erok5safXMlcOJ1OnTx50qO/rKxMp0+fvqr5IhzhihmGocmTJ+udd97Rxo0bFR0d7dHfs2dPNWjQQBs2bDDb8vLylJ+fr9jY2Jout04ZMGCA9u7dq5ycHHPr1auXRo8ebf7N3PhO7969q3ztxVdffaVWrVpJkqKjo+V0Oj3mx+12a/v27czPdXbhwgXVq+f5Vli/fn1VVFRIYm78yZXMRWxsrM6ePavdu3ebYzZu3KiKigrFxMRc+cmueTk56oxJkyYZISEhxubNm42CggJzu3DhgjnmgQceMFq2bGls3LjR2LVrlxEbG2vExsb6sOq6y/ppNcNgbnxpx44dRkBAgPHXv/7VOHjwoLFkyRIjMDDQePPNN80xc+fONUJDQ413333XyM3NNe68804jOjra+P77731Y+S9fYmKi8W//9m/G6tWrjSNHjhgrV640mjdvbjz22GPmGOam5pw7d8747LPPjM8++8yQZGRmZhqfffaZcezYMcMwrmwuBg0aZHTv3t3Yvn278fHHHxtt27Y1Ro4ceVV1EI5wxSRddlu0aJE55vvvvzcefPBBo0mTJkZgYKBx9913GwUFBb4rug77cThibnzr/fffNzp16mQ4HA6jffv2xquvvurRX1FRYTz55JNGeHi44XA4jAEDBhh5eXk+qrbucLvdxpQpU4yWLVsaDRs2NG666Sbjz3/+s1FSUmKOYW5qzqZNmy77PpOYmGgYxpXNxXfffWeMHDnSuOGGG4zg4GBj/Pjxxrlz566qDpthWL4GFAAAoI5jzREAAIAF4QgAAMCCcAQAAGBBOAIAALAgHAEAAFgQjgAAACwIRwAAABaEIwAAAAvCEQAAgAXhCAAAwIJwBAAAYEE4AgAAsPhfUcBSHb9b1GQAAAAASUVORK5CYII=", 403 | "text/plain": [ 404 | "
" 405 | ] 406 | }, 407 | "metadata": {}, 408 | "output_type": "display_data" 409 | } 410 | ], 411 | "source": [ 412 | "Healthcare['bmi'].plot(kind=\"hist\")" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 10, 418 | "id": "29f19551-49d9-4076-843d-b0a3656e23ff", 419 | "metadata": {}, 420 | "outputs": [ 421 | { 422 | "data": { 423 | "text/plain": [ 424 | "" 425 | ] 426 | }, 427 | "execution_count": 10, 428 | "metadata": {}, 429 | "output_type": "execute_result" 430 | }, 431 | { 432 | "data": { 433 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAigAAAGdCAYAAAA44ojeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAi8ElEQVR4nO3df3BU1f3/8ddmA2ElZBGE/NBgVkIrNmv9RfmhaYNkivXHsBNiS8VCLR+xFqkIVg2fomXEpCKIFVGqpYrWH60hRo0jHQcLrmNAxR9tWgejJjUKCXxQdhMJiLv7/YNvtq6mSuCGc7L7fMzskD3n7M07f4T7yrnnnuuKxWIxAQAAWCTNdAEAAABfREABAADWIaAAAADrEFAAAIB1CCgAAMA6BBQAAGAdAgoAALAOAQUAAFgn3XQBhyMajWr79u0aNGiQXC6X6XIAAMAhiMViam9vV15entLSvnqOpE8GlO3btys/P990GQAA4DC0tLTohBNO+MoxfTKgDBo0SNLBHzArK8twNQAA4FCEw2Hl5+fHz+NfpU8GlK7LOllZWQQUAAD6mENZntHjRbIvvPCCLrroIuXl5cnlcqm2tjahPxaL6cYbb1Rubq48Ho9KS0vV2NiYMOajjz7S9OnTlZWVpcGDB2vWrFnq6OjoaSkAACBJ9TigfPLJJ/r2t7+tVatWddu/dOlS3XnnnVq9erW2bNmigQMHavLkydq3b198zPTp0/XPf/5Tzz33nOrq6vTCCy9o9uzZh/9TAACApOKKxWKxw/6wy6UnnnhCgUBA0sHZk7y8PC1YsEDXXnutJCkUCik7O1sPPPCApk2bprfeekunnHKKXnnlFZ111lmSpPXr1+v888/XBx98oLy8vK/9vuFwWF6vV6FQiEs8AAD0ET05fzu6D0pTU5NaW1tVWloab/N6vRo7dqzq6+slSfX19Ro8eHA8nEhSaWmp0tLStGXLlm6Pu3//foXD4YQXAABIXo4GlNbWVklSdnZ2Qnt2dna8r7W1VcOHD0/oT09P15AhQ+Jjvqiqqkperzf+4hZjAACSW5/YSbaiokKhUCj+amlpMV0SAADoRY4GlJycHElSW1tbQntbW1u8LycnRzt37kzo/+yzz/TRRx/Fx3xRRkZG/JZibi0GACD5ORpQfD6fcnJytGHDhnhbOBzWli1bNH78eEnS+PHjtWfPHm3dujU+5vnnn1c0GtXYsWOdLAcAAPRRPd6oraOjQ++88078fVNTk9544w0NGTJEI0aM0Lx587RkyRKNGjVKPp9PixYtUl5eXvxOn9GjR+u8887T5ZdfrtWrV+vAgQO66qqrNG3atEO6gwdA8opEIgoGg9qxY4dyc3NVXFwst9ttuiwAJsR66G9/+1tM0pdeM2fOjMVisVg0Go0tWrQolp2dHcvIyIhNmjQptm3btoRj7N69O/bjH/84lpmZGcvKyopddtllsfb29kOuIRQKxSTFQqFQT8sHYKl169bFCgoKEv5fKSgoiK1bt850aQAc0pPz9xHtg2IK+6AAyaWmpkbl5eW68MILtXDhQhUVFamhoUGVlZWqq6tTdXW1ysrKTJcJ4Aj15PxNQAFgVCQSUWFhofx+v2praxMewR6NRhUIBNTQ0KDGxkYu9wB9nLGN2gCgp4LBoJqbm7Vw4cKEcCJJaWlpqqioUFNTk4LBoKEKAZhAQAFg1I4dOyRJRUVF3fZ3tXeNA5AaCCgAjMrNzZUkNTQ0dNvf1d41DkBqIKAAMKq4uFgFBQWqrKxUNBpN6ItGo6qqqpLP51NxcbGhCgGYQEABYJTb7dby5ctVV1enQCCg+vp6tbe3q76+XoFAQHV1dVq2bBkLZIEU0+ON2gDAaWVlZaqurtaCBQs0YcKEeLvP5+MWYyBFcZsxAGuwkyyQ3Hpy/mYGBYA13G63SkpKTJcBwAKsQQEAANYhoAAAAOsQUAAAgHUIKAAAwDoEFAAAYB0CCgAAsA4BBQAAWIeAAgAArENAAQAA1iGgAAAA67DVPQBr8CweAF2YQQFghZqaGhUWFmrixIm65JJLNHHiRBUWFqqmpsZ0aQAMIKAAMK6mpkbl5eXy+/2qr69Xe3u76uvr5ff7VV5eTkgBUpArFovFTBfRUz15XDMAu0UiERUWFsrv96u2tlZpaf/5uykajSoQCKihoUGNjY1c7gH6uJ6cv5lBAWBUMBhUc3OzFi5cmBBOJCktLU0VFRVqampSMBg0VCEAEwgoAIzasWOHJKmoqKjb/q72rnEAUgMBBYBRubm5kqSGhoZu+7vau8YBSA0EFABGFRcXq6CgQJWVlYpGowl90WhUVVVV8vl8Ki4uNlQhABMIKACMcrvdWr58uerq6hQIBBLu4gkEAqqrq9OyZctYIAukGDZqA2BcWVmZqqurtWDBAk2YMCHe7vP5VF1drbKyMoPVATCB24wBWIOdZIHk1pPzNzMoAKzhdrtVUlJiugwAFmANCgAAsA4BBQAAWIeAAgAArENAAQAA1iGgAAAA6xBQAACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYh4ACAACsQ0ABAADWIaAAAADrEFAAAIB1CCgAAMA6BBQAAGAdAgoAALAOAQUAAFiHgAIAAKxDQAEAANYhoAAAAOsQUAAAgHUIKAAAwDoEFAAAYB0CCgAAsA4BBQAAWIeAAgAArENAAQAA1iGgAAAA6xBQAACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYh4ACAACs43hAiUQiWrRokXw+nzwej0aOHKmbb75ZsVgsPiYWi+nGG29Ubm6uPB6PSktL1djY6HQpAACgj3I8oNx666265557dNddd+mtt97SrbfeqqVLl2rlypXxMUuXLtWdd96p1atXa8uWLRo4cKAmT56sffv2OV0OAADog1yxz09tOODCCy9Udna21qxZE2+bOnWqPB6P/vSnPykWiykvL08LFizQtddeK0kKhULKzs7WAw88oGnTpn3t9wiHw/J6vQqFQsrKynKyfAAA0Et6cv52fAZlwoQJ2rBhg95++21J0ptvvqkXX3xRP/jBDyRJTU1Nam1tVWlpafwzXq9XY8eOVX19fbfH3L9/v8LhcMILAAAkr3SnD3jDDTcoHA7r5JNPltvtViQS0S233KLp06dLklpbWyVJ2dnZCZ/Lzs6O931RVVWVFi9e7HSpAADAUo7PoPzlL3/Rww8/rEceeUSvvfaa1q5dq2XLlmnt2rWHfcyKigqFQqH4q6WlxcGKAQCAbRyfQfnVr36lG264Ib6WxO/369///reqqqo0c+ZM5eTkSJLa2tqUm5sb/1xbW5tOO+20bo+ZkZGhjIwMp0sFAACWcnwGZe/evUpLSzys2+1WNBqVJPl8PuXk5GjDhg3x/nA4rC1btmj8+PFOlwOgD4lEItq4caMeffRRbdy4UZFIxHRJAAxxfAbloosu0i233KIRI0boW9/6ll5//XXdfvvt+tnPfiZJcrlcmjdvnpYsWaJRo0bJ5/Np0aJFysvLUyAQcLocAH1ETU2NFixYoObm5nhbQUGBli9frrKyMnOFATDC8RmUlStXqry8XL/4xS80evRoXXvttbriiit08803x8dcd911mjt3rmbPnq0xY8aoo6ND69ev14ABA5wuB0AfUFNTo/Lycvn9ftXX16u9vV319fXy+/0qLy9XTU2N6RIBHGWO74NyNLAPCpA8IpGICgsL5ff7VVtbm3CJOBqNKhAIqKGhQY2NjXK73QYrBXCkjO6DAgA9EQwG1dzcrIULF35p/VpaWpoqKirU1NSkYDBoqEIAJhBQABi1Y8cOSVJRUVG3/V3tXeMApAYCCgCjurYbaGho6La/q/3z2xIASH4EFABGFRcXq6CgQJWVlfHtCLpEo1FVVVXJ5/OpuLjYUIUATCCgADDK7XZr+fLlqqurUyAQSLiLJxAIqK6uTsuWLWOBLJBiHN8HBQB6qqysTNXV1VqwYIEmTJgQb/f5fKqurmYfFCAFcZsxAGtEIhEFg0Ht2LFDubm5Ki4uZuYESCI9OX8zgwLAGm63WyUlJabLAGAB1qAAAADrEFAAAIB1CCgAAMA6BBQAAGAdAgoAALAOAQUAAFiHgAIAAKxDQAEAANYhoAAAAOsQUAAAgHUIKAAAwDo8iweANXhYIIAuzKAAsEJNTY0KCws1ceJEXXLJJZo4caIKCwtVU1NjujQABhBQABhXU1Oj8vJy+f1+1dfXq729XfX19fL7/SovLyekACnIFYvFYqaL6KlwOCyv16tQKKSsrCzT5QA4ApFIRIWFhfL7/aqtrVVa2n/+bopGowoEAmpoaFBjYyOXe4A+rifnb2ZQABgVDAbV3NyshQsXJoQTSUpLS1NFRYWampoUDAYNVQjABAIKAKN27NghSSoqKuq2v6u9axyA1EBAAWBUbm6uJKmhoaHb/q72rnEAUgMBBYBRxcXFKigoUGVlpaLRaEJfNBpVVVWVfD6fiouLDVUIwAQCCgCj3G63li9frrq6OgUCgYS7eAKBgOrq6rRs2TIWyAIpho3aABhXVlam6upqLViwQBMmTIi3+3w+VVdXq6yszGB1AEzgNmMA1mAnWSC59eT8zQwKAGu43W6VlJSYLgOABViDAgAArENAAQAA1iGgAAAA6xBQAACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYh43aAFiDnWQBdGEGBYAVampqVFhYqIkTJ+qSSy7RxIkTVVhYqJqaGtOlATCAgALAuJqaGpWXl8vv9yc8zdjv96u8vJyQAqQgHhYIwKhIJKLCwkL5/X7V1tYqLe0/fzdFo1EFAgE1NDSosbGRyz1AH9eT8zczKACMCgaDam5u1sKFCxPCiSSlpaWpoqJCTU1NCgaDhioEYAIBBYBRO3bskCQVFRV129/V3jUOQGogoAAwKjc3V5LU0NDQbX9Xe9c4AKmBgALAqOLiYhUUFKiyslLRaDShLxqNqqqqSj6fT8XFxYYqBGAC+6AAMMrtdmv58uUqLy/XlClTdN5558nj8aizs1Pr16/XM888o+rqahbIAimGu3gAWOG6667T7bffrkgkEm9zu92aP3++li5darAyAE7hLh4AfUpNTY1uu+029e/fP6G9f//+uu2229gHBUhBBBQARkUiEf385z+XJE2aNClho7ZJkyZJkq688sqEmRUAyY+AAsCojRs3ateuXTrnnHP05JNPaty4ccrMzNS4ceP05JNP6pxzztHOnTu1ceNG06UCOIoIKACM6goeixcvViwW08aNG/Xoo49q48aNisViuummmxLGAUgN3MUDwArBYFCzZs1Sc3NzvK2goEAzZ840VxQAY5hBAWBUSUmJJOk3v/mNioqKEtagFBUVafHixQnjAKQGAgoAo4qLi+PP4InFYl96SQefycNGbUBq4RIPAKNeeuklRaNRuVwuPf/883rmmWfifcccc4xcLpei0aheeuklZlGAFMIMCgCjuh4C+NBDDyk7OzuhLzs7Ww899FDCOACpgRkUAEZ1PQRw5MiReueddxQMBrVjxw7l5uaquLhYL7/8csI4AKmBGRQARvGwQADdYQYFgFFdDwucOnWqvF6vOjs7431dDw1ct24dDwsEUgwzKACs4HK5etQOILnxNGMARkUiERUWFuq4445TW1ubWlpa4n35+fnKzs7W7t271djYyCwK0Mf15PzNJR4ARgWDQTU3N6u5uflLsyUffPBBPLAEg0FuMwZSCAEFgFEffvhh/Othw4ZpxowZOumkk/Tee+/pwQcf1M6dO780DkDyI6AAMGr79u2SDi6I9Xg8WrZsWbzvxBNPjC+U7RoHIDX0yiLZDz/8UJdeeqmGDh0qj8cjv9+vV199Nd4fi8V04403Kjc3Vx6PR6WlpWpsbOyNUgBY7s0335QkdXZ2qqioSKtWrdIf//hHrVq1SkVFRfG7errGAUgNjs+gfPzxxzr77LM1ceJEPfvssxo2bJgaGxt17LHHxscsXbpUd955p9auXSufz6dFixZp8uTJ+te//qUBAwY4XRIAi3V0dMS//uJW9x6Pp9txAJKf4wHl1ltvVX5+vu6///54m8/ni38di8V0xx136Ne//rWmTJkiSXrwwQeVnZ2t2tpaTZs2zemSAFgsLy/vv/Z9ftHsV40DkHwcv8Tz1FNP6ayzztLFF1+s4cOH6/TTT9d9990X729qalJra6tKS0vjbV6vV2PHjlV9fX23x9y/f7/C4XDCC0ByGDt2bPzrkpIS3XXXXVqzZo3uuusufe973+t2HIDk5/gMynvvvad77rlH8+fP18KFC/XKK6/ol7/8pfr376+ZM2eqtbVVkrp9KFhX3xdVVVVp8eLFTpcKwAIff/xx/Ou//vWvevbZZ+PvP7/vyefHAUh+js+gRKNRnXHGGaqsrNTpp5+u2bNn6/LLL9fq1asP+5gVFRUKhULx1+c3cgLQtw0bNkzSwUvBX9wHxeVyxS8Rd40DkBocn0HJzc3VKaecktA2evRorVu3TpKUk5MjSWpra0t4OmlbW5tOO+20bo+ZkZGhjIwMp0sFYIHjjz9e0sHLvxdccIFGjhypffv2acCAAXr33Xfji2a7xgFIDY4HlLPPPlvbtm1LaHv77bd14oknSjr4V1JOTo42bNgQDyThcFhbtmzRlVde6XQ5ACzX9TTj4447Tn//+98T7uIZMWKEzjrrLO3evZunGQMpxvGAcs0112jChAmqrKzUD3/4Q7388su69957de+990o6OGU7b948LVmyRKNGjYrfZpyXl6dAIOB0OQAs9/mnGX/xEk9LS4vef/99nmYMpCDH16CMGTNGTzzxhB599FEVFRXp5ptv1h133KHp06fHx1x33XWaO3euZs+erTFjxqijo0Pr169nDxQgRW3evFnSwW0IPq/rfVc/gNTB04wBGPXpp5/K4/EoGo0qPT1dn332Wbyv631aWpo6OzvVv39/g5UCOFI9OX/3ylb3AHCoVq5cqWg0KkkJ4eTz76PRqFauXHnUawNgDgEFgFHBYDD+db9+/TRp0iRdeumlmjRpkvr169ftOADJj6cZAzCqa2dol8ulaDSqDRs2xPs+vzCWHaSB1EJAAWDUnj17JB1cEDt06FDNmDFDJ510kt577z09+OCD2rlzZ8I4AKmBgALAGrt27dKyZcvi77942zGA1MEaFABGfX57gf92m/EXxwFIfgQUAEb5/X5HxwFIDgQUAEYd6lZMfXDLJgBHgIACwKi3337b0XEAkgMBBYBRr7/+uqPjACQHAgoAo9LSDu2/oUMdByA58BsPwKhDfb4Oz+EBUgsBBYBRBBQA3SGgADBq9+7djo4DkBwIKACMYg0KgO7wGw/AKAIKgO7wGw/AqH79+jk6DkByIKAAMIoZFADd4TcegFGDBw92dByA5EBAAWDUSSed5Og4AMmBgALAqDfffNPRcQCSAwEFgFF79+51dByA5EBAAWBURkaGo+MAJAcCCgCjRo8e7eg4AMmBgALAqI8++sjRcQCSAwEFgFEul8vRcQCSAwEFgFGnnHKKo+MAJAcCCgCjMjMzHR0HIDkQUAAYVV9f7+g4AMmBgALAqNbWVkfHAUgOBBQARu3Zs8fRcQCSAwEFgFGxWMzRcQCSAwEFAABYh4ACwCiPx+PoOADJgYACwCg2agPQHQIKAKMOHDjg6DgAyYGAAsAoFskC6A4BBYBRBBQA3SGgADAqPT3d0XEAkgMBBQAAWIc/SQAcsc5PI3p3V8dhfdblOrS/k1yuNDV8GDqs7zFyWKY8/d2H9VkAZhBQAByxd3d16MKVLx7WZ/e7PZI6D2nc4X6PurnnqOh472F9FoAZBBQAR2zksEzVzT3nsD67+IOLVP3w2q8dNzVwkW46zO8xcljmYX0OgDmuWB9cGh8Oh+X1ehUKhZSVlWW6HABHoLOzU8ccc8zXjtu7dy+7yQJ9XE/O3yySBWCUx+PRlClTvnLMlClTCCdAiiGgADCutrb2v4aUKVOmqLa29ugWBMA4AgoAK9TW1mrv3r2aNvN/NKDgdE2b+T/au3cv4QRIUSySBWANj8ej/71lmepXvqj/nXsOl3WAFMYMCgAAsA4BBQAAWIeAAgAArENAAQAA1iGgAAAA6xBQAACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYh4ACAACsQ0ABAADWIaAAAADrEFAAAIB1CCgAAMA6BBQAAGAdAgoAALAOAQUAAFin1wPKb3/7W7lcLs2bNy/etm/fPs2ZM0dDhw5VZmampk6dqra2tt4uBQAA9BG9GlBeeeUV/f73v9epp56a0H7NNdfo6aef1uOPP65NmzZp+/btKisr681SAABAH9JrAaWjo0PTp0/Xfffdp2OPPTbeHgqFtGbNGt1+++0699xzdeaZZ+r+++/XSy+9pM2bN/dWOQAAoA/ptYAyZ84cXXDBBSotLU1o37p1qw4cOJDQfvLJJ2vEiBGqr6/v9lj79+9XOBxOeAEAgOSV3hsHfeyxx/Taa6/plVde+VJfa2ur+vfvr8GDBye0Z2dnq7W1tdvjVVVVafHixb1RKgAAsJDjMygtLS26+uqr9fDDD2vAgAGOHLOiokKhUCj+amlpceS4AADATo4HlK1bt2rnzp0644wzlJ6ervT0dG3atEl33nmn0tPTlZ2drU8//VR79uxJ+FxbW5tycnK6PWZGRoaysrISXgAAIHk5foln0qRJ+sc//pHQdtlll+nkk0/W9ddfr/z8fPXr108bNmzQ1KlTJUnbtm3T+++/r/HjxztdDgAA6IMcDyiDBg1SUVFRQtvAgQM1dOjQePusWbM0f/58DRkyRFlZWZo7d67Gjx+vcePGOV0OAADog3plkezXWbFihdLS0jR16lTt379fkydP1t13322iFAAAYKGjElA2btyY8H7AgAFatWqVVq1adTS+PQAA6GN4Fg8AALAOAQUAAFiHgAIAAKxDQAEAANYhoAAAAOsQUAAAgHUIKAAAwDoEFAAAYB0CCgAAsA4BBQAAWIeAAgAArENAAQAA1iGgAAAA6xBQAACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYh4ACAACsk266AABmNf3fJ/pk/2emy4h7Z2dHwr+2GJiRLt9xA02XAaQMAgqQwpr+7xNNXLbRdBndmvfnN0yX8CV/u7aEkAIcJQQUIIV1zZzc8aPTVDg803A1B+07ENEHH3fqhGM9GtDPbbocSQdnc+b9+Q2rZpqAZEdAAaDC4ZkqOt5ruoy4swpMVwDANBbJAgAA6xBQAACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYh4ACAACsQ0ABAADWIaAAAADrEFAAAIB1CCgAAMA6BBQAAGAdAgoAALAOAQUAAFiHgAIAAKyTbroAAGa50sNqCm9T2oBM06VYqyncIVd62HQZQEohoAAprt/gLVr4cqXpMqzXb/AkSeebLgNIGQQUIMUd2DNWyy+4RCOHM4Py37y7s0O/fPhd02UAKYWAAqS42GdZ8mV9U6cM9ZouxVrRfSHFPttlugwgpbBIFgAAWIeAAgAArENAAQAA1iGgAAAA6xBQAACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYh4ACAACsQ0ABAADWIaAAAADrEFAAAIB1CCgAAMA66aYLAGBO54GIJKnhw5DhSv5j34GIPvi4Uycc69GAfm7T5UiS3tnZYboEIOUQUIAU9u7/P/HeUPMPw5X0DQMz+C8TOFr4bQNS2Pe/lSNJGjk8Ux6LZivm/fkN3fGj01Q4PNN0OXEDM9LlO26g6TKAlEFAAVLYkIH9Ne07I0yX0a3C4ZkqOt5rugwAhrBIFgAAWIeAAgAArENAAQAA1iGgAAAA6xBQAACAdQgoAADAOo4HlKqqKo0ZM0aDBg3S8OHDFQgEtG3btoQx+/bt05w5czR06FBlZmZq6tSpamtrc7oUAADQRzkeUDZt2qQ5c+Zo8+bNeu6553TgwAF9//vf1yeffBIfc8011+jpp5/W448/rk2bNmn79u0qKytzuhQAANBHOb5R2/r16xPeP/DAAxo+fLi2bt2q7373uwqFQlqzZo0eeeQRnXvuuZKk+++/X6NHj9bmzZs1btw4p0sCAAB9TK+vQQmFDj6EbMiQIZKkrVu36sCBAyotLY2POfnkkzVixAjV19d3e4z9+/crHA4nvAAAQPLq1YASjUY1b948nX322SoqKpIktba2qn///ho8eHDC2OzsbLW2tnZ7nKqqKnm93vgrPz+/N8sGAACG9WpAmTNnjhoaGvTYY48d0XEqKioUCoXir5aWFocqBAAANuq1hwVeddVVqqur0wsvvKATTjgh3p6Tk6NPP/1Ue/bsSZhFaWtrU05OTrfHysjIUEZGRm+VCgAALOP4DEosFtNVV12lJ554Qs8//7x8Pl9C/5lnnql+/fppw4YN8bZt27bp/fff1/jx450uBwAA9EGOz6DMmTNHjzzyiJ588kkNGjQovq7E6/XK4/HI6/Vq1qxZmj9/voYMGaKsrCzNnTtX48eP5w4eAAAgqRcCyj333CNJKikpSWi///779dOf/lSStGLFCqWlpWnq1Knav3+/Jk+erLvvvtvpUgAAQB/leECJxWJfO2bAgAFatWqVVq1a5fS3BwAASYBn8QAAAOsQUAAAgHUIKAAAwDoEFAAAYB0CCgAAsA4BBQAAWIeAAgAArENAAQAA1iGgAAAA6xBQAACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYh4ACAACsQ0ABAADWIaAAAADrEFAAAIB1CCgAAMA6BBQAAGAdAgoAALAOAQUAAFiHgAIAAKxDQAEAANYhoAAAAOsQUAAAgHUIKAAAwDoEFAAAYB0CCgAAsE666QIA9H2dn0b07q4OR471zs6OhH+dMHJYpjz93Y4dD0DvI6AAOGLv7urQhStfdPSY8/78hmPHqpt7joqO9zp2PAC9j4AC4IiNHJapurnnOHKsfQci+uDjTp1wrEcD+jkz6zFyWKYjxwFw9BBQABwxT3+3ozMUZxU4digAfRSLZAEAgHUIKAAAwDoEFAAAYB0CCgAAsA4BBQAAWIeAAgAArENAAQAA1iGgAAAA6xBQAACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYp08+zTgWi0mSwuGw4UoAAMCh6jpvd53Hv0qfDCjt7e2SpPz8fMOVAACAnmpvb5fX6/3KMa7YocQYy0SjUW3fvl2DBg2Sy+UyXQ4AB4XDYeXn56ulpUVZWVmmywHgoFgspvb2duXl5Skt7atXmfTJgAIgeYXDYXm9XoVCIQIKkMJYJAsAAKxDQAEAANYhoACwSkZGhm666SZlZGSYLgWAQaxBAQAA1mEGBQAAWIeAAgAArENAAQAA1iGgAOh1JSUlmjdvnqPHfOCBBzR48GBHjwnAHgQUAH3Sj370I7399tumywDQS/rks3gAwOPxyOPxmC4DQC9hBgXAUfHZZ5/pqquuktfr1XHHHadFixbFn2haUFCgJUuWaMaMGcrMzNSJJ56op556Srt27dKUKVOUmZmpU089Va+++mr8eFziAZIbAQXAUbF27Vqlp6fr5Zdf1u9+9zvdfvvt+sMf/hDvX7Fihc4++2y9/vrruuCCC/STn/xEM2bM0KWXXqrXXntNI0eO1IwZMw7pMe0A+j4CCoCjIj8/XytWrNA3v/lNTZ8+XXPnztWKFSvi/eeff76uuOIKjRo1SjfeeKPC4bDGjBmjiy++WN/4xjd0/fXX66233lJbW5vBnwLA0UJAAXBUjBs3Ti6XK/5+/PjxamxsVCQSkSSdeuqp8b7s7GxJkt/v/1Lbzp07j0a5AAwjoACwQr9+/eJfdwWZ7tqi0ejRLQyAEQQUAEfFli1bEt5v3rxZo0aNktvtNlQRAJsRUAAcFe+//77mz5+vbdu26dFHH9XKlSt19dVXmy4LgKXYBwXAUTFjxgx1dnbqO9/5jtxut66++mrNnj3bdFkALOWKcc8eAACwDJd4AACAdQgoAADAOgQUAABgHQIKAACwDgEFAABYh4ACAACsQ0ABAADWIaAAAADrEFAAAIB1CCgAAMA6BBQAAGAdAgoAALDO/wPqw5jD3iEoswAAAABJRU5ErkJggg==", 434 | "text/plain": [ 435 | "
" 436 | ] 437 | }, 438 | "metadata": {}, 439 | "output_type": "display_data" 440 | } 441 | ], 442 | "source": [ 443 | "Healthcare['bmi'].plot(kind=\"box\")" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": 11, 449 | "id": "5d2293c5-305a-40d8-98f2-f9a8b164da81", 450 | "metadata": {}, 451 | "outputs": [ 452 | { 453 | "data": { 454 | "text/html": [ 455 | "
\n", 456 | "\n", 469 | "\n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | "
idgenderagehypertensionheart_diseaseever_marriedwork_typeResidence_typeavg_glucose_levelbmismoking_statusstroke
09046Male67.001YesPrivateUrban228.6936.6formerly smoked1
151676Female61.000YesSelf-employedRural202.2128.1never smoked1
231112Male80.001YesPrivateRural105.9232.5never smoked1
360182Female49.000YesPrivateUrban171.2334.4smokes1
41665Female79.010YesSelf-employedRural174.1224.0never smoked1
\n", 565 | "
" 566 | ], 567 | "text/plain": [ 568 | " id gender age hypertension heart_disease ever_married \\\n", 569 | "0 9046 Male 67.0 0 1 Yes \n", 570 | "1 51676 Female 61.0 0 0 Yes \n", 571 | "2 31112 Male 80.0 0 1 Yes \n", 572 | "3 60182 Female 49.0 0 0 Yes \n", 573 | "4 1665 Female 79.0 1 0 Yes \n", 574 | "\n", 575 | " work_type Residence_type avg_glucose_level bmi smoking_status \\\n", 576 | "0 Private Urban 228.69 36.6 formerly smoked \n", 577 | "1 Self-employed Rural 202.21 28.1 never smoked \n", 578 | "2 Private Rural 105.92 32.5 never smoked \n", 579 | "3 Private Urban 171.23 34.4 smokes \n", 580 | "4 Self-employed Rural 174.12 24.0 never smoked \n", 581 | "\n", 582 | " stroke \n", 583 | "0 1 \n", 584 | "1 1 \n", 585 | "2 1 \n", 586 | "3 1 \n", 587 | "4 1 " 588 | ] 589 | }, 590 | "execution_count": 11, 591 | "metadata": {}, 592 | "output_type": "execute_result" 593 | } 594 | ], 595 | "source": [ 596 | "Healthcare['bmi'].fillna(Healthcare['bmi'].median(),inplace=True)\n", 597 | "Healthcare.head()" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "id": "1e8b09f7-2adc-4aae-9fcd-bdd604fb5b65", 603 | "metadata": {}, 604 | "source": [ 605 | "Insight: According to the charts, which are skewed to the right, it seems that placing the missing values with the median is a suitable solution for handling the missing values." 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "execution_count": 12, 611 | "id": "a74177b0-d1a8-46f8-83d6-ac12a7414c4a", 612 | "metadata": {}, 613 | "outputs": [ 614 | { 615 | "data": { 616 | "text/plain": [ 617 | "id 0\n", 618 | "gender 0\n", 619 | "age 0\n", 620 | "hypertension 0\n", 621 | "heart_disease 0\n", 622 | "ever_married 0\n", 623 | "work_type 0\n", 624 | "Residence_type 0\n", 625 | "avg_glucose_level 0\n", 626 | "bmi 0\n", 627 | "smoking_status 0\n", 628 | "stroke 0\n", 629 | "dtype: int64" 630 | ] 631 | }, 632 | "execution_count": 12, 633 | "metadata": {}, 634 | "output_type": "execute_result" 635 | } 636 | ], 637 | "source": [ 638 | "np.sum(Healthcare.isnull())" 639 | ] 640 | }, 641 | { 642 | "cell_type": "markdown", 643 | "id": "bb377abb-a38f-4fab-a705-9be6ffff988d", 644 | "metadata": {}, 645 | "source": [ 646 | "Handling Outliers" 647 | ] 648 | }, 649 | { 650 | "cell_type": "markdown", 651 | "id": "b0bc6117-00b6-499a-b822-17035b22c8eb", 652 | "metadata": {}, 653 | "source": [ 654 | "Handling Outliers" 655 | ] 656 | }, 657 | { 658 | "cell_type": "code", 659 | "execution_count": 13, 660 | "id": "1cf54307-842e-454b-92c8-8d344a747f1d", 661 | "metadata": {}, 662 | "outputs": [], 663 | "source": [ 664 | "numeric_Healthcare = Healthcare.select_dtypes(include=['float64'])" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": 14, 670 | "id": "ac014b77-5696-425f-b1ef-65a9e5463a7e", 671 | "metadata": {}, 672 | "outputs": [ 673 | { 674 | "name": "stdout", 675 | "output_type": "stream", 676 | "text": [ 677 | "age 0\n", 678 | "avg_glucose_level 627\n", 679 | "bmi 126\n", 680 | "dtype: int64\n" 681 | ] 682 | } 683 | ], 684 | "source": [ 685 | "Q1 = numeric_Healthcare.quantile(0.25)\n", 686 | "Q3 = numeric_Healthcare.quantile(0.75)\n", 687 | "IQR = Q3 - Q1\n", 688 | "lower_bound = Q1 - 1.5 * IQR\n", 689 | "upper_bound = Q3 + 1.5 * IQR\n", 690 | "outliers = (numeric_Healthcare < lower_bound) | (numeric_Healthcare > upper_bound)\n", 691 | "outliers_count = outliers.sum()\n", 692 | "print(outliers_count)" 693 | ] 694 | }, 695 | { 696 | "cell_type": "markdown", 697 | "id": "febb3bf9-bcd0-4b95-af1d-72b2ed2c4899", 698 | "metadata": {}, 699 | "source": [ 700 | "Result: We have outliers in two variables include Average Glucose Lavel and BMI, so we invastige more these features to handle outliers. The approach is to fit the outliers with the mean." 701 | ] 702 | }, 703 | { 704 | "cell_type": "code", 705 | "execution_count": 15, 706 | "id": "b002ad9a-3138-4781-9222-f17865525527", 707 | "metadata": {}, 708 | "outputs": [], 709 | "source": [ 710 | "def ImputeOutliersIQR(dataframeVariable):\n", 711 | " Q1 = dataframeVariable.quantile(0.25)\n", 712 | " Q3 = dataframeVariable.quantile(0.75)\n", 713 | " IQR = Q3 - Q1\n", 714 | " lower_bound = Q1 - 1.5 * IQR\n", 715 | " upper_bound = Q3 + 1.5 * IQR\n", 716 | "\n", 717 | " dataframeVariable = np.where(\n", 718 | " dataframeVariable >= upper_bound,\n", 719 | " dataframeVariable.mean(),\n", 720 | " np.where(\n", 721 | " dataframeVariable <= lower_bound,\n", 722 | " dataframeVariable.mean(),\n", 723 | " dataframeVariable\n", 724 | " )\n", 725 | " )\n", 726 | " return dataframeVariable\n" 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": 16, 732 | "id": "fa5438c7-4c61-4ae2-9657-68d562578325", 733 | "metadata": {}, 734 | "outputs": [ 735 | { 736 | "data": { 737 | "text/html": [ 738 | "\n" 746 | ] 747 | }, 748 | "metadata": {}, 749 | "output_type": "display_data" 750 | } 751 | ], 752 | "source": [ 753 | "HealthcareWithoutOutliersavg_glucose_level = Healthcare.copy()\n", 754 | "\n", 755 | "fig = make_subplots(\n", 756 | " rows=1, cols=5,\n", 757 | " specs=[[{}, {}, {} ,{} ,{}]], subplot_titles=(\"Initial Data\",\"Step 1\",\"Without Outliers\"))\n", 758 | "fig.add_trace(go.Box(y=HealthcareWithoutOutliersavg_glucose_level['avg_glucose_level'], name=\"avg_glucose_level\",marker_color='dodgerblue'), 1, 1)\n", 759 | "\n", 760 | "HealthcareWithoutOutliersavg_glucose_level['avg_glucose_level'] = ImputeOutliersIQR(HealthcareWithoutOutliersavg_glucose_level['avg_glucose_level']) \n", 761 | "fig.add_trace(go.Box(y=HealthcareWithoutOutliersavg_glucose_level['avg_glucose_level'], name=\"avg_glucose_level\",marker_color='royalblue'), 1, 2)\n", 762 | "\n", 763 | "HealthcareWithoutOutliersavg_glucose_level['avg_glucose_level'] = ImputeOutliersIQR(HealthcareWithoutOutliersavg_glucose_level['avg_glucose_level']) \n", 764 | "fig.add_trace(go.Box(y=HealthcareWithoutOutliersavg_glucose_level['avg_glucose_level'], name=\"avg_glucose_level\",marker_color='navy'), 1, 3)\n", 765 | "\n", 766 | "fig.update_layout(height=700, width=1300, showlegend=False, title_text=\"REMOVE OUTLIERS FROM INCOME($)\", \n", 767 | " title_font = {\"size\": 20, \"color\": 'gray', 'family':'Arial Black'}, title={ 'x':0.5,'xanchor': 'center','yanchor': 'top'})\n", 768 | "fig.show(renderer=\"iframe\")" 769 | ] 770 | }, 771 | { 772 | "cell_type": "code", 773 | "execution_count": 17, 774 | "id": "a06b1717-a436-4f4e-806b-f9168832a587", 775 | "metadata": {}, 776 | "outputs": [ 777 | { 778 | "data": { 779 | "text/html": [ 780 | "\n" 788 | ] 789 | }, 790 | "metadata": {}, 791 | "output_type": "display_data" 792 | } 793 | ], 794 | "source": [ 795 | "HealthcareWithoutOutliersbmi = HealthcareWithoutOutliersavg_glucose_level.copy()\n", 796 | "\n", 797 | "fig = make_subplots(\n", 798 | " rows=1, cols=5,\n", 799 | " specs=[[{}, {}, {}, {}, {}]], subplot_titles=(\"Initial Data\",\"Step 1\",\"Step 2\",\"Step 3\",\"Without Outliers\"))\n", 800 | "fig.add_trace(go.Box(y=Healthcare['bmi'], name=\"bmi\",marker_color='lightblue'), 1, 1)\n", 801 | "\n", 802 | "HealthcareWithoutOutliersbmi['bmi'] = ImputeOutliersIQR(HealthcareWithoutOutliersbmi['bmi']) \n", 803 | "fig.add_trace(go.Box(y=HealthcareWithoutOutliersbmi['bmi'], name=\"bmi\",marker_color='cornflowerblue'), 1, 2)\n", 804 | "\n", 805 | "HealthcareWithoutOutliersbmi['bmi'] = ImputeOutliersIQR(HealthcareWithoutOutliersbmi['bmi']) \n", 806 | "fig.add_trace(go.Box(y=HealthcareWithoutOutliersbmi['bmi'], name=\"bmi\",marker_color='dodgerblue'), 1, 3)\n", 807 | "\n", 808 | "HealthcareWithoutOutliersbmi['bmi'] = ImputeOutliersIQR(HealthcareWithoutOutliersbmi['bmi']) \n", 809 | "fig.add_trace(go.Box(y=HealthcareWithoutOutliersbmi['bmi'], name=\"bmi\",marker_color='royalblue'), 1, 4)\n", 810 | "\n", 811 | "HealthcareWithoutOutliersbmi['bmi']= ImputeOutliersIQR(HealthcareWithoutOutliersbmi['bmi']) \n", 812 | "fig.add_trace(go.Box(y=HealthcareWithoutOutliersbmi['bmi'], name=\"bmi\",marker_color='navy'), 1, 5)\n", 813 | "\n", 814 | "fig.update_layout(height=700, width=1300, showlegend=False, title_text=\"REMOVE OUTLIERS FROM bmi($)\", \n", 815 | " title_font = {\"size\": 20, \"color\": 'gray', 'family':'Arial Black'}, title={ 'x':0.5,'xanchor': 'center','yanchor': 'top'})\n", 816 | "fig.show(renderer=\"iframe\")" 817 | ] 818 | }, 819 | { 820 | "cell_type": "markdown", 821 | "id": "5fe6f2e1-13fd-4cd8-a9ad-b9386250f9c6", 822 | "metadata": {}, 823 | "source": [ 824 | "Descriptive Statistics" 825 | ] 826 | }, 827 | { 828 | "cell_type": "code", 829 | "execution_count": 18, 830 | "id": "5be9ce80-ba65-4c25-ada3-6771315a9892", 831 | "metadata": {}, 832 | "outputs": [ 833 | { 834 | "data": { 835 | "text/plain": [ 836 | "gender\n", 837 | "Female 2994\n", 838 | "Male 2115\n", 839 | "Other 1\n", 840 | "Name: count, dtype: int64" 841 | ] 842 | }, 843 | "execution_count": 18, 844 | "metadata": {}, 845 | "output_type": "execute_result" 846 | } 847 | ], 848 | "source": [ 849 | "Healthcare['gender'].value_counts()" 850 | ] 851 | }, 852 | { 853 | "cell_type": "code", 854 | "execution_count": 19, 855 | "id": "6e768100-c057-4f27-8477-3bcadd1c8720", 856 | "metadata": {}, 857 | "outputs": [ 858 | { 859 | "data": { 860 | "text/plain": [ 861 | "count 5110.000000\n", 862 | "mean 43.226614\n", 863 | "std 22.612647\n", 864 | "min 0.080000\n", 865 | "25% 25.000000\n", 866 | "50% 45.000000\n", 867 | "75% 61.000000\n", 868 | "max 82.000000\n", 869 | "Name: age, dtype: float64" 870 | ] 871 | }, 872 | "execution_count": 19, 873 | "metadata": {}, 874 | "output_type": "execute_result" 875 | } 876 | ], 877 | "source": [ 878 | "Healthcare['age'].describe()" 879 | ] 880 | }, 881 | { 882 | "cell_type": "code", 883 | "execution_count": 20, 884 | "id": "b0bf6bb7-7f2f-47b0-9f4d-57fbc278c8f5", 885 | "metadata": {}, 886 | "outputs": [ 887 | { 888 | "data": { 889 | "text/plain": [ 890 | "hypertension\n", 891 | "0 4612\n", 892 | "1 498\n", 893 | "Name: count, dtype: int64" 894 | ] 895 | }, 896 | "execution_count": 20, 897 | "metadata": {}, 898 | "output_type": "execute_result" 899 | } 900 | ], 901 | "source": [ 902 | "Healthcare['hypertension'].value_counts()" 903 | ] 904 | }, 905 | { 906 | "cell_type": "code", 907 | "execution_count": 21, 908 | "id": "21bfa655-1a4a-481b-b392-fe7e25024b10", 909 | "metadata": {}, 910 | "outputs": [ 911 | { 912 | "data": { 913 | "text/plain": [ 914 | "heart_disease\n", 915 | "0 4834\n", 916 | "1 276\n", 917 | "Name: count, dtype: int64" 918 | ] 919 | }, 920 | "execution_count": 21, 921 | "metadata": {}, 922 | "output_type": "execute_result" 923 | } 924 | ], 925 | "source": [ 926 | "Healthcare['heart_disease'].value_counts()" 927 | ] 928 | }, 929 | { 930 | "cell_type": "code", 931 | "execution_count": 22, 932 | "id": "6cd7a458-2d39-4725-9bdd-78f856d7209c", 933 | "metadata": {}, 934 | "outputs": [ 935 | { 936 | "data": { 937 | "text/plain": [ 938 | "ever_married\n", 939 | "Yes 3353\n", 940 | "No 1757\n", 941 | "Name: count, dtype: int64" 942 | ] 943 | }, 944 | "execution_count": 22, 945 | "metadata": {}, 946 | "output_type": "execute_result" 947 | } 948 | ], 949 | "source": [ 950 | "Healthcare['ever_married'].value_counts()" 951 | ] 952 | }, 953 | { 954 | "cell_type": "code", 955 | "execution_count": 23, 956 | "id": "2c817dfa-4134-4cd9-a816-2fd072cfe91b", 957 | "metadata": {}, 958 | "outputs": [ 959 | { 960 | "data": { 961 | "text/plain": [ 962 | "work_type\n", 963 | "Private 2925\n", 964 | "Self-employed 819\n", 965 | "children 687\n", 966 | "Govt_job 657\n", 967 | "Never_worked 22\n", 968 | "Name: count, dtype: int64" 969 | ] 970 | }, 971 | "execution_count": 23, 972 | "metadata": {}, 973 | "output_type": "execute_result" 974 | } 975 | ], 976 | "source": [ 977 | "Healthcare['work_type'].value_counts()" 978 | ] 979 | }, 980 | { 981 | "cell_type": "code", 982 | "execution_count": 24, 983 | "id": "548b96ca-508b-420b-bcf7-2df552580960", 984 | "metadata": {}, 985 | "outputs": [ 986 | { 987 | "data": { 988 | "text/plain": [ 989 | "Residence_type\n", 990 | "Urban 2596\n", 991 | "Rural 2514\n", 992 | "Name: count, dtype: int64" 993 | ] 994 | }, 995 | "execution_count": 24, 996 | "metadata": {}, 997 | "output_type": "execute_result" 998 | } 999 | ], 1000 | "source": [ 1001 | "Healthcare['Residence_type'].value_counts()" 1002 | ] 1003 | }, 1004 | { 1005 | "cell_type": "code", 1006 | "execution_count": 25, 1007 | "id": "52686be2-4cea-416c-b213-510ad77eca53", 1008 | "metadata": {}, 1009 | "outputs": [ 1010 | { 1011 | "data": { 1012 | "text/plain": [ 1013 | "count 5110.000000\n", 1014 | "mean 106.147677\n", 1015 | "std 45.283560\n", 1016 | "min 55.120000\n", 1017 | "25% 77.245000\n", 1018 | "50% 91.885000\n", 1019 | "75% 114.090000\n", 1020 | "max 271.740000\n", 1021 | "Name: avg_glucose_level, dtype: float64" 1022 | ] 1023 | }, 1024 | "execution_count": 25, 1025 | "metadata": {}, 1026 | "output_type": "execute_result" 1027 | } 1028 | ], 1029 | "source": [ 1030 | "Healthcare['avg_glucose_level'].describe()" 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "code", 1035 | "execution_count": 26, 1036 | "id": "4686d90e-8c7c-46d6-b880-685ac6113745", 1037 | "metadata": {}, 1038 | "outputs": [ 1039 | { 1040 | "data": { 1041 | "text/plain": [ 1042 | "count 5110.000000\n", 1043 | "mean 28.862035\n", 1044 | "std 7.699562\n", 1045 | "min 10.300000\n", 1046 | "25% 23.800000\n", 1047 | "50% 28.100000\n", 1048 | "75% 32.800000\n", 1049 | "max 97.600000\n", 1050 | "Name: bmi, dtype: float64" 1051 | ] 1052 | }, 1053 | "execution_count": 26, 1054 | "metadata": {}, 1055 | "output_type": "execute_result" 1056 | } 1057 | ], 1058 | "source": [ 1059 | "Healthcare['bmi'].describe()" 1060 | ] 1061 | }, 1062 | { 1063 | "cell_type": "code", 1064 | "execution_count": 27, 1065 | "id": "dacc6b9b-6daf-4cff-b8c0-ad949ebe50dd", 1066 | "metadata": {}, 1067 | "outputs": [ 1068 | { 1069 | "data": { 1070 | "text/plain": [ 1071 | "smoking_status\n", 1072 | "never smoked 1892\n", 1073 | "Unknown 1544\n", 1074 | "formerly smoked 885\n", 1075 | "smokes 789\n", 1076 | "Name: count, dtype: int64" 1077 | ] 1078 | }, 1079 | "execution_count": 27, 1080 | "metadata": {}, 1081 | "output_type": "execute_result" 1082 | } 1083 | ], 1084 | "source": [ 1085 | "Healthcare['smoking_status'].value_counts()" 1086 | ] 1087 | }, 1088 | { 1089 | "cell_type": "markdown", 1090 | "id": "3b5cf0b0-3c65-41c5-9f57-81e49e143631", 1091 | "metadata": {}, 1092 | "source": [ 1093 | "Encoding Categorical Variables" 1094 | ] 1095 | }, 1096 | { 1097 | "cell_type": "code", 1098 | "execution_count": 28, 1099 | "id": "a6722de4-97ed-4d1c-9966-bbdf9199b70c", 1100 | "metadata": {}, 1101 | "outputs": [], 1102 | "source": [ 1103 | "Healthcare.drop(\"id\",axis=1,inplace=True)" 1104 | ] 1105 | }, 1106 | { 1107 | "cell_type": "code", 1108 | "execution_count": 29, 1109 | "id": "5a8a410f-a481-4c5e-bcda-2d04f8fca297", 1110 | "metadata": {}, 1111 | "outputs": [], 1112 | "source": [ 1113 | "Healthcare['gender']=Healthcare['gender'].apply(lambda x: 0 if x==\"Male\" else(1 if x==\"Female\" else 2))" 1114 | ] 1115 | }, 1116 | { 1117 | "cell_type": "code", 1118 | "execution_count": 30, 1119 | "id": "2029dc4a-e9e8-41e2-a730-4d6f181f0171", 1120 | "metadata": {}, 1121 | "outputs": [], 1122 | "source": [ 1123 | "Healthcare['ever_married']=Healthcare['ever_married'].apply(lambda x: 0 if x==\"No\" else 1)" 1124 | ] 1125 | }, 1126 | { 1127 | "cell_type": "code", 1128 | "execution_count": 31, 1129 | "id": "8d521d29-ef79-4218-b0e3-68b77539384b", 1130 | "metadata": {}, 1131 | "outputs": [], 1132 | "source": [ 1133 | "Healthcare['Residence_type']=Healthcare['Residence_type'].apply(lambda x: 0 if x==\"Urban\" else 1)" 1134 | ] 1135 | }, 1136 | { 1137 | "cell_type": "code", 1138 | "execution_count": 32, 1139 | "id": "41b06d06-d249-4123-ade6-dfec88d0da4a", 1140 | "metadata": {}, 1141 | "outputs": [], 1142 | "source": [ 1143 | "Healthcare_encoded=pd.get_dummies(Healthcare,columns=[\"work_type\",\"smoking_status\"],drop_first=True)" 1144 | ] 1145 | }, 1146 | { 1147 | "cell_type": "code", 1148 | "execution_count": 33, 1149 | "id": "b4b81ab5-2476-4b85-b030-0e2593a9b9d4", 1150 | "metadata": {}, 1151 | "outputs": [ 1152 | { 1153 | "data": { 1154 | "text/html": [ 1155 | "
\n", 1156 | "\n", 1169 | "\n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | " \n", 1275 | " \n", 1276 | " \n", 1277 | " \n", 1278 | " \n", 1279 | " \n", 1280 | " \n", 1281 | " \n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | "
genderagehypertensionheart_diseaseever_marriedResidence_typeavg_glucose_levelbmistrokework_type_Never_workedwork_type_Privatework_type_Self-employedwork_type_childrensmoking_status_formerly smokedsmoking_status_never smokedsmoking_status_smokes
0067.00110228.6936.61FalseTrueFalseFalseTrueFalseFalse
1161.00011202.2128.11FalseFalseTrueFalseFalseTrueFalse
2080.00111105.9232.51FalseTrueFalseFalseFalseTrueFalse
3149.00010171.2334.41FalseTrueFalseFalseFalseFalseTrue
4179.01011174.1224.01FalseFalseTrueFalseFalseTrueFalse
\n", 1289 | "
" 1290 | ], 1291 | "text/plain": [ 1292 | " gender age hypertension heart_disease ever_married Residence_type \\\n", 1293 | "0 0 67.0 0 1 1 0 \n", 1294 | "1 1 61.0 0 0 1 1 \n", 1295 | "2 0 80.0 0 1 1 1 \n", 1296 | "3 1 49.0 0 0 1 0 \n", 1297 | "4 1 79.0 1 0 1 1 \n", 1298 | "\n", 1299 | " avg_glucose_level bmi stroke work_type_Never_worked work_type_Private \\\n", 1300 | "0 228.69 36.6 1 False True \n", 1301 | "1 202.21 28.1 1 False False \n", 1302 | "2 105.92 32.5 1 False True \n", 1303 | "3 171.23 34.4 1 False True \n", 1304 | "4 174.12 24.0 1 False False \n", 1305 | "\n", 1306 | " work_type_Self-employed work_type_children \\\n", 1307 | "0 False False \n", 1308 | "1 True False \n", 1309 | "2 False False \n", 1310 | "3 False False \n", 1311 | "4 True False \n", 1312 | "\n", 1313 | " smoking_status_formerly smoked smoking_status_never smoked \\\n", 1314 | "0 True False \n", 1315 | "1 False True \n", 1316 | "2 False True \n", 1317 | "3 False False \n", 1318 | "4 False True \n", 1319 | "\n", 1320 | " smoking_status_smokes \n", 1321 | "0 False \n", 1322 | "1 False \n", 1323 | "2 False \n", 1324 | "3 True \n", 1325 | "4 False " 1326 | ] 1327 | }, 1328 | "execution_count": 33, 1329 | "metadata": {}, 1330 | "output_type": "execute_result" 1331 | } 1332 | ], 1333 | "source": [ 1334 | "Healthcare_encoded.head()" 1335 | ] 1336 | }, 1337 | { 1338 | "cell_type": "code", 1339 | "execution_count": null, 1340 | "id": "a0a76149-5175-4c9a-aa28-d6e0d047c232", 1341 | "metadata": {}, 1342 | "outputs": [], 1343 | "source": [] 1344 | } 1345 | ], 1346 | "metadata": { 1347 | "kernelspec": { 1348 | "display_name": "Python 3 (ipykernel)", 1349 | "language": "python", 1350 | "name": "python3" 1351 | }, 1352 | "language_info": { 1353 | "codemirror_mode": { 1354 | "name": "ipython", 1355 | "version": 3 1356 | }, 1357 | "file_extension": ".py", 1358 | "mimetype": "text/x-python", 1359 | "name": "python", 1360 | "nbconvert_exporter": "python", 1361 | "pygments_lexer": "ipython3", 1362 | "version": "3.12.4" 1363 | } 1364 | }, 1365 | "nbformat": 4, 1366 | "nbformat_minor": 5 1367 | } 1368 | --------------------------------------------------------------------------------