├── 1_28_2020_Supervised_learning_with_Sklearn.ipynb
├── 1_fzqAsLGKIt3jzstwBLGsJA.png
├── 438058314_3872280102992838_2811757508132819156_n.jpg
├── Advertising.csv
├── Agglomerative_Clustering_using_scikit_learn.ipynb
├── Andrew_Linear_Regression_Exercise_1_By_Fida_Mohammad.ipynb
├── Anomaly_Detection.ipynb
├── Anomaly_Detection_using_Python_Library_.ipynb
├── Anomaly_Detection_using_Using_Gaussian_Mixture_Models.ipynb
├── Anomaly_Detection_with_PyCaret.ipynb
├── Auto_Model_Training_and_Evaluation_.ipynb
├── Build_Machine_Learning_Pipelines.ipynb
├── Clustering_With_Pycaret.ipynb
├── Collaborative_Filtering.ipynb
├── DALL·E 2025-02-20 09.38.02 - An enhanced AI-themed GitHub repository banner with a futuristic dark blue and black background, incorporating glowing abstract neural network pattern.webp
├── DBSCAN_Clustering_in_Machine_Learning.ipynb
├── Data Visualization
    ├── Automate_Exploratory_Data_Analysis.ipynb
    └── readme
├── Data_Exploratory_and_Ploting.ipynb
├── Data_Processing_in_Python_.ipynb
├── Datasets
    ├── CC GENERAL.csv
    ├── readme
    ├── test.csv
    └── train.csv
├── Deep Learning library
    ├── FastAI_in_Machine_Learning.ipynb
    └── readme
├── Distance_Measure_.ipynb
├── Exploring_Correlation_10_14_21.ipynb
├── Feature Selection
    ├── Feature_Selection.ipynb
    └── readme
├── Feature_Selection_10_14_21.ipynb
├── Feature_extraction.ipynb
├── Fine_Tuning_your_model.ipynb
├── Introduction of AI.md
├── KNN_with_Python_.ipynb
├── Linear_Regression_Andrew.ipynb
├── ML(Andrew)
    ├── 4-Linear Regression with Multiple Variables
    │   ├── Readme
    │   ├── ex1.pdf
    │   ├── ex1data1.txt
    │   ├── ex1data2.txt
    │   ├── exercise1.ipynb
    │   └── utils.py
    ├── 5-Logistic Regression (LR)
    │   ├── Multiclass_classification_using_onevsall.ipynb
    │   ├── ex3data1.mat
    │   ├── ex3weights.mat
    │   ├── neuralnetwork.png
    │   ├── readme
    │   ├── token.pkl
    │   └── utils.py
    ├── Neural Networks: Representation
    │   ├── ex4-backpropagation.png
    │   ├── ex4data1.mat
    │   ├── ex4weights.mat
    │   ├── exercise4.ipynb
    │   ├── neural_network.png
    │   ├── readme
    │   └── utils.py
    ├── Rread
    └── Rreadme
├── ML0101EN_RecSys_Collaborative_Filtering_movies_py_v1.ipynb
├── ML0101EN_RecSys_Content_Based_movies_py_v1.ipynb
├── Machine Leanring.png
├── Machine Learning
    ├── New Text Document.txt
    ├── 📚Chapter 1 - Introduction
    │   └── New Text Document.txt
    └── 📚Chapter 2 -Linear Regression with one Variable
    │   └── New Text Document.txt
├── Machine_Learning.ipynb
├── Model Evaluation
    ├── Bias_and_Variance_using_Python.ipynb
    ├── Scikit_Plot_Visualizing_Machine_Learning_Algorithm_Results_&_Performance (1).ipynb
    ├── What_is_Cross_Validation_in_Machine_Learning_.ipynb
    ├── hyperparameter_tuning.ipynb
    └── readme
├── Pandas.ipynb
├── Pipelines_in_scikit_learn.ipynb
├── Preprocessing
    ├── Create_new_Features_(Faker)_.ipynb
    ├── Creating_artificial_datasets.ipynb
    ├── Data_Processing_in_Python_.ipynb
    ├── Data_representation_in_scikit_learn.ipynb
    ├── StandardScaler_in_Machine_Learning.ipynb
    ├── Upload_Dataset_from_github_to_Colab.ipynb
    └── readme
├── ReadMe.md
├── Recommendation System
    ├── ML0101EN-RecSys-Collaborative-Filtering-movies-py-v1.ipynb
    ├── ML0101EN-RecSys-Content-Based-movies-py-v1.ipynb
    └── readme
├── Regression_With_Pycaret.ipynb
├── Scikit_Learn_Boosting_Methods.ipynb
├── Simple_Linear_Regression_using_scikit_learn.ipynb
├── Sklearn
    ├── Association Mining
    │   ├── Apriori_Algorithm (1).ipynb
    │   └── readm
    ├── Feature_Engineering_in_ML (1).ipynb
    ├── Graph Algorithms
    │   ├── Graph_Algorithem.ipynb
    │   └── readme
    ├── Introduction_of_SKLEARN.ipynb
    ├── Unsupervised Learning
    │   ├── Anomaly_Detection.ipynb
    │   ├── Anomaly_Detection_with_Isolation_Forest_algorithm.ipynb
    │   ├── BIRCH_Clustering_in_Machine_Learning.ipynb
    │   ├── Clus-DBSCN-weather-py-v1.ipynb
    │   ├── Clus-Hierarchical-Cars-py-v1.ipynb
    │   ├── Clus-K-Means-Customer-Seg-py-v1.ipynb
    │   ├── DBSCAN_Clustering_in_Machine_Learning.ipynb
    │   ├── Kmean .ipynb
    │   └── readme
    ├── dataset
    │   └── readme
    ├── readme
    └── supervised algorithm
    │   ├── 1-28-2020-Supervised_learning_with_Sklearn.ipynb
    │   ├── Bagging_&_Random_Forests.ipynb
    │   ├── Clas-Decision-Trees-drug-py-v1.ipynb
    │   ├── Clas-K-Nearest-neighbors-CustCat-py-v1.ipynb
    │   ├── Decision_Trees.ipynb
    │   ├── Linear_Regression_.ipynb
    │   ├── Model_Evaluation_&_Scoring_Matrices (1).ipynb
    │   ├── Naive_Bayes_.ipynb
    │   ├── Naive_Bayes_Algorithm_in_Machine_Learning.ipynb
    │   ├── Neural_Network.ipynb
    │   ├── Perceptron_in_Machine_Learning.ipynb
    │   ├── PyCaret_in_Machine_Learning.ipynb
    │   ├── Reg-Mulitple-Linear-Regression-Co2-py-v1.ipynb
    │   ├── Reg-NoneLinearRegression-py-v1.ipynb
    │   ├── Reg-Polynomial-Regression-Co2-py-v1.ipynb
    │   ├── Reg-Simple-Linear-Regression-Co2-py-v1.ipynb
    │   ├── Supervised_(Classification)_ML_Model_Training_and_Evulation_.ipynb
    │   ├── Support_Vector_Machine_(SVM)_.ipynb
    │   ├── Voting_Classifiers.ipynb
    │   ├── XGBoost.ipynb
    │   ├── XGBoost_in_Machine_Learning.ipynb
    │   ├── dataset
    │       └── readme
    │   ├── logistic_Regression_and_KNN_.ipynb
    │   └── readm
├── Statistics
    ├── Exploring_Correlation_10_14_21.ipynb
    └── Readme
├── Statistics_for_Machine_Learning_.ipynb
├── Supervised_(Classification)_ML_Model_Training_and_Evulation_ Backup.ipynb
├── Supervised_(Classification)_ML_Model_Training_and_Evulation_.ipynb
├── The-Art-of-Linear-Algebra.pdf
├── Unsupervised_learning.ipynb
└── readme


/1_fzqAsLGKIt3jzstwBLGsJA.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/1_fzqAsLGKIt3jzstwBLGsJA.png


--------------------------------------------------------------------------------
/438058314_3872280102992838_2811757508132819156_n.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/438058314_3872280102992838_2811757508132819156_n.jpg


--------------------------------------------------------------------------------
/Advertising.csv:
--------------------------------------------------------------------------------
  1 | ,TV,Radio,Newspaper,Sales
  2 | 1,230.1,37.8,69.2,22.1
  3 | 2,44.5,39.3,45.1,10.4
  4 | 3,17.2,45.9,69.3,9.3
  5 | 4,151.5,41.3,58.5,18.5
  6 | 5,180.8,10.8,58.4,12.9
  7 | 6,8.7,48.9,75,7.2
  8 | 7,57.5,32.8,23.5,11.8
  9 | 8,120.2,19.6,11.6,13.2
 10 | 9,8.6,2.1,1,4.8
 11 | 10,199.8,2.6,21.2,10.6
 12 | 11,66.1,5.8,24.2,8.6
 13 | 12,214.7,24,4,17.4
 14 | 13,23.8,35.1,65.9,9.2
 15 | 14,97.5,7.6,7.2,9.7
 16 | 15,204.1,32.9,46,19
 17 | 16,195.4,47.7,52.9,22.4
 18 | 17,67.8,36.6,114,12.5
 19 | 18,281.4,39.6,55.8,24.4
 20 | 19,69.2,20.5,18.3,11.3
 21 | 20,147.3,23.9,19.1,14.6
 22 | 21,218.4,27.7,53.4,18
 23 | 22,237.4,5.1,23.5,12.5
 24 | 23,13.2,15.9,49.6,5.6
 25 | 24,228.3,16.9,26.2,15.5
 26 | 25,62.3,12.6,18.3,9.7
 27 | 26,262.9,3.5,19.5,12
 28 | 27,142.9,29.3,12.6,15
 29 | 28,240.1,16.7,22.9,15.9
 30 | 29,248.8,27.1,22.9,18.9
 31 | 30,70.6,16,40.8,10.5
 32 | 31,292.9,28.3,43.2,21.4
 33 | 32,112.9,17.4,38.6,11.9
 34 | 33,97.2,1.5,30,9.6
 35 | 34,265.6,20,0.3,17.4
 36 | 35,95.7,1.4,7.4,9.5
 37 | 36,290.7,4.1,8.5,12.8
 38 | 37,266.9,43.8,5,25.4
 39 | 38,74.7,49.4,45.7,14.7
 40 | 39,43.1,26.7,35.1,10.1
 41 | 40,228,37.7,32,21.5
 42 | 41,202.5,22.3,31.6,16.6
 43 | 42,177,33.4,38.7,17.1
 44 | 43,293.6,27.7,1.8,20.7
 45 | 44,206.9,8.4,26.4,12.9
 46 | 45,25.1,25.7,43.3,8.5
 47 | 46,175.1,22.5,31.5,14.9
 48 | 47,89.7,9.9,35.7,10.6
 49 | 48,239.9,41.5,18.5,23.2
 50 | 49,227.2,15.8,49.9,14.8
 51 | 50,66.9,11.7,36.8,9.7
 52 | 51,199.8,3.1,34.6,11.4
 53 | 52,100.4,9.6,3.6,10.7
 54 | 53,216.4,41.7,39.6,22.6
 55 | 54,182.6,46.2,58.7,21.2
 56 | 55,262.7,28.8,15.9,20.2
 57 | 56,198.9,49.4,60,23.7
 58 | 57,7.3,28.1,41.4,5.5
 59 | 58,136.2,19.2,16.6,13.2
 60 | 59,210.8,49.6,37.7,23.8
 61 | 60,210.7,29.5,9.3,18.4
 62 | 61,53.5,2,21.4,8.1
 63 | 62,261.3,42.7,54.7,24.2
 64 | 63,239.3,15.5,27.3,15.7
 65 | 64,102.7,29.6,8.4,14
 66 | 65,131.1,42.8,28.9,18
 67 | 66,69,9.3,0.9,9.3
 68 | 67,31.5,24.6,2.2,9.5
 69 | 68,139.3,14.5,10.2,13.4
 70 | 69,237.4,27.5,11,18.9
 71 | 70,216.8,43.9,27.2,22.3
 72 | 71,199.1,30.6,38.7,18.3
 73 | 72,109.8,14.3,31.7,12.4
 74 | 73,26.8,33,19.3,8.8
 75 | 74,129.4,5.7,31.3,11
 76 | 75,213.4,24.6,13.1,17
 77 | 76,16.9,43.7,89.4,8.7
 78 | 77,27.5,1.6,20.7,6.9
 79 | 78,120.5,28.5,14.2,14.2
 80 | 79,5.4,29.9,9.4,5.3
 81 | 80,116,7.7,23.1,11
 82 | 81,76.4,26.7,22.3,11.8
 83 | 82,239.8,4.1,36.9,12.3
 84 | 83,75.3,20.3,32.5,11.3
 85 | 84,68.4,44.5,35.6,13.6
 86 | 85,213.5,43,33.8,21.7
 87 | 86,193.2,18.4,65.7,15.2
 88 | 87,76.3,27.5,16,12
 89 | 88,110.7,40.6,63.2,16
 90 | 89,88.3,25.5,73.4,12.9
 91 | 90,109.8,47.8,51.4,16.7
 92 | 91,134.3,4.9,9.3,11.2
 93 | 92,28.6,1.5,33,7.3
 94 | 93,217.7,33.5,59,19.4
 95 | 94,250.9,36.5,72.3,22.2
 96 | 95,107.4,14,10.9,11.5
 97 | 96,163.3,31.6,52.9,16.9
 98 | 97,197.6,3.5,5.9,11.7
 99 | 98,184.9,21,22,15.5
100 | 99,289.7,42.3,51.2,25.4
101 | 100,135.2,41.7,45.9,17.2
102 | 101,222.4,4.3,49.8,11.7
103 | 102,296.4,36.3,100.9,23.8
104 | 103,280.2,10.1,21.4,14.8
105 | 104,187.9,17.2,17.9,14.7
106 | 105,238.2,34.3,5.3,20.7
107 | 106,137.9,46.4,59,19.2
108 | 107,25,11,29.7,7.2
109 | 108,90.4,0.3,23.2,8.7
110 | 109,13.1,0.4,25.6,5.3
111 | 110,255.4,26.9,5.5,19.8
112 | 111,225.8,8.2,56.5,13.4
113 | 112,241.7,38,23.2,21.8
114 | 113,175.7,15.4,2.4,14.1
115 | 114,209.6,20.6,10.7,15.9
116 | 115,78.2,46.8,34.5,14.6
117 | 116,75.1,35,52.7,12.6
118 | 117,139.2,14.3,25.6,12.2
119 | 118,76.4,0.8,14.8,9.4
120 | 119,125.7,36.9,79.2,15.9
121 | 120,19.4,16,22.3,6.6
122 | 121,141.3,26.8,46.2,15.5
123 | 122,18.8,21.7,50.4,7
124 | 123,224,2.4,15.6,11.6
125 | 124,123.1,34.6,12.4,15.2
126 | 125,229.5,32.3,74.2,19.7
127 | 126,87.2,11.8,25.9,10.6
128 | 127,7.8,38.9,50.6,6.6
129 | 128,80.2,0,9.2,8.8
130 | 129,220.3,49,3.2,24.7
131 | 130,59.6,12,43.1,9.7
132 | 131,0.7,39.6,8.7,1.6
133 | 132,265.2,2.9,43,12.7
134 | 133,8.4,27.2,2.1,5.7
135 | 134,219.8,33.5,45.1,19.6
136 | 135,36.9,38.6,65.6,10.8
137 | 136,48.3,47,8.5,11.6
138 | 137,25.6,39,9.3,9.5
139 | 138,273.7,28.9,59.7,20.8
140 | 139,43,25.9,20.5,9.6
141 | 140,184.9,43.9,1.7,20.7
142 | 141,73.4,17,12.9,10.9
143 | 142,193.7,35.4,75.6,19.2
144 | 143,220.5,33.2,37.9,20.1
145 | 144,104.6,5.7,34.4,10.4
146 | 145,96.2,14.8,38.9,11.4
147 | 146,140.3,1.9,9,10.3
148 | 147,240.1,7.3,8.7,13.2
149 | 148,243.2,49,44.3,25.4
150 | 149,38,40.3,11.9,10.9
151 | 150,44.7,25.8,20.6,10.1
152 | 151,280.7,13.9,37,16.1
153 | 152,121,8.4,48.7,11.6
154 | 153,197.6,23.3,14.2,16.6
155 | 154,171.3,39.7,37.7,19
156 | 155,187.8,21.1,9.5,15.6
157 | 156,4.1,11.6,5.7,3.2
158 | 157,93.9,43.5,50.5,15.3
159 | 158,149.8,1.3,24.3,10.1
160 | 159,11.7,36.9,45.2,7.3
161 | 160,131.7,18.4,34.6,12.9
162 | 161,172.5,18.1,30.7,14.4
163 | 162,85.7,35.8,49.3,13.3
164 | 163,188.4,18.1,25.6,14.9
165 | 164,163.5,36.8,7.4,18
166 | 165,117.2,14.7,5.4,11.9
167 | 166,234.5,3.4,84.8,11.9
168 | 167,17.9,37.6,21.6,8
169 | 168,206.8,5.2,19.4,12.2
170 | 169,215.4,23.6,57.6,17.1
171 | 170,284.3,10.6,6.4,15
172 | 171,50,11.6,18.4,8.4
173 | 172,164.5,20.9,47.4,14.5
174 | 173,19.6,20.1,17,7.6
175 | 174,168.4,7.1,12.8,11.7
176 | 175,222.4,3.4,13.1,11.5
177 | 176,276.9,48.9,41.8,27
178 | 177,248.4,30.2,20.3,20.2
179 | 178,170.2,7.8,35.2,11.7
180 | 179,276.7,2.3,23.7,11.8
181 | 180,165.6,10,17.6,12.6
182 | 181,156.6,2.6,8.3,10.5
183 | 182,218.5,5.4,27.4,12.2
184 | 183,56.2,5.7,29.7,8.7
185 | 184,287.6,43,71.8,26.2
186 | 185,253.8,21.3,30,17.6
187 | 186,205,45.1,19.6,22.6
188 | 187,139.5,2.1,26.6,10.3
189 | 188,191.1,28.7,18.2,17.3
190 | 189,286,13.9,3.7,15.9
191 | 190,18.7,12.1,23.4,6.7
192 | 191,39.5,41.1,5.8,10.8
193 | 192,75.5,10.8,6,9.9
194 | 193,17.2,4.1,31.6,5.9
195 | 194,166.8,42,3.6,19.6
196 | 195,149.7,35.6,6,17.3
197 | 196,38.2,3.7,13.8,7.6
198 | 197,94.2,4.9,8.1,9.7
199 | 198,177,9.3,6.4,12.8
200 | 199,283.6,42,66.2,25.5
201 | 200,232.1,8.6,8.7,13.4
202 | 


--------------------------------------------------------------------------------
/Anomaly_Detection_using_Using_Gaussian_Mixture_Models.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Anomaly Detection using Using Gaussian Mixture Models.ipynb",
  7 |       "provenance": [],
  8 |       "toc_visible": true,
  9 |       "authorship_tag": "ABX9TyNX728rInjjG7OnOJLHXmfh",
 10 |       "include_colab_link": true
 11 |     },
 12 |     "kernelspec": {
 13 |       "name": "python3",
 14 |       "display_name": "Python 3"
 15 |     },
 16 |     "language_info": {
 17 |       "name": "python"
 18 |     }
 19 |   },
 20 |   "cells": [
 21 |     {
 22 |       "cell_type": "markdown",
 23 |       "metadata": {
 24 |         "id": "view-in-github",
 25 |         "colab_type": "text"
 26 |       },
 27 |       "source": [
 28 |         "<a href=\"https://colab.research.google.com/github/hussain0048/Machine-Learning/blob/master/Anomaly_Detection_using_Using_Gaussian_Mixture_Models.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
 29 |       ]
 30 |     },
 31 |     {
 32 |       "cell_type": "markdown",
 33 |       "source": [
 34 |         "# **Data Loading**"
 35 |       ],
 36 |       "metadata": {
 37 |         "id": "ozLwZwg3QgOG"
 38 |       }
 39 |     },
 40 |     {
 41 |       "cell_type": "code",
 42 |       "execution_count": null,
 43 |       "metadata": {
 44 |         "id": "UgIn8MqKOwhu"
 45 |       },
 46 |       "outputs": [],
 47 |       "source": [
 48 |         "from google.colab import drive\n",
 49 |         "drive.mount('/content/drive')"
 50 |       ]
 51 |     },
 52 |     {
 53 |       "cell_type": "markdown",
 54 |       "source": [
 55 |         "#**Import libaray**"
 56 |       ],
 57 |       "metadata": {
 58 |         "id": "5Lusrw9PQ3g9"
 59 |       }
 60 |     },
 61 |     {
 62 |       "cell_type": "code",
 63 |       "source": [
 64 |         "import numpy as np\n",
 65 |         "import pandas as pd\n",
 66 |         "import matplotlib.pyplot as plt\n",
 67 |         "import seaborn as sb\n",
 68 |         "from scipy.io import loadmat\n",
 69 |         "%matplotlib inline"
 70 |       ],
 71 |       "metadata": {
 72 |         "id": "Rf85GGg1RAdU"
 73 |       },
 74 |       "execution_count": null,
 75 |       "outputs": []
 76 |     },
 77 |     {
 78 |       "cell_type": "markdown",
 79 |       "source": [
 80 |         "# **Dataset**"
 81 |       ],
 82 |       "metadata": {
 83 |         "id": "g1F04UK1REQG"
 84 |       }
 85 |     },
 86 |     {
 87 |       "cell_type": "code",
 88 |       "source": [
 89 |         "# Number of samples per component\n",
 90 |         "n_samples = 500"
 91 |       ],
 92 |       "metadata": {
 93 |         "id": "9FTMOy8vROPo"
 94 |       },
 95 |       "execution_count": null,
 96 |       "outputs": []
 97 |     },
 98 |     {
 99 |       "cell_type": "code",
100 |       "source": [
101 |         "# Generate random sample, two components\n",
102 |         "import numpy as np\n",
103 |         "# Generate random sample, two components\n",
104 |         "np.random.seed(0)\n",
105 |         "C = np.array([[0., -0.1], [1.7, .4]])\n",
106 |         "C2 = np.array([[1., -0.1], [2.7, .2]])\n",
107 |         "#X = np.r_[np.dot(np.random.randn(n_samples, 2), C)]\n",
108 |         "          #.7 * np.random.randn(n_samples, 2) + np.array([-6, 3])]\n",
109 |         "X = np.r_[np.dot(np.random.randn(n_samples, 2), C),np.dot(np.random.randn(n_samples, 2), C2)]"
110 |       ],
111 |       "metadata": {
112 |         "id": "NIdZm0CB6nuc"
113 |       },
114 |       "execution_count": 4,
115 |       "outputs": []
116 |     },
117 |     {
118 |       "cell_type": "markdown",
119 |       "source": [
120 |         "# **Data Ploting**"
121 |       ],
122 |       "metadata": {
123 |         "id": "k882mXlfRekE"
124 |       }
125 |     },
126 |     {
127 |       "cell_type": "code",
128 |       "source": [
129 |         "import matplotlib.pyplot as plt\n",
130 |         "%matplotlib inline"
131 |       ],
132 |       "metadata": {
133 |         "id": "iMvqCFDiRj0O"
134 |       },
135 |       "execution_count": 7,
136 |       "outputs": []
137 |     },
138 |     {
139 |       "cell_type": "code",
140 |       "source": [
141 |         "X[-5:] = [[4,-1],[4.1,-1.1],[3.9,-1],[4.0,-1.2],[4.0,-1.3]]"
142 |       ],
143 |       "metadata": {
144 |         "id": "ajQb_SjA7lyM"
145 |       },
146 |       "execution_count": 8,
147 |       "outputs": []
148 |     },
149 |     {
150 |       "cell_type": "code",
151 |       "source": [
152 |         "plt.scatter(X[:,0], X[:,1],s=5)"
153 |       ],
154 |       "metadata": {
155 |         "id": "Bnw4gYel7r8w"
156 |       },
157 |       "execution_count": null,
158 |       "outputs": []
159 |     },
160 |     {
161 |       "cell_type": "markdown",
162 |       "source": [
163 |         "# **Model development**"
164 |       ],
165 |       "metadata": {
166 |         "id": "JBcKxRPoRz1U"
167 |       }
168 |     },
169 |     {
170 |       "cell_type": "code",
171 |       "source": [
172 |         "from sklearn.mixture import GaussianMixture"
173 |       ],
174 |       "metadata": {
175 |         "id": "Rmmy_9WTSDvG"
176 |       },
177 |       "execution_count": 11,
178 |       "outputs": []
179 |     },
180 |     {
181 |       "cell_type": "code",
182 |       "source": [
183 |         "gmm = GaussianMixture(n_components=3)\n"
184 |       ],
185 |       "metadata": {
186 |         "id": "7L8Duj5GSH42"
187 |       },
188 |       "execution_count": 12,
189 |       "outputs": []
190 |     },
191 |     {
192 |       "cell_type": "code",
193 |       "source": [
194 |         "gmm.fit(X)"
195 |       ],
196 |       "metadata": {
197 |         "id": "Zx3ahZXV9L9M",
198 |         "outputId": "8b5237a5-fb9e-4eb0-cb5e-da87ae822929",
199 |         "colab": {
200 |           "base_uri": "https://localhost:8080/"
201 |         }
202 |       },
203 |       "execution_count": 13,
204 |       "outputs": [
205 |         {
206 |           "output_type": "execute_result",
207 |           "data": {
208 |             "text/plain": [
209 |               "GaussianMixture(n_components=3)"
210 |             ]
211 |           },
212 |           "metadata": {},
213 |           "execution_count": 13
214 |         }
215 |       ]
216 |     },
217 |     {
218 |       "cell_type": "code",
219 |       "source": [
220 |         "pred = gmm.predict(X)"
221 |       ],
222 |       "metadata": {
223 |         "id": "VaRONdmx9Qn6"
224 |       },
225 |       "execution_count": 14,
226 |       "outputs": []
227 |     },
228 |     {
229 |       "cell_type": "code",
230 |       "source": [
231 |         "pred[:50]"
232 |       ],
233 |       "metadata": {
234 |         "id": "9xpIlZmA9YIz",
235 |         "outputId": "e430ddae-f5a8-4a57-8c39-230d812ad832",
236 |         "colab": {
237 |           "base_uri": "https://localhost:8080/"
238 |         }
239 |       },
240 |       "execution_count": 15,
241 |       "outputs": [
242 |         {
243 |           "output_type": "execute_result",
244 |           "data": {
245 |             "text/plain": [
246 |               "array([1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
247 |               "       2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
248 |               "       2, 2, 2, 2, 2, 2])"
249 |             ]
250 |           },
251 |           "metadata": {},
252 |           "execution_count": 15
253 |         }
254 |       ]
255 |     },
256 |     {
257 |       "cell_type": "code",
258 |       "source": [
259 |         "plt.scatter(X[:,0], X[:,1],s=10,c=pred)"
260 |       ],
261 |       "metadata": {
262 |         "id": "_aStRaq29oSc"
263 |       },
264 |       "execution_count": null,
265 |       "outputs": []
266 |     },
267 |     {
268 |       "cell_type": "markdown",
269 |       "source": [
270 |         "# **References**"
271 |       ],
272 |       "metadata": {
273 |         "id": "husMyO00-Jxp"
274 |       }
275 |     },
276 |     {
277 |       "cell_type": "markdown",
278 |       "source": [
279 |         "[1-Anomaly Detection.ipynb](https://github.com/edyoda/data-science-complete-tutorial/blob/master/13.%20Anomaly%20Detection.ipynb)"
280 |       ],
281 |       "metadata": {
282 |         "id": "PD3T4GsK-PB1"
283 |       }
284 |     }
285 |   ]
286 | }


--------------------------------------------------------------------------------
/DALL·E 2025-02-20 09.38.02 - An enhanced AI-themed GitHub repository banner with a futuristic dark blue and black background, incorporating glowing abstract neural network pattern.webp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/DALL·E 2025-02-20 09.38.02 - An enhanced AI-themed GitHub repository banner with a futuristic dark blue and black background, incorporating glowing abstract neural network pattern.webp


--------------------------------------------------------------------------------
/Data Visualization/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Datasets/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Deep Learning library/FastAI_in_Machine_Learning.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "FastAI in Machine Learning",
  7 |       "provenance": [],
  8 |       "collapsed_sections": [],
  9 |       "toc_visible": true
 10 |     },
 11 |     "kernelspec": {
 12 |       "name": "python3",
 13 |       "display_name": "Python 3"
 14 |     },
 15 |     "language_info": {
 16 |       "name": "python"
 17 |     }
 18 |   },
 19 |   "cells": [
 20 |     {
 21 |       "cell_type": "markdown",
 22 |       "metadata": {
 23 |         "id": "gDzSpvtHNZPb"
 24 |       },
 25 |       "source": [
 26 |         "# **Introduction**"
 27 |       ]
 28 |     },
 29 |     {
 30 |       "cell_type": "markdown",
 31 |       "metadata": {
 32 |         "id": "7PanCid7NoX9"
 33 |       },
 34 |       "source": [
 35 |         "FastAI is a Machine Learning library used for Deep Learning tasks. It helps by providing top-level components that can be easily used to achieve cutting edge results. In this article, I will walk you through a tutorial on FastAI in Machine Learning using Python."
 36 |       ]
 37 |     },
 38 |     {
 39 |       "cell_type": "markdown",
 40 |       "metadata": {
 41 |         "id": "tAWclN5xNvSa"
 42 |       },
 43 |       "source": [
 44 |         "The FastAI library is created for two main goals:\n",
 45 |         "\n",
 46 |         "- to be approachable\n",
 47 |         "- rapidly productive\n",
 48 |         "\n"
 49 |       ]
 50 |     },
 51 |     {
 52 |       "cell_type": "markdown",
 53 |       "metadata": {
 54 |         "id": "bgfKKQUON9MK"
 55 |       },
 56 |       "source": [
 57 |         "FastAI aims to provide high-level components to researchers who use low-level components that can be used to build new approaches. The best part about this library is that it does all of this without substantial components in ease of use, flexibility, and performance."
 58 |       ]
 59 |     },
 60 |     {
 61 |       "cell_type": "markdown",
 62 |       "metadata": {
 63 |         "id": "NS2_1L5qOImL"
 64 |       },
 65 |       "source": [
 66 |         "FastAI is built on Pytorch, NumPy, PIL, pandas, and a few other libraries. To achieve its goals, it does not aim to hide the lower levels of its foundation. Using this machine learning library, we can directly interact with the underlying PyTorch primitive models.\n",
 67 |         "\n",
 68 |         "By using the FastAI library in Machine Learning, we can easily build and train advanced neural network models using transfer learning with very few lines of code. In the section below, I’ll show you an example of this library."
 69 |       ]
 70 |     },
 71 |     {
 72 |       "cell_type": "markdown",
 73 |       "metadata": {
 74 |         "id": "srGXt7ipON7a"
 75 |       },
 76 |       "source": [
 77 |         "In this section, I’ll walk you through an example of how to use the FastAI Machine Learning Library on a very popular task within the Machine Learning community that is about classifying dogs and cats."
 78 |       ]
 79 |     },
 80 |     {
 81 |       "cell_type": "markdown",
 82 |       "metadata": {
 83 |         "id": "sSN_BlZ3OSvT"
 84 |       },
 85 |       "source": [
 86 |         "To use this library, you need to run these three commands below in your command prompt or terminal:"
 87 |       ]
 88 |     },
 89 |     {
 90 |       "cell_type": "code",
 91 |       "metadata": {
 92 |         "id": "qD7IiuE1OUL0"
 93 |       },
 94 |       "source": [
 95 |         "!pip install fastai\n",
 96 |         "!pip install fastbook --upgrade\n",
 97 |         "!pip install -Uqq fastbook"
 98 |       ],
 99 |       "execution_count": null,
100 |       "outputs": []
101 |     },
102 |     {
103 |       "cell_type": "markdown",
104 |       "metadata": {
105 |         "id": "bkF7ZCndOovC"
106 |       },
107 |       "source": [
108 |         "After executing the above commands we need to prepare the environment to work on this library, which we can easily do by importing the fastbook library and passing the setup_book() function:"
109 |       ]
110 |     },
111 |     {
112 |       "cell_type": "code",
113 |       "metadata": {
114 |         "colab": {
115 |           "base_uri": "https://localhost:8080/"
116 |         },
117 |         "id": "jr3xu6mmOsCV",
118 |         "outputId": "5609f189-9ace-4bbb-f823-cd861d6b7018"
119 |       },
120 |       "source": [
121 |         "import fastbook\n",
122 |         "fastbook.setup_book()"
123 |       ],
124 |       "execution_count": 10,
125 |       "outputs": [
126 |         {
127 |           "output_type": "stream",
128 |           "text": [
129 |             "Mounted at /content/gdrive\n"
130 |           ],
131 |           "name": "stdout"
132 |         }
133 |       ]
134 |     },
135 |     {
136 |       "cell_type": "markdown",
137 |       "metadata": {
138 |         "id": "4mosj6MlO1IB"
139 |       },
140 |       "source": [
141 |         "Now let’s import the necessary libraries and the dataset that we need to work on in this tutorial:"
142 |       ]
143 |     },
144 |     {
145 |       "cell_type": "code",
146 |       "metadata": {
147 |         "colab": {
148 |           "base_uri": "https://localhost:8080/",
149 |           "height": 17
150 |         },
151 |         "id": "_XSNLrm_O6hZ",
152 |         "outputId": "d78baa28-6f79-4596-8616-b6ec0aa69608"
153 |       },
154 |       "source": [
155 |         "from fastai.vision.all import *\n",
156 |         "path = untar_data(URLs.PETS)"
157 |       ],
158 |       "execution_count": 11,
159 |       "outputs": [
160 |         {
161 |           "output_type": "display_data",
162 |           "data": {
163 |             "text/html": [
164 |               ""
165 |             ],
166 |             "text/plain": [
167 |               "<IPython.core.display.HTML object>"
168 |             ]
169 |           },
170 |           "metadata": {
171 |             "tags": []
172 |           }
173 |         }
174 |       ]
175 |     },
176 |     {
177 |       "cell_type": "markdown",
178 |       "metadata": {
179 |         "id": "yNhIIIj5O_iq"
180 |       },
181 |       "source": [
182 |         "In FastAI, untar_data is a very powerful convenience function to download files from a URL. We are using the PETS dataset here which includes 37 categories of pets with roughly around 200 images of each class. Now let’s determine the labels:"
183 |       ]
184 |     },
185 |     {
186 |       "cell_type": "code",
187 |       "metadata": {
188 |         "id": "bEIFujpKPPFQ"
189 |       },
190 |       "source": [
191 |         "def is_cat(x):\n",
192 |         "  return x[0].isupper()"
193 |       ],
194 |       "execution_count": 12,
195 |       "outputs": []
196 |     },
197 |     {
198 |       "cell_type": "markdown",
199 |       "metadata": {
200 |         "id": "qFDQ35bdPT2Y"
201 |       },
202 |       "source": [
203 |         "Now I will use the ImageDataLoader function which raps around several data loaders for the problems of computer vision:"
204 |       ]
205 |     },
206 |     {
207 |       "cell_type": "code",
208 |       "metadata": {
209 |         "id": "7T4AzbKaPZm4"
210 |       },
211 |       "source": [
212 |         "dls = ImageDataLoaders.from_name_func(\n",
213 |         "    path,\n",
214 |         "    get_image_files(path),\n",
215 |         "    valid_pct = 0.2,\n",
216 |         "    seed = 42,\n",
217 |         "    label_func = is_cat,\n",
218 |         "    item_tfms = Resize(224)\n",
219 |         ")"
220 |       ],
221 |       "execution_count": 13,
222 |       "outputs": []
223 |     },
224 |     {
225 |       "cell_type": "markdown",
226 |       "metadata": {
227 |         "id": "VAXSqaU7Pi7p"
228 |       },
229 |       "source": [
230 |         "**Final Step: Making Predictions**\n"
231 |       ]
232 |     },
233 |     {
234 |       "cell_type": "markdown",
235 |       "metadata": {
236 |         "id": "gWHm8lNLPnDY"
237 |       },
238 |       "source": [
239 |         "Now let’s train the model and make predictions:\n",
240 |         "\n"
241 |       ]
242 |     },
243 |     {
244 |       "cell_type": "code",
245 |       "metadata": {
246 |         "id": "w5A80ZU_PlLR"
247 |       },
248 |       "source": [
249 |         "learn =cnn_learner(dls,\n",
250 |         "                   resnet34,\n",
251 |         "                   metrics = error_rate)\n",
252 |         "\n",
253 |         "learn.fine_tune(1)\n",
254 |         "import ipywidgets as widgets\n",
255 |         "\n",
256 |         "uploader = widgets.FileUpload()\n",
257 |         "uploader\n",
258 |         "def pred():\n",
259 |         "  img = PILImage.create(uploader.data[0])\n",
260 |         "  img.show()\n",
261 |         "\n",
262 |         "  #Make Prediction\n",
263 |         "  is_cat,_,probs = learn.predict(img)\n",
264 |         "\n",
265 |         "  print(f\"Image is of a Cat: {is_cat}.\")\n",
266 |         "  print(f\"Probability image is a cat: {probs[1].item():.6f}\")\n",
267 |         "  pred()"
268 |       ],
269 |       "execution_count": null,
270 |       "outputs": []
271 |     },
272 |     {
273 |       "cell_type": "markdown",
274 |       "metadata": {
275 |         "id": "qOG-HMzONLd1"
276 |       },
277 |       "source": [
278 |         "# **References**"
279 |       ]
280 |     },
281 |     {
282 |       "cell_type": "markdown",
283 |       "metadata": {
284 |         "id": "ljFIXtDbNPtz"
285 |       },
286 |       "source": [
287 |         "[FastAI in Machine Learning](https://thecleverprogrammer.com/2021/01/22/fastai-in-machine-learning/)"
288 |       ]
289 |     }
290 |   ]
291 | }


--------------------------------------------------------------------------------
/Deep Learning library/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Distance_Measure_.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Distance Measure .ipynb",
  7 |       "provenance": [],
  8 |       "toc_visible": true,
  9 |       "authorship_tag": "ABX9TyNuKUxrnYYZ7AoByDw9rqRO",
 10 |       "include_colab_link": true
 11 |     },
 12 |     "kernelspec": {
 13 |       "name": "python3",
 14 |       "display_name": "Python 3"
 15 |     },
 16 |     "language_info": {
 17 |       "name": "python"
 18 |     }
 19 |   },
 20 |   "cells": [
 21 |     {
 22 |       "cell_type": "markdown",
 23 |       "metadata": {
 24 |         "id": "view-in-github",
 25 |         "colab_type": "text"
 26 |       },
 27 |       "source": [
 28 |         "<a href=\"https://colab.research.google.com/github/hussain0048/Machine-Learning/blob/master/Distance_Measure_.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
 29 |       ]
 30 |     },
 31 |     {
 32 |       "cell_type": "markdown",
 33 |       "source": [
 34 |         "#**Euclidean Distance**"
 35 |       ],
 36 |       "metadata": {
 37 |         "id": "yf4g4m_Im8F1"
 38 |       }
 39 |     },
 40 |     {
 41 |       "cell_type": "markdown",
 42 |       "source": [
 43 |         "## **Using the SciPy library**"
 44 |       ],
 45 |       "metadata": {
 46 |         "id": "WCaSZlQymsn5"
 47 |       }
 48 |     },
 49 |     {
 50 |       "cell_type": "code",
 51 |       "execution_count": null,
 52 |       "metadata": {
 53 |         "id": "h3_WjqkGmiTy"
 54 |       },
 55 |       "outputs": [],
 56 |       "source": [
 57 |         "from scipy.spatial import distance\n",
 58 |         "A = (5, 3)\n",
 59 |         "B = (2, 4)\n",
 60 |         "d = distance.euclidean(A, B)\n",
 61 |         "print('Euclidean Distance:',d)\n"
 62 |       ]
 63 |     },
 64 |     {
 65 |       "cell_type": "markdown",
 66 |       "source": [
 67 |         "## **Using NumPy Library**"
 68 |       ],
 69 |       "metadata": {
 70 |         "id": "0-EPqZKNnJK6"
 71 |       }
 72 |     },
 73 |     {
 74 |       "cell_type": "code",
 75 |       "source": [
 76 |         "import numpy as np\n",
 77 |         "A = np.array((5, 3))\n",
 78 |         "B = np.array((2, 4))\n",
 79 |         "d = np.linalg.norm(A-B)\n",
 80 |         "print(\"Euclidean Distance: \",d)"
 81 |       ],
 82 |       "metadata": {
 83 |         "id": "sxFTX13JnPwC"
 84 |       },
 85 |       "execution_count": null,
 86 |       "outputs": []
 87 |     },
 88 |     {
 89 |       "cell_type": "markdown",
 90 |       "source": [
 91 |         "# **Manhattan Distance**"
 92 |       ],
 93 |       "metadata": {
 94 |         "id": "WLxylSminTcE"
 95 |       }
 96 |     },
 97 |     {
 98 |       "cell_type": "markdown",
 99 |       "source": [
100 |         "**Manhattan Distance using Python**\n"
101 |       ],
102 |       "metadata": {
103 |         "id": "TRsCTYYZncFO"
104 |       }
105 |     },
106 |     {
107 |       "cell_type": "code",
108 |       "source": [
109 |         "from scipy.spatial import distance\n",
110 |         "A = (5, 3)\n",
111 |         "B = (2, 4)\n",
112 |         "d = distance.cityblock(A, B)\n",
113 |         "print('Manhattan Distance:',d)"
114 |       ],
115 |       "metadata": {
116 |         "id": "OjZy5ym6neYk"
117 |       },
118 |       "execution_count": null,
119 |       "outputs": []
120 |     },
121 |     {
122 |       "cell_type": "markdown",
123 |       "source": [
124 |         "#**Cosine Distance**"
125 |       ],
126 |       "metadata": {
127 |         "id": "mn0JXkoOnma0"
128 |       }
129 |     },
130 |     {
131 |       "cell_type": "code",
132 |       "source": [
133 |         "from scipy.spatial import distance\n",
134 |         "A = (5, 3)\n",
135 |         "B = (2, 4)\n",
136 |         "d = 1 - distance.cosine(A, B)\n",
137 |         "print('Cosine Distance:',d)\n"
138 |       ],
139 |       "metadata": {
140 |         "colab": {
141 |           "base_uri": "https://localhost:8080/"
142 |         },
143 |         "id": "EyiWTN0enyYR",
144 |         "outputId": "bc656dc3-72fb-42c1-f449-b2f8addcaf96"
145 |       },
146 |       "execution_count": 5,
147 |       "outputs": [
148 |         {
149 |           "output_type": "stream",
150 |           "name": "stdout",
151 |           "text": [
152 |             "Cosine Distance: 0.8436614877321075\n"
153 |           ]
154 |         }
155 |       ]
156 |     },
157 |     {
158 |       "cell_type": "markdown",
159 |       "source": [
160 |         "# **References**"
161 |       ],
162 |       "metadata": {
163 |         "id": "ts_op__5n1jZ"
164 |       }
165 |     },
166 |     {
167 |       "cell_type": "markdown",
168 |       "source": [
169 |         "[Measure Distance between data points in Machine Learning](https://deepblade.com/measure-distance-between-data-points-in-machine-learning/)"
170 |       ],
171 |       "metadata": {
172 |         "id": "n6KuV9W1o7rw"
173 |       }
174 |     }
175 |   ]
176 | }


--------------------------------------------------------------------------------
/Feature Selection/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/KNN_with_Python_.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "KNN with Python .ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": [],
  9 |       "toc_visible": true,
 10 |       "authorship_tag": "ABX9TyP5F/MUuMECwDqfqT/hWEMn",
 11 |       "include_colab_link": true
 12 |     },
 13 |     "kernelspec": {
 14 |       "name": "python3",
 15 |       "display_name": "Python 3"
 16 |     },
 17 |     "language_info": {
 18 |       "name": "python"
 19 |     }
 20 |   },
 21 |   "cells": [
 22 |     {
 23 |       "cell_type": "markdown",
 24 |       "metadata": {
 25 |         "id": "view-in-github",
 26 |         "colab_type": "text"
 27 |       },
 28 |       "source": [
 29 |         "<a href=\"https://colab.research.google.com/github/hussain0048/Machine-Learning/blob/master/KNN_with_Python_.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
 30 |       ]
 31 |     },
 32 |     {
 33 |       "cell_type": "markdown",
 34 |       "source": [
 35 |         "# **Dataset**"
 36 |       ],
 37 |       "metadata": {
 38 |         "id": "2EE6EUa9CgX9"
 39 |       }
 40 |     },
 41 |     {
 42 |       "cell_type": "code",
 43 |       "execution_count": 1,
 44 |       "metadata": {
 45 |         "id": "3xvG_5WCB0xW"
 46 |       },
 47 |       "outputs": [],
 48 |       "source": [
 49 |         "# read in the iris data\n",
 50 |         "from sklearn.datasets import load_iris\n",
 51 |         "iris = load_iris()\n",
 52 |         "\n",
 53 |         "# create X (features) and y (response)\n",
 54 |         "X = iris.data\n",
 55 |         "y = iris.target"
 56 |       ]
 57 |     },
 58 |     {
 59 |       "cell_type": "markdown",
 60 |       "source": [
 61 |         "# **KNN (K=5)**\n"
 62 |       ],
 63 |       "metadata": {
 64 |         "id": "edWE1NORCzEM"
 65 |       }
 66 |     },
 67 |     {
 68 |       "cell_type": "code",
 69 |       "source": [
 70 |         "from sklearn.neighbors import KNeighborsClassifier\n",
 71 |         "from sklearn import metrics\n",
 72 |         "knn = KNeighborsClassifier(n_neighbors=5)\n",
 73 |         "knn.fit(X, y)\n",
 74 |         "y_pred = knn.predict(X)\n",
 75 |         "print(metrics.accuracy_score(y, y_pred))"
 76 |       ],
 77 |       "metadata": {
 78 |         "id": "zUWFag3LC29s"
 79 |       },
 80 |       "execution_count": null,
 81 |       "outputs": []
 82 |     },
 83 |     {
 84 |       "cell_type": "markdown",
 85 |       "source": [
 86 |         "# **KNN (K=1)**\n"
 87 |       ],
 88 |       "metadata": {
 89 |         "id": "1iiDPsmkDNcK"
 90 |       }
 91 |     },
 92 |     {
 93 |       "cell_type": "code",
 94 |       "source": [
 95 |         "knn = KNeighborsClassifier(n_neighbors=1)\n",
 96 |         "knn.fit(X, y)\n",
 97 |         "y_pred = knn.predict(X)\n",
 98 |         "print(metrics.accuracy_score(y, y_pred))"
 99 |       ],
100 |       "metadata": {
101 |         "colab": {
102 |           "base_uri": "https://localhost:8080/"
103 |         },
104 |         "id": "U-nSWOTvDR96",
105 |         "outputId": "06264400-522e-44c8-c0c5-c05fe88f9052"
106 |       },
107 |       "execution_count": 4,
108 |       "outputs": [
109 |         {
110 |           "output_type": "stream",
111 |           "name": "stdout",
112 |           "text": [
113 |             "1.0\n"
114 |           ]
115 |         }
116 |       ]
117 |     },
118 |     {
119 |       "cell_type": "markdown",
120 |       "source": [
121 |         "#**Evaluation procedure #2: Train/test split**\n"
122 |       ],
123 |       "metadata": {
124 |         "id": "gZFVYcunDf16"
125 |       }
126 |     },
127 |     {
128 |       "cell_type": "code",
129 |       "source": [
130 |         "# print the shapes of X and y\n",
131 |         "print(X.shape)\n",
132 |         "print(y.shape)"
133 |       ],
134 |       "metadata": {
135 |         "colab": {
136 |           "base_uri": "https://localhost:8080/"
137 |         },
138 |         "id": "MMlebVPgDq4j",
139 |         "outputId": "7d4efdfd-3afa-474d-d494-ee916a0d887b"
140 |       },
141 |       "execution_count": 5,
142 |       "outputs": [
143 |         {
144 |           "output_type": "stream",
145 |           "name": "stdout",
146 |           "text": [
147 |             "(150, 4)\n",
148 |             "(150,)\n"
149 |           ]
150 |         }
151 |       ]
152 |     },
153 |     {
154 |       "cell_type": "code",
155 |       "source": [
156 |         "# STEP 1: split X and y into training and testing sets\n",
157 |         "from sklearn.model_selection import train_test_split\n",
158 |         "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=4)"
159 |       ],
160 |       "metadata": {
161 |         "id": "UbBF5af_F9fM"
162 |       },
163 |       "execution_count": 6,
164 |       "outputs": []
165 |     },
166 |     {
167 |       "cell_type": "code",
168 |       "source": [
169 |         "# print the shapes of the new X objects\n",
170 |         "print(X_train.shape)\n",
171 |         "print(X_test.shape)"
172 |       ],
173 |       "metadata": {
174 |         "colab": {
175 |           "base_uri": "https://localhost:8080/"
176 |         },
177 |         "id": "5ue1TOw_GDpY",
178 |         "outputId": "3c8dfdc9-963e-462d-9222-055679529f83"
179 |       },
180 |       "execution_count": 7,
181 |       "outputs": [
182 |         {
183 |           "output_type": "stream",
184 |           "name": "stdout",
185 |           "text": [
186 |             "(90, 4)\n",
187 |             "(60, 4)\n"
188 |           ]
189 |         }
190 |       ]
191 |     },
192 |     {
193 |       "cell_type": "code",
194 |       "source": [
195 |         "# print the shapes of the new y objects\n",
196 |         "print(y_train.shape)\n",
197 |         "print(y_test.shape)"
198 |       ],
199 |       "metadata": {
200 |         "id": "DyrZNBH5GHwI"
201 |       },
202 |       "execution_count": null,
203 |       "outputs": []
204 |     },
205 |     {
206 |       "cell_type": "markdown",
207 |       "source": [
208 |         "**Repeat for KNN with K=5:**\n",
209 |         "\n"
210 |       ],
211 |       "metadata": {
212 |         "id": "WVrAozv6GO0Z"
213 |       }
214 |     },
215 |     {
216 |       "cell_type": "code",
217 |       "source": [
218 |         "knn = KNeighborsClassifier(n_neighbors=5)\n",
219 |         "knn.fit(X_train, y_train)\n",
220 |         "y_pred = knn.predict(X_test)\n",
221 |         "print(metrics.accuracy_score(y_test, y_pred))"
222 |       ],
223 |       "metadata": {
224 |         "colab": {
225 |           "base_uri": "https://localhost:8080/"
226 |         },
227 |         "id": "B1BXJmzJGRTR",
228 |         "outputId": "f57550a9-6e05-40bd-c7cc-3d987cae7780"
229 |       },
230 |       "execution_count": 9,
231 |       "outputs": [
232 |         {
233 |           "output_type": "stream",
234 |           "name": "stdout",
235 |           "text": [
236 |             "0.9666666666666667\n"
237 |           ]
238 |         }
239 |       ]
240 |     },
241 |     {
242 |       "cell_type": "markdown",
243 |       "source": [
244 |         "#**Repeat for KNN with K=1**\n",
245 |         "\n"
246 |       ],
247 |       "metadata": {
248 |         "id": "BkVRVhTtGZHu"
249 |       }
250 |     },
251 |     {
252 |       "cell_type": "code",
253 |       "source": [
254 |         "knn = KNeighborsClassifier(n_neighbors=1)\n",
255 |         "knn.fit(X_train, y_train)\n",
256 |         "y_pred = knn.predict(X_test)\n",
257 |         "print(metrics.accuracy_score(y_test, y_pred))"
258 |       ],
259 |       "metadata": {
260 |         "id": "bCyJ_HhzGkuC"
261 |       },
262 |       "execution_count": null,
263 |       "outputs": []
264 |     },
265 |     {
266 |       "cell_type": "markdown",
267 |       "source": [
268 |         "# **Can we locate an even better value for K?**\n",
269 |         "\n"
270 |       ],
271 |       "metadata": {
272 |         "id": "jZlQPqiNGsJn"
273 |       }
274 |     },
275 |     {
276 |       "cell_type": "code",
277 |       "source": [
278 |         "# try K=1 through K=25 and record testing accuracy\n",
279 |         "k_range = list(range(1, 26))\n",
280 |         "scores = []\n",
281 |         "for k in k_range:\n",
282 |         "    knn = KNeighborsClassifier(n_neighbors=k)\n",
283 |         "    knn.fit(X_train, y_train)\n",
284 |         "    y_pred = knn.predict(X_test)\n",
285 |         "    scores.append(metrics.accuracy_score(y_test, y_pred))"
286 |       ],
287 |       "metadata": {
288 |         "id": "TnvErmrEGxDY"
289 |       },
290 |       "execution_count": 12,
291 |       "outputs": []
292 |     },
293 |     {
294 |       "cell_type": "code",
295 |       "source": [
296 |         "# import Matplotlib (scientific plotting library)\n",
297 |         "import matplotlib.pyplot as plt\n",
298 |         "\n",
299 |         "# allow plots to appear within the notebook\n",
300 |         "%matplotlib inline\n",
301 |         "\n",
302 |         "# plot the relationship between K and testing accuracy\n",
303 |         "plt.plot(k_range, scores)\n",
304 |         "plt.xlabel('Value of K for KNN')\n",
305 |         "plt.ylabel('Testing Accuracy')"
306 |       ],
307 |       "metadata": {
308 |         "id": "DzRYpsiAG86g"
309 |       },
310 |       "execution_count": null,
311 |       "outputs": []
312 |     },
313 |     {
314 |       "cell_type": "markdown",
315 |       "source": [
316 |         "# **References**\n"
317 |       ],
318 |       "metadata": {
319 |         "id": "bqzF4sMlG_yX"
320 |       }
321 |     },
322 |     {
323 |       "cell_type": "markdown",
324 |       "source": [
325 |         "[1-Comparing Machine Learning models in scikit-learn](https://github.com/justmarkham/scikit-learn-videos/blob/master/05_model_evaluation.ipynb)"
326 |       ],
327 |       "metadata": {
328 |         "id": "N_FzM6jhHFlR"
329 |       }
330 |     }
331 |   ]
332 | }


--------------------------------------------------------------------------------
/ML(Andrew)/4-Linear Regression with Multiple Variables/Readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/ML(Andrew)/4-Linear Regression with Multiple Variables/ex1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/4-Linear Regression with Multiple Variables/ex1.pdf


--------------------------------------------------------------------------------
/ML(Andrew)/4-Linear Regression with Multiple Variables/ex1data1.txt:
--------------------------------------------------------------------------------
 1 | 6.1101,17.592
 2 | 5.5277,9.1302
 3 | 8.5186,13.662
 4 | 7.0032,11.854
 5 | 5.8598,6.8233
 6 | 8.3829,11.886
 7 | 7.4764,4.3483
 8 | 8.5781,12
 9 | 6.4862,6.5987
10 | 5.0546,3.8166
11 | 5.7107,3.2522
12 | 14.164,15.505
13 | 5.734,3.1551
14 | 8.4084,7.2258
15 | 5.6407,0.71618
16 | 5.3794,3.5129
17 | 6.3654,5.3048
18 | 5.1301,0.56077
19 | 6.4296,3.6518
20 | 7.0708,5.3893
21 | 6.1891,3.1386
22 | 20.27,21.767
23 | 5.4901,4.263
24 | 6.3261,5.1875
25 | 5.5649,3.0825
26 | 18.945,22.638
27 | 12.828,13.501
28 | 10.957,7.0467
29 | 13.176,14.692
30 | 22.203,24.147
31 | 5.2524,-1.22
32 | 6.5894,5.9966
33 | 9.2482,12.134
34 | 5.8918,1.8495
35 | 8.2111,6.5426
36 | 7.9334,4.5623
37 | 8.0959,4.1164
38 | 5.6063,3.3928
39 | 12.836,10.117
40 | 6.3534,5.4974
41 | 5.4069,0.55657
42 | 6.8825,3.9115
43 | 11.708,5.3854
44 | 5.7737,2.4406
45 | 7.8247,6.7318
46 | 7.0931,1.0463
47 | 5.0702,5.1337
48 | 5.8014,1.844
49 | 11.7,8.0043
50 | 5.5416,1.0179
51 | 7.5402,6.7504
52 | 5.3077,1.8396
53 | 7.4239,4.2885
54 | 7.6031,4.9981
55 | 6.3328,1.4233
56 | 6.3589,-1.4211
57 | 6.2742,2.4756
58 | 5.6397,4.6042
59 | 9.3102,3.9624
60 | 9.4536,5.4141
61 | 8.8254,5.1694
62 | 5.1793,-0.74279
63 | 21.279,17.929
64 | 14.908,12.054
65 | 18.959,17.054
66 | 7.2182,4.8852
67 | 8.2951,5.7442
68 | 10.236,7.7754
69 | 5.4994,1.0173
70 | 20.341,20.992
71 | 10.136,6.6799
72 | 7.3345,4.0259
73 | 6.0062,1.2784
74 | 7.2259,3.3411
75 | 5.0269,-2.6807
76 | 6.5479,0.29678
77 | 7.5386,3.8845
78 | 5.0365,5.7014
79 | 10.274,6.7526
80 | 5.1077,2.0576
81 | 5.7292,0.47953
82 | 5.1884,0.20421
83 | 6.3557,0.67861
84 | 9.7687,7.5435
85 | 6.5159,5.3436
86 | 8.5172,4.2415
87 | 9.1802,6.7981
88 | 6.002,0.92695
89 | 5.5204,0.152
90 | 5.0594,2.8214
91 | 5.7077,1.8451
92 | 7.6366,4.2959
93 | 5.8707,7.2029
94 | 5.3054,1.9869
95 | 8.2934,0.14454
96 | 13.394,9.0551
97 | 5.4369,0.61705
98 | 


--------------------------------------------------------------------------------
/ML(Andrew)/4-Linear Regression with Multiple Variables/ex1data2.txt:
--------------------------------------------------------------------------------
 1 | 2104,3,399900
 2 | 1600,3,329900
 3 | 2400,3,369000
 4 | 1416,2,232000
 5 | 3000,4,539900
 6 | 1985,4,299900
 7 | 1534,3,314900
 8 | 1427,3,198999
 9 | 1380,3,212000
10 | 1494,3,242500
11 | 1940,4,239999
12 | 2000,3,347000
13 | 1890,3,329999
14 | 4478,5,699900
15 | 1268,3,259900
16 | 2300,4,449900
17 | 1320,2,299900
18 | 1236,3,199900
19 | 2609,4,499998
20 | 3031,4,599000
21 | 1767,3,252900
22 | 1888,2,255000
23 | 1604,3,242900
24 | 1962,4,259900
25 | 3890,3,573900
26 | 1100,3,249900
27 | 1458,3,464500
28 | 2526,3,469000
29 | 2200,3,475000
30 | 2637,3,299900
31 | 1839,2,349900
32 | 1000,1,169900
33 | 2040,4,314900
34 | 3137,3,579900
35 | 1811,4,285900
36 | 1437,3,249900
37 | 1239,3,229900
38 | 2132,4,345000
39 | 4215,4,549000
40 | 2162,4,287000
41 | 1664,2,368500
42 | 2238,3,329900
43 | 2567,4,314000
44 | 1200,3,299000
45 | 852,2,179900
46 | 1852,4,299900
47 | 1203,3,239500
48 | 


--------------------------------------------------------------------------------
/ML(Andrew)/4-Linear Regression with Multiple Variables/utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import sys
 3 | sys.path.append('..')
 4 | 
 5 | from submission import SubmissionBase
 6 | 
 7 | 
 8 | class Grader(SubmissionBase):
 9 |     X1 = np.column_stack((np.ones(20), np.exp(1) + np.exp(2) * np.linspace(0.1, 2, 20)))
10 |     Y1 = X1[:, 1] + np.sin(X1[:, 0]) + np.cos(X1[:, 1])
11 |     X2 = np.column_stack((X1, X1[:, 1]**0.5, X1[:, 1]**0.25))
12 |     Y2 = np.power(Y1, 0.5) + Y1
13 | 
14 |     def __init__(self):
15 |         part_names = ['Warm up exercise',
16 |                       'Computing Cost (for one variable)',
17 |                       'Gradient Descent (for one variable)',
18 |                       'Feature Normalization',
19 |                       'Computing Cost (for multiple variables)',
20 |                       'Gradient Descent (for multiple variables)',
21 |                       'Normal Equations']
22 |         super().__init__('linear-regression', part_names)
23 | 
24 |     def __iter__(self):
25 |         for part_id in range(1, 8):
26 |             try:
27 |                 func = self.functions[part_id]
28 | 
29 |                 # Each part has different expected arguments/different function
30 |                 if part_id == 1:
31 |                     res = func()
32 |                 elif part_id == 2:
33 |                     res = func(self.X1, self.Y1, np.array([0.5, -0.5]))
34 |                 elif part_id == 3:
35 |                     res = func(self.X1, self.Y1, np.array([0.5, -0.5]), 0.01, 10)
36 |                 elif part_id == 4:
37 |                     res = func(self.X2[:, 1:4])
38 |                 elif part_id == 5:
39 |                     res = func(self.X2, self.Y2, np.array([0.1, 0.2, 0.3, 0.4]))
40 |                 elif part_id == 6:
41 |                     res = func(self.X2, self.Y2, np.array([-0.1, -0.2, -0.3, -0.4]), 0.01, 10)
42 |                 elif part_id == 7:
43 |                     res = func(self.X2, self.Y2)
44 |                 else:
45 |                     raise KeyError
46 |                 yield part_id, res
47 |             except KeyError:
48 |                 yield part_id, 0
49 | 


--------------------------------------------------------------------------------
/ML(Andrew)/5-Logistic Regression (LR)/ex3data1.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/5-Logistic Regression (LR)/ex3data1.mat


--------------------------------------------------------------------------------
/ML(Andrew)/5-Logistic Regression (LR)/ex3weights.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/5-Logistic Regression (LR)/ex3weights.mat


--------------------------------------------------------------------------------
/ML(Andrew)/5-Logistic Regression (LR)/neuralnetwork.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/5-Logistic Regression (LR)/neuralnetwork.png


--------------------------------------------------------------------------------
/ML(Andrew)/5-Logistic Regression (LR)/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/ML(Andrew)/5-Logistic Regression (LR)/token.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/5-Logistic Regression (LR)/token.pkl


--------------------------------------------------------------------------------
/ML(Andrew)/5-Logistic Regression (LR)/utils.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import numpy as np
  3 | from matplotlib import pyplot
  4 | 
  5 | sys.path.append('..')
  6 | from submission import SubmissionBase
  7 | 
  8 | 
  9 | def displayData(X, example_width=None, figsize=(10, 10)):
 10 |     """
 11 |     Displays 2D data stored in X in a nice grid.
 12 |     """
 13 |     # Compute rows, cols
 14 |     if X.ndim == 2:
 15 |         m, n = X.shape
 16 |     elif X.ndim == 1:
 17 |         n = X.size
 18 |         m = 1
 19 |         X = X[None]  # Promote to a 2 dimensional array
 20 |     else:
 21 |         raise IndexError('Input X should be 1 or 2 dimensional.')
 22 | 
 23 |     example_width = example_width or int(np.round(np.sqrt(n)))
 24 |     example_height = n / example_width
 25 | 
 26 |     # Compute number of items to display
 27 |     display_rows = int(np.floor(np.sqrt(m)))
 28 |     display_cols = int(np.ceil(m / display_rows))
 29 | 
 30 |     fig, ax_array = pyplot.subplots(display_rows, display_cols, figsize=figsize)
 31 |     fig.subplots_adjust(wspace=0.025, hspace=0.025)
 32 | 
 33 |     ax_array = [ax_array] if m == 1 else ax_array.ravel()
 34 | 
 35 |     for i, ax in enumerate(ax_array):
 36 |         ax.imshow(X[i].reshape(example_width, example_width, order='F'),
 37 |                   cmap='Greys', extent=[0, 1, 0, 1])
 38 |         ax.axis('off')
 39 | 
 40 | 
 41 | def sigmoid(z):
 42 |     """
 43 |     Computes the sigmoid of z.
 44 |     """
 45 |     return 1.0 / (1.0 + np.exp(-z))
 46 | 
 47 | 
 48 | class Grader(SubmissionBase):
 49 |     # Random Test Cases
 50 |     X = np.stack([np.ones(20),
 51 |                   np.exp(1) * np.sin(np.arange(1, 21)),
 52 |                   np.exp(0.5) * np.cos(np.arange(1, 21))], axis=1)
 53 | 
 54 |     y = (np.sin(X[:, 0] + X[:, 1]) > 0).astype(float)
 55 | 
 56 |     Xm = np.array([[-1, -1],
 57 |                    [-1, -2],
 58 |                    [-2, -1],
 59 |                    [-2, -2],
 60 |                    [1, 1],
 61 |                    [1, 2],
 62 |                    [2, 1],
 63 |                    [2, 2],
 64 |                    [-1, 1],
 65 |                    [-1, 2],
 66 |                    [-2, 1],
 67 |                    [-2, 2],
 68 |                    [1, -1],
 69 |                    [1, -2],
 70 |                    [-2, -1],
 71 |                    [-2, -2]])
 72 |     ym = np.array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
 73 | 
 74 |     t1 = np.sin(np.reshape(np.arange(1, 25, 2), (4, 3), order='F'))
 75 |     t2 = np.cos(np.reshape(np.arange(1, 41, 2), (4, 5), order='F'))
 76 | 
 77 |     def __init__(self):
 78 |         part_names = ['Regularized Logistic Regression',
 79 |                       'One-vs-All Classifier Training',
 80 |                       'One-vs-All Classifier Prediction',
 81 |                       'Neural Network Prediction Function']
 82 | 
 83 |         super().__init__('multi-class-classification-and-neural-networks', part_names)
 84 | 
 85 |     def __iter__(self):
 86 |         for part_id in range(1, 5):
 87 |             try:
 88 |                 func = self.functions[part_id]
 89 | 
 90 |                 # Each part has different expected arguments/different function
 91 |                 if part_id == 1:
 92 |                     res = func(np.array([0.25, 0.5, -0.5]), self.X, self.y, 0.1)
 93 |                     res = np.hstack(res).tolist()
 94 |                 elif part_id == 2:
 95 |                     res = func(self.Xm, self.ym, 4, 0.1)
 96 |                 elif part_id == 3:
 97 |                     res = func(self.t1, self.Xm) + 1
 98 |                 elif part_id == 4:
 99 |                     res = func(self.t1, self.t2, self.Xm) + 1
100 |                 else:
101 |                     raise KeyError
102 |                 yield part_id, res
103 |             except KeyError:
104 |                 yield part_id, 0
105 | 


--------------------------------------------------------------------------------
/ML(Andrew)/Neural Networks: Representation/ex4-backpropagation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/Neural Networks: Representation/ex4-backpropagation.png


--------------------------------------------------------------------------------
/ML(Andrew)/Neural Networks: Representation/ex4data1.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/Neural Networks: Representation/ex4data1.mat


--------------------------------------------------------------------------------
/ML(Andrew)/Neural Networks: Representation/ex4weights.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/Neural Networks: Representation/ex4weights.mat


--------------------------------------------------------------------------------
/ML(Andrew)/Neural Networks: Representation/neural_network.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/Neural Networks: Representation/neural_network.png


--------------------------------------------------------------------------------
/ML(Andrew)/Neural Networks: Representation/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/ML(Andrew)/Neural Networks: Representation/utils.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import numpy as np
  3 | from matplotlib import pyplot
  4 | 
  5 | sys.path.append('..')
  6 | from submission import SubmissionBase
  7 | 
  8 | 
  9 | def displayData(X, example_width=None, figsize=(10, 10)):
 10 |     """
 11 |     Displays 2D data stored in X in a nice grid.
 12 |     """
 13 |     # Compute rows, cols
 14 |     if X.ndim == 2:
 15 |         m, n = X.shape
 16 |     elif X.ndim == 1:
 17 |         n = X.size
 18 |         m = 1
 19 |         X = X[None]  # Promote to a 2 dimensional array
 20 |     else:
 21 |         raise IndexError('Input X should be 1 or 2 dimensional.')
 22 | 
 23 |     example_width = example_width or int(np.round(np.sqrt(n)))
 24 |     example_height = n / example_width
 25 | 
 26 |     # Compute number of items to display
 27 |     display_rows = int(np.floor(np.sqrt(m)))
 28 |     display_cols = int(np.ceil(m / display_rows))
 29 | 
 30 |     fig, ax_array = pyplot.subplots(display_rows, display_cols, figsize=figsize)
 31 |     fig.subplots_adjust(wspace=0.025, hspace=0.025)
 32 | 
 33 |     ax_array = [ax_array] if m == 1 else ax_array.ravel()
 34 | 
 35 |     for i, ax in enumerate(ax_array):
 36 |         # Display Image
 37 |         h = ax.imshow(X[i].reshape(example_width, example_width, order='F'),
 38 |                       cmap='Greys', extent=[0, 1, 0, 1])
 39 |         ax.axis('off')
 40 | 
 41 | 
 42 | def predict(Theta1, Theta2, X):
 43 |     """
 44 |     Predict the label of an input given a trained neural network
 45 |     Outputs the predicted label of X given the trained weights of a neural
 46 |     network(Theta1, Theta2)
 47 |     """
 48 |     # Useful values
 49 |     m = X.shape[0]
 50 |     num_labels = Theta2.shape[0]
 51 | 
 52 |     # You need to return the following variables correctly
 53 |     p = np.zeros(m)
 54 |     h1 = sigmoid(np.dot(np.concatenate([np.ones((m, 1)), X], axis=1), Theta1.T))
 55 |     h2 = sigmoid(np.dot(np.concatenate([np.ones((m, 1)), h1], axis=1), Theta2.T))
 56 |     p = np.argmax(h2, axis=1)
 57 |     return p
 58 | 
 59 | 
 60 | def debugInitializeWeights(fan_out, fan_in):
 61 |     """
 62 |     Initialize the weights of a layer with fan_in incoming connections and fan_out outgoings
 63 |     connections using a fixed strategy. This will help you later in debugging.
 64 | 
 65 |     Note that W should be set a matrix of size (1+fan_in, fan_out) as the first row of W handles
 66 |     the "bias" terms.
 67 | 
 68 |     Parameters
 69 |     ----------
 70 |     fan_out : int
 71 |         The number of outgoing connections.
 72 | 
 73 |     fan_in : int
 74 |         The number of incoming connections.
 75 | 
 76 |     Returns
 77 |     -------
 78 |     W : array_like (1+fan_in, fan_out)
 79 |         The initialized weights array given the dimensions.
 80 |     """
 81 |     # Initialize W using "sin". This ensures that W is always of the same values and will be
 82 |     # useful for debugging
 83 |     W = np.sin(np.arange(1, 1 + (1+fan_in)*fan_out))/10.0
 84 |     W = W.reshape(fan_out, 1+fan_in, order='F')
 85 |     return W
 86 | 
 87 | 
 88 | def computeNumericalGradient(J, theta, e=1e-4):
 89 |     """
 90 |     Computes the gradient using "finite differences" and gives us a numerical estimate of the
 91 |     gradient.
 92 | 
 93 |     Parameters
 94 |     ----------
 95 |     J : func
 96 |         The cost function which will be used to estimate its numerical gradient.
 97 | 
 98 |     theta : array_like
 99 |         The one dimensional unrolled network parameters. The numerical gradient is computed at
100 |          those given parameters.
101 | 
102 |     e : float (optional)
103 |         The value to use for epsilon for computing the finite difference.
104 | 
105 |     Notes
106 |     -----
107 |     The following code implements numerical gradient checking, and
108 |     returns the numerical gradient. It sets `numgrad[i]` to (a numerical
109 |     approximation of) the partial derivative of J with respect to the
110 |     i-th input argument, evaluated at theta. (i.e., `numgrad[i]` should
111 |     be the (approximately) the partial derivative of J with respect
112 |     to theta[i].)
113 |     """
114 |     numgrad = np.zeros(theta.shape)
115 |     perturb = np.diag(e * np.ones(theta.shape))
116 |     for i in range(theta.size):
117 |         loss1, _ = J(theta - perturb[:, i])
118 |         loss2, _ = J(theta + perturb[:, i])
119 |         numgrad[i] = (loss2 - loss1)/(2*e)
120 |     return numgrad
121 | 
122 | 
123 | def checkNNGradients(nnCostFunction, lambda_=0):
124 |     """
125 |     Creates a small neural network to check the backpropagation gradients. It will output the
126 |     analytical gradients produced by your backprop code and the numerical gradients
127 |     (computed using computeNumericalGradient). These two gradient computations should result in
128 |     very similar values.
129 | 
130 |     Parameters
131 |     ----------
132 |     nnCostFunction : func
133 |         A reference to the cost function implemented by the student.
134 | 
135 |     lambda_ : float (optional)
136 |         The regularization parameter value.
137 |     """
138 |     input_layer_size = 3
139 |     hidden_layer_size = 5
140 |     num_labels = 3
141 |     m = 5
142 | 
143 |     # We generate some 'random' test data
144 |     Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size)
145 |     Theta2 = debugInitializeWeights(num_labels, hidden_layer_size)
146 | 
147 |     # Reusing debugInitializeWeights to generate X
148 |     X = debugInitializeWeights(m, input_layer_size - 1)
149 |     y = np.arange(1, 1+m) % num_labels
150 |     # print(y)
151 |     # Unroll parameters
152 |     nn_params = np.concatenate([Theta1.ravel(), Theta2.ravel()])
153 | 
154 |     # short hand for cost function
155 |     costFunc = lambda p: nnCostFunction(p, input_layer_size, hidden_layer_size,
156 |                                         num_labels, X, y, lambda_)
157 |     cost, grad = costFunc(nn_params)
158 |     numgrad = computeNumericalGradient(costFunc, nn_params)
159 | 
160 |     # Visually examine the two gradient computations.The two columns you get should be very similar.
161 |     print(np.stack([numgrad, grad], axis=1))
162 |     print('The above two columns you get should be very similar.')
163 |     print('(Left-Your Numerical Gradient, Right-Analytical Gradient)\n')
164 | 
165 |     # Evaluate the norm of the difference between two the solutions. If you have a correct
166 |     # implementation, and assuming you used e = 0.0001 in computeNumericalGradient, then diff
167 |     # should be less than 1e-9.
168 |     diff = np.linalg.norm(numgrad - grad)/np.linalg.norm(numgrad + grad)
169 | 
170 |     print('If your backpropagation implementation is correct, then \n'
171 |           'the relative difference will be small (less than 1e-9). \n'
172 |           'Relative Difference: %g' % diff)
173 | 
174 | 
175 | def sigmoid(z):
176 |     """
177 |     Computes the sigmoid of z.
178 |     """
179 |     return 1.0 / (1.0 + np.exp(-z))
180 | 
181 | 
182 | class Grader(SubmissionBase):
183 |     X = np.reshape(3 * np.sin(np.arange(1, 31)), (3, 10), order='F')
184 |     Xm = np.reshape(np.sin(np.arange(1, 33)), (16, 2), order='F') / 5
185 |     ym = np.arange(1, 17) % 4
186 |     t1 = np.sin(np.reshape(np.arange(1, 25, 2), (4, 3), order='F'))
187 |     t2 = np.cos(np.reshape(np.arange(1, 41, 2), (4, 5), order='F'))
188 |     t = np.concatenate([t1.ravel(), t2.ravel()], axis=0)
189 | 
190 |     def __init__(self):
191 |         part_names = ['Feedforward and Cost Function',
192 |                       'Regularized Cost Function',
193 |                       'Sigmoid Gradient',
194 |                       'Neural Network Gradient (Backpropagation)',
195 |                       'Regularized Gradient']
196 |         super().__init__('neural-network-learning', part_names)
197 | 
198 |     def __iter__(self):
199 |         for part_id in range(1, 6):
200 |             try:
201 |                 func = self.functions[part_id]
202 | 
203 |                 # Each part has different expected arguments/different function
204 |                 if part_id == 1:
205 |                     res = func(self.t, 2, 4, 4, self.Xm, self.ym, 0)[0]
206 |                 elif part_id == 2:
207 |                     res = func(self.t, 2, 4, 4, self.Xm, self.ym, 1.5)
208 |                 elif part_id == 3:
209 |                     res = func(self.X, )
210 |                 elif part_id == 4:
211 |                     J, grad = func(self.t, 2, 4, 4, self.Xm, self.ym, 0)
212 |                     grad1 = np.reshape(grad[:12], (4, 3))
213 |                     grad2 = np.reshape(grad[12:], (4, 5))
214 |                     grad = np.concatenate([grad1.ravel('F'), grad2.ravel('F')])
215 |                     res = np.hstack([J, grad]).tolist()
216 |                 elif part_id == 5:
217 |                     J, grad = func(self.t, 2, 4, 4, self.Xm, self.ym, 1.5)
218 |                     grad1 = np.reshape(grad[:12], (4, 3))
219 |                     grad2 = np.reshape(grad[12:], (4, 5))
220 |                     grad = np.concatenate([grad1.ravel('F'), grad2.ravel('F')])
221 |                     res = np.hstack([J, grad]).tolist()
222 |                 else:
223 |                     raise KeyError
224 |                 yield part_id, res
225 |             except KeyError:
226 |                 yield part_id, 0
227 | 


--------------------------------------------------------------------------------
/ML(Andrew)/Rread:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/ML(Andrew)/Rreadme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Machine Leanring.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/Machine Leanring.png


--------------------------------------------------------------------------------
/Machine Learning/New Text Document.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/Machine Learning/New Text Document.txt


--------------------------------------------------------------------------------
/Machine Learning/📚Chapter 1 - Introduction/New Text Document.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/Machine Learning/📚Chapter 1 - Introduction/New Text Document.txt


--------------------------------------------------------------------------------
/Machine Learning/📚Chapter 2 -Linear Regression with one Variable/New Text Document.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/Machine Learning/📚Chapter 2 -Linear Regression with one Variable/New Text Document.txt


--------------------------------------------------------------------------------
/Model Evaluation/Bias_and_Variance_using_Python.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Bias and Variance using Python.ipynb",
  7 |       "provenance": [],
  8 |       "toc_visible": true
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python3",
 12 |       "display_name": "Python 3"
 13 |     },
 14 |     "language_info": {
 15 |       "name": "python"
 16 |     }
 17 |   },
 18 |   "cells": [
 19 |     {
 20 |       "cell_type": "markdown",
 21 |       "metadata": {
 22 |         "id": "gL0AffZGOZxZ"
 23 |       },
 24 |       "source": [
 25 |         "# **Bias and Variance in Machine Learning**\n"
 26 |       ]
 27 |     },
 28 |     {
 29 |       "cell_type": "markdown",
 30 |       "metadata": {
 31 |         "id": "O5iX_J1BY8UO"
 32 |       },
 33 |       "source": [
 34 |         "We use the terms bias and variance or bias-variance trade-off to describe the performance of a machine learning model. In this article, I will introduce you to the concept of bias and variance in machine learning\n",
 35 |         "\n",
 36 |         "When training a machine learning model, it is very important to understand the bias and variance of predictions of your model. It helps in analyzing prediction errors which help us in training more accurate machine learning models. In this article, I’ll walk you through how to calculate bias and variance using Python."
 37 |       ]
 38 |     },
 39 |     {
 40 |       "cell_type": "markdown",
 41 |       "metadata": {
 42 |         "id": "2vGnZiIXuWdv"
 43 |       },
 44 |       "source": [
 45 |         "In machine learning, you must have heard that the model has a **high variance or high bias**."
 46 |       ]
 47 |     },
 48 |     {
 49 |       "cell_type": "markdown",
 50 |       "metadata": {
 51 |         "id": "HCJMFdRIyEE_"
 52 |       },
 53 |       "source": [
 54 |         "**High bias**"
 55 |       ]
 56 |     },
 57 |     {
 58 |       "cell_type": "markdown",
 59 |       "metadata": {
 60 |         "id": "Uf9-H-oMuum6"
 61 |       },
 62 |       "source": [
 63 |         "To understand what bias and variance are, suppose we have a point estimator of a parameter or function. Then, **the bias is usually defined as the difference between the expected value of the estimator and the parameter we want to estimate.**"
 64 |       ]
 65 |     },
 66 |     {
 67 |       "cell_type": "markdown",
 68 |       "metadata": {
 69 |         "id": "yNYPasTnyj3W"
 70 |       },
 71 |       "source": [
 72 |         "high bias is proportional to the **underfitting**."
 73 |       ]
 74 |     },
 75 |     {
 76 |       "cell_type": "markdown",
 77 |       "metadata": {
 78 |         "id": "2Q-c8Ae5yNEh"
 79 |       },
 80 |       "source": [
 81 |         "Bias is the difference between predicted values and expected results. A machine learning model with a low bias is a perfect model and a model with a high bias is expected with a high error rate on the training and test sets."
 82 |       ]
 83 |     },
 84 |     {
 85 |       "cell_type": "markdown",
 86 |       "metadata": {
 87 |         "id": "OG4K7aCUvDcK"
 88 |       },
 89 |       "source": [
 90 |         "If the bias is greater than zero, we also say that the estimator is positively biased, if the bias is less than zero, the estimator is negatively biased, and if the bias is exactly zero, the estimator is unbiased. "
 91 |       ]
 92 |     },
 93 |     {
 94 |       "cell_type": "markdown",
 95 |       "metadata": {
 96 |         "id": "YRBAgEMEzC4W"
 97 |       },
 98 |       "source": [
 99 |         "**Variance**"
100 |       ]
101 |     },
102 |     {
103 |       "cell_type": "markdown",
104 |       "metadata": {
105 |         "id": "UvtrnQBozdsd"
106 |       },
107 |       "source": [
108 |         "Variance as the difference between the expected value of the estimator squared minus the expectation squared of the estimator. A machine learning model with high variance indicates that the model may work well on the data it was trained on, but it will not generalize well on the dataset it has never seen before."
109 |       ]
110 |     },
111 |     {
112 |       "cell_type": "markdown",
113 |       "metadata": {
114 |         "id": "IHDYldyNzI4l"
115 |       },
116 |       "source": [
117 |         " In general, one could say that a high variance is proportional to the **overfitting**"
118 |       ]
119 |     },
120 |     {
121 |       "cell_type": "markdown",
122 |       "metadata": {
123 |         "id": "8w8p0-KTQWwo"
124 |       },
125 |       "source": [
126 |         "**Bias and Variance using Python**\n"
127 |       ]
128 |     },
129 |     {
130 |       "cell_type": "markdown",
131 |       "metadata": {
132 |         "id": "brQLUbEU0Hpe"
133 |       },
134 |       "source": [
135 |         "You must be using the scikit-learn library in Python for implementing most of the machine learning algorithms. But it does not have any function to calculate the bias and variance of your trained model. So to calculate the bias and variance of your model using Python, you have to install another library known as mlxtend. You can easily install it in your system by using the pip command:"
136 |       ]
137 |     },
138 |     {
139 |       "cell_type": "code",
140 |       "metadata": {
141 |         "id": "ixuoUgX4aQl8"
142 |       },
143 |       "source": [
144 |         "!pip install mlxtend"
145 |       ],
146 |       "execution_count": null,
147 |       "outputs": []
148 |     },
149 |     {
150 |       "cell_type": "code",
151 |       "metadata": {
152 |         "colab": {
153 |           "base_uri": "https://localhost:8080/"
154 |         },
155 |         "id": "YjfUljli1FiK",
156 |         "outputId": "02b8d63f-23ac-4547-eb93-760afa9b8b29"
157 |       },
158 |       "source": [
159 |         "!pip install mlxtend --upgrade"
160 |       ],
161 |       "execution_count": 1,
162 |       "outputs": [
163 |         {
164 |           "output_type": "stream",
165 |           "text": [
166 |             "Requirement already satisfied: mlxtend in /usr/local/lib/python3.7/dist-packages (0.18.0)\n",
167 |             "Requirement already satisfied: matplotlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (3.2.2)\n",
168 |             "Requirement already satisfied: joblib>=0.13.2 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.0.1)\n",
169 |             "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from mlxtend) (57.2.0)\n",
170 |             "Requirement already satisfied: scikit-learn>=0.20.3 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (0.22.2.post1)\n",
171 |             "Requirement already satisfied: scipy>=1.2.1 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.4.1)\n",
172 |             "Requirement already satisfied: pandas>=0.24.2 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.1.5)\n",
173 |             "Requirement already satisfied: numpy>=1.16.2 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.19.5)\n",
174 |             "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->mlxtend) (2.4.7)\n",
175 |             "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->mlxtend) (2.8.1)\n",
176 |             "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->mlxtend) (1.3.1)\n",
177 |             "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->mlxtend) (0.10.0)\n",
178 |             "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from cycler>=0.10->matplotlib>=3.0.0->mlxtend) (1.15.0)\n",
179 |             "Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.2->mlxtend) (2018.9)\n"
180 |           ],
181 |           "name": "stdout"
182 |         }
183 |       ]
184 |     },
185 |     {
186 |       "cell_type": "markdown",
187 |       "metadata": {
188 |         "id": "kNEwf4BrQqhB"
189 |       },
190 |       "source": [
191 |         "Now let’s train a machine learning model and then we will see how we can calculate its bias and variance using Python:\n",
192 |         "\n",
193 |         "\n",
194 |         "\n"
195 |       ]
196 |     },
197 |     {
198 |       "cell_type": "code",
199 |       "metadata": {
200 |         "id": "K7Vcp8aQ0o3z"
201 |       },
202 |       "source": [
203 |         "from mlxtend.evaluate import bias_variance_decomp\n",
204 |         "import numpy as np\n",
205 |         "import pandas as pd\n",
206 |         "from sklearn.linear_model import LinearRegression\n",
207 |         "from sklearn.utils import shuffle\n",
208 |         "from sklearn.metrics import mean_squared_error\n",
209 |         "\n",
210 |         "data = pd.read_csv(\"https://raw.githubusercontent.com/amankharwal/Website-data/master/student-mat.csv\")\n",
211 |         "data = data[[\"G1\", \"G2\", \"G3\", \"studytime\", \"failures\", \"absences\"]]\n",
212 |         "\n",
213 |         "predict = \"G3\"\n",
214 |         "x = np.array(data.drop([predict], 1))\n",
215 |         "y = np.array(data[predict])\n",
216 |         "\n",
217 |         "from sklearn.model_selection import train_test_split\n",
218 |         "xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2)\n",
219 |         "\n",
220 |         "linear_regression = LinearRegression()\n",
221 |         "linear_regression.fit(xtrain, ytrain)\n",
222 |         "y_pred = linear_regression.predict(xtest)"
223 |       ],
224 |       "execution_count": 2,
225 |       "outputs": []
226 |     },
227 |     {
228 |       "cell_type": "markdown",
229 |       "metadata": {
230 |         "id": "ZOpJ6on01UyC"
231 |       },
232 |       "source": [
233 |         "So till now, we have trained a machine learning model by using the linear regression algorithm, below is how we can calculate its bias and variance using Python:"
234 |       ]
235 |     },
236 |     {
237 |       "cell_type": "code",
238 |       "metadata": {
239 |         "colab": {
240 |           "base_uri": "https://localhost:8080/"
241 |         },
242 |         "id": "gjnWuL-k1Tpm",
243 |         "outputId": "a6c993d9-e5f6-44d9-fd2e-c28387585c35"
244 |       },
245 |       "source": [
246 |         "mse, bias, variance = bias_variance_decomp(linear_regression, xtrain, ytrain, xtest, ytest, \n",
247 |         "                                           loss='mse', num_rounds=200, random_seed=123)\n",
248 |         "print(\"Average Bias : \", bias)\n",
249 |         "print(\"Average Variance : \", variance)"
250 |       ],
251 |       "execution_count": 3,
252 |       "outputs": [
253 |         {
254 |           "output_type": "stream",
255 |           "text": [
256 |             "Average Bias :  4.910302451198915\n",
257 |             "Average Variance :  0.05685635558630853\n"
258 |           ],
259 |           "name": "stdout"
260 |         }
261 |       ]
262 |     },
263 |     {
264 |       "cell_type": "markdown",
265 |       "metadata": {
266 |         "id": "nmU-Bn7q1gm-"
267 |       },
268 |       "source": [
269 |         "Bias is the difference between predicted values and expected results. Variance is the variability of your model’s predictions over different sets of data. I hope you liked this article on how to calculate the bias and variance of a machine learning model. Feel free to ask your valuable questions in the comments section below."
270 |       ]
271 |     },
272 |     {
273 |       "cell_type": "markdown",
274 |       "metadata": {
275 |         "id": "MIXjbz4hOO6i"
276 |       },
277 |       "source": [
278 |         "[Bias and Variance using Python](https://thecleverprogrammer.com/2021/05/20/bias-and-variance-using-python/)"
279 |       ]
280 |     },
281 |     {
282 |       "cell_type": "markdown",
283 |       "metadata": {
284 |         "id": "kP2erWOV1vHB"
285 |       },
286 |       "source": [
287 |         "[Bias and Variance in Machine Learning](https://thecleverprogrammer.com/2020/12/28/bias-and-variance-in-machine-learning/)"
288 |       ]
289 |     },
290 |     {
291 |       "cell_type": "markdown",
292 |       "metadata": {
293 |         "id": "YPEeaJW9170J"
294 |       },
295 |       "source": [
296 |         "[Overfitting and Underfitting in Machine Learning\n",
297 |         "](https://thecleverprogrammer.com/2020/09/04/overfitting-and-underfitting-in-machine-learning/)"
298 |       ]
299 |     }
300 |   ]
301 | }


--------------------------------------------------------------------------------
/Model Evaluation/What_is_Cross_Validation_in_Machine_Learning_.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "What is Cross-Validation in Machine Learning?.ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": [],
  9 |       "toc_visible": true
 10 |     },
 11 |     "kernelspec": {
 12 |       "name": "python3",
 13 |       "display_name": "Python 3"
 14 |     },
 15 |     "language_info": {
 16 |       "name": "python"
 17 |     }
 18 |   },
 19 |   "cells": [
 20 |     {
 21 |       "cell_type": "markdown",
 22 |       "metadata": {
 23 |         "id": "bIsI9dS-FNPO"
 24 |       },
 25 |       "source": [
 26 |         "# **Introduction**"
 27 |       ]
 28 |     },
 29 |     {
 30 |       "cell_type": "markdown",
 31 |       "metadata": {
 32 |         "id": "ZdupLwSYFUHh"
 33 |       },
 34 |       "source": [
 35 |         "In Machine Learning, Cross-validation is a statistical method of evaluating generalization performance that is more stable and thorough than using a division of dataset into a training and test set. In this article, I’ll walk you through what cross-validation is and how to use it for machine learning using the Python programming language."
 36 |       ]
 37 |     },
 38 |     {
 39 |       "cell_type": "markdown",
 40 |       "metadata": {
 41 |         "id": "aEQSzsbvGKch"
 42 |       },
 43 |       "source": [
 44 |         "![](https://drive.google.com/uc?export=view&id=1ZKQVeYKrTuLUxD6QYZ1nsoxP7cqbhmAq)"
 45 |       ]
 46 |     },
 47 |     {
 48 |       "cell_type": "markdown",
 49 |       "metadata": {
 50 |         "id": "OhNzcC8XGWxO"
 51 |       },
 52 |       "source": [
 53 |         "In cross-validation, the data is instead split multiple times and multiple models are trained. The most commonly used version of cross-validation is k-times cross-validation, where k is a user-specified number, usually 5 or 10."
 54 |       ]
 55 |     },
 56 |     {
 57 |       "cell_type": "markdown",
 58 |       "metadata": {
 59 |         "id": "aZE7qjSMGoGv"
 60 |       },
 61 |       "source": [
 62 |         "In five-way cross-validation, the data is first partitioned into five parts of (approximately) equal size, called folds. Then, a sequence of models is formed. The first model is trained using the first fold as a test set, and the remaining folds (2–5) are used as a training set."
 63 |       ]
 64 |     },
 65 |     {
 66 |       "cell_type": "markdown",
 67 |       "metadata": {
 68 |         "id": "95HA3eL9HBzf"
 69 |       },
 70 |       "source": [
 71 |         "The model is built using data from folds 2 to 5, then the precision is evaluated on fold 1. Then another model is built, this time using fold 2 as the test set and the data from folds 1, 3, 4 and 5 as a training set.\n",
 72 |         "\n",
 73 |         "This process is repeated using folds 3, 4 and 5 as test sets. For each of these five divisions of the data into training and testing sets, we calculate the precision. In the end, we collected five precision values."
 74 |       ]
 75 |     },
 76 |     {
 77 |       "cell_type": "markdown",
 78 |       "metadata": {
 79 |         "id": "22KV7aATHIU5"
 80 |       },
 81 |       "source": [
 82 |         "# **Implementation Of Cross-Validation with Python**\n"
 83 |       ]
 84 |     },
 85 |     {
 86 |       "cell_type": "markdown",
 87 |       "metadata": {
 88 |         "id": "8OYkq1D_HMbf"
 89 |       },
 90 |       "source": [
 91 |         "We can easily implement the process of Cross-validation with Python programming language by using the Scikit-learn library in Python."
 92 |       ]
 93 |     },
 94 |     {
 95 |       "cell_type": "markdown",
 96 |       "metadata": {
 97 |         "id": "TfTjWdy9HWK_"
 98 |       },
 99 |       "source": [
100 |         "Cross-validation is implemented in scikit-learn using the cross_val_score function of the model_selection module. The parameters of the cross_val_score function are the model we want to evaluate, the training data, and the ground truth labels. Let’s evaluate LogisticRegression on the iris dataset:"
101 |       ]
102 |     },
103 |     {
104 |       "cell_type": "code",
105 |       "metadata": {
106 |         "id": "6JPdJfM3EJ-b"
107 |       },
108 |       "source": [
109 |         "from sklearn.model_selection import cross_val_score\n",
110 |         "from sklearn.datasets import load_iris\n",
111 |         "from sklearn.linear_model import LogisticRegression\n",
112 |         "\n",
113 |         "iris = load_iris()\n",
114 |         "logreg = LogisticRegression()\n",
115 |         "\n",
116 |         "scores = cross_val_score(logreg, iris.data, iris.target)\n",
117 |         "print(\"Cross-validation scores: {}\".format(scores))"
118 |       ],
119 |       "execution_count": null,
120 |       "outputs": []
121 |     },
122 |     {
123 |       "cell_type": "markdown",
124 |       "metadata": {
125 |         "id": "316KH8M9HpDn"
126 |       },
127 |       "source": [
128 |         "By default, cross_val_score performs triple cross-validation, returning three precision values. We can modify the number of folds used by modifying the cv parameter:"
129 |       ]
130 |     },
131 |     {
132 |       "cell_type": "code",
133 |       "metadata": {
134 |         "id": "w5hj18T1Htho"
135 |       },
136 |       "source": [
137 |         "scores = cross_val_score(logreg, iris.data, iris.target, cv=5)\n",
138 |         "print(\"Cross-validation scores: {}\".format(scores))"
139 |       ],
140 |       "execution_count": null,
141 |       "outputs": []
142 |     },
143 |     {
144 |       "cell_type": "markdown",
145 |       "metadata": {
146 |         "id": "VsCWOjgEH0AQ"
147 |       },
148 |       "source": [
149 |         "A common way to summarize the precision of cross-validation is to calculate the mean:\n",
150 |         "\n"
151 |       ]
152 |     },
153 |     {
154 |       "cell_type": "code",
155 |       "metadata": {
156 |         "colab": {
157 |           "base_uri": "https://localhost:8080/"
158 |         },
159 |         "id": "TPFUGuP9H3_e",
160 |         "outputId": "3334c1b9-6b81-4145-f6a9-7fd46ffb4e59"
161 |       },
162 |       "source": [
163 |         "print(\"Average cross-validation score: {:.2f}\".format(scores.mean()))\n"
164 |       ],
165 |       "execution_count": 3,
166 |       "outputs": [
167 |         {
168 |           "output_type": "stream",
169 |           "text": [
170 |             "Average cross-validation score: 0.97\n"
171 |           ],
172 |           "name": "stdout"
173 |         }
174 |       ]
175 |     },
176 |     {
177 |       "cell_type": "markdown",
178 |       "metadata": {
179 |         "id": "XD-d44CUH_8n"
180 |       },
181 |       "source": [
182 |         "# **Benefits & Drawbacks of Using Cross-Validation**\n"
183 |       ]
184 |     },
185 |     {
186 |       "cell_type": "markdown",
187 |       "metadata": {
188 |         "id": "we1jyh0CIElI"
189 |       },
190 |       "source": [
191 |         "There are several advantages to using cross-validation instead of a single division into one training and one set of tests. First of all, remember that train_test_split performs a random division of data."
192 |       ]
193 |     },
194 |     {
195 |       "cell_type": "markdown",
196 |       "metadata": {
197 |         "id": "tUMXrkbEIRen"
198 |       },
199 |       "source": [
200 |         "Imagine that we are “lucky” at randomly splitting the data, and all the hard-to-categorize examples end up in the training set. In this case, the test set will only contain “simple” examples, and the accuracy of our test set will be unrealistic.\n",
201 |         "\n",
202 |         "Conversely, if we are “unlucky” we may have randomly placed all of the hard-to-rank examples in the test set and therefore have an unrealistic score."
203 |       ]
204 |     },
205 |     {
206 |       "cell_type": "markdown",
207 |       "metadata": {
208 |         "id": "9H9IvQmtIqWW"
209 |       },
210 |       "source": [
211 |         "However, when using cross-validation, each example will be in the test set exactly once: each example is in one of the folds, and each fold is the test set once. Therefore, the model must generalize well to all samples in the dataset for all cross-validation scores (and their mean) to be high.\n",
212 |         "\n",
213 |         "Having multiple splits of the data also provides information about the sensitivity of our model to the selection of the training data set. For the iris dataset, we saw accuracies between 90% and 100%. That’s quite a range, and it gives us an idea of ​​how the model might work in the worst-case scenario and the best-case scenario when applied to new data.\n",
214 |         "\n",
215 |         "Another advantage of cross-validation over using a single data division is that we use our data more efficiently. When using train_test_split, we typically use 75% of the data for training and 25% of the data for evaluation.\n",
216 |         "\n",
217 |         "When using five-fold cross-validation, on each iteration we can use four-fifths of the data (80%) to fit the model. When using 10 cross-validations, we can use the nine-tenths of the data (90%) to fit the model. More data will generally result in more accurate models.\n",
218 |         "\n",
219 |         "The main disadvantage is the increase in computational costs. Since we are currently training k models instead of a single model, the cross-validation will be about k times slower than doing a single division of the data."
220 |       ]
221 |     },
222 |     {
223 |       "cell_type": "markdown",
224 |       "metadata": {
225 |         "id": "Fby7CCu5E0-C"
226 |       },
227 |       "source": [
228 |         "# **References**\n",
229 |         "[What is Cross-Validation in Machine Learning?](https://thecleverprogrammer.com/2020/10/25/what-is-cross-validation-in-machine-learning/)"
230 |       ]
231 |     }
232 |   ]
233 | }


--------------------------------------------------------------------------------
/Model Evaluation/readme:
--------------------------------------------------------------------------------
1 | R
2 | 


--------------------------------------------------------------------------------
/Preprocessing/Create_new_Features_(Faker)_.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Create new Features (Faker) .ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": []
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python3",
 12 |       "display_name": "Python 3"
 13 |     },
 14 |     "language_info": {
 15 |       "name": "python"
 16 |     }
 17 |   },
 18 |   "cells": [
 19 |     {
 20 |       "cell_type": "code",
 21 |       "metadata": {
 22 |         "id": "VOLioVdPYeAI"
 23 |       },
 24 |       "source": [
 25 |         ""
 26 |       ],
 27 |       "execution_count": null,
 28 |       "outputs": []
 29 |     },
 30 |     {
 31 |       "cell_type": "markdown",
 32 |       "metadata": {
 33 |         "id": "N7LxyqR9Yh8d"
 34 |       },
 35 |       "source": [
 36 |         "# **Introduction**"
 37 |       ]
 38 |     },
 39 |     {
 40 |       "cell_type": "markdown",
 41 |       "metadata": {
 42 |         "id": "fu6PCH6GYmMi"
 43 |       },
 44 |       "source": [
 45 |         "let say you want to create data with certain data type (Bool, txt ...) with special characteristics ( name , address etc) to test some python library or specific implementation. But it take time to find that specific kind of data. you wander , is ther any quick way that you can create your own data? What if there is a package that enables you to create fake data in one line of code.  Checkout Fake: a python  package that does exactly that "
 46 |       ]
 47 |     },
 48 |     {
 49 |       "cell_type": "code",
 50 |       "metadata": {
 51 |         "colab": {
 52 |           "base_uri": "https://localhost:8080/"
 53 |         },
 54 |         "id": "gsOZ4jgDjUzd",
 55 |         "outputId": "708b0c31-3eb9-4d84-c87a-4e1d2f4bcf61"
 56 |       },
 57 |       "source": [
 58 |         "!pip install faker "
 59 |       ],
 60 |       "execution_count": 1,
 61 |       "outputs": [
 62 |         {
 63 |           "output_type": "stream",
 64 |           "text": [
 65 |             "Collecting faker\n",
 66 |             "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/95/c4/6abf74493bf4eb4cea24cab5932106921ce6d014a626966031da4ee7ad25/Faker-8.2.1-py3-none-any.whl (1.2MB)\n",
 67 |             "\u001b[K     |████████████████████████████████| 1.2MB 5.3MB/s \n",
 68 |             "\u001b[?25hRequirement already satisfied: text-unidecode==1.3 in /usr/local/lib/python3.7/dist-packages (from faker) (1.3)\n",
 69 |             "Requirement already satisfied: python-dateutil>=2.4 in /usr/local/lib/python3.7/dist-packages (from faker) (2.8.1)\n",
 70 |             "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.4->faker) (1.15.0)\n",
 71 |             "Installing collected packages: faker\n",
 72 |             "Successfully installed faker-8.2.1\n"
 73 |           ],
 74 |           "name": "stdout"
 75 |         }
 76 |       ]
 77 |     },
 78 |     {
 79 |       "cell_type": "code",
 80 |       "metadata": {
 81 |         "id": "CZR5qZrajbms"
 82 |       },
 83 |       "source": [
 84 |         "# Installing the package \n",
 85 |         "from faker import Faker\n",
 86 |         "fake=Faker()\n",
 87 |         "name=fake.name()\n",
 88 |         "color= fake.color_name()\n",
 89 |         "city= fake.city()\n",
 90 |         "job = fake.job()"
 91 |       ],
 92 |       "execution_count": 4,
 93 |       "outputs": []
 94 |     },
 95 |     {
 96 |       "cell_type": "code",
 97 |       "metadata": {
 98 |         "colab": {
 99 |           "base_uri": "https://localhost:8080/"
100 |         },
101 |         "id": "QHdtMY3ykUQb",
102 |         "outputId": "87d4a505-f9aa-4412-d301-456ea376c9ac"
103 |       },
104 |       "source": [
105 |         "print('Her name is {}. She lives in {}. her favorite color is {}. she work as {}'.format(name,city,color,job))"
106 |       ],
107 |       "execution_count": 7,
108 |       "outputs": [
109 |         {
110 |           "output_type": "stream",
111 |           "text": [
112 |             "Her name is Jonathan Hernandez. She lives in West Marialand. her favorite color is Ivory. she work as TEFL teacher\n"
113 |           ],
114 |           "name": "stdout"
115 |         }
116 |       ]
117 |     },
118 |     {
119 |       "cell_type": "code",
120 |       "metadata": {
121 |         "id": "ZiTE9Z_PkFa3"
122 |       },
123 |       "source": [
124 |         ""
125 |       ],
126 |       "execution_count": null,
127 |       "outputs": []
128 |     }
129 |   ]
130 | }


--------------------------------------------------------------------------------
/Preprocessing/Creating_artificial_datasets.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |   "nbformat": 4,
 3 |   "nbformat_minor": 0,
 4 |   "metadata": {
 5 |     "colab": {
 6 |       "name": "Creating artificial datasets.ipynb",
 7 |       "provenance": [],
 8 |       "collapsed_sections": []
 9 |     },
10 |     "kernelspec": {
11 |       "name": "python3",
12 |       "display_name": "Python 3"
13 |     },
14 |     "language_info": {
15 |       "name": "python"
16 |     },
17 |     "accelerator": "GPU"
18 |   },
19 |   "cells": [
20 |     {
21 |       "cell_type": "markdown",
22 |       "metadata": {
23 |         "id": "BiE4HQuXA1jb"
24 |       },
25 |       "source": [
26 |         "# **Import libaray**"
27 |       ]
28 |     },
29 |     {
30 |       "cell_type": "markdown",
31 |       "metadata": {
32 |         "id": "69-GFSEUYpQA"
33 |       },
34 |       "source": [
35 |         "**Creating artificial datasets**\n",
36 |         "\n",
37 |         "For instance, you can create artificial datasets using scikit-learn (as shown below) that can be used to try out different machine learning workflow that you may have devised.\n"
38 |       ]
39 |     },
40 |     {
41 |       "cell_type": "code",
42 |       "metadata": {
43 |         "id": "j9t6Ik9SAUPX"
44 |       },
45 |       "source": [
46 |         "from sklearn.datasets import make_classification"
47 |       ],
48 |       "execution_count": 5,
49 |       "outputs": []
50 |     },
51 |     {
52 |       "cell_type": "code",
53 |       "metadata": {
54 |         "id": "f52X64tDAYXY"
55 |       },
56 |       "source": [
57 |         "X, Y = make_classification(n_samples=200, n_classes=2, n_features=10, n_redundant=0, random_state=1)\n"
58 |       ],
59 |       "execution_count": 9,
60 |       "outputs": []
61 |     },
62 |     {
63 |       "cell_type": "markdown",
64 |       "metadata": {
65 |         "id": "5CKqdX86iIBc"
66 |       },
67 |       "source": [
68 |         "# **References**"
69 |       ]
70 |     },
71 |     {
72 |       "cell_type": "markdown",
73 |       "metadata": {
74 |         "id": "OZF79FvqiNhO"
75 |       },
76 |       "source": [
77 |         "[How to Master Scikit-learn for Data Science](https://towardsdatascience.com/how-to-master-scikit-learn-for-data-science-c29214ec25b0)"
78 |       ]
79 |     }
80 |   ]
81 | }


--------------------------------------------------------------------------------
/Preprocessing/Data_representation_in_scikit_learn.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Data representation in scikit-learn.ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": []
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python3",
 12 |       "display_name": "Python 3"
 13 |     },
 14 |     "language_info": {
 15 |       "name": "python"
 16 |     },
 17 |     "accelerator": "GPU"
 18 |   },
 19 |   "cells": [
 20 |     {
 21 |       "cell_type": "markdown",
 22 |       "metadata": {
 23 |         "id": "BiE4HQuXA1jb"
 24 |       },
 25 |       "source": [
 26 |         "# **Import libaray**"
 27 |       ]
 28 |     },
 29 |     {
 30 |       "cell_type": "code",
 31 |       "metadata": {
 32 |         "id": "CYcSX5ofA8Yz"
 33 |       },
 34 |       "source": [
 35 |         "import numpy as np \n",
 36 |         "import pandas as pd \n",
 37 |         "import seaborn as sns \n",
 38 |         "import matplotlib.pyplot as plt\n",
 39 |         "from prettytable import PrettyTable\n",
 40 |         "from sklearn.metrics import roc_curve, auc\n",
 41 |         "from mlxtend.plotting import plot_confusion_matrix \n",
 42 |         "from sklearn.model_selection import train_test_split\n",
 43 |         "from sklearn.metrics import classification_report, confusion_matrix\n",
 44 |         "from sklearn.tree import DecisionTreeClassifier\n",
 45 |         "from sklearn.ensemble import RandomForestClassifier\n",
 46 |         "from sklearn.svm import LinearSVC\n",
 47 |         "from sklearn.linear_model import LogisticRegression\n",
 48 |         "from sklearn.neighbors import KNeighborsClassifier\n",
 49 |         "import warnings\n",
 50 |         "warnings.filterwarnings(\"ignore\")"
 51 |       ],
 52 |       "execution_count": 1,
 53 |       "outputs": []
 54 |     },
 55 |     {
 56 |       "cell_type": "markdown",
 57 |       "metadata": {
 58 |         "id": "-dEH-aOwYdNx"
 59 |       },
 60 |       "source": [
 61 |         "# **Introduction**\n",
 62 |         "\n",
 63 |         "Let’s start with the basics and consider the data representation used in scikit-learn, which is essentially a tabular dataset.At a high-level, for a supervised learning problem the tabular dataset will be comprised of both X and y variables while an unsupervised learning problem will constitute of only X variables.\n",
 64 |         "\n",
 65 |         "At a high-level, X variables are also known as independent variables and they can be either quantitative or qualitative descriptions of samples of interests while the y variable is also known as the dependent variable and they are essentially the target or response variable that predictive models are built to predict."
 66 |       ]
 67 |     },
 68 |     {
 69 |       "cell_type": "markdown",
 70 |       "metadata": {
 71 |         "id": "9Rl_C9QLeAc8"
 72 |       },
 73 |       "source": [
 74 |         "![](https://drive.google.com/uc?export=view&id=1OH5lF4gI10mL3T4dG2YYhhjp2Ich7p_U)"
 75 |       ]
 76 |     },
 77 |     {
 78 |       "cell_type": "markdown",
 79 |       "metadata": {
 80 |         "id": "dSRa4CGgeNHG"
 81 |       },
 82 |       "source": [
 83 |         "For example, if we’re building a predictive model to predict whether individuals have a disease or not the disease/non-disease status is the y variable whereas health indicators obtained by clinical test results are used as X variables.\n"
 84 |       ]
 85 |     },
 86 |     {
 87 |       "cell_type": "markdown",
 88 |       "metadata": {
 89 |         "id": "69-GFSEUYpQA"
 90 |       },
 91 |       "source": [
 92 |         "# **Loading data from CSV files via Pandas**\n",
 93 |         "\n",
 94 |         "Practically, the contents of a dataset can be stored in a CSV file and it can be read in using the Pandas library via the pd.read_csv() function. Thus, the data structure of the loaded data is known as the Pandas DataFrame."
 95 |       ]
 96 |     },
 97 |     {
 98 |       "cell_type": "code",
 99 |       "metadata": {
100 |         "colab": {
101 |           "base_uri": "https://localhost:8080/"
102 |         },
103 |         "id": "fYNNsGgxmxyQ",
104 |         "outputId": "db152256-b938-4017-cba9-ab9d37289e47"
105 |       },
106 |       "source": [
107 |         "from google.colab import drive\n",
108 |         "drive.mount('/content/drive')"
109 |       ],
110 |       "execution_count": 2,
111 |       "outputs": [
112 |         {
113 |           "output_type": "stream",
114 |           "text": [
115 |             "Mounted at /content/drive\n"
116 |           ],
117 |           "name": "stdout"
118 |         }
119 |       ]
120 |     },
121 |     {
122 |       "cell_type": "code",
123 |       "metadata": {
124 |         "id": "bkoAdyxInLR4"
125 |       },
126 |       "source": [
127 |         "import pandas as pd\n",
128 |         "import numpy as np\n",
129 |         "data = pd.read_csv(\"//content/drive/MyDrive/Datasets/Student field Recommendation /Placement_Data_Full_Class.csv\")"
130 |       ],
131 |       "execution_count": 7,
132 |       "outputs": []
133 |     },
134 |     {
135 |       "cell_type": "code",
136 |       "metadata": {
137 |         "id": "OE_f4-bToHWA",
138 |         "colab": {
139 |           "base_uri": "https://localhost:8080/",
140 |           "height": 241
141 |         },
142 |         "outputId": "47538ec6-98c7-43f0-d835-f443e3e86808"
143 |       },
144 |       "source": [
145 |         "data.head()"
146 |       ],
147 |       "execution_count": 8,
148 |       "outputs": [
149 |         {
150 |           "output_type": "execute_result",
151 |           "data": {
152 |             "text/html": [
153 |               "<div>\n",
154 |               "<style scoped>\n",
155 |               "    .dataframe tbody tr th:only-of-type {\n",
156 |               "        vertical-align: middle;\n",
157 |               "    }\n",
158 |               "\n",
159 |               "    .dataframe tbody tr th {\n",
160 |               "        vertical-align: top;\n",
161 |               "    }\n",
162 |               "\n",
163 |               "    .dataframe thead th {\n",
164 |               "        text-align: right;\n",
165 |               "    }\n",
166 |               "</style>\n",
167 |               "<table border=\"1\" class=\"dataframe\">\n",
168 |               "  <thead>\n",
169 |               "    <tr style=\"text-align: right;\">\n",
170 |               "      <th></th>\n",
171 |               "      <th>sl_no</th>\n",
172 |               "      <th>gender</th>\n",
173 |               "      <th>ssc_p</th>\n",
174 |               "      <th>ssc_b</th>\n",
175 |               "      <th>hsc_p</th>\n",
176 |               "      <th>hsc_b</th>\n",
177 |               "      <th>hsc_s</th>\n",
178 |               "      <th>degree_p</th>\n",
179 |               "      <th>degree_t</th>\n",
180 |               "      <th>workex</th>\n",
181 |               "      <th>etest_p</th>\n",
182 |               "      <th>specialisation</th>\n",
183 |               "      <th>mba_p</th>\n",
184 |               "      <th>status</th>\n",
185 |               "      <th>salary</th>\n",
186 |               "    </tr>\n",
187 |               "  </thead>\n",
188 |               "  <tbody>\n",
189 |               "    <tr>\n",
190 |               "      <th>0</th>\n",
191 |               "      <td>1</td>\n",
192 |               "      <td>M</td>\n",
193 |               "      <td>67.00</td>\n",
194 |               "      <td>Others</td>\n",
195 |               "      <td>91.00</td>\n",
196 |               "      <td>Others</td>\n",
197 |               "      <td>Commerce</td>\n",
198 |               "      <td>58.00</td>\n",
199 |               "      <td>Sci&amp;Tech</td>\n",
200 |               "      <td>No</td>\n",
201 |               "      <td>55.0</td>\n",
202 |               "      <td>Mkt&amp;HR</td>\n",
203 |               "      <td>58.80</td>\n",
204 |               "      <td>Placed</td>\n",
205 |               "      <td>270000.0</td>\n",
206 |               "    </tr>\n",
207 |               "    <tr>\n",
208 |               "      <th>1</th>\n",
209 |               "      <td>2</td>\n",
210 |               "      <td>M</td>\n",
211 |               "      <td>79.33</td>\n",
212 |               "      <td>Central</td>\n",
213 |               "      <td>78.33</td>\n",
214 |               "      <td>Others</td>\n",
215 |               "      <td>Science</td>\n",
216 |               "      <td>77.48</td>\n",
217 |               "      <td>Sci&amp;Tech</td>\n",
218 |               "      <td>Yes</td>\n",
219 |               "      <td>86.5</td>\n",
220 |               "      <td>Mkt&amp;Fin</td>\n",
221 |               "      <td>66.28</td>\n",
222 |               "      <td>Placed</td>\n",
223 |               "      <td>200000.0</td>\n",
224 |               "    </tr>\n",
225 |               "    <tr>\n",
226 |               "      <th>2</th>\n",
227 |               "      <td>3</td>\n",
228 |               "      <td>M</td>\n",
229 |               "      <td>65.00</td>\n",
230 |               "      <td>Central</td>\n",
231 |               "      <td>68.00</td>\n",
232 |               "      <td>Central</td>\n",
233 |               "      <td>Arts</td>\n",
234 |               "      <td>64.00</td>\n",
235 |               "      <td>Comm&amp;Mgmt</td>\n",
236 |               "      <td>No</td>\n",
237 |               "      <td>75.0</td>\n",
238 |               "      <td>Mkt&amp;Fin</td>\n",
239 |               "      <td>57.80</td>\n",
240 |               "      <td>Placed</td>\n",
241 |               "      <td>250000.0</td>\n",
242 |               "    </tr>\n",
243 |               "    <tr>\n",
244 |               "      <th>3</th>\n",
245 |               "      <td>4</td>\n",
246 |               "      <td>M</td>\n",
247 |               "      <td>56.00</td>\n",
248 |               "      <td>Central</td>\n",
249 |               "      <td>52.00</td>\n",
250 |               "      <td>Central</td>\n",
251 |               "      <td>Science</td>\n",
252 |               "      <td>52.00</td>\n",
253 |               "      <td>Sci&amp;Tech</td>\n",
254 |               "      <td>No</td>\n",
255 |               "      <td>66.0</td>\n",
256 |               "      <td>Mkt&amp;HR</td>\n",
257 |               "      <td>59.43</td>\n",
258 |               "      <td>Not Placed</td>\n",
259 |               "      <td>NaN</td>\n",
260 |               "    </tr>\n",
261 |               "    <tr>\n",
262 |               "      <th>4</th>\n",
263 |               "      <td>5</td>\n",
264 |               "      <td>M</td>\n",
265 |               "      <td>85.80</td>\n",
266 |               "      <td>Central</td>\n",
267 |               "      <td>73.60</td>\n",
268 |               "      <td>Central</td>\n",
269 |               "      <td>Commerce</td>\n",
270 |               "      <td>73.30</td>\n",
271 |               "      <td>Comm&amp;Mgmt</td>\n",
272 |               "      <td>No</td>\n",
273 |               "      <td>96.8</td>\n",
274 |               "      <td>Mkt&amp;Fin</td>\n",
275 |               "      <td>55.50</td>\n",
276 |               "      <td>Placed</td>\n",
277 |               "      <td>425000.0</td>\n",
278 |               "    </tr>\n",
279 |               "  </tbody>\n",
280 |               "</table>\n",
281 |               "</div>"
282 |             ],
283 |             "text/plain": [
284 |               "   sl_no gender  ssc_p    ssc_b  ...  specialisation  mba_p      status    salary\n",
285 |               "0      1      M  67.00   Others  ...          Mkt&HR  58.80      Placed  270000.0\n",
286 |               "1      2      M  79.33  Central  ...         Mkt&Fin  66.28      Placed  200000.0\n",
287 |               "2      3      M  65.00  Central  ...         Mkt&Fin  57.80      Placed  250000.0\n",
288 |               "3      4      M  56.00  Central  ...          Mkt&HR  59.43  Not Placed       NaN\n",
289 |               "4      5      M  85.80  Central  ...         Mkt&Fin  55.50      Placed  425000.0\n",
290 |               "\n",
291 |               "[5 rows x 15 columns]"
292 |             ]
293 |           },
294 |           "metadata": {
295 |             "tags": []
296 |           },
297 |           "execution_count": 8
298 |         }
299 |       ]
300 |     },
301 |     {
302 |       "cell_type": "markdown",
303 |       "metadata": {
304 |         "id": "ShNfkqfBeoWq"
305 |       },
306 |       "source": [
307 |         "Afterwards, data processing can be performed on the DataFrame using the wide range of Pandas functions for handling missing data (i.e. dropping missing data or filling them in with imputed values), selecting specific column or range of columns, performing feature transformations, conditional filtering of data, etc.\n",
308 |         "In the following example, we will separate the DataFrame as X and y variables, which will be used shortly for model building."
309 |       ]
310 |     },
311 |     {
312 |       "cell_type": "markdown",
313 |       "metadata": {
314 |         "id": "Y7r4GCZ_r578"
315 |       },
316 |       "source": [
317 |         "# **Prepraring X and Y**"
318 |       ]
319 |     },
320 |     {
321 |       "cell_type": "code",
322 |       "metadata": {
323 |         "id": "j9t6Ik9SAUPX"
324 |       },
325 |       "source": [
326 |         "X=data.drop('specialisation',axis=1)"
327 |       ],
328 |       "execution_count": null,
329 |       "outputs": []
330 |     },
331 |     {
332 |       "cell_type": "code",
333 |       "metadata": {
334 |         "id": "f52X64tDAYXY"
335 |       },
336 |       "source": [
337 |         "y=data[['specialisation']]\n"
338 |       ],
339 |       "execution_count": null,
340 |       "outputs": []
341 |     },
342 |     {
343 |       "cell_type": "markdown",
344 |       "metadata": {
345 |         "id": "5CKqdX86iIBc"
346 |       },
347 |       "source": [
348 |         "# **References**"
349 |       ]
350 |     },
351 |     {
352 |       "cell_type": "markdown",
353 |       "metadata": {
354 |         "id": "OZF79FvqiNhO"
355 |       },
356 |       "source": [
357 |         "[How to Master Scikit-learn for Data Science](https://towardsdatascience.com/how-to-master-scikit-learn-for-data-science-c29214ec25b0)"
358 |       ]
359 |     }
360 |   ]
361 | }


--------------------------------------------------------------------------------
/Preprocessing/StandardScaler_in_Machine_Learning.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "StandardScaler in Machine Learning.ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": []
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python3",
 12 |       "display_name": "Python 3"
 13 |     },
 14 |     "language_info": {
 15 |       "name": "python"
 16 |     }
 17 |   },
 18 |   "cells": [
 19 |     {
 20 |       "cell_type": "markdown",
 21 |       "metadata": {
 22 |         "id": "qIlLHw-C-vA8"
 23 |       },
 24 |       "source": [
 25 |         "# **Introduction**"
 26 |       ]
 27 |     },
 28 |     {
 29 |       "cell_type": "markdown",
 30 |       "metadata": {
 31 |         "id": "d_TDMTfJ_nuh"
 32 |       },
 33 |       "source": [
 34 |         "In Machine Learning, StandardScaler is used to resize the distribution of values ​​so that the mean of the observed values ​​is 0 and the standard deviation is 1. In this article, I will walk you through how to use StandardScaler in Machine Learning.\n",
 35 |         "\n",
 36 |         "StandardScaler is an important technique that is mainly performed as a preprocessing step before many machine learning models, in order to standardize the range of functionality of the input dataset."
 37 |       ]
 38 |     },
 39 |     {
 40 |       "cell_type": "markdown",
 41 |       "metadata": {
 42 |         "id": "cP4xM3Va_4k2"
 43 |       },
 44 |       "source": [
 45 |         "Some machine learning practitioners tend to standardize their data blindly before each machine learning model without making the effort to understand why it should be used, or even whether it is needed or not. So you need to understand when you should use the StandardScaler to scale your data."
 46 |       ]
 47 |     },
 48 |     {
 49 |       "cell_type": "markdown",
 50 |       "metadata": {
 51 |         "id": "n19_fR7xADFO"
 52 |       },
 53 |       "source": [
 54 |         "#**When and How To Use StandardScaler?**"
 55 |       ]
 56 |     },
 57 |     {
 58 |       "cell_type": "markdown",
 59 |       "metadata": {
 60 |         "id": "6gsDegOhAOnv"
 61 |       },
 62 |       "source": [
 63 |         "StandardScaler comes into play when the characteristics of the input dataset differ greatly between their ranges, or simply when they are measured in different units of measure.\n",
 64 |         "\n",
 65 |         "StandardScaler removes the mean and scales the data to the unit variance. However, outliers have an influence when calculating the empirical mean and standard deviation, which narrows the range of characteristic values.\n",
 66 |         "\n",
 67 |         "These differences in the initial features can cause problems for many machine learning models. For example, for models based on the calculation of distance, if one of the features has a wide range of values, the distance will be governed by that particular characteristic.\n",
 68 |         "\n",
 69 |         "The idea behind the StandardScaler is that variables that are measured at different scales do not contribute equally to the fit of the model and the learning function of the model and could end up creating a bias. \n",
 70 |         "\n",
 71 |         "So, to deal with this potential problem, we need to standardize the data (μ = 0, σ = 1) that is typically used before we integrate it into the machine learning model."
 72 |       ]
 73 |     },
 74 |     {
 75 |       "cell_type": "code",
 76 |       "metadata": {
 77 |         "id": "4U-xaO6I6RQZ"
 78 |       },
 79 |       "source": [
 80 |         "from sklearn.preprocessing import StandardScaler\n",
 81 |         "import numpy as np\n",
 82 |         "\n",
 83 |         "# 4 samples/observations and 2 variables/features\n",
 84 |         "X = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])\n",
 85 |         "# the scaler object (model)\n",
 86 |         "scaler = StandardScaler()\n",
 87 |         "# fit and transform the data\n",
 88 |         "scaled_data = scaler.fit_transform(X)\n",
 89 |         "print(X)"
 90 |       ],
 91 |       "execution_count": null,
 92 |       "outputs": []
 93 |     },
 94 |     {
 95 |       "cell_type": "code",
 96 |       "metadata": {
 97 |         "id": "igUk56jpAaw8",
 98 |         "outputId": "ce938b03-2f3e-497b-bee2-d07e8dbd1fc2",
 99 |         "colab": {
100 |           "base_uri": "https://localhost:8080/"
101 |         }
102 |       },
103 |       "source": [
104 |         "print(scaled_data)\n"
105 |       ],
106 |       "execution_count": 6,
107 |       "outputs": [
108 |         {
109 |           "output_type": "stream",
110 |           "text": [
111 |             "[[-1. -1.]\n",
112 |             " [ 1. -1.]\n",
113 |             " [-1.  1.]\n",
114 |             " [ 1.  1.]]\n"
115 |           ],
116 |           "name": "stdout"
117 |         }
118 |       ]
119 |     },
120 |     {
121 |       "cell_type": "markdown",
122 |       "metadata": {
123 |         "id": "6-B4D3yZ-f9t"
124 |       },
125 |       "source": [
126 |         "# **References**\n",
127 |         "\n",
128 |         "[StandardScaler in Machine Learning](https://thecleverprogrammer.com/2020/09/22/standardscaler-in-machine-learning/)"
129 |       ]
130 |     }
131 |   ]
132 | }


--------------------------------------------------------------------------------
/Preprocessing/Upload_Dataset_from_github_to_Colab.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Upload Dataset from github to Colab.ipynb",
  7 |       "provenance": []
  8 |     },
  9 |     "kernelspec": {
 10 |       "name": "python3",
 11 |       "display_name": "Python 3"
 12 |     },
 13 |     "language_info": {
 14 |       "name": "python"
 15 |     }
 16 |   },
 17 |   "cells": [
 18 |     {
 19 |       "cell_type": "markdown",
 20 |       "metadata": {
 21 |         "id": "S2tqYTAohEb8"
 22 |       },
 23 |       "source": [
 24 |         "# New Section"
 25 |       ]
 26 |     },
 27 |     {
 28 |       "cell_type": "code",
 29 |       "metadata": {
 30 |         "id": "9Lc_LrjBht7s"
 31 |       },
 32 |       "source": [
 33 |         "import pandas as pd # data processing"
 34 |       ],
 35 |       "execution_count": 16,
 36 |       "outputs": []
 37 |     },
 38 |     {
 39 |       "cell_type": "code",
 40 |       "metadata": {
 41 |         "id": "Yd6mvSMgg9ir"
 42 |       },
 43 |       "source": [
 44 |         "import zipfile\n",
 45 |         "import os"
 46 |       ],
 47 |       "execution_count": 10,
 48 |       "outputs": []
 49 |     },
 50 |     {
 51 |       "cell_type": "code",
 52 |       "metadata": {
 53 |         "colab": {
 54 |           "base_uri": "https://localhost:8080/"
 55 |         },
 56 |         "id": "7j52livQkVAb",
 57 |         "outputId": "3d27d14f-b9e3-4e78-daf5-b819bb787aba"
 58 |       },
 59 |       "source": [
 60 |         "!wget --no-check-certificate \\\n",
 61 |         "    \"https://github.com/hussain0048/Machine-Learning/archive/refs/heads/master.zip\" \\\n",
 62 |         "    -O \"/tmp/Machine-Learning.zip\"\n",
 63 |         "\n",
 64 |         "zip_ref = zipfile.ZipFile('/tmp/Machine-Learning.zip', 'r') #Opens the zip file in read mode\n",
 65 |         "zip_ref.extractall('/tmp') #Extracts the files into the /tmp folder\n",
 66 |         "zip_ref.close()"
 67 |       ],
 68 |       "execution_count": 13,
 69 |       "outputs": [
 70 |         {
 71 |           "output_type": "stream",
 72 |           "text": [
 73 |             "--2021-07-15 05:16:04--  https://github.com/hussain0048/Machine-Learning/archive/refs/heads/master.zip\n",
 74 |             "Resolving github.com (github.com)... 140.82.114.4\n",
 75 |             "Connecting to github.com (github.com)|140.82.114.4|:443... connected.\n",
 76 |             "HTTP request sent, awaiting response... 302 Found\n",
 77 |             "Location: https://codeload.github.com/hussain0048/Machine-Learning/zip/refs/heads/master [following]\n",
 78 |             "--2021-07-15 05:16:04--  https://codeload.github.com/hussain0048/Machine-Learning/zip/refs/heads/master\n",
 79 |             "Resolving codeload.github.com (codeload.github.com)... 140.82.112.10\n",
 80 |             "Connecting to codeload.github.com (codeload.github.com)|140.82.112.10|:443... connected.\n",
 81 |             "HTTP request sent, awaiting response... 200 OK\n",
 82 |             "Length: unspecified [application/zip]\n",
 83 |             "Saving to: ‘/tmp/Machine-Learning.zip’\n",
 84 |             "\n",
 85 |             "/tmp/Machine-Learni     [   <=>              ]  16.18M  24.6MB/s    in 0.7s    \n",
 86 |             "\n",
 87 |             "2021-07-15 05:16:05 (24.6 MB/s) - ‘/tmp/Machine-Learning.zip’ saved [16964377]\n",
 88 |             "\n"
 89 |           ],
 90 |           "name": "stdout"
 91 |         }
 92 |       ]
 93 |     },
 94 |     {
 95 |       "cell_type": "code",
 96 |       "metadata": {
 97 |         "id": "t_ePp3U1l_D5"
 98 |       },
 99 |       "source": [
100 |         "data = pd.read_csv(\"/tmp/Machine-Learning-master/Datasets/train.csv\")"
101 |       ],
102 |       "execution_count": 17,
103 |       "outputs": []
104 |     },
105 |     {
106 |       "cell_type": "markdown",
107 |       "metadata": {
108 |         "id": "KIOinRuDjNVc"
109 |       },
110 |       "source": [
111 |         "go to github rep\n",
112 |         "right click on Download ZIP> copy link addres"
113 |       ]
114 |     },
115 |     {
116 |       "cell_type": "markdown",
117 |       "metadata": {
118 |         "id": "LDY0ocfYiL4z"
119 |       },
120 |       "source": [
121 |         "![](https://drive.google.com/uc?export=view&id=19K_dvCKXahqLVa41-3klLp5Y4dlkkNwV)"
122 |       ]
123 |     },
124 |     {
125 |       "cell_type": "code",
126 |       "metadata": {
127 |         "id": "ggg1KCFXhbsW"
128 |       },
129 |       "source": [
130 |         "zip_ref = zipfile.ZipFile('/tmp/Website-data.zip', 'r') #Opens the zip file in read mode\n",
131 |         "zip_ref.extractall('/tmp') #Extracts the files into the /tmp folder\n",
132 |         "zip_ref.close()"
133 |       ],
134 |       "execution_count": null,
135 |       "outputs": []
136 |     },
137 |     {
138 |       "cell_type": "code",
139 |       "metadata": {
140 |         "id": "khdJwRrphnkd"
141 |       },
142 |       "source": [
143 |         "data = pd.read_csv(\"/tmp/Website-data-master/Groceries_dataset.csv\")\n"
144 |       ],
145 |       "execution_count": null,
146 |       "outputs": []
147 |     }
148 |   ]
149 | }


--------------------------------------------------------------------------------
/Preprocessing/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Recommendation System/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Scikit_Learn_Boosting_Methods.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "provenance": [],
  7 |       "toc_visible": true,
  8 |       "authorship_tag": "ABX9TyPKLNYUQulZukfKIw41xtdM",
  9 |       "include_colab_link": true
 10 |     },
 11 |     "kernelspec": {
 12 |       "name": "python3",
 13 |       "display_name": "Python 3"
 14 |     },
 15 |     "language_info": {
 16 |       "name": "python"
 17 |     }
 18 |   },
 19 |   "cells": [
 20 |     {
 21 |       "cell_type": "markdown",
 22 |       "metadata": {
 23 |         "id": "view-in-github",
 24 |         "colab_type": "text"
 25 |       },
 26 |       "source": [
 27 |         "<a href=\"https://colab.research.google.com/github/hussain0048/Machine-Learning/blob/master/Scikit_Learn_Boosting_Methods.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
 28 |       ]
 29 |     },
 30 |     {
 31 |       "cell_type": "code",
 32 |       "execution_count": null,
 33 |       "metadata": {
 34 |         "id": "K8CUatxoYDcT"
 35 |       },
 36 |       "outputs": [],
 37 |       "source": []
 38 |     },
 39 |     {
 40 |       "cell_type": "markdown",
 41 |       "source": [
 42 |         "# **1- AdaBoost**"
 43 |       ],
 44 |       "metadata": {
 45 |         "id": "SjTKXSYlZZDS"
 46 |       }
 47 |     },
 48 |     {
 49 |       "cell_type": "markdown",
 50 |       "source": [
 51 |         "## **1.1- Classification with AdaBoost**"
 52 |       ],
 53 |       "metadata": {
 54 |         "id": "rTCbgzqXZnUK"
 55 |       }
 56 |     },
 57 |     {
 58 |       "cell_type": "code",
 59 |       "source": [
 60 |         "from sklearn.ensemble import AdaBoostClassifier\n",
 61 |         "from sklearn.datasets import make_classification\n",
 62 |         "X, y = make_classification(n_samples = 1000, n_features = 10,n_informative = 2, n_redundant = 0,random_state = 0, shuffle = False)\n",
 63 |         "ADBclf = AdaBoostClassifier(n_estimators = 100, random_state = 0)\n",
 64 |         "ADBclf.fit(X, y)"
 65 |       ],
 66 |       "metadata": {
 67 |         "colab": {
 68 |           "base_uri": "https://localhost:8080/",
 69 |           "height": 75
 70 |         },
 71 |         "id": "7-KTAVDqZva6",
 72 |         "outputId": "e012f872-7c87-4c9b-aacd-2f8c3f19acab"
 73 |       },
 74 |       "execution_count": 1,
 75 |       "outputs": [
 76 |         {
 77 |           "output_type": "execute_result",
 78 |           "data": {
 79 |             "text/plain": [
 80 |               "AdaBoostClassifier(n_estimators=100, random_state=0)"
 81 |             ],
 82 |             "text/html": [
 83 |               "<style>#sk-container-id-1 {color: black;background-color: white;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>AdaBoostClassifier(n_estimators=100, random_state=0)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">AdaBoostClassifier</label><div class=\"sk-toggleable__content\"><pre>AdaBoostClassifier(n_estimators=100, random_state=0)</pre></div></div></div></div></div>"
 84 |             ]
 85 |           },
 86 |           "metadata": {},
 87 |           "execution_count": 1
 88 |         }
 89 |       ]
 90 |     },
 91 |     {
 92 |       "cell_type": "code",
 93 |       "source": [
 94 |         "print(ADBclf.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))"
 95 |       ],
 96 |       "metadata": {
 97 |         "colab": {
 98 |           "base_uri": "https://localhost:8080/"
 99 |         },
100 |         "id": "OB5MvAqdZ37z",
101 |         "outputId": "abd2b6e8-bd0d-49fc-fff6-1598fb302272"
102 |       },
103 |       "execution_count": 2,
104 |       "outputs": [
105 |         {
106 |           "output_type": "stream",
107 |           "name": "stdout",
108 |           "text": [
109 |             "[1]\n"
110 |           ]
111 |         }
112 |       ]
113 |     },
114 |     {
115 |       "cell_type": "markdown",
116 |       "source": [
117 |         "**Extra-Tree method**"
118 |       ],
119 |       "metadata": {
120 |         "id": "J8Jvz6VNaFXz"
121 |       }
122 |     },
123 |     {
124 |       "cell_type": "code",
125 |       "source": [
126 |         "from pandas import read_csv\n",
127 |         "from sklearn.model_selection import KFold\n",
128 |         "from sklearn.model_selection import cross_val_score\n",
129 |         "from sklearn.ensemble import AdaBoostClassifier\n",
130 |         "seed = 5\n",
131 |         "kfold = KFold(n_splits = 10)\n",
132 |         "num_trees = 100\n",
133 |         "max_features = 5\n",
134 |         "ADBclf = AdaBoostClassifier(n_estimators = num_trees, max_features = max_features)\n",
135 |         "results = cross_val_score(ADBclf, X, y, cv = kfold)\n",
136 |         "print(results.mean())"
137 |       ],
138 |       "metadata": {
139 |         "colab": {
140 |           "base_uri": "https://localhost:8080/",
141 |           "height": 235
142 |         },
143 |         "id": "LIsbQo61aJNK",
144 |         "outputId": "e19af9fb-a1e7-480c-ac7d-e652b0db6217"
145 |       },
146 |       "execution_count": 6,
147 |       "outputs": [
148 |         {
149 |           "output_type": "error",
150 |           "ename": "TypeError",
151 |           "evalue": "ignored",
152 |           "traceback": [
153 |             "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
154 |             "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
155 |             "\u001b[0;32m<ipython-input-6-121763d1ede5>\u001b[0m in \u001b[0;36m<cell line: 9>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      7\u001b[0m \u001b[0mnum_trees\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m100\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      8\u001b[0m \u001b[0mmax_features\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0mADBclf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mAdaBoostClassifier\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn_estimators\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnum_trees\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmax_features\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmax_features\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     10\u001b[0m \u001b[0mresults\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcross_val_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mADBclf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkfold\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     11\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresults\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmean\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
156 |             "\u001b[0;31mTypeError\u001b[0m: AdaBoostClassifier.__init__() got an unexpected keyword argument 'max_features'"
157 |           ]
158 |         }
159 |       ]
160 |     }
161 |   ]
162 | }


--------------------------------------------------------------------------------
/Sklearn/Association Mining/readm:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Sklearn/Graph Algorithms/readme:
--------------------------------------------------------------------------------
1 | 
2 | Introduction to Graph Neural Network
3 | https://heartbeat.fritz.ai/introduction-to-graph-neural-networks-c5a9f4aa9e99
4 | 


--------------------------------------------------------------------------------
/Sklearn/Unsupervised Learning/BIRCH_Clustering_in_Machine_Learning.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "BIRCH Clustering in Machine Learning.ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": []
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python3",
 12 |       "display_name": "Python 3"
 13 |     },
 14 |     "language_info": {
 15 |       "name": "python"
 16 |     }
 17 |   },
 18 |   "cells": [
 19 |     {
 20 |       "cell_type": "markdown",
 21 |       "metadata": {
 22 |         "id": "qIlLHw-C-vA8"
 23 |       },
 24 |       "source": [
 25 |         "# **Introduction**"
 26 |       ]
 27 |     },
 28 |     {
 29 |       "cell_type": "markdown",
 30 |       "metadata": {
 31 |         "id": "nAgC69Io5PCm"
 32 |       },
 33 |       "source": [
 34 |         "The BIRCH is a Clustering algorithm in machine learning. It stands for Balanced Reducing and Clustering using Hierarchies. In this article, I will take you through the concept of BIRCH Clustering in Machine Learning and its implementation using Python."
 35 |       ]
 36 |     },
 37 |     {
 38 |       "cell_type": "markdown",
 39 |       "metadata": {
 40 |         "id": "7IyZxpFq5Zd1"
 41 |       },
 42 |       "source": [
 43 |         "BIRCH is a clustering algorithm in machine learning that has been specially designed for clustering on a very large data set. It is often faster than other clustering algorithms like batch K-Means. It provides a very similar result to the batch K-Means algorithm if the number of features in the dataset is not more than 20."
 44 |       ]
 45 |     },
 46 |     {
 47 |       "cell_type": "markdown",
 48 |       "metadata": {
 49 |         "id": "BzlX_zN35iNC"
 50 |       },
 51 |       "source": [
 52 |         "When training the model using the BIRCH algorithm, it creates a tree structure with enough data to quickly assign each data point to a cluster. By storing all the data points in the tree, this algorithm allows the use of limited memory while working on a very large data set. In the section below, I will take you through its implementation by using the Python programming language."
 53 |       ]
 54 |     },
 55 |     {
 56 |       "cell_type": "markdown",
 57 |       "metadata": {
 58 |         "id": "n19_fR7xADFO"
 59 |       },
 60 |       "source": [
 61 |         "#**BIRCH Clustering using Python**"
 62 |       ]
 63 |     },
 64 |     {
 65 |       "cell_type": "markdown",
 66 |       "metadata": {
 67 |         "id": "BZaXy3GY5yR4"
 68 |       },
 69 |       "source": [
 70 |         "The BIRCH algorithm starts with a threshold value, then learns from the data, then inserts data points into the tree. In the process, if it goes out of memory while learning from the data, it increases the threshold value and repeats the process. Now let’s see how to implement BIRCH clustering using Python. I’ll start this task by importing the necessary Python libraries and the dataset:"
 71 |       ]
 72 |     },
 73 |     {
 74 |       "cell_type": "markdown",
 75 |       "metadata": {
 76 |         "id": "6rTGVD0nChvK"
 77 |       },
 78 |       "source": [
 79 |         "# **2. Preparing the Data**"
 80 |       ]
 81 |     },
 82 |     {
 83 |       "cell_type": "code",
 84 |       "metadata": {
 85 |         "colab": {
 86 |           "base_uri": "https://localhost:8080/"
 87 |         },
 88 |         "id": "tupzfDqeCprt",
 89 |         "outputId": "28d2a545-5163-4ade-8c15-2ca912cae60d"
 90 |       },
 91 |       "source": [
 92 |         "from google.colab import drive\n",
 93 |         "drive.mount('/content/drive')"
 94 |       ],
 95 |       "execution_count": 1,
 96 |       "outputs": [
 97 |         {
 98 |           "output_type": "stream",
 99 |           "text": [
100 |             "Mounted at /content/drive\n"
101 |           ],
102 |           "name": "stdout"
103 |         }
104 |       ]
105 |     },
106 |     {
107 |       "cell_type": "markdown",
108 |       "metadata": {
109 |         "id": "BeDM7bY6C2a9"
110 |       },
111 |       "source": [
112 |         "We’re ready to start building our neural network!\n",
113 |         "\n"
114 |       ]
115 |     },
116 |     {
117 |       "cell_type": "markdown",
118 |       "metadata": {
119 |         "id": "r1HpibBgDJGI"
120 |       },
121 |       "source": [
122 |         "# **3. Building the Model**"
123 |       ]
124 |     },
125 |     {
126 |       "cell_type": "code",
127 |       "metadata": {
128 |         "id": "2H50zmCjDQzS"
129 |       },
130 |       "source": [
131 |         "import numpy as np\n",
132 |         "import pandas as pd\n",
133 |         "import matplotlib.pyplot as plt\n",
134 |         "import seaborn as sns\n",
135 |         "sns.set()\n",
136 |         "\n",
137 |         "data = pd.read_csv(\"/content/drive/MyDrive/Datasets/Customer Segmentation /customers.csv\")\n",
138 |         "print(data.head())"
139 |       ],
140 |       "execution_count": null,
141 |       "outputs": []
142 |     },
143 |     {
144 |       "cell_type": "markdown",
145 |       "metadata": {
146 |         "id": "hYAVqkXz59S0"
147 |       },
148 |       "source": [
149 |         "The dataset that I am using here is based on customer segmentation. Now let’s prepare the data for implementing the clustering algorithm. Here I will rename the columns for simplicity and then I will only select two columns for implementing the BIRCH clustering algorithm using Python:"
150 |       ]
151 |     },
152 |     {
153 |       "cell_type": "code",
154 |       "metadata": {
155 |         "id": "vAeN5ZBFEHHc"
156 |       },
157 |       "source": [
158 |         "data[\"Income\"] = data[[\"Annual Income (k$)\"]]\n",
159 |         "data[\"Spending\"] = data[[\"Spending Score (1-100)\"]]\n",
160 |         "data = data[[\"Income\", \"Spending\"]]\n",
161 |         "print(data.head())"
162 |       ],
163 |       "execution_count": null,
164 |       "outputs": []
165 |     },
166 |     {
167 |       "cell_type": "markdown",
168 |       "metadata": {
169 |         "id": "swpeW1fh6LA5"
170 |       },
171 |       "source": [
172 |         "So we have prepared the data and now let’s import the BIRCH class from the sklearn library in Python and use it on the data and have a look at the results by visualizing the clusters:"
173 |       ]
174 |     },
175 |     {
176 |       "cell_type": "code",
177 |       "metadata": {
178 |         "id": "4U-xaO6I6RQZ"
179 |       },
180 |       "source": [
181 |         "from sklearn.cluster import Birch\n",
182 |         "model = Birch(branching_factor=30, n_clusters=5, threshold=2.5)\n",
183 |         "model.fit(data)\n",
184 |         "pred = model.predict(data)\n",
185 |         "plt.scatter(data[\"Income\"], data[\"Spending\"], c=pred, cmap='rainbow', alpha=0.5, edgecolors='b')\n",
186 |         "plt.show()"
187 |       ],
188 |       "execution_count": null,
189 |       "outputs": []
190 |     },
191 |     {
192 |       "cell_type": "markdown",
193 |       "metadata": {
194 |         "id": "6-B4D3yZ-f9t"
195 |       },
196 |       "source": [
197 |         "# **References**\n",
198 |         "\n",
199 |         "[BIRCH Clustering in Machine Learning](https://thecleverprogrammer.com/2021/03/15/birch-clustering-in-machine-learning/)"
200 |       ]
201 |     }
202 |   ]
203 | }


--------------------------------------------------------------------------------
/Sklearn/Unsupervised Learning/readme:
--------------------------------------------------------------------------------
1 | Unsupervised Machine Learning Algorithms
2 | https://thecleverprogrammer.com/2020/11/28/unsupervised-machine-learning-algorithms/?fbclid=IwAR1hrTax3aATD-tBz3qrAvMXgehghFudPa07M2gKetyWWb4vyPcmywEi60Y
3 | 


--------------------------------------------------------------------------------
/Sklearn/dataset/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Sklearn/readme:
--------------------------------------------------------------------------------
 1 | 
 2 | Machine Learning with Scikit-Learn Python
 3 | https://www.youtube.com/watch?v=FEksNK_i7lQ&amp;list=PLM8wYQRetTxDHDWU-YBPfKXV3G0TKXvpy&amp;index=4
 4 | Create Your Own Demand By Enhancing Your Career With CoderzColumn
 5 | https://coderzcolumn.com/
 6 | https://dataaspirant.com/save-scikit-learn-models-with-python-pickle/
 7 | https://dafriedman97.github.io/mlbook/content/c4/code.html
 8 | https://www.coursera.org/learn/python-machine-learning?ranMID=40328&ranEAID=d1QCig2q3qI&ranSiteID=d1QCig2q3qI-7SLRY0cj7Y2ybRuVHN.HAw&siteID=d1QCig2q3qI-7SLRY0cj7Y2ybRuVHN.HAw&utm_content=2&utm_medium=partners&utm_source=linkshare&utm_campaign=d1QCig2q3qI#syllabus
 9 | Machine learning with python
10 | https://www.youtube.com/playlist?list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v
11 | https://www.dataspoof.info/
12 | Graph Algorithms with Python
13 | https://thecleverprogrammer.com/2020/10/09/graph-algorithms-with-python/?fbclid=IwAR15GrrQsBy6Hu3kr2gQZCWu38SZcQKu9rZELwIFOKgEIczeErD0cFmi1mA
14 | Python Machine Learning
15 | https://www.youtube.com/watch?v=jg5paDArl3E&list=PL7yh-TELLS1EZGz1-VDltwdwZvPV-jliQ
16 | All Machine Learning Algorithms Explained
17 | https://thecleverprogrammer.com/2020/06/05/all-machine-learning-algorithms-explained/?fbclid=IwAR1agYHZMyEKGESgIMtHCGTgwjNaAtmG-EnI80C7qeuo29QdvAV_VxJuILs
18 | All posts in Machine Learning
19 | https://skilllx.com/category/technologies/machine-learning/
20 | What is Anomaly Detection in Machine Learning?
21 | https://thecleverprogrammer.com/2020/11/04/what-is-anomaly-detection-in-machine-learning/?fbclid=IwAR1fCr_LKHuHjc3XlsyVpxtQpAGpKIo0baBzmXUoyvO0WUtdNgP4WIWKG2I
22 | MACHINE LEARNING FROM SCRATCH PYTHON- TABLE OF CONTENT
23 | https://aihubprojects.com/machine-learning-from-scratch-python/?fbclid=IwAR0bgDZiVBczdqqZLlkLc4Towas2QMjsYNjJpOxqRFOUzfY-0e-U6mO-ZIs
24 | 
25 | 


--------------------------------------------------------------------------------
/Sklearn/supervised algorithm/Naive_Bayes_Algorithm_in_Machine_Learning.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Naive Bayes Algorithm in Machine Learning.ipynb",
  7 |       "provenance": []
  8 |     },
  9 |     "kernelspec": {
 10 |       "name": "python3",
 11 |       "display_name": "Python 3"
 12 |     },
 13 |     "language_info": {
 14 |       "name": "python"
 15 |     }
 16 |   },
 17 |   "cells": [
 18 |     {
 19 |       "cell_type": "markdown",
 20 |       "metadata": {
 21 |         "id": "2ReRIaHRDJwq"
 22 |       },
 23 |       "source": [
 24 |         "# **Introduction**"
 25 |       ]
 26 |     },
 27 |     {
 28 |       "cell_type": "markdown",
 29 |       "metadata": {
 30 |         "id": "2vldBcOaDU9B"
 31 |       },
 32 |       "source": [
 33 |         "In machine learning, the Naive Bayes algorithm is based on Bayes’ theorem with naïve assumptions. This makes it easier to train a model by assuming that the features are independent of each other. In this article, I will give you an introduction to the Naive Bayes algorithm in Machine Learning and its implementation using Python."
 34 |       ]
 35 |     },
 36 |     {
 37 |       "cell_type": "markdown",
 38 |       "metadata": {
 39 |         "id": "3yJkU0w5DWsy"
 40 |       },
 41 |       "source": [
 42 |         "In machine learning, the Naive Bayes is a classification algorithm based on Bayes’ theorem. It is said to be naive because the foundation of this algorithm is based on naive assumptions. Some of the advantages of this algorithm are:\n",
 43 |         "\n",
 44 |         "- It is a very simple algorithm for classification problems compared to other classification algorithms.\n",
 45 |         "- It is also a very powerful algorithm which implies that it is faster to predict labels using it compared to other classification algorithms.\n",
 46 |         "- Another advantage of using it is that it can also give better results on small datasets compared to other algorithms."
 47 |       ]
 48 |     },
 49 |     {
 50 |       "cell_type": "markdown",
 51 |       "metadata": {
 52 |         "id": "Tzi1tw53DrUS"
 53 |       },
 54 |       "source": [
 55 |         "Like all other machine learning algorithms, it also has some drawbacks. One of the biggest drawbacks that sometimes matters in classification issues is that Naive Bayes have a very strong assumption that features are independent of each other. It is therefore difficult to find such datasets in real problems where the features are independent of each other."
 56 |       ]
 57 |     },
 58 |     {
 59 |       "cell_type": "markdown",
 60 |       "metadata": {
 61 |         "id": "BifXbMvDDze7"
 62 |       },
 63 |       "source": [
 64 |         "# **Assumptions:**"
 65 |       ]
 66 |     },
 67 |     {
 68 |       "cell_type": "markdown",
 69 |       "metadata": {
 70 |         "id": "7e6AK4UHD8uz"
 71 |       },
 72 |       "source": [
 73 |         "The naive hypothesis of the Naive Bayes classifier states that each entity in the dataset makes an independent and equal contribution to the prediction of the labels.\n",
 74 |         "\n",
 75 |         "Simply put, we can say that we may not find any correlation between features and that each feature has equal importance for the formation of a classification model.\n",
 76 |         "\n",
 77 |         "These assumptions are usually not true when working in real-life problems, but the algorithm still works well, which is why it is known as the “naive” Bayes."
 78 |       ]
 79 |     },
 80 |     {
 81 |       "cell_type": "markdown",
 82 |       "metadata": {
 83 |         "id": "i2y-KQh1EHSJ"
 84 |       },
 85 |       "source": [
 86 |         "**Types:**\n",
 87 |         "\n",
 88 |         "There are three types of Naive Bayes classifiers which depend on the distribution of the dataset, namely; Gaussian, Mautinomial and Bernoulli. Let’s review the types of Naive Bayes Classifier before we implement this algorithm using Python:\n",
 89 |         "\n",
 90 |         "- Gaussian: It is used when the dataset is normally distributed.\n",
 91 |         "- Multinomial: It is used when the dataset contains discrete values.\n",
 92 |         "- Bernoulli: It is used while working on binary classification problems."
 93 |       ]
 94 |     },
 95 |     {
 96 |       "cell_type": "markdown",
 97 |       "metadata": {
 98 |         "id": "N6VCaPDyEVAy"
 99 |       },
100 |       "source": [
101 |         "**Naive Bayes Algorithm Using Python**\n",
102 |         "\n",
103 |         "Hope so far you may have discovered a lot of facts about Naive Bayes classification algorithm in machine learning. Now in this section, I will walk you through how to implement it using the Python programming language. Here I will be using the classic iris dataset for this task:"
104 |       ]
105 |     },
106 |     {
107 |       "cell_type": "code",
108 |       "metadata": {
109 |         "colab": {
110 |           "base_uri": "https://localhost:8080/"
111 |         },
112 |         "id": "Rl8S77LBEUTb",
113 |         "outputId": "d20590c5-759c-4f34-966d-74bfb6fcd2c0"
114 |       },
115 |       "source": [
116 |         "from sklearn.naive_bayes import GaussianNB\n",
117 |         "from sklearn.naive_bayes import MultinomialNB\n",
118 |         "from sklearn import datasets\n",
119 |         "from sklearn.metrics import confusion_matrix\n",
120 |         "iris = datasets.load_iris()\n",
121 |         "gnb = GaussianNB()\n",
122 |         "mnb = MultinomialNB()\n",
123 |         "y_pred_gnb = gnb.fit(iris.data, iris.target).predict(iris.data)\n",
124 |         "cnf_matrix_gnb = confusion_matrix(iris.target, y_pred_gnb)\n",
125 |         "print(cnf_matrix_gnb)"
126 |       ],
127 |       "execution_count": 1,
128 |       "outputs": [
129 |         {
130 |           "output_type": "stream",
131 |           "text": [
132 |             "[[50  0  0]\n",
133 |             " [ 0 47  3]\n",
134 |             " [ 0  3 47]]\n"
135 |           ],
136 |           "name": "stdout"
137 |         }
138 |       ]
139 |     },
140 |     {
141 |       "cell_type": "code",
142 |       "metadata": {
143 |         "colab": {
144 |           "base_uri": "https://localhost:8080/"
145 |         },
146 |         "id": "DiTu6btfEg8B",
147 |         "outputId": "dbb74cb4-57c7-4db6-e54b-1fe70f9294a3"
148 |       },
149 |       "source": [
150 |         "y_pred_mnb = mnb.fit(iris.data, iris.target).predict(iris.data)\n",
151 |         "cnf_matrix_mnb = confusion_matrix(iris.target, y_pred_mnb)\n",
152 |         "print(cnf_matrix_mnb)"
153 |       ],
154 |       "execution_count": 2,
155 |       "outputs": [
156 |         {
157 |           "output_type": "stream",
158 |           "text": [
159 |             "[[50  0  0]\n",
160 |             " [ 0 46  4]\n",
161 |             " [ 0  3 47]]\n"
162 |           ],
163 |           "name": "stdout"
164 |         }
165 |       ]
166 |     },
167 |     {
168 |       "cell_type": "markdown",
169 |       "metadata": {
170 |         "id": "FDJ9vY_0EloR"
171 |       },
172 |       "source": [
173 |         "**Conclusion**\n",
174 |         "\n",
175 |         "This is how easy it is to implement the Naive Bayes algorithm using Python for classification problems in machine learning. Some of the real-time issues where the Naive Bayes classifier can be used are:\n",
176 |         "\n",
177 |         "- Text Classification\n",
178 |         "- Spam Detection\n",
179 |         "- Sentiment Analysis\n",
180 |         "- Recommendation Systems\n",
181 |         "\n",
182 |         "I hope you liked this article on Naive Bayes classifier in machine learning and its implementation using Python. Feel free to ask your valuable questions in the comments section below."
183 |       ]
184 |     },
185 |     {
186 |       "cell_type": "markdown",
187 |       "metadata": {
188 |         "id": "AbDjpQd3C8mi"
189 |       },
190 |       "source": [
191 |         "# **References**"
192 |       ]
193 |     },
194 |     {
195 |       "cell_type": "markdown",
196 |       "metadata": {
197 |         "id": "L4CtOLMnDBuL"
198 |       },
199 |       "source": [
200 |         "[Naive Bayes Algorithm in Machine Learning](https://thecleverprogrammer.com/2021/02/07/naive-bayes-algorithm-in-machine-learning/)"
201 |       ]
202 |     }
203 |   ]
204 | }


--------------------------------------------------------------------------------
/Sklearn/supervised algorithm/Perceptron_in_Machine_Learning.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Perceptron in Machine Learning.ipynb",
  7 |       "provenance": [],
  8 |       "toc_visible": true
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python3",
 12 |       "display_name": "Python 3"
 13 |     },
 14 |     "language_info": {
 15 |       "name": "python"
 16 |     }
 17 |   },
 18 |   "cells": [
 19 |     {
 20 |       "cell_type": "markdown",
 21 |       "metadata": {
 22 |         "id": "XgG-pUdEI3CK"
 23 |       },
 24 |       "source": [
 25 |         "# **Introduction**\n",
 26 |         "Perceptron is one of the simplest architecture of **Artificial Neural Networks** in Machine Learning. It was invented by **Frank Rosenblatt in 1957**. In this article, I will take you through an introduction to Perceptron in Machine Learning and its implementation using Python.\n",
 27 |         "\n",
 28 |         "Perceptron is a type of neural network architecture that falls under the category of the simplest form of artificial neural networks. The Perceptrons are generally based on different types of artificial neurons known as **Threshold Logic Unit (TLU) or sometimes Linear Threshold Unit (LTU).** The inputs and outputs of a perceptron are numbers, unlike the values we see using a classification algorithm like logistic regression (True or False values).\n",
 29 |         "\n",
 30 |         "Perceptrons are made up of a single layer of Threshold Logic Unit where each TLU is connected to all inputs. A single TLU can be used to solve the binary classification problem and if all neurons in one layer are connected to each neuron in the previous layer, it is called a fully connected layer or dense layer. Such types of architectures can be used in the problems of multiclass classification.\n"
 31 |       ]
 32 |     },
 33 |     {
 34 |       "cell_type": "markdown",
 35 |       "metadata": {
 36 |         "id": "hBtctHfzJGya"
 37 |       },
 38 |       "source": [
 39 |         "# **Implementation of Perceptron using Python**\n",
 40 |         "\n",
 41 |         "Thus, a Perceptron is the simplest architecture of an artificial neural network that can be used to train binary or multiclass classification models. Now let’s see the implementation of perceptrons using Python. Here I will use a perceptron on the classic iris dataset to classify iris species. Here is how we can implement Perceptron using Python:\n"
 42 |       ]
 43 |     },
 44 |     {
 45 |       "cell_type": "code",
 46 |       "metadata": {
 47 |         "colab": {
 48 |           "base_uri": "https://localhost:8080/"
 49 |         },
 50 |         "id": "EwAapSDzIlmZ",
 51 |         "outputId": "2570f288-e0f1-4f8e-8f0d-e777e0ba3222"
 52 |       },
 53 |       "source": [
 54 |         "import numpy as np\n",
 55 |         "from sklearn.datasets import load_iris\n",
 56 |         "from sklearn.linear_model import Perceptron\n",
 57 |         "iris = load_iris()\n",
 58 |         "x = iris.data[:,(2,3)] #petal length, petal width\n",
 59 |         "y = (iris.target == 0).astype(np.int) #iris setosa\n",
 60 |         "perceptron = Perceptron()\n",
 61 |         "perceptron.fit(x, y)\n",
 62 |         "ypred = perceptron.predict([[2, 0.5]])\n",
 63 |         "print(ypred)"
 64 |       ],
 65 |       "execution_count": 8,
 66 |       "outputs": [
 67 |         {
 68 |           "output_type": "stream",
 69 |           "text": [
 70 |             "[0]\n"
 71 |           ],
 72 |           "name": "stdout"
 73 |         }
 74 |       ]
 75 |     },
 76 |     {
 77 |       "cell_type": "markdown",
 78 |       "metadata": {
 79 |         "id": "v4H1RuMkS4-L"
 80 |       },
 81 |       "source": [
 82 |         "The performance of Perceptrons strongly resembles the stochastic gradient descent algorithm in machine learning. But unlike a classification algorithm, perceptrons do not produce a binary class output because they make predictions on hard thresholds. This is why machine learning classification algorithms are more preferred than using a Perceptron architecture to solve a classification problem."
 83 |       ]
 84 |     },
 85 |     {
 86 |       "cell_type": "markdown",
 87 |       "metadata": {
 88 |         "id": "K-wD6XrbKwKJ"
 89 |       },
 90 |       "source": [
 91 |         "# **Summary**\n",
 92 |         "Perceptrons are a type of neural network architecture that falls under the category of the simplest form of artificial neural networks. I hope you liked this article on an introduction to Perceptrons in machine learning and its implementation using the Python programming language. Feel free to ask your valuable questions in the comments section below.\n"
 93 |       ]
 94 |     },
 95 |     {
 96 |       "cell_type": "markdown",
 97 |       "metadata": {
 98 |         "id": "uOu2ttBhIqPr"
 99 |       },
100 |       "source": [
101 |         "# **References**\n",
102 |         "\n",
103 |         "[Perceptron in Machine Learning](https://thecleverprogrammer.com/2021/05/23/perceptron-in-machine-learning/)"
104 |       ]
105 |     }
106 |   ]
107 | }


--------------------------------------------------------------------------------
/Sklearn/supervised algorithm/Reg-Mulitple-Linear-Regression-Co2-py-v1.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "button": false,
  7 |     "deletable": true,
  8 |     "new_sheet": false,
  9 |     "run_control": {
 10 |      "read_only": false
 11 |     }
 12 |    },
 13 |    "source": [
 14 |     "<a href=\"https://www.bigdatauniversity.com\"><img src=\"https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png\" width=\"400\" align=\"center\"></a>\n",
 15 |     "\n",
 16 |     "<h1><center>Multiple Linear Regression</center></h1>\n",
 17 |     "\n",
 18 |     "<h4>About this Notebook</h4>\n",
 19 |     "In this notebook, we learn how to use scikit-learn to implement Multiple linear regression. We download a dataset that is related to fuel consumption and Carbon dioxide emission of cars. Then, we split our data into training and test sets, create a model using training set, Evaluate your model using test set, and finally use model to predict unknown value\n"
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "markdown",
 24 |    "metadata": {},
 25 |    "source": [
 26 |     "<h1>Table of contents</h1>\n",
 27 |     "\n",
 28 |     "<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
 29 |     "    <ol>\n",
 30 |     "        <li><a href=\"#understanding-data\">Understanding the Data</a></li>\n",
 31 |     "        <li><a href=\"#reading_data\">Reading the Data in</a></li>\n",
 32 |     "        <li><a href=\"#multiple_regression_model\">Multiple Regression Model</a></li>\n",
 33 |     "        <li><a href=\"#prediction\">Prediction</a></li>\n",
 34 |     "        <li><a href=\"#practice\">Practice</a></li>\n",
 35 |     "    </ol>\n",
 36 |     "</div>\n",
 37 |     "<br>\n",
 38 |     "<hr>"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {
 44 |     "button": false,
 45 |     "deletable": true,
 46 |     "new_sheet": false,
 47 |     "run_control": {
 48 |      "read_only": false
 49 |     }
 50 |    },
 51 |    "source": [
 52 |     "### Importing Needed packages"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "code",
 57 |    "execution_count": null,
 58 |    "metadata": {
 59 |     "button": false,
 60 |     "collapsed": true,
 61 |     "deletable": true,
 62 |     "new_sheet": false,
 63 |     "run_control": {
 64 |      "read_only": false
 65 |     }
 66 |    },
 67 |    "outputs": [],
 68 |    "source": [
 69 |     "import matplotlib.pyplot as plt\n",
 70 |     "import pandas as pd\n",
 71 |     "import pylab as pl\n",
 72 |     "import numpy as np\n",
 73 |     "%matplotlib inline"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "markdown",
 78 |    "metadata": {
 79 |     "button": false,
 80 |     "deletable": true,
 81 |     "new_sheet": false,
 82 |     "run_control": {
 83 |      "read_only": false
 84 |     }
 85 |    },
 86 |    "source": [
 87 |     "### Downloading Data\n",
 88 |     "To download the data, we will use !wget to download it from IBM Object Storage."
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": null,
 94 |    "metadata": {
 95 |     "button": false,
 96 |     "collapsed": true,
 97 |     "deletable": true,
 98 |     "new_sheet": false,
 99 |     "run_control": {
100 |      "read_only": false
101 |     }
102 |    },
103 |    "outputs": [],
104 |    "source": [
105 |     "!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "markdown",
110 |    "metadata": {},
111 |    "source": [
112 |     "__Did you know?__ When it comes to Machine Learning, you will likely be working with large datasets. As a business, where can you host your data? IBM is offering a unique opportunity for businesses, with 10 Tb of IBM Cloud Object Storage: [Sign up now for free](http://cocl.us/ML0101EN-IBM-Offer-CC)"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "markdown",
117 |    "metadata": {
118 |     "button": false,
119 |     "deletable": true,
120 |     "new_sheet": false,
121 |     "run_control": {
122 |      "read_only": false
123 |     }
124 |    },
125 |    "source": [
126 |     "\n",
127 |     "<h2 id=\"understanding_data\">Understanding the Data</h2>\n",
128 |     "\n",
129 |     "### `FuelConsumption.csv`:\n",
130 |     "We have downloaded a fuel consumption dataset, **`FuelConsumption.csv`**, which contains model-specific fuel consumption ratings and estimated carbon dioxide emissions for new light-duty vehicles for retail sale in Canada. [Dataset source](http://open.canada.ca/data/en/dataset/98f1a129-f628-4ce4-b24d-6f16bf24dd64)\n",
131 |     "\n",
132 |     "- **MODELYEAR** e.g. 2014\n",
133 |     "- **MAKE** e.g. Acura\n",
134 |     "- **MODEL** e.g. ILX\n",
135 |     "- **VEHICLE CLASS** e.g. SUV\n",
136 |     "- **ENGINE SIZE** e.g. 4.7\n",
137 |     "- **CYLINDERS** e.g 6\n",
138 |     "- **TRANSMISSION** e.g. A6\n",
139 |     "- **FUELTYPE** e.g. z\n",
140 |     "- **FUEL CONSUMPTION in CITY(L/100 km)** e.g. 9.9\n",
141 |     "- **FUEL CONSUMPTION in HWY (L/100 km)** e.g. 8.9\n",
142 |     "- **FUEL CONSUMPTION COMB (L/100 km)** e.g. 9.2\n",
143 |     "- **CO2 EMISSIONS (g/km)** e.g. 182   --> low --> 0\n"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "markdown",
148 |    "metadata": {
149 |     "button": false,
150 |     "deletable": true,
151 |     "new_sheet": false,
152 |     "run_control": {
153 |      "read_only": false
154 |     }
155 |    },
156 |    "source": [
157 |     "<h2 id=\"reading_data\">Reading the data in</h2>"
158 |    ]
159 |   },
160 |   {
161 |    "cell_type": "code",
162 |    "execution_count": null,
163 |    "metadata": {
164 |     "button": false,
165 |     "collapsed": true,
166 |     "deletable": true,
167 |     "new_sheet": false,
168 |     "run_control": {
169 |      "read_only": false
170 |     }
171 |    },
172 |    "outputs": [],
173 |    "source": [
174 |     "df = pd.read_csv(\"FuelConsumption.csv\")\n",
175 |     "\n",
176 |     "# take a look at the dataset\n",
177 |     "df.head()"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "markdown",
182 |    "metadata": {},
183 |    "source": [
184 |     "Lets select some features that we want to use for regression."
185 |    ]
186 |   },
187 |   {
188 |    "cell_type": "code",
189 |    "execution_count": null,
190 |    "metadata": {
191 |     "button": false,
192 |     "collapsed": true,
193 |     "deletable": true,
194 |     "new_sheet": false,
195 |     "run_control": {
196 |      "read_only": false
197 |     }
198 |    },
199 |    "outputs": [],
200 |    "source": [
201 |     "cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_CITY','FUELCONSUMPTION_HWY','FUELCONSUMPTION_COMB','CO2EMISSIONS']]\n",
202 |     "cdf.head(9)"
203 |    ]
204 |   },
205 |   {
206 |    "cell_type": "markdown",
207 |    "metadata": {},
208 |    "source": [
209 |     "Lets plot Emission values with respect to Engine size:"
210 |    ]
211 |   },
212 |   {
213 |    "cell_type": "code",
214 |    "execution_count": null,
215 |    "metadata": {
216 |     "button": false,
217 |     "collapsed": true,
218 |     "deletable": true,
219 |     "new_sheet": false,
220 |     "run_control": {
221 |      "read_only": false
222 |     },
223 |     "scrolled": true
224 |    },
225 |    "outputs": [],
226 |    "source": [
227 |     "plt.scatter(cdf.ENGINESIZE, cdf.CO2EMISSIONS,  color='blue')\n",
228 |     "plt.xlabel(\"Engine size\")\n",
229 |     "plt.ylabel(\"Emission\")\n",
230 |     "plt.show()"
231 |    ]
232 |   },
233 |   {
234 |    "cell_type": "markdown",
235 |    "metadata": {
236 |     "button": false,
237 |     "deletable": true,
238 |     "new_sheet": false,
239 |     "run_control": {
240 |      "read_only": false
241 |     }
242 |    },
243 |    "source": [
244 |     "#### Creating train and test dataset\n",
245 |     "Train/Test Split involves splitting the dataset into training and testing sets respectively, which are mutually exclusive. After which, you train with the training set and test with the testing set. \n",
246 |     "This will provide a more accurate evaluation on out-of-sample accuracy because the testing dataset is not part of the dataset that have been used to train the data. It is more realistic for real world problems.\n",
247 |     "\n",
248 |     "This means that we know the outcome of each data point in this dataset, making it great to test with! And since this data has not been used to train the model, the model has no knowledge of the outcome of these data points. So, in essence, it’s truly an out-of-sample testing.\n",
249 |     "\n"
250 |    ]
251 |   },
252 |   {
253 |    "cell_type": "code",
254 |    "execution_count": null,
255 |    "metadata": {
256 |     "button": false,
257 |     "collapsed": true,
258 |     "deletable": true,
259 |     "new_sheet": false,
260 |     "run_control": {
261 |      "read_only": false
262 |     }
263 |    },
264 |    "outputs": [],
265 |    "source": [
266 |     "msk = np.random.rand(len(df)) < 0.8\n",
267 |     "train = cdf[msk]\n",
268 |     "test = cdf[~msk]"
269 |    ]
270 |   },
271 |   {
272 |    "cell_type": "markdown",
273 |    "metadata": {
274 |     "button": false,
275 |     "deletable": true,
276 |     "new_sheet": false,
277 |     "run_control": {
278 |      "read_only": false
279 |     }
280 |    },
281 |    "source": [
282 |     "#### Train data distribution"
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "code",
287 |    "execution_count": null,
288 |    "metadata": {
289 |     "button": false,
290 |     "collapsed": true,
291 |     "deletable": true,
292 |     "new_sheet": false,
293 |     "run_control": {
294 |      "read_only": false
295 |     }
296 |    },
297 |    "outputs": [],
298 |    "source": [
299 |     "plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS,  color='blue')\n",
300 |     "plt.xlabel(\"Engine size\")\n",
301 |     "plt.ylabel(\"Emission\")\n",
302 |     "plt.show()"
303 |    ]
304 |   },
305 |   {
306 |    "cell_type": "markdown",
307 |    "metadata": {
308 |     "button": false,
309 |     "deletable": true,
310 |     "new_sheet": false,
311 |     "run_control": {
312 |      "read_only": false
313 |     }
314 |    },
315 |    "source": [
316 |     "<h2 id=\"multiple_regression_model\">Multiple Regression Model</h2>\n"
317 |    ]
318 |   },
319 |   {
320 |    "cell_type": "markdown",
321 |    "metadata": {},
322 |    "source": [
323 |     "In reality, there are multiple variables that predict the Co2emission. When more than one independent variable is present, the process is called multiple linear regression. For example, predicting co2emission using FUELCONSUMPTION_COMB, EngineSize and Cylinders of cars. The good thing here is that Multiple linear regression is the extension of simple linear regression model."
324 |    ]
325 |   },
326 |   {
327 |    "cell_type": "code",
328 |    "execution_count": null,
329 |    "metadata": {
330 |     "button": false,
331 |     "collapsed": true,
332 |     "deletable": true,
333 |     "new_sheet": false,
334 |     "run_control": {
335 |      "read_only": false
336 |     }
337 |    },
338 |    "outputs": [],
339 |    "source": [
340 |     "from sklearn import linear_model\n",
341 |     "regr = linear_model.LinearRegression()\n",
342 |     "x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])\n",
343 |     "y = np.asanyarray(train[['CO2EMISSIONS']])\n",
344 |     "regr.fit (x, y)\n",
345 |     "# The coefficients\n",
346 |     "print ('Coefficients: ', regr.coef_)"
347 |    ]
348 |   },
349 |   {
350 |    "cell_type": "markdown",
351 |    "metadata": {},
352 |    "source": [
353 |     "As mentioned before, __Coefficient__ and __Intercept__ , are the parameters of the fit line. \n",
354 |     "Given that it is a multiple linear regression, with 3 parameters, and knowing that the parameters are the intercept and coefficients of hyperplane, sklearn can estimate them from our data. Scikit-learn uses plain Ordinary Least Squares method to solve this problem.\n",
355 |     "\n",
356 |     "#### Ordinary Least Squares (OLS)\n",
357 |     "OLS is a method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by minimizing the sum of the squares of the differences between the target dependent variable and those predicted by the linear function. In other words, it tries to minimizes the sum of squared errors (SSE) or mean squared error (MSE) between the target variable (y) and our predicted output ($\\hat{y}$) over all samples in the dataset.\n",
358 |     "\n",
359 |     "OLS can find the best parameters using of the following methods:\n",
360 |     "    - Solving the model parameters analytically using closed-form equations\n",
361 |     "    - Using an optimization algorithm (Gradient Descent, Stochastic Gradient Descent, Newton’s Method, etc.)"
362 |    ]
363 |   },
364 |   {
365 |    "cell_type": "markdown",
366 |    "metadata": {},
367 |    "source": [
368 |     "<h2 id=\"prediction\">Prediction</h2>"
369 |    ]
370 |   },
371 |   {
372 |    "cell_type": "code",
373 |    "execution_count": null,
374 |    "metadata": {
375 |     "button": false,
376 |     "collapsed": true,
377 |     "deletable": true,
378 |     "new_sheet": false,
379 |     "run_control": {
380 |      "read_only": false
381 |     }
382 |    },
383 |    "outputs": [],
384 |    "source": [
385 |     "y_hat= regr.predict(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])\n",
386 |     "x = np.asanyarray(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])\n",
387 |     "y = np.asanyarray(test[['CO2EMISSIONS']])\n",
388 |     "print(\"Residual sum of squares: %.2f\"\n",
389 |     "      % np.mean((y_hat - y) ** 2))\n",
390 |     "\n",
391 |     "# Explained variance score: 1 is perfect prediction\n",
392 |     "print('Variance score: %.2f' % regr.score(x, y))"
393 |    ]
394 |   },
395 |   {
396 |    "cell_type": "markdown",
397 |    "metadata": {},
398 |    "source": [
399 |     "__explained variance regression score:__  \n",
400 |     "If $\\hat{y}$ is the estimated target output, y the corresponding (correct) target output, and Var is Variance, the square of the standard deviation, then the explained variance is estimated as follow:\n",
401 |     "\n",
402 |     "$\\texttt{explainedVariance}(y, \\hat{y}) = 1 - \\frac{Var\\{ y - \\hat{y}\\}}{Var\\{y\\}}$  \n",
403 |     "The best possible score is 1.0, lower values are worse."
404 |    ]
405 |   },
406 |   {
407 |    "cell_type": "markdown",
408 |    "metadata": {},
409 |    "source": [
410 |     "<h2 id=\"practice\">Practice</h2>\n",
411 |     "Try to use a multiple linear regression with the same dataset but this time use __FUEL CONSUMPTION in CITY__ and \n",
412 |     "__FUEL CONSUMPTION in HWY__ instead of FUELCONSUMPTION_COMB. Does it result in better accuracy?"
413 |    ]
414 |   },
415 |   {
416 |    "cell_type": "code",
417 |    "execution_count": null,
418 |    "metadata": {},
419 |    "outputs": [],
420 |    "source": [
421 |     "# write your code here\n",
422 |     "\n"
423 |    ]
424 |   },
425 |   {
426 |    "cell_type": "markdown",
427 |    "metadata": {},
428 |    "source": [
429 |     "Double-click __here__ for the solution.\n",
430 |     "\n",
431 |     "<!-- Your answer is below:\n",
432 |     "\n",
433 |     "regr = linear_model.LinearRegression()\n",
434 |     "x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_CITY','FUELCONSUMPTION_HWY']])\n",
435 |     "y = np.asanyarray(train[['CO2EMISSIONS']])\n",
436 |     "regr.fit (x, y)\n",
437 |     "print ('Coefficients: ', regr.coef_)\n",
438 |     "y_= regr.predict(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_CITY','FUELCONSUMPTION_HWY']])\n",
439 |     "x = np.asanyarray(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_CITY','FUELCONSUMPTION_HWY']])\n",
440 |     "y = np.asanyarray(test[['CO2EMISSIONS']])\n",
441 |     "print(\"Residual sum of squares: %.2f\"% np.mean((y_ - y) ** 2))\n",
442 |     "print('Variance score: %.2f' % regr.score(x, y))\n",
443 |     "\n",
444 |     "\n",
445 |     "-->"
446 |    ]
447 |   },
448 |   {
449 |    "cell_type": "markdown",
450 |    "metadata": {
451 |     "button": false,
452 |     "deletable": true,
453 |     "new_sheet": false,
454 |     "run_control": {
455 |      "read_only": false
456 |     }
457 |    },
458 |    "source": [
459 |     "<h2>Want to learn more?</h2>\n",
460 |     "\n",
461 |     "IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: <a href=\"http://cocl.us/ML0101EN-SPSSModeler\">SPSS Modeler</a>\n",
462 |     "\n",
463 |     "Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at <a href=\"https://cocl.us/ML0101EN_DSX\">Watson Studio</a>\n",
464 |     "\n",
465 |     "<h3>Thanks for completing this lesson!</h3>\n",
466 |     "\n",
467 |     "<h4>Author:  <a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a></h4>\n",
468 |     "<p><a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a>, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.</p>\n",
469 |     "\n",
470 |     "<hr>\n",
471 |     "\n",
472 |     "<p>Copyright &copy; 2018 <a href=\"https://cocl.us/DX0108EN_CC\">Cognitive Class</a>. This notebook and its source code are released under the terms of the <a href=\"https://bigdatauniversity.com/mit-license/\">MIT License</a>.</p>"
473 |    ]
474 |   }
475 |  ],
476 |  "metadata": {
477 |   "kernelspec": {
478 |    "display_name": "Python 3",
479 |    "language": "python",
480 |    "name": "python3"
481 |   },
482 |   "language_info": {
483 |    "codemirror_mode": {
484 |     "name": "ipython",
485 |     "version": 3
486 |    },
487 |    "file_extension": ".py",
488 |    "mimetype": "text/x-python",
489 |    "name": "python",
490 |    "nbconvert_exporter": "python",
491 |    "pygments_lexer": "ipython3",
492 |    "version": "3.6.6"
493 |   },
494 |   "widgets": {
495 |    "state": {},
496 |    "version": "1.1.2"
497 |   }
498 |  },
499 |  "nbformat": 4,
500 |  "nbformat_minor": 2
501 | }
502 | 


--------------------------------------------------------------------------------
/Sklearn/supervised algorithm/Reg-NoneLinearRegression-py-v1.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "<a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png\" width=\"400\" align=\"center\"></a>\n",
  8 |     "\n",
  9 |     "<h1><center>Non Linear Regression Analysis</center></h1>"
 10 |    ]
 11 |   },
 12 |   {
 13 |    "cell_type": "markdown",
 14 |    "metadata": {},
 15 |    "source": [
 16 |     "If the data shows a curvy trend, then linear regression will not produce very accurate results when compared to a non-linear regression because, as the name implies, linear regression presumes that the data is linear. \n",
 17 |     "Let's learn about non linear regressions and apply an example on python. In this notebook, we fit a non-linear model to the datapoints corrensponding to China's GDP from 1960 to 2014."
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "<h2 id=\"importing_libraries\">Importing required libraries</h2>"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": null,
 30 |    "metadata": {
 31 |     "collapsed": false
 32 |    },
 33 |    "outputs": [],
 34 |    "source": [
 35 |     "import numpy as np\n",
 36 |     "import matplotlib.pyplot as plt\n",
 37 |     "%matplotlib inline"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "Though Linear regression is very good to solve many problems, it cannot be used for all datasets. First recall how linear regression, could model a dataset. It models a linear relation between a dependent variable y and independent variable x. It had a simple equation, of degree 1, for example y = $2x$ + 3."
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "code",
 49 |    "execution_count": null,
 50 |    "metadata": {},
 51 |    "outputs": [],
 52 |    "source": [
 53 |     "x = np.arange(-5.0, 5.0, 0.1)\n",
 54 |     "\n",
 55 |     "##You can adjust the slope and intercept to verify the changes in the graph\n",
 56 |     "y = 2*(x) + 3\n",
 57 |     "y_noise = 2 * np.random.normal(size=x.size)\n",
 58 |     "ydata = y + y_noise\n",
 59 |     "#plt.figure(figsize=(8,6))\n",
 60 |     "plt.plot(x, ydata,  'bo')\n",
 61 |     "plt.plot(x,y, 'r') \n",
 62 |     "plt.ylabel('Dependent Variable')\n",
 63 |     "plt.xlabel('Indepdendent Variable')\n",
 64 |     "plt.show()"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "markdown",
 69 |    "metadata": {},
 70 |    "source": [
 71 |     "Non-linear regressions are a relationship between independent variables $x$ and a dependent variable $y$ which result in a non-linear function modeled data. Essentially any relationship that is not linear can be termed as non-linear, and is usually represented by the polynomial of $k$ degrees (maximum power of $x$). \n",
 72 |     "\n",
 73 |     "$$ \\ y = a x^3 + b x^2 + c x + d \\ $$\n",
 74 |     "\n",
 75 |     "Non-linear functions can have elements like exponentials, logarithms, fractions, and others. For example: $$ y = \\log(x)$$\n",
 76 |     "    \n",
 77 |     "Or even, more complicated such as :\n",
 78 |     "$$ y = \\log(a x^3 + b x^2 + c x + d)$$"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "markdown",
 83 |    "metadata": {},
 84 |    "source": [
 85 |     "Let's take a look at a cubic function's graph."
 86 |    ]
 87 |   },
 88 |   {
 89 |    "cell_type": "code",
 90 |    "execution_count": null,
 91 |    "metadata": {
 92 |     "collapsed": false
 93 |    },
 94 |    "outputs": [],
 95 |    "source": [
 96 |     "x = np.arange(-5.0, 5.0, 0.1)\n",
 97 |     "\n",
 98 |     "##You can adjust the slope and intercept to verify the changes in the graph\n",
 99 |     "y = 1*(x**3) + 1*(x**2) + 1*x + 3\n",
100 |     "y_noise = 20 * np.random.normal(size=x.size)\n",
101 |     "ydata = y + y_noise\n",
102 |     "plt.plot(x, ydata,  'bo')\n",
103 |     "plt.plot(x,y, 'r') \n",
104 |     "plt.ylabel('Dependent Variable')\n",
105 |     "plt.xlabel('Indepdendent Variable')\n",
106 |     "plt.show()"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "markdown",
111 |    "metadata": {},
112 |    "source": [
113 |     "As you can see, this function has $x^3$ and $x^2$ as independent variables. Also, the graphic of this function is not a straight line over the 2D plane. So this is a non-linear function."
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {},
119 |    "source": [
120 |     "Some other types of non-linear functions are:"
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "markdown",
125 |    "metadata": {},
126 |    "source": [
127 |     "### Quadratic"
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "markdown",
132 |    "metadata": {},
133 |    "source": [
134 |     "$$ Y = X^2 $$"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "code",
139 |    "execution_count": null,
140 |    "metadata": {
141 |     "collapsed": false
142 |    },
143 |    "outputs": [],
144 |    "source": [
145 |     "x = np.arange(-5.0, 5.0, 0.1)\n",
146 |     "\n",
147 |     "##You can adjust the slope and intercept to verify the changes in the graph\n",
148 |     "\n",
149 |     "y = np.power(x,2)\n",
150 |     "y_noise = 2 * np.random.normal(size=x.size)\n",
151 |     "ydata = y + y_noise\n",
152 |     "plt.plot(x, ydata,  'bo')\n",
153 |     "plt.plot(x,y, 'r') \n",
154 |     "plt.ylabel('Dependent Variable')\n",
155 |     "plt.xlabel('Indepdendent Variable')\n",
156 |     "plt.show()"
157 |    ]
158 |   },
159 |   {
160 |    "cell_type": "markdown",
161 |    "metadata": {},
162 |    "source": [
163 |     "### Exponential"
164 |    ]
165 |   },
166 |   {
167 |    "cell_type": "markdown",
168 |    "metadata": {},
169 |    "source": [
170 |     "An exponential function with base c is defined by $$ Y = a + b c^X$$ where b ≠0, c > 0 , c ≠1, and x is any real number. The base, c, is constant and the exponent, x, is a variable. \n",
171 |     "\n"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": null,
177 |    "metadata": {
178 |     "collapsed": false
179 |    },
180 |    "outputs": [],
181 |    "source": [
182 |     "X = np.arange(-5.0, 5.0, 0.1)\n",
183 |     "\n",
184 |     "##You can adjust the slope and intercept to verify the changes in the graph\n",
185 |     "\n",
186 |     "Y= np.exp(X)\n",
187 |     "\n",
188 |     "plt.plot(X,Y) \n",
189 |     "plt.ylabel('Dependent Variable')\n",
190 |     "plt.xlabel('Indepdendent Variable')\n",
191 |     "plt.show()"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "markdown",
196 |    "metadata": {},
197 |    "source": [
198 |     "### Logarithmic\n",
199 |     "\n",
200 |     "The response $y$ is a results of applying logarithmic map from input $x$'s to output variable $y$. It is one of the simplest form of __log()__: i.e. $$ y = \\log(x)$$\n",
201 |     "\n",
202 |     "Please consider that instead of $x$, we can use $X$, which can be polynomial representation of the $x$'s. In general form it would be written as  \n",
203 |     "\\begin{equation}\n",
204 |     "y = \\log(X)\n",
205 |     "\\end{equation}"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "code",
210 |    "execution_count": null,
211 |    "metadata": {
212 |     "collapsed": false
213 |    },
214 |    "outputs": [],
215 |    "source": [
216 |     "X = np.arange(-5.0, 5.0, 0.1)\n",
217 |     "\n",
218 |     "Y = np.log(X)\n",
219 |     "\n",
220 |     "plt.plot(X,Y) \n",
221 |     "plt.ylabel('Dependent Variable')\n",
222 |     "plt.xlabel('Indepdendent Variable')\n",
223 |     "plt.show()"
224 |    ]
225 |   },
226 |   {
227 |    "cell_type": "markdown",
228 |    "metadata": {},
229 |    "source": [
230 |     "### Sigmoidal/Logistic"
231 |    ]
232 |   },
233 |   {
234 |    "cell_type": "markdown",
235 |    "metadata": {},
236 |    "source": [
237 |     "$$ Y = a + \\frac{b}{1+ c^{(X-d)}}$$"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "code",
242 |    "execution_count": null,
243 |    "metadata": {},
244 |    "outputs": [],
245 |    "source": [
246 |     "X = np.arange(-5.0, 5.0, 0.1)\n",
247 |     "\n",
248 |     "\n",
249 |     "Y = 1-4/(1+np.power(3, X-2))\n",
250 |     "\n",
251 |     "plt.plot(X,Y) \n",
252 |     "plt.ylabel('Dependent Variable')\n",
253 |     "plt.xlabel('Indepdendent Variable')\n",
254 |     "plt.show()"
255 |    ]
256 |   },
257 |   {
258 |    "cell_type": "markdown",
259 |    "metadata": {},
260 |    "source": [
261 |     "<a id=\"ref2\"></a>\n",
262 |     "# Non-Linear Regression example"
263 |    ]
264 |   },
265 |   {
266 |    "cell_type": "markdown",
267 |    "metadata": {},
268 |    "source": [
269 |     "For an example, we're going to try and fit a non-linear model to the datapoints corresponding to China's GDP from 1960 to 2014. We download a dataset with two columns, the first, a year between 1960 and 2014, the second, China's corresponding annual gross domestic income in US dollars for that year. "
270 |    ]
271 |   },
272 |   {
273 |    "cell_type": "code",
274 |    "execution_count": null,
275 |    "metadata": {
276 |     "collapsed": false
277 |    },
278 |    "outputs": [],
279 |    "source": [
280 |     "import numpy as np\n",
281 |     "import pandas as pd\n",
282 |     "\n",
283 |     "#downloading dataset\n",
284 |     "!wget -nv -O china_gdp.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/china_gdp.csv\n",
285 |     "    \n",
286 |     "df = pd.read_csv(\"china_gdp.csv\")\n",
287 |     "df.head(10)"
288 |    ]
289 |   },
290 |   {
291 |    "cell_type": "markdown",
292 |    "metadata": {},
293 |    "source": [
294 |     "__Did you know?__ When it comes to Machine Learning, you will likely be working with large datasets. As a business, where can you host your data? IBM is offering a unique opportunity for businesses, with 10 Tb of IBM Cloud Object Storage: [Sign up now for free](http://cocl.us/ML0101EN-IBM-Offer-CC)"
295 |    ]
296 |   },
297 |   {
298 |    "cell_type": "markdown",
299 |    "metadata": {},
300 |    "source": [
301 |     "### Plotting the Dataset ###\n",
302 |     "This is what the datapoints look like. It kind of looks like an either logistic or exponential function. The growth starts off slow, then from 2005 on forward, the growth is very significant. And finally, it decelerate slightly in the 2010s."
303 |    ]
304 |   },
305 |   {
306 |    "cell_type": "code",
307 |    "execution_count": null,
308 |    "metadata": {
309 |     "collapsed": false
310 |    },
311 |    "outputs": [],
312 |    "source": [
313 |     "plt.figure(figsize=(8,5))\n",
314 |     "x_data, y_data = (df[\"Year\"].values, df[\"Value\"].values)\n",
315 |     "plt.plot(x_data, y_data, 'ro')\n",
316 |     "plt.ylabel('GDP')\n",
317 |     "plt.xlabel('Year')\n",
318 |     "plt.show()"
319 |    ]
320 |   },
321 |   {
322 |    "cell_type": "markdown",
323 |    "metadata": {},
324 |    "source": [
325 |     "### Choosing a model ###\n",
326 |     "\n",
327 |     "From an initial look at the plot, we determine that the logistic function could be a good approximation,\n",
328 |     "since it has the property of starting with a slow growth, increasing growth in the middle, and then decreasing again at the end; as illustrated below:"
329 |    ]
330 |   },
331 |   {
332 |    "cell_type": "code",
333 |    "execution_count": null,
334 |    "metadata": {
335 |     "collapsed": false
336 |    },
337 |    "outputs": [],
338 |    "source": [
339 |     "X = np.arange(-5.0, 5.0, 0.1)\n",
340 |     "Y = 1.0 / (1.0 + np.exp(-X))\n",
341 |     "\n",
342 |     "plt.plot(X,Y) \n",
343 |     "plt.ylabel('Dependent Variable')\n",
344 |     "plt.xlabel('Indepdendent Variable')\n",
345 |     "plt.show()"
346 |    ]
347 |   },
348 |   {
349 |    "cell_type": "markdown",
350 |    "metadata": {},
351 |    "source": [
352 |     "\n",
353 |     "\n",
354 |     "The formula for the logistic function is the following:\n",
355 |     "\n",
356 |     "$$ \\hat{Y} = \\frac1{1+e^{\\beta_1(X-\\beta_2)}}$$\n",
357 |     "\n",
358 |     "$\\beta_1$: Controls the curve's steepness,\n",
359 |     "\n",
360 |     "$\\beta_2$: Slides the curve on the x-axis."
361 |    ]
362 |   },
363 |   {
364 |    "cell_type": "markdown",
365 |    "metadata": {},
366 |    "source": [
367 |     "### Building The Model ###\n",
368 |     "Now, let's build our regression model and initialize its parameters. "
369 |    ]
370 |   },
371 |   {
372 |    "cell_type": "code",
373 |    "execution_count": null,
374 |    "metadata": {},
375 |    "outputs": [],
376 |    "source": [
377 |     "def sigmoid(x, Beta_1, Beta_2):\n",
378 |     "     y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2)))\n",
379 |     "     return y"
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "markdown",
384 |    "metadata": {},
385 |    "source": [
386 |     "Lets look at a sample sigmoid line that might fit with the data:"
387 |    ]
388 |   },
389 |   {
390 |    "cell_type": "code",
391 |    "execution_count": null,
392 |    "metadata": {
393 |     "collapsed": false
394 |    },
395 |    "outputs": [],
396 |    "source": [
397 |     "beta_1 = 0.10\n",
398 |     "beta_2 = 1990.0\n",
399 |     "\n",
400 |     "#logistic function\n",
401 |     "Y_pred = sigmoid(x_data, beta_1 , beta_2)\n",
402 |     "\n",
403 |     "#plot initial prediction against datapoints\n",
404 |     "plt.plot(x_data, Y_pred*15000000000000.)\n",
405 |     "plt.plot(x_data, y_data, 'ro')"
406 |    ]
407 |   },
408 |   {
409 |    "cell_type": "markdown",
410 |    "metadata": {},
411 |    "source": [
412 |     "Our task here is to find the best parameters for our model. Lets first normalize our x and y:"
413 |    ]
414 |   },
415 |   {
416 |    "cell_type": "code",
417 |    "execution_count": null,
418 |    "metadata": {},
419 |    "outputs": [],
420 |    "source": [
421 |     "# Lets normalize our data\n",
422 |     "xdata =x_data/max(x_data)\n",
423 |     "ydata =y_data/max(y_data)"
424 |    ]
425 |   },
426 |   {
427 |    "cell_type": "markdown",
428 |    "metadata": {},
429 |    "source": [
430 |     "#### How we find the best parameters for our fit line?\n",
431 |     "we can use __curve_fit__ which uses non-linear least squares to fit our sigmoid function, to data. Optimal values for the parameters so that the sum of the squared residuals of sigmoid(xdata, *popt) - ydata is minimized.\n",
432 |     "\n",
433 |     "popt are our optimized parameters."
434 |    ]
435 |   },
436 |   {
437 |    "cell_type": "code",
438 |    "execution_count": null,
439 |    "metadata": {},
440 |    "outputs": [],
441 |    "source": [
442 |     "from scipy.optimize import curve_fit\n",
443 |     "popt, pcov = curve_fit(sigmoid, xdata, ydata)\n",
444 |     "#print the final parameters\n",
445 |     "print(\" beta_1 = %f, beta_2 = %f\" % (popt[0], popt[1]))"
446 |    ]
447 |   },
448 |   {
449 |    "cell_type": "markdown",
450 |    "metadata": {},
451 |    "source": [
452 |     "Now we plot our resulting regression model."
453 |    ]
454 |   },
455 |   {
456 |    "cell_type": "code",
457 |    "execution_count": null,
458 |    "metadata": {},
459 |    "outputs": [],
460 |    "source": [
461 |     "x = np.linspace(1960, 2015, 55)\n",
462 |     "x = x/max(x)\n",
463 |     "plt.figure(figsize=(8,5))\n",
464 |     "y = sigmoid(x, *popt)\n",
465 |     "plt.plot(xdata, ydata, 'ro', label='data')\n",
466 |     "plt.plot(x,y, linewidth=3.0, label='fit')\n",
467 |     "plt.legend(loc='best')\n",
468 |     "plt.ylabel('GDP')\n",
469 |     "plt.xlabel('Year')\n",
470 |     "plt.show()"
471 |    ]
472 |   },
473 |   {
474 |    "cell_type": "markdown",
475 |    "metadata": {},
476 |    "source": [
477 |     "## Practice\n",
478 |     "Can you calculate what is the accuracy of our model?"
479 |    ]
480 |   },
481 |   {
482 |    "cell_type": "code",
483 |    "execution_count": null,
484 |    "metadata": {},
485 |    "outputs": [],
486 |    "source": [
487 |     "# write your code here\n",
488 |     "\n",
489 |     "\n"
490 |    ]
491 |   },
492 |   {
493 |    "cell_type": "markdown",
494 |    "metadata": {},
495 |    "source": [
496 |     "Double-click __here__ for the solution.\n",
497 |     "\n",
498 |     "<!-- Your answer is below:\n",
499 |     "    \n",
500 |     "# split data into train/test\n",
501 |     "msk = np.random.rand(len(df)) < 0.8\n",
502 |     "train_x = xdata[msk]\n",
503 |     "test_x = xdata[~msk]\n",
504 |     "train_y = ydata[msk]\n",
505 |     "test_y = ydata[~msk]\n",
506 |     "\n",
507 |     "# build the model using train set\n",
508 |     "popt, pcov = curve_fit(sigmoid, train_x, train_y)\n",
509 |     "\n",
510 |     "# predict using test set\n",
511 |     "y_hat = sigmoid(test_x, *popt)\n",
512 |     "\n",
513 |     "# evaluation\n",
514 |     "print(\"Mean absolute error: %.2f\" % np.mean(np.absolute(y_hat - test_y)))\n",
515 |     "print(\"Residual sum of squares (MSE): %.2f\" % np.mean((y_hat - test_y) ** 2))\n",
516 |     "from sklearn.metrics import r2_score\n",
517 |     "print(\"R2-score: %.2f\" % r2_score(y_hat , test_y) )\n",
518 |     "\n",
519 |     "-->"
520 |    ]
521 |   },
522 |   {
523 |    "cell_type": "markdown",
524 |    "metadata": {},
525 |    "source": [
526 |     "<h2>Want to learn more?</h2>\n",
527 |     "\n",
528 |     "IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: <a href=\"http://cocl.us/ML0101EN-SPSSModeler\">SPSS Modeler</a>\n",
529 |     "\n",
530 |     "Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at <a href=\"https://cocl.us/ML0101EN_DSX\">Watson Studio</a>\n",
531 |     "\n",
532 |     "<h3>Thanks for completing this lesson!</h3>\n",
533 |     "\n",
534 |     "<h4>Author:  <a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a></h4>\n",
535 |     "<p><a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a>, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.</p>\n",
536 |     "\n",
537 |     "<hr>\n",
538 |     "\n",
539 |     "<p>Copyright &copy; 2018 <a href=\"https://cocl.us/DX0108EN_CC\">Cognitive Class</a>. This notebook and its source code are released under the terms of the <a href=\"https://bigdatauniversity.com/mit-license/\">MIT License</a>.</p>"
540 |    ]
541 |   }
542 |  ],
543 |  "metadata": {
544 |   "kernelspec": {
545 |    "display_name": "Python 3",
546 |    "language": "python",
547 |    "name": "python3"
548 |   },
549 |   "language_info": {
550 |    "codemirror_mode": {
551 |     "name": "ipython",
552 |     "version": 3
553 |    },
554 |    "file_extension": ".py",
555 |    "mimetype": "text/x-python",
556 |    "name": "python",
557 |    "nbconvert_exporter": "python",
558 |    "pygments_lexer": "ipython3",
559 |    "version": "3.6.6"
560 |   }
561 |  },
562 |  "nbformat": 4,
563 |  "nbformat_minor": 2
564 | }
565 | 


--------------------------------------------------------------------------------
/Sklearn/supervised algorithm/Voting_Classifiers.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Voting Classifiers.ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": [],
  9 |       "toc_visible": true
 10 |     },
 11 |     "kernelspec": {
 12 |       "name": "python3",
 13 |       "display_name": "Python 3"
 14 |     },
 15 |     "language_info": {
 16 |       "name": "python"
 17 |     }
 18 |   },
 19 |   "cells": [
 20 |     {
 21 |       "cell_type": "markdown",
 22 |       "metadata": {
 23 |         "id": "gL0AffZGOZxZ"
 24 |       },
 25 |       "source": [
 26 |         "# **Introduction**\n"
 27 |       ]
 28 |     },
 29 |     {
 30 |       "cell_type": "markdown",
 31 |       "metadata": {
 32 |         "id": "sEtW2gI7KVy5"
 33 |       },
 34 |       "source": [
 35 |         "Similarly, if we aggregate the predictions of a group of models (such as classifiers or regressors), we will often get better predictions than the best individual predictor. A group of predictors is called an **ensemble**. Thus this technique is called **ensemble learning**, and an ensemble learning algorithm is called an Ensemble Method."
 36 |       ]
 37 |     },
 38 |     {
 39 |       "cell_type": "markdown",
 40 |       "metadata": {
 41 |         "id": "lIDjC_tsK0HY"
 42 |       },
 43 |       "source": [
 44 |         "As an example of an ensemble method, we can train a **group of decision tree classifiers**, each on a random subset of the training data. **Such an ensemble of decision trees is called a random forest**. Despite its simplicity, this is one of the most powerful machine learning algorithms available today. In this chapter, we will discuss the most famous ensemble learning methods, including: **Bagging, Boosting, & Stacking.**"
 45 |       ]
 46 |     },
 47 |     {
 48 |       "cell_type": "markdown",
 49 |       "metadata": {
 50 |         "id": "Lr8nF4iiAxoS"
 51 |       },
 52 |       "source": [
 53 |         "# **Voting Classifiers**"
 54 |       ]
 55 |     },
 56 |     {
 57 |       "cell_type": "markdown",
 58 |       "metadata": {
 59 |         "id": "eJIKaFHZA7MB"
 60 |       },
 61 |       "source": [
 62 |         "Suppose we have trained a few classifiers, each achieving an 80% accuracy. A very simple way to create an even better classifiers is to aggregate the predictions of all our classifiers and choose the prediction that is the most frequent.\n",
 63 |         "\n",
 64 |         "**Majority voting classification is called Hard Voting**"
 65 |       ]
 66 |     },
 67 |     {
 68 |       "cell_type": "markdown",
 69 |       "metadata": {
 70 |         "id": "cUPzZRuhB53n"
 71 |       },
 72 |       "source": [
 73 |         "![](https://drive.google.com/uc?export=view&id=1Y01QJdvZ4mucKd2HIfZjnZPPdn35ISIc\n",
 74 |         ")"
 75 |       ]
 76 |     },
 77 |     {
 78 |       "cell_type": "markdown",
 79 |       "metadata": {
 80 |         "id": "ytZeVqhkNGE_"
 81 |       },
 82 |       "source": [
 83 |         "Somewhat surprisingly, this classifier achieves an even better accuracy than the best predictor in the ensemble. Even if each classifier is a weak learner (does slightly better then random guessing). Assuming that we have a sufficient number of weak learners and enough diversity.\n",
 84 |         "\n",
 85 |         "Due to the law of large numbers, if we build an ensemble containing 1,000 classifiers with individual accuracies of $51%$ & trained for binary classification, If we predict the majority voting class, we can hope for up to $75%$ accuracy.\n",
 86 |         "\n",
 87 |         "This is only true if all classifiers are completely independent, making uncorrelated errors, which is clearly not the case because they are trained on the same data.\n",
 88 |         "\n",
 89 |         "One way to get diverse classifiers is use different algorithms for each one of them & train them on different subset of the training data.\n",
 90 |         "\n",
 91 |         "Let's implement a hard voting ensemble learner using scikit-learn:"
 92 |       ]
 93 |     },
 94 |     {
 95 |       "cell_type": "markdown",
 96 |       "metadata": {
 97 |         "id": "Bh-YhPsZCG7S"
 98 |       },
 99 |       "source": [
100 |         "**Python implmentation**"
101 |       ]
102 |     },
103 |     {
104 |       "cell_type": "code",
105 |       "metadata": {
106 |         "id": "gMfrhXQhNVob"
107 |       },
108 |       "source": [
109 |         "import numpy as np\n",
110 |         "import pandas as pd\n",
111 |         "import matplotlib.pyplot as plt\n",
112 |         "import sklearn"
113 |       ],
114 |       "execution_count": 1,
115 |       "outputs": []
116 |     },
117 |     {
118 |       "cell_type": "code",
119 |       "metadata": {
120 |         "id": "hprKmZLBNdNZ"
121 |       },
122 |       "source": [
123 |         "from sklearn.ensemble import RandomForestClassifier\n",
124 |         "from sklearn.ensemble import VotingClassifier\n",
125 |         "from sklearn.linear_model import LogisticRegression\n",
126 |         "from sklearn.svm import SVC"
127 |       ],
128 |       "execution_count": 2,
129 |       "outputs": []
130 |     },
131 |     {
132 |       "cell_type": "code",
133 |       "metadata": {
134 |         "id": "oRZKUesUNiNn"
135 |       },
136 |       "source": [
137 |         "log_clf = LogisticRegression(solver='lbfgs')\n",
138 |         "rf_clf = RandomForestClassifier(n_estimators=100)\n",
139 |         "svm_clf = SVC(gamma='scale')"
140 |       ],
141 |       "execution_count": 3,
142 |       "outputs": []
143 |     },
144 |     {
145 |       "cell_type": "code",
146 |       "metadata": {
147 |         "id": "fXyBuAIjNnBZ"
148 |       },
149 |       "source": [
150 |         "from sklearn import datasets\n",
151 |         "from sklearn.model_selection import train_test_split"
152 |       ],
153 |       "execution_count": 4,
154 |       "outputs": []
155 |     },
156 |     {
157 |       "cell_type": "code",
158 |       "metadata": {
159 |         "id": "xBa0B1EhNspZ"
160 |       },
161 |       "source": [
162 |         "X, y = datasets.make_moons(n_samples=10000, noise=0.5)\n",
163 |         "X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.33)"
164 |       ],
165 |       "execution_count": 5,
166 |       "outputs": []
167 |     },
168 |     {
169 |       "cell_type": "code",
170 |       "metadata": {
171 |         "colab": {
172 |           "base_uri": "https://localhost:8080/"
173 |         },
174 |         "id": "57JwyteWNw1Q",
175 |         "outputId": "967b0632-dda0-4799-cf94-6a17410fdea2"
176 |       },
177 |       "source": [
178 |         "X_train.shape, y_train.shape, X_val.shape, y_val.shape\n"
179 |       ],
180 |       "execution_count": 6,
181 |       "outputs": [
182 |         {
183 |           "output_type": "execute_result",
184 |           "data": {
185 |             "text/plain": [
186 |               "((6700, 2), (6700,), (3300, 2), (3300,))"
187 |             ]
188 |           },
189 |           "metadata": {
190 |             "tags": []
191 |           },
192 |           "execution_count": 6
193 |         }
194 |       ]
195 |     },
196 |     {
197 |       "cell_type": "code",
198 |       "metadata": {
199 |         "id": "xUzphfgFN2V2"
200 |       },
201 |       "source": [
202 |         "voting_clf = VotingClassifier(estimators=[('lr', log_clf), ('rf', rf_clf), ('svc', svm_clf)], voting='hard')"
203 |       ],
204 |       "execution_count": 7,
205 |       "outputs": []
206 |     },
207 |     {
208 |       "cell_type": "code",
209 |       "metadata": {
210 |         "colab": {
211 |           "base_uri": "https://localhost:8080/"
212 |         },
213 |         "id": "zNROV_7TN6dO",
214 |         "outputId": "fc519296-81a5-47c0-af5e-f6c8e8a599fb"
215 |       },
216 |       "source": [
217 |         "voting_clf.fit(X_train, y_train)\n"
218 |       ],
219 |       "execution_count": 8,
220 |       "outputs": [
221 |         {
222 |           "output_type": "execute_result",
223 |           "data": {
224 |             "text/plain": [
225 |               "VotingClassifier(estimators=[('lr',\n",
226 |               "                              LogisticRegression(C=1.0, class_weight=None,\n",
227 |               "                                                 dual=False, fit_intercept=True,\n",
228 |               "                                                 intercept_scaling=1,\n",
229 |               "                                                 l1_ratio=None, max_iter=100,\n",
230 |               "                                                 multi_class='auto',\n",
231 |               "                                                 n_jobs=None, penalty='l2',\n",
232 |               "                                                 random_state=None,\n",
233 |               "                                                 solver='lbfgs', tol=0.0001,\n",
234 |               "                                                 verbose=0, warm_start=False)),\n",
235 |               "                             ('rf',\n",
236 |               "                              RandomForestClassifier(bootstrap=True,\n",
237 |               "                                                     ccp_alpha=0.0,\n",
238 |               "                                                     class_weight=None,\n",
239 |               "                                                     cr...\n",
240 |               "                                                     oob_score=False,\n",
241 |               "                                                     random_state=None,\n",
242 |               "                                                     verbose=0,\n",
243 |               "                                                     warm_start=False)),\n",
244 |               "                             ('svc',\n",
245 |               "                              SVC(C=1.0, break_ties=False, cache_size=200,\n",
246 |               "                                  class_weight=None, coef0=0.0,\n",
247 |               "                                  decision_function_shape='ovr', degree=3,\n",
248 |               "                                  gamma='scale', kernel='rbf', max_iter=-1,\n",
249 |               "                                  probability=False, random_state=None,\n",
250 |               "                                  shrinking=True, tol=0.001, verbose=False))],\n",
251 |               "                 flatten_transform=True, n_jobs=None, voting='hard',\n",
252 |               "                 weights=None)"
253 |             ]
254 |           },
255 |           "metadata": {
256 |             "tags": []
257 |           },
258 |           "execution_count": 8
259 |         }
260 |       ]
261 |     },
262 |     {
263 |       "cell_type": "markdown",
264 |       "metadata": {
265 |         "id": "HqIKJAZrOD6u"
266 |       },
267 |       "source": [
268 |         "Let's take a look at the performance of each classifier + ensemble method on the validation set:\n",
269 |         "\n"
270 |       ]
271 |     },
272 |     {
273 |       "cell_type": "code",
274 |       "metadata": {
275 |         "id": "I77QMfXHOHRq"
276 |       },
277 |       "source": [
278 |         "from sklearn.metrics import accuracy_score\n"
279 |       ],
280 |       "execution_count": 9,
281 |       "outputs": []
282 |     },
283 |     {
284 |       "cell_type": "code",
285 |       "metadata": {
286 |         "colab": {
287 |           "base_uri": "https://localhost:8080/"
288 |         },
289 |         "id": "lQLu9-ZtOI7n",
290 |         "outputId": "a09fc59f-13d1-4d4c-99f1-3206e4c877d9"
291 |       },
292 |       "source": [
293 |         "for clf in [log_clf, rf_clf, svm_clf, voting_clf]:\n",
294 |         "    clf.fit(X_train, y_train)\n",
295 |         "    y_hat = clf.predict(X_val)\n",
296 |         "    print(clf.__class__.__name__, accuracy_score(y_val, y_hat))"
297 |       ],
298 |       "execution_count": 10,
299 |       "outputs": [
300 |         {
301 |           "output_type": "stream",
302 |           "text": [
303 |             "LogisticRegression 0.8151515151515152\n",
304 |             "RandomForestClassifier 0.803939393939394\n",
305 |             "SVC 0.8303030303030303\n",
306 |             "VotingClassifier 0.8254545454545454\n"
307 |           ],
308 |           "name": "stdout"
309 |         }
310 |       ]
311 |     },
312 |     {
313 |       "cell_type": "markdown",
314 |       "metadata": {
315 |         "id": "vodRbHN8OVam"
316 |       },
317 |       "source": [
318 |         "There we have it! The voting classifier slightly outperforms the individual classifiers.\n",
319 |         "\n",
320 |         "If all ensemble method learners can estimate class probabilities, we can average their probabilities per class then predict the class with the highest probability. This is called Soft voting. It often yields results better than hard voting because it weights confidence."
321 |       ]
322 |     },
323 |     {
324 |       "cell_type": "markdown",
325 |       "metadata": {
326 |         "id": "EgWJOp-40ADb"
327 |       },
328 |       "source": [
329 |         "# **References**"
330 |       ]
331 |     },
332 |     {
333 |       "cell_type": "markdown",
334 |       "metadata": {
335 |         "id": "MIXjbz4hOO6i"
336 |       },
337 |       "source": [
338 |         "[Chapter 7. Ensemble Learning & Random Forests](https://github.com/Akramz/Hands-on-Machine-Learning-with-Scikit-Learn-Keras-and-TensorFlow/blob/master/07.Ensembles_RFs.ipynb)"
339 |       ]
340 |     }
341 |   ]
342 | }


--------------------------------------------------------------------------------
/Sklearn/supervised algorithm/XGBoost_in_Machine_Learning.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "XGBoost in Machine Learning.ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": []
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python3",
 12 |       "display_name": "Python 3"
 13 |     },
 14 |     "language_info": {
 15 |       "name": "python"
 16 |     }
 17 |   },
 18 |   "cells": [
 19 |     {
 20 |       "cell_type": "markdown",
 21 |       "metadata": {
 22 |         "id": "bIsI9dS-FNPO"
 23 |       },
 24 |       "source": [
 25 |         "# **Introduction**"
 26 |       ]
 27 |     },
 28 |     {
 29 |       "cell_type": "markdown",
 30 |       "metadata": {
 31 |         "id": "ZdupLwSYFUHh"
 32 |       },
 33 |       "source": [
 34 |         "XGBoost or Gradient Boosting is a machine learning algorithm that goes through cycles to iteratively add models to a set. In this article, I will take you through the XGBoost algorithm in Machine Learning."
 35 |       ]
 36 |     },
 37 |     {
 38 |       "cell_type": "markdown",
 39 |       "metadata": {
 40 |         "id": "0EpPpnxScrD4"
 41 |       },
 42 |       "source": [
 43 |         "The cycle of the XGBoost algorithm begins by initializing the whole with a unique model, the predictions of which can be quite naive."
 44 |       ]
 45 |     },
 46 |     {
 47 |       "cell_type": "markdown",
 48 |       "metadata": {
 49 |         "id": "22KV7aATHIU5"
 50 |       },
 51 |       "source": [
 52 |         "# **The Process of XGBoost Algorithm:**\n"
 53 |       ]
 54 |     },
 55 |     {
 56 |       "cell_type": "markdown",
 57 |       "metadata": {
 58 |         "id": "nQkZsGzvc1pi"
 59 |       },
 60 |       "source": [
 61 |         "- First, we use the current set to generate predictions for each observation in the dataset. To make a prediction, we add the predictions of all the models in the set.\n",
 62 |         "- These predictions are used to calculate a loss function.\n",
 63 |         "- Then we use the loss function to fit a new model which will be added to the set. Specifically, we determine the parameters of the model so that adding this new model to the set reduces the loss.\n",
 64 |         "- Finally, we add the new model to the set, and …\n",
 65 |         "then repeat!"
 66 |       ]
 67 |     },
 68 |     {
 69 |       "cell_type": "markdown",
 70 |       "metadata": {
 71 |         "id": "zpOJPL34rP5Z"
 72 |       },
 73 |       "source": [
 74 |         "# **XGBoost Algorithm in Action**\n"
 75 |       ]
 76 |     },
 77 |     {
 78 |       "cell_type": "markdown",
 79 |       "metadata": {
 80 |         "id": "ePqKoUWTre6h"
 81 |       },
 82 |       "source": [
 83 |         "I’ll start by loading the training and validation data into X_train, X_valid, y_train and y_valid. The dataset, I am using here can be easily downloaded from here."
 84 |       ]
 85 |     },
 86 |     {
 87 |       "cell_type": "code",
 88 |       "metadata": {
 89 |         "colab": {
 90 |           "base_uri": "https://localhost:8080/"
 91 |         },
 92 |         "id": "HoOKm_KQsBWL",
 93 |         "outputId": "1d16352d-393f-46a2-88dc-22c1ac52da25"
 94 |       },
 95 |       "source": [
 96 |         "\n",
 97 |         "from google.colab import drive\n",
 98 |         "drive.mount('/content/drive')"
 99 |       ],
100 |       "execution_count": 9,
101 |       "outputs": [
102 |         {
103 |           "output_type": "stream",
104 |           "text": [
105 |             "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
106 |           ],
107 |           "name": "stdout"
108 |         }
109 |       ]
110 |     },
111 |     {
112 |       "cell_type": "code",
113 |       "metadata": {
114 |         "id": "Fw0YMemUsOWj"
115 |       },
116 |       "source": [
117 |         "import pandas as pd\n",
118 |         "from sklearn.model_selection import train_test_split\n",
119 |         "\n",
120 |         "# Read the data\n",
121 |         "data = pd.read_csv('/content/drive/MyDrive/Datasets/melb_data.csv')\n",
122 |         "\n",
123 |         "# Select subset of predictors\n",
124 |         "cols_to_use = ['Rooms', 'Distance', 'Landsize', 'BuildingArea', 'YearBuilt']\n",
125 |         "X = data[cols_to_use]\n",
126 |         "\n",
127 |         "# Select target\n",
128 |         "y = data.Price\n",
129 |         "\n",
130 |         "# Separate data into training and validation sets\n",
131 |         "X_train, X_valid, y_train, y_valid = train_test_split(X,y) "
132 |       ],
133 |       "execution_count": 6,
134 |       "outputs": []
135 |     },
136 |     {
137 |       "cell_type": "markdown",
138 |       "metadata": {
139 |         "id": "otrycWU6sqTK"
140 |       },
141 |       "source": [
142 |         "Now, here you will learn how to use the XGBoost algorithm. Here we need to import the scikit-learn API for XGBoost (xgboost.XGBRegressor). This allows us to create and adjust a model like we would in scikit-learn. As you will see in the output, the XGBRegressor class has many adjustable parameters:"
143 |       ]
144 |     },
145 |     {
146 |       "cell_type": "code",
147 |       "metadata": {
148 |         "id": "IsTLbK6GssO6"
149 |       },
150 |       "source": [
151 |         "from xgboost import XGBRegressor\n",
152 |         "\n",
153 |         "my_model = XGBRegressor()\n",
154 |         "my_model.fit(X_train, y_train)"
155 |       ],
156 |       "execution_count": null,
157 |       "outputs": []
158 |     },
159 |     {
160 |       "cell_type": "markdown",
161 |       "metadata": {
162 |         "id": "ZAYMrKnXs3aj"
163 |       },
164 |       "source": [
165 |         "Now, we need to make predictions and evaluate our model:\n",
166 |         "\n"
167 |       ]
168 |     },
169 |     {
170 |       "cell_type": "code",
171 |       "metadata": {
172 |         "colab": {
173 |           "base_uri": "https://localhost:8080/"
174 |         },
175 |         "id": "V_A5LMrws-5i",
176 |         "outputId": "2c451630-6381-4aec-a62c-590e92000d0e"
177 |       },
178 |       "source": [
179 |         "from sklearn.metrics import mean_absolute_error\n",
180 |         "\n",
181 |         "predictions = my_model.predict(X_valid)\n",
182 |         "print(\"Mean Absolute Error: \" + str(mean_absolute_error(predictions, y_valid)))"
183 |       ],
184 |       "execution_count": 8,
185 |       "outputs": [
186 |         {
187 |           "output_type": "stream",
188 |           "text": [
189 |             "Mean Absolute Error: 279829.9009295499\n"
190 |           ],
191 |           "name": "stdout"
192 |         }
193 |       ]
194 |     },
195 |     {
196 |       "cell_type": "markdown",
197 |       "metadata": {
198 |         "id": "XD-d44CUH_8n"
199 |       },
200 |       "source": [
201 |         "# **Parameter Tuning**\n"
202 |       ]
203 |     },
204 |     {
205 |       "cell_type": "markdown",
206 |       "metadata": {
207 |         "id": "fYkOlpS737BL"
208 |       },
209 |       "source": [
210 |         "XGBoost has a few features that can drastically affect the accuracy and speed of training. The first feature you need to understand are:\n",
211 |         "\n"
212 |       ]
213 |     },
214 |     {
215 |       "cell_type": "markdown",
216 |       "metadata": {
217 |         "id": "eJwABNAd3_Jc"
218 |       },
219 |       "source": [
220 |         "**n_estimators**\n"
221 |       ]
222 |     },
223 |     {
224 |       "cell_type": "markdown",
225 |       "metadata": {
226 |         "id": "0zaqY98h4Ct6"
227 |       },
228 |       "source": [
229 |         "n_estimators specifies the number of times to skip the modelling cycle described above. It is equal to the number of models we include in the set."
230 |       ]
231 |     },
232 |     {
233 |       "cell_type": "markdown",
234 |       "metadata": {
235 |         "id": "BECHRnND4Ih2"
236 |       },
237 |       "source": [
238 |         "- Too low a value results in an underfitting, leading to inaccurate predictions on training data and test data.\n",
239 |         "- Too high a value results in overfitting, resulting in accurate predictions on training data, but inaccurate predictions on test data (which is important to us)."
240 |       ]
241 |     },
242 |     {
243 |       "cell_type": "markdown",
244 |       "metadata": {
245 |         "id": "m45TqHHI4THz"
246 |       },
247 |       "source": [
248 |         "Typical the values ​​lie between 100 to 1000, although it all depends a lot on the learning_rate parameter described below. Here is the code to set the number of models in the set:"
249 |       ]
250 |     },
251 |     {
252 |       "cell_type": "code",
253 |       "metadata": {
254 |         "id": "Uc_9nhDP4U19"
255 |       },
256 |       "source": [
257 |         "XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
258 |         "             colsample_bynode=1, colsample_bytree=1, gamma=0,\n",
259 |         "             importance_type='gain', learning_rate=0.1, max_delta_step=0,\n",
260 |         "             max_depth=3, min_child_weight=1, missing=None, n_estimators=500,\n",
261 |         "             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,\n",
262 |         "             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,\n",
263 |         "             silent=None, subsample=1, verbosity=1)"
264 |       ],
265 |       "execution_count": null,
266 |       "outputs": []
267 |     },
268 |     {
269 |       "cell_type": "markdown",
270 |       "metadata": {
271 |         "id": "OA6GZob_4cCd"
272 |       },
273 |       "source": [
274 |         "**early_stopping_rounds**\n"
275 |       ]
276 |     },
277 |     {
278 |       "cell_type": "markdown",
279 |       "metadata": {
280 |         "id": "U6wv3Tnv4eNL"
281 |       },
282 |       "source": [
283 |         "early_stopping_rounds provides a way to automatically find the ideal value for n_estimators. Stopping early causes the iteration of the model to stop when the validation score stops improving, even though we are not stopping hard for n_estimators. It’s a good idea to set n_estimators high and then use early_stopping_rounds to find the optimal time to stop the iteration."
284 |       ]
285 |     },
286 |     {
287 |       "cell_type": "markdown",
288 |       "metadata": {
289 |         "id": "tbHxfNEq4jH1"
290 |       },
291 |       "source": [
292 |         "Since random chance sometimes causes a single round where validation scores do not improve, you must specify a number for the number of direct deterioration turns to allow before stopping. Setting early_stopping_rounds = 5 is a reasonable choice. In this case, we stop after 5 consecutive rounds of deterioration of validation scores. Now let’s see how we can use early_stopping:"
293 |       ]
294 |     },
295 |     {
296 |       "cell_type": "code",
297 |       "metadata": {
298 |         "id": "Q4nFErKq4r1T"
299 |       },
300 |       "source": [
301 |         "my_model = XGBRegressor(n_estimators=500)\n",
302 |         "my_model.fit(X_train, y_train, \n",
303 |         "             early_stopping_rounds=5, \n",
304 |         "             eval_set=[(X_valid, y_valid)],\n",
305 |         "             verbose=False)"
306 |       ],
307 |       "execution_count": null,
308 |       "outputs": []
309 |     },
310 |     {
311 |       "cell_type": "markdown",
312 |       "metadata": {
313 |         "id": "4FdGAi104xp0"
314 |       },
315 |       "source": [
316 |         "**learning_rate**\n"
317 |       ]
318 |     },
319 |     {
320 |       "cell_type": "markdown",
321 |       "metadata": {
322 |         "id": "dTaHClVE43Z0"
323 |       },
324 |       "source": [
325 |         "Instead of getting predictions by simply adding up the predictions of each component model, we can multiply the predictions of each model by a small number before adding them.\n",
326 |         "\n",
327 |         "This means that every tree we add to the set helps us less. So we can set a high value for the n_estimators without overfitting. If we use early shutdown, the appropriate number of trees will be determined automatically. Now, let’s see how we can use learning_rate in XGBoost algorithm:"
328 |       ]
329 |     },
330 |     {
331 |       "cell_type": "code",
332 |       "metadata": {
333 |         "colab": {
334 |           "base_uri": "https://localhost:8080/"
335 |         },
336 |         "id": "C0TxFTgp45S6",
337 |         "outputId": "cc0ed6cb-e257-431b-a1f9-28c3d9d9f431"
338 |       },
339 |       "source": [
340 |         "my_model = XGBRegressor(n_estimators=1000, learning_rate=0.05)\n",
341 |         "my_model.fit(X_train, y_train, \n",
342 |         "             early_stopping_rounds=5, \n",
343 |         "             eval_set=[(X_valid, y_valid)], \n",
344 |         "             verbose=False)"
345 |       ],
346 |       "execution_count": 12,
347 |       "outputs": [
348 |         {
349 |           "output_type": "stream",
350 |           "text": [
351 |             "[07:22:39] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.\n"
352 |           ],
353 |           "name": "stdout"
354 |         },
355 |         {
356 |           "output_type": "execute_result",
357 |           "data": {
358 |             "text/plain": [
359 |               "XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
360 |               "             colsample_bynode=1, colsample_bytree=1, gamma=0,\n",
361 |               "             importance_type='gain', learning_rate=0.05, max_delta_step=0,\n",
362 |               "             max_depth=3, min_child_weight=1, missing=None, n_estimators=1000,\n",
363 |               "             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,\n",
364 |               "             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,\n",
365 |               "             silent=None, subsample=1, verbosity=1)"
366 |             ]
367 |           },
368 |           "metadata": {},
369 |           "execution_count": 12
370 |         }
371 |       ]
372 |     },
373 |     {
374 |       "cell_type": "markdown",
375 |       "metadata": {
376 |         "id": "8fxR6L_c5E6F"
377 |       },
378 |       "source": [
379 |         "**n_jobs**\n"
380 |       ]
381 |     },
382 |     {
383 |       "cell_type": "markdown",
384 |       "metadata": {
385 |         "id": "sxepfNxS5HMr"
386 |       },
387 |       "source": [
388 |         "On larger datasets where execution is a consideration, you can use parallelism to build your models faster. It is common to set the n_jobs parameter equal to the number of cores on your machine. On smaller data sets, this won’t help.\n",
389 |         "\n",
390 |         "The resulting model will not be better, so micro-optimizing the timing of the fit is usually just a distraction. But it’s very useful in large datasets where you would spend a lot of time waiting for the fit command. Now, let’s see how to use this parameter in the XGBoost algorithm:"
391 |       ]
392 |     },
393 |     {
394 |       "cell_type": "code",
395 |       "metadata": {
396 |         "id": "k4CmOX1O5Ov2"
397 |       },
398 |       "source": [
399 |         "my_model = XGBRegressor(n_estimators=1000, learning_rate=0.05, n_jobs=4)\n",
400 |         "my_model.fit(X_train, y_train, \n",
401 |         "             early_stopping_rounds=5, \n",
402 |         "             eval_set=[(X_valid, y_valid)], \n",
403 |         "             verbose=False)"
404 |       ],
405 |       "execution_count": null,
406 |       "outputs": []
407 |     },
408 |     {
409 |       "cell_type": "markdown",
410 |       "metadata": {
411 |         "id": "Fby7CCu5E0-C"
412 |       },
413 |       "source": [
414 |         "# **References**\n",
415 |         "[XGBoost in Machine Learning](https://thecleverprogrammer.com/2020/09/04/xgboost-in-machine-learning/)"
416 |       ]
417 |     }
418 |   ]
419 | }


--------------------------------------------------------------------------------
/Sklearn/supervised algorithm/dataset/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Sklearn/supervised algorithm/readm:
--------------------------------------------------------------------------------
1 | Supervised learning is a type of machine learning problem where users are given targets which they need to predict.
2 | Classification is a type of supervised learning where an algorithm predicts one output from a list of given classes.
3 | It can be a binary classification task where there are 2-classes or multi-class problems where there are more than 2-classes.
4 | Scikit-Learn - Naive Bayes¶
5 | https://coderzcolumn.com/tutorials/machine-learning/scikit-learn-sklearn-naive-bayes?fbclid=IwAR2EUHN0XwJlCQ8hxjvYHh9Vl4g0AjllmD1ktHsNd7Mwu5g2bOLZEjdKld4
6 | 


--------------------------------------------------------------------------------
/Statistics/Readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/The-Art-of-Linear-Algebra.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/The-Art-of-Linear-Algebra.pdf


--------------------------------------------------------------------------------
/readme:
--------------------------------------------------------------------------------
1 | Semi Supervised Learning – A Gentle Introduction for Beginners
2 | https://machinelearningknowledge.ai/semi-supervised-learning-a-gentle-introduction-for-beginners/?fbclid=IwAR2hWWec_bhDJpjr9nOSEUkS1zjW4LJ-IsqLXtL8dm3mCPT-JHFfjVUThWY
3 | Machine Learning with python for everyone
4 | https://drive.google.com/file/d/16q7D0W0CIGS4qOAjt18BpEquodqpE7EV/view?usp%3Ddrivesdk&fbclid=IwAR0y98UMt5ts7FFCN32AN29o8gUHnTGlB1sMNR_wvqEXV_GCefLqvCVlheE
5 | 
6 | 
7 | 


--------------------------------------------------------------------------------