├── 1_28_2020_Supervised_learning_with_Sklearn.ipynb ├── 1_fzqAsLGKIt3jzstwBLGsJA.png ├── 438058314_3872280102992838_2811757508132819156_n.jpg ├── Advertising.csv ├── Agglomerative_Clustering_using_scikit_learn.ipynb ├── Andrew_Linear_Regression_Exercise_1_By_Fida_Mohammad.ipynb ├── Anomaly_Detection.ipynb ├── Anomaly_Detection_using_Python_Library_.ipynb ├── Anomaly_Detection_using_Using_Gaussian_Mixture_Models.ipynb ├── Anomaly_Detection_with_PyCaret.ipynb ├── Auto_Model_Training_and_Evaluation_.ipynb ├── Build_Machine_Learning_Pipelines.ipynb ├── Clustering_With_Pycaret.ipynb ├── Collaborative_Filtering.ipynb ├── DALL·E 2025-02-20 09.38.02 - An enhanced AI-themed GitHub repository banner with a futuristic dark blue and black background, incorporating glowing abstract neural network pattern.webp ├── DBSCAN_Clustering_in_Machine_Learning.ipynb ├── Data Visualization ├── Automate_Exploratory_Data_Analysis.ipynb └── readme ├── Data_Exploratory_and_Ploting.ipynb ├── Data_Processing_in_Python_.ipynb ├── Datasets ├── CC GENERAL.csv ├── readme ├── test.csv └── train.csv ├── Deep Learning library ├── FastAI_in_Machine_Learning.ipynb └── readme ├── Distance_Measure_.ipynb ├── Exploring_Correlation_10_14_21.ipynb ├── Feature Selection ├── Feature_Selection.ipynb └── readme ├── Feature_Selection_10_14_21.ipynb ├── Feature_extraction.ipynb ├── Fine_Tuning_your_model.ipynb ├── Introduction of AI.md ├── KNN_with_Python_.ipynb ├── Linear_Regression_Andrew.ipynb ├── ML(Andrew) ├── 4-Linear Regression with Multiple Variables │ ├── Readme │ ├── ex1.pdf │ ├── ex1data1.txt │ ├── ex1data2.txt │ ├── exercise1.ipynb │ └── utils.py ├── 5-Logistic Regression (LR) │ ├── Multiclass_classification_using_onevsall.ipynb │ ├── ex3data1.mat │ ├── ex3weights.mat │ ├── neuralnetwork.png │ ├── readme │ ├── token.pkl │ └── utils.py ├── Neural Networks: Representation │ ├── ex4-backpropagation.png │ ├── ex4data1.mat │ ├── ex4weights.mat │ ├── exercise4.ipynb │ ├── neural_network.png │ ├── readme │ └── utils.py ├── Rread └── Rreadme ├── ML0101EN_RecSys_Collaborative_Filtering_movies_py_v1.ipynb ├── ML0101EN_RecSys_Content_Based_movies_py_v1.ipynb ├── Machine Leanring.png ├── Machine Learning ├── New Text Document.txt ├── 📚Chapter 1 - Introduction │ └── New Text Document.txt └── 📚Chapter 2 -Linear Regression with one Variable │ └── New Text Document.txt ├── Machine_Learning.ipynb ├── Model Evaluation ├── Bias_and_Variance_using_Python.ipynb ├── Scikit_Plot_Visualizing_Machine_Learning_Algorithm_Results_&_Performance (1).ipynb ├── What_is_Cross_Validation_in_Machine_Learning_.ipynb ├── hyperparameter_tuning.ipynb └── readme ├── Pandas.ipynb ├── Pipelines_in_scikit_learn.ipynb ├── Preprocessing ├── Create_new_Features_(Faker)_.ipynb ├── Creating_artificial_datasets.ipynb ├── Data_Processing_in_Python_.ipynb ├── Data_representation_in_scikit_learn.ipynb ├── StandardScaler_in_Machine_Learning.ipynb ├── Upload_Dataset_from_github_to_Colab.ipynb └── readme ├── ReadMe.md ├── Recommendation System ├── ML0101EN-RecSys-Collaborative-Filtering-movies-py-v1.ipynb ├── ML0101EN-RecSys-Content-Based-movies-py-v1.ipynb └── readme ├── Regression_With_Pycaret.ipynb ├── Scikit_Learn_Boosting_Methods.ipynb ├── Simple_Linear_Regression_using_scikit_learn.ipynb ├── Sklearn ├── Association Mining │ ├── Apriori_Algorithm (1).ipynb │ └── readm ├── Feature_Engineering_in_ML (1).ipynb ├── Graph Algorithms │ ├── Graph_Algorithem.ipynb │ └── readme ├── Introduction_of_SKLEARN.ipynb ├── Unsupervised Learning │ ├── Anomaly_Detection.ipynb │ ├── Anomaly_Detection_with_Isolation_Forest_algorithm.ipynb │ ├── BIRCH_Clustering_in_Machine_Learning.ipynb │ ├── Clus-DBSCN-weather-py-v1.ipynb │ ├── Clus-Hierarchical-Cars-py-v1.ipynb │ ├── Clus-K-Means-Customer-Seg-py-v1.ipynb │ ├── DBSCAN_Clustering_in_Machine_Learning.ipynb │ ├── Kmean .ipynb │ └── readme ├── dataset │ └── readme ├── readme └── supervised algorithm │ ├── 1-28-2020-Supervised_learning_with_Sklearn.ipynb │ ├── Bagging_&_Random_Forests.ipynb │ ├── Clas-Decision-Trees-drug-py-v1.ipynb │ ├── Clas-K-Nearest-neighbors-CustCat-py-v1.ipynb │ ├── Decision_Trees.ipynb │ ├── Linear_Regression_.ipynb │ ├── Model_Evaluation_&_Scoring_Matrices (1).ipynb │ ├── Naive_Bayes_.ipynb │ ├── Naive_Bayes_Algorithm_in_Machine_Learning.ipynb │ ├── Neural_Network.ipynb │ ├── Perceptron_in_Machine_Learning.ipynb │ ├── PyCaret_in_Machine_Learning.ipynb │ ├── Reg-Mulitple-Linear-Regression-Co2-py-v1.ipynb │ ├── Reg-NoneLinearRegression-py-v1.ipynb │ ├── Reg-Polynomial-Regression-Co2-py-v1.ipynb │ ├── Reg-Simple-Linear-Regression-Co2-py-v1.ipynb │ ├── Supervised_(Classification)_ML_Model_Training_and_Evulation_.ipynb │ ├── Support_Vector_Machine_(SVM)_.ipynb │ ├── Voting_Classifiers.ipynb │ ├── XGBoost.ipynb │ ├── XGBoost_in_Machine_Learning.ipynb │ ├── dataset │ └── readme │ ├── logistic_Regression_and_KNN_.ipynb │ └── readm ├── Statistics ├── Exploring_Correlation_10_14_21.ipynb └── Readme ├── Statistics_for_Machine_Learning_.ipynb ├── Supervised_(Classification)_ML_Model_Training_and_Evulation_ Backup.ipynb ├── Supervised_(Classification)_ML_Model_Training_and_Evulation_.ipynb ├── The-Art-of-Linear-Algebra.pdf ├── Unsupervised_learning.ipynb └── readme /1_fzqAsLGKIt3jzstwBLGsJA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/1_fzqAsLGKIt3jzstwBLGsJA.png -------------------------------------------------------------------------------- /438058314_3872280102992838_2811757508132819156_n.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/438058314_3872280102992838_2811757508132819156_n.jpg -------------------------------------------------------------------------------- /Advertising.csv: -------------------------------------------------------------------------------- 1 | ,TV,Radio,Newspaper,Sales 2 | 1,230.1,37.8,69.2,22.1 3 | 2,44.5,39.3,45.1,10.4 4 | 3,17.2,45.9,69.3,9.3 5 | 4,151.5,41.3,58.5,18.5 6 | 5,180.8,10.8,58.4,12.9 7 | 6,8.7,48.9,75,7.2 8 | 7,57.5,32.8,23.5,11.8 9 | 8,120.2,19.6,11.6,13.2 10 | 9,8.6,2.1,1,4.8 11 | 10,199.8,2.6,21.2,10.6 12 | 11,66.1,5.8,24.2,8.6 13 | 12,214.7,24,4,17.4 14 | 13,23.8,35.1,65.9,9.2 15 | 14,97.5,7.6,7.2,9.7 16 | 15,204.1,32.9,46,19 17 | 16,195.4,47.7,52.9,22.4 18 | 17,67.8,36.6,114,12.5 19 | 18,281.4,39.6,55.8,24.4 20 | 19,69.2,20.5,18.3,11.3 21 | 20,147.3,23.9,19.1,14.6 22 | 21,218.4,27.7,53.4,18 23 | 22,237.4,5.1,23.5,12.5 24 | 23,13.2,15.9,49.6,5.6 25 | 24,228.3,16.9,26.2,15.5 26 | 25,62.3,12.6,18.3,9.7 27 | 26,262.9,3.5,19.5,12 28 | 27,142.9,29.3,12.6,15 29 | 28,240.1,16.7,22.9,15.9 30 | 29,248.8,27.1,22.9,18.9 31 | 30,70.6,16,40.8,10.5 32 | 31,292.9,28.3,43.2,21.4 33 | 32,112.9,17.4,38.6,11.9 34 | 33,97.2,1.5,30,9.6 35 | 34,265.6,20,0.3,17.4 36 | 35,95.7,1.4,7.4,9.5 37 | 36,290.7,4.1,8.5,12.8 38 | 37,266.9,43.8,5,25.4 39 | 38,74.7,49.4,45.7,14.7 40 | 39,43.1,26.7,35.1,10.1 41 | 40,228,37.7,32,21.5 42 | 41,202.5,22.3,31.6,16.6 43 | 42,177,33.4,38.7,17.1 44 | 43,293.6,27.7,1.8,20.7 45 | 44,206.9,8.4,26.4,12.9 46 | 45,25.1,25.7,43.3,8.5 47 | 46,175.1,22.5,31.5,14.9 48 | 47,89.7,9.9,35.7,10.6 49 | 48,239.9,41.5,18.5,23.2 50 | 49,227.2,15.8,49.9,14.8 51 | 50,66.9,11.7,36.8,9.7 52 | 51,199.8,3.1,34.6,11.4 53 | 52,100.4,9.6,3.6,10.7 54 | 53,216.4,41.7,39.6,22.6 55 | 54,182.6,46.2,58.7,21.2 56 | 55,262.7,28.8,15.9,20.2 57 | 56,198.9,49.4,60,23.7 58 | 57,7.3,28.1,41.4,5.5 59 | 58,136.2,19.2,16.6,13.2 60 | 59,210.8,49.6,37.7,23.8 61 | 60,210.7,29.5,9.3,18.4 62 | 61,53.5,2,21.4,8.1 63 | 62,261.3,42.7,54.7,24.2 64 | 63,239.3,15.5,27.3,15.7 65 | 64,102.7,29.6,8.4,14 66 | 65,131.1,42.8,28.9,18 67 | 66,69,9.3,0.9,9.3 68 | 67,31.5,24.6,2.2,9.5 69 | 68,139.3,14.5,10.2,13.4 70 | 69,237.4,27.5,11,18.9 71 | 70,216.8,43.9,27.2,22.3 72 | 71,199.1,30.6,38.7,18.3 73 | 72,109.8,14.3,31.7,12.4 74 | 73,26.8,33,19.3,8.8 75 | 74,129.4,5.7,31.3,11 76 | 75,213.4,24.6,13.1,17 77 | 76,16.9,43.7,89.4,8.7 78 | 77,27.5,1.6,20.7,6.9 79 | 78,120.5,28.5,14.2,14.2 80 | 79,5.4,29.9,9.4,5.3 81 | 80,116,7.7,23.1,11 82 | 81,76.4,26.7,22.3,11.8 83 | 82,239.8,4.1,36.9,12.3 84 | 83,75.3,20.3,32.5,11.3 85 | 84,68.4,44.5,35.6,13.6 86 | 85,213.5,43,33.8,21.7 87 | 86,193.2,18.4,65.7,15.2 88 | 87,76.3,27.5,16,12 89 | 88,110.7,40.6,63.2,16 90 | 89,88.3,25.5,73.4,12.9 91 | 90,109.8,47.8,51.4,16.7 92 | 91,134.3,4.9,9.3,11.2 93 | 92,28.6,1.5,33,7.3 94 | 93,217.7,33.5,59,19.4 95 | 94,250.9,36.5,72.3,22.2 96 | 95,107.4,14,10.9,11.5 97 | 96,163.3,31.6,52.9,16.9 98 | 97,197.6,3.5,5.9,11.7 99 | 98,184.9,21,22,15.5 100 | 99,289.7,42.3,51.2,25.4 101 | 100,135.2,41.7,45.9,17.2 102 | 101,222.4,4.3,49.8,11.7 103 | 102,296.4,36.3,100.9,23.8 104 | 103,280.2,10.1,21.4,14.8 105 | 104,187.9,17.2,17.9,14.7 106 | 105,238.2,34.3,5.3,20.7 107 | 106,137.9,46.4,59,19.2 108 | 107,25,11,29.7,7.2 109 | 108,90.4,0.3,23.2,8.7 110 | 109,13.1,0.4,25.6,5.3 111 | 110,255.4,26.9,5.5,19.8 112 | 111,225.8,8.2,56.5,13.4 113 | 112,241.7,38,23.2,21.8 114 | 113,175.7,15.4,2.4,14.1 115 | 114,209.6,20.6,10.7,15.9 116 | 115,78.2,46.8,34.5,14.6 117 | 116,75.1,35,52.7,12.6 118 | 117,139.2,14.3,25.6,12.2 119 | 118,76.4,0.8,14.8,9.4 120 | 119,125.7,36.9,79.2,15.9 121 | 120,19.4,16,22.3,6.6 122 | 121,141.3,26.8,46.2,15.5 123 | 122,18.8,21.7,50.4,7 124 | 123,224,2.4,15.6,11.6 125 | 124,123.1,34.6,12.4,15.2 126 | 125,229.5,32.3,74.2,19.7 127 | 126,87.2,11.8,25.9,10.6 128 | 127,7.8,38.9,50.6,6.6 129 | 128,80.2,0,9.2,8.8 130 | 129,220.3,49,3.2,24.7 131 | 130,59.6,12,43.1,9.7 132 | 131,0.7,39.6,8.7,1.6 133 | 132,265.2,2.9,43,12.7 134 | 133,8.4,27.2,2.1,5.7 135 | 134,219.8,33.5,45.1,19.6 136 | 135,36.9,38.6,65.6,10.8 137 | 136,48.3,47,8.5,11.6 138 | 137,25.6,39,9.3,9.5 139 | 138,273.7,28.9,59.7,20.8 140 | 139,43,25.9,20.5,9.6 141 | 140,184.9,43.9,1.7,20.7 142 | 141,73.4,17,12.9,10.9 143 | 142,193.7,35.4,75.6,19.2 144 | 143,220.5,33.2,37.9,20.1 145 | 144,104.6,5.7,34.4,10.4 146 | 145,96.2,14.8,38.9,11.4 147 | 146,140.3,1.9,9,10.3 148 | 147,240.1,7.3,8.7,13.2 149 | 148,243.2,49,44.3,25.4 150 | 149,38,40.3,11.9,10.9 151 | 150,44.7,25.8,20.6,10.1 152 | 151,280.7,13.9,37,16.1 153 | 152,121,8.4,48.7,11.6 154 | 153,197.6,23.3,14.2,16.6 155 | 154,171.3,39.7,37.7,19 156 | 155,187.8,21.1,9.5,15.6 157 | 156,4.1,11.6,5.7,3.2 158 | 157,93.9,43.5,50.5,15.3 159 | 158,149.8,1.3,24.3,10.1 160 | 159,11.7,36.9,45.2,7.3 161 | 160,131.7,18.4,34.6,12.9 162 | 161,172.5,18.1,30.7,14.4 163 | 162,85.7,35.8,49.3,13.3 164 | 163,188.4,18.1,25.6,14.9 165 | 164,163.5,36.8,7.4,18 166 | 165,117.2,14.7,5.4,11.9 167 | 166,234.5,3.4,84.8,11.9 168 | 167,17.9,37.6,21.6,8 169 | 168,206.8,5.2,19.4,12.2 170 | 169,215.4,23.6,57.6,17.1 171 | 170,284.3,10.6,6.4,15 172 | 171,50,11.6,18.4,8.4 173 | 172,164.5,20.9,47.4,14.5 174 | 173,19.6,20.1,17,7.6 175 | 174,168.4,7.1,12.8,11.7 176 | 175,222.4,3.4,13.1,11.5 177 | 176,276.9,48.9,41.8,27 178 | 177,248.4,30.2,20.3,20.2 179 | 178,170.2,7.8,35.2,11.7 180 | 179,276.7,2.3,23.7,11.8 181 | 180,165.6,10,17.6,12.6 182 | 181,156.6,2.6,8.3,10.5 183 | 182,218.5,5.4,27.4,12.2 184 | 183,56.2,5.7,29.7,8.7 185 | 184,287.6,43,71.8,26.2 186 | 185,253.8,21.3,30,17.6 187 | 186,205,45.1,19.6,22.6 188 | 187,139.5,2.1,26.6,10.3 189 | 188,191.1,28.7,18.2,17.3 190 | 189,286,13.9,3.7,15.9 191 | 190,18.7,12.1,23.4,6.7 192 | 191,39.5,41.1,5.8,10.8 193 | 192,75.5,10.8,6,9.9 194 | 193,17.2,4.1,31.6,5.9 195 | 194,166.8,42,3.6,19.6 196 | 195,149.7,35.6,6,17.3 197 | 196,38.2,3.7,13.8,7.6 198 | 197,94.2,4.9,8.1,9.7 199 | 198,177,9.3,6.4,12.8 200 | 199,283.6,42,66.2,25.5 201 | 200,232.1,8.6,8.7,13.4 202 | -------------------------------------------------------------------------------- /Anomaly_Detection_using_Using_Gaussian_Mixture_Models.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Anomaly Detection using Using Gaussian Mixture Models.ipynb", 7 | "provenance": [], 8 | "toc_visible": true, 9 | "authorship_tag": "ABX9TyNX728rInjjG7OnOJLHXmfh", 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "name": "python3", 14 | "display_name": "Python 3" 15 | }, 16 | "language_info": { 17 | "name": "python" 18 | } 19 | }, 20 | "cells": [ 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "view-in-github", 25 | "colab_type": "text" 26 | }, 27 | "source": [ 28 | "\"Open" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "source": [ 34 | "# **Data Loading**" 35 | ], 36 | "metadata": { 37 | "id": "ozLwZwg3QgOG" 38 | } 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": null, 43 | "metadata": { 44 | "id": "UgIn8MqKOwhu" 45 | }, 46 | "outputs": [], 47 | "source": [ 48 | "from google.colab import drive\n", 49 | "drive.mount('/content/drive')" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "source": [ 55 | "#**Import libaray**" 56 | ], 57 | "metadata": { 58 | "id": "5Lusrw9PQ3g9" 59 | } 60 | }, 61 | { 62 | "cell_type": "code", 63 | "source": [ 64 | "import numpy as np\n", 65 | "import pandas as pd\n", 66 | "import matplotlib.pyplot as plt\n", 67 | "import seaborn as sb\n", 68 | "from scipy.io import loadmat\n", 69 | "%matplotlib inline" 70 | ], 71 | "metadata": { 72 | "id": "Rf85GGg1RAdU" 73 | }, 74 | "execution_count": null, 75 | "outputs": [] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "source": [ 80 | "# **Dataset**" 81 | ], 82 | "metadata": { 83 | "id": "g1F04UK1REQG" 84 | } 85 | }, 86 | { 87 | "cell_type": "code", 88 | "source": [ 89 | "# Number of samples per component\n", 90 | "n_samples = 500" 91 | ], 92 | "metadata": { 93 | "id": "9FTMOy8vROPo" 94 | }, 95 | "execution_count": null, 96 | "outputs": [] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "source": [ 101 | "# Generate random sample, two components\n", 102 | "import numpy as np\n", 103 | "# Generate random sample, two components\n", 104 | "np.random.seed(0)\n", 105 | "C = np.array([[0., -0.1], [1.7, .4]])\n", 106 | "C2 = np.array([[1., -0.1], [2.7, .2]])\n", 107 | "#X = np.r_[np.dot(np.random.randn(n_samples, 2), C)]\n", 108 | " #.7 * np.random.randn(n_samples, 2) + np.array([-6, 3])]\n", 109 | "X = np.r_[np.dot(np.random.randn(n_samples, 2), C),np.dot(np.random.randn(n_samples, 2), C2)]" 110 | ], 111 | "metadata": { 112 | "id": "NIdZm0CB6nuc" 113 | }, 114 | "execution_count": 4, 115 | "outputs": [] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "source": [ 120 | "# **Data Ploting**" 121 | ], 122 | "metadata": { 123 | "id": "k882mXlfRekE" 124 | } 125 | }, 126 | { 127 | "cell_type": "code", 128 | "source": [ 129 | "import matplotlib.pyplot as plt\n", 130 | "%matplotlib inline" 131 | ], 132 | "metadata": { 133 | "id": "iMvqCFDiRj0O" 134 | }, 135 | "execution_count": 7, 136 | "outputs": [] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "source": [ 141 | "X[-5:] = [[4,-1],[4.1,-1.1],[3.9,-1],[4.0,-1.2],[4.0,-1.3]]" 142 | ], 143 | "metadata": { 144 | "id": "ajQb_SjA7lyM" 145 | }, 146 | "execution_count": 8, 147 | "outputs": [] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "source": [ 152 | "plt.scatter(X[:,0], X[:,1],s=5)" 153 | ], 154 | "metadata": { 155 | "id": "Bnw4gYel7r8w" 156 | }, 157 | "execution_count": null, 158 | "outputs": [] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "source": [ 163 | "# **Model development**" 164 | ], 165 | "metadata": { 166 | "id": "JBcKxRPoRz1U" 167 | } 168 | }, 169 | { 170 | "cell_type": "code", 171 | "source": [ 172 | "from sklearn.mixture import GaussianMixture" 173 | ], 174 | "metadata": { 175 | "id": "Rmmy_9WTSDvG" 176 | }, 177 | "execution_count": 11, 178 | "outputs": [] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "source": [ 183 | "gmm = GaussianMixture(n_components=3)\n" 184 | ], 185 | "metadata": { 186 | "id": "7L8Duj5GSH42" 187 | }, 188 | "execution_count": 12, 189 | "outputs": [] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "source": [ 194 | "gmm.fit(X)" 195 | ], 196 | "metadata": { 197 | "id": "Zx3ahZXV9L9M", 198 | "outputId": "8b5237a5-fb9e-4eb0-cb5e-da87ae822929", 199 | "colab": { 200 | "base_uri": "https://localhost:8080/" 201 | } 202 | }, 203 | "execution_count": 13, 204 | "outputs": [ 205 | { 206 | "output_type": "execute_result", 207 | "data": { 208 | "text/plain": [ 209 | "GaussianMixture(n_components=3)" 210 | ] 211 | }, 212 | "metadata": {}, 213 | "execution_count": 13 214 | } 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "source": [ 220 | "pred = gmm.predict(X)" 221 | ], 222 | "metadata": { 223 | "id": "VaRONdmx9Qn6" 224 | }, 225 | "execution_count": 14, 226 | "outputs": [] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "source": [ 231 | "pred[:50]" 232 | ], 233 | "metadata": { 234 | "id": "9xpIlZmA9YIz", 235 | "outputId": "e430ddae-f5a8-4a57-8c39-230d812ad832", 236 | "colab": { 237 | "base_uri": "https://localhost:8080/" 238 | } 239 | }, 240 | "execution_count": 15, 241 | "outputs": [ 242 | { 243 | "output_type": "execute_result", 244 | "data": { 245 | "text/plain": [ 246 | "array([1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n", 247 | " 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n", 248 | " 2, 2, 2, 2, 2, 2])" 249 | ] 250 | }, 251 | "metadata": {}, 252 | "execution_count": 15 253 | } 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "source": [ 259 | "plt.scatter(X[:,0], X[:,1],s=10,c=pred)" 260 | ], 261 | "metadata": { 262 | "id": "_aStRaq29oSc" 263 | }, 264 | "execution_count": null, 265 | "outputs": [] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "source": [ 270 | "# **References**" 271 | ], 272 | "metadata": { 273 | "id": "husMyO00-Jxp" 274 | } 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "source": [ 279 | "[1-Anomaly Detection.ipynb](https://github.com/edyoda/data-science-complete-tutorial/blob/master/13.%20Anomaly%20Detection.ipynb)" 280 | ], 281 | "metadata": { 282 | "id": "PD3T4GsK-PB1" 283 | } 284 | } 285 | ] 286 | } -------------------------------------------------------------------------------- /DALL·E 2025-02-20 09.38.02 - An enhanced AI-themed GitHub repository banner with a futuristic dark blue and black background, incorporating glowing abstract neural network pattern.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/DALL·E 2025-02-20 09.38.02 - An enhanced AI-themed GitHub repository banner with a futuristic dark blue and black background, incorporating glowing abstract neural network pattern.webp -------------------------------------------------------------------------------- /Data Visualization/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Datasets/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Deep Learning library/FastAI_in_Machine_Learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "FastAI in Machine Learning", 7 | "provenance": [], 8 | "collapsed_sections": [], 9 | "toc_visible": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "language_info": { 16 | "name": "python" 17 | } 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "gDzSpvtHNZPb" 24 | }, 25 | "source": [ 26 | "# **Introduction**" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "id": "7PanCid7NoX9" 33 | }, 34 | "source": [ 35 | "FastAI is a Machine Learning library used for Deep Learning tasks. It helps by providing top-level components that can be easily used to achieve cutting edge results. In this article, I will walk you through a tutorial on FastAI in Machine Learning using Python." 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": { 41 | "id": "tAWclN5xNvSa" 42 | }, 43 | "source": [ 44 | "The FastAI library is created for two main goals:\n", 45 | "\n", 46 | "- to be approachable\n", 47 | "- rapidly productive\n", 48 | "\n" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": { 54 | "id": "bgfKKQUON9MK" 55 | }, 56 | "source": [ 57 | "FastAI aims to provide high-level components to researchers who use low-level components that can be used to build new approaches. The best part about this library is that it does all of this without substantial components in ease of use, flexibility, and performance." 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": { 63 | "id": "NS2_1L5qOImL" 64 | }, 65 | "source": [ 66 | "FastAI is built on Pytorch, NumPy, PIL, pandas, and a few other libraries. To achieve its goals, it does not aim to hide the lower levels of its foundation. Using this machine learning library, we can directly interact with the underlying PyTorch primitive models.\n", 67 | "\n", 68 | "By using the FastAI library in Machine Learning, we can easily build and train advanced neural network models using transfer learning with very few lines of code. In the section below, I’ll show you an example of this library." 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": { 74 | "id": "srGXt7ipON7a" 75 | }, 76 | "source": [ 77 | "In this section, I’ll walk you through an example of how to use the FastAI Machine Learning Library on a very popular task within the Machine Learning community that is about classifying dogs and cats." 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": { 83 | "id": "sSN_BlZ3OSvT" 84 | }, 85 | "source": [ 86 | "To use this library, you need to run these three commands below in your command prompt or terminal:" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "metadata": { 92 | "id": "qD7IiuE1OUL0" 93 | }, 94 | "source": [ 95 | "!pip install fastai\n", 96 | "!pip install fastbook --upgrade\n", 97 | "!pip install -Uqq fastbook" 98 | ], 99 | "execution_count": null, 100 | "outputs": [] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": { 105 | "id": "bkF7ZCndOovC" 106 | }, 107 | "source": [ 108 | "After executing the above commands we need to prepare the environment to work on this library, which we can easily do by importing the fastbook library and passing the setup_book() function:" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "metadata": { 114 | "colab": { 115 | "base_uri": "https://localhost:8080/" 116 | }, 117 | "id": "jr3xu6mmOsCV", 118 | "outputId": "5609f189-9ace-4bbb-f823-cd861d6b7018" 119 | }, 120 | "source": [ 121 | "import fastbook\n", 122 | "fastbook.setup_book()" 123 | ], 124 | "execution_count": 10, 125 | "outputs": [ 126 | { 127 | "output_type": "stream", 128 | "text": [ 129 | "Mounted at /content/gdrive\n" 130 | ], 131 | "name": "stdout" 132 | } 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": { 138 | "id": "4mosj6MlO1IB" 139 | }, 140 | "source": [ 141 | "Now let’s import the necessary libraries and the dataset that we need to work on in this tutorial:" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "metadata": { 147 | "colab": { 148 | "base_uri": "https://localhost:8080/", 149 | "height": 17 150 | }, 151 | "id": "_XSNLrm_O6hZ", 152 | "outputId": "d78baa28-6f79-4596-8616-b6ec0aa69608" 153 | }, 154 | "source": [ 155 | "from fastai.vision.all import *\n", 156 | "path = untar_data(URLs.PETS)" 157 | ], 158 | "execution_count": 11, 159 | "outputs": [ 160 | { 161 | "output_type": "display_data", 162 | "data": { 163 | "text/html": [ 164 | "" 165 | ], 166 | "text/plain": [ 167 | "" 168 | ] 169 | }, 170 | "metadata": { 171 | "tags": [] 172 | } 173 | } 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": { 179 | "id": "yNhIIIj5O_iq" 180 | }, 181 | "source": [ 182 | "In FastAI, untar_data is a very powerful convenience function to download files from a URL. We are using the PETS dataset here which includes 37 categories of pets with roughly around 200 images of each class. Now let’s determine the labels:" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "metadata": { 188 | "id": "bEIFujpKPPFQ" 189 | }, 190 | "source": [ 191 | "def is_cat(x):\n", 192 | " return x[0].isupper()" 193 | ], 194 | "execution_count": 12, 195 | "outputs": [] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": { 200 | "id": "qFDQ35bdPT2Y" 201 | }, 202 | "source": [ 203 | "Now I will use the ImageDataLoader function which raps around several data loaders for the problems of computer vision:" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "metadata": { 209 | "id": "7T4AzbKaPZm4" 210 | }, 211 | "source": [ 212 | "dls = ImageDataLoaders.from_name_func(\n", 213 | " path,\n", 214 | " get_image_files(path),\n", 215 | " valid_pct = 0.2,\n", 216 | " seed = 42,\n", 217 | " label_func = is_cat,\n", 218 | " item_tfms = Resize(224)\n", 219 | ")" 220 | ], 221 | "execution_count": 13, 222 | "outputs": [] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": { 227 | "id": "VAXSqaU7Pi7p" 228 | }, 229 | "source": [ 230 | "**Final Step: Making Predictions**\n" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": { 236 | "id": "gWHm8lNLPnDY" 237 | }, 238 | "source": [ 239 | "Now let’s train the model and make predictions:\n", 240 | "\n" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "metadata": { 246 | "id": "w5A80ZU_PlLR" 247 | }, 248 | "source": [ 249 | "learn =cnn_learner(dls,\n", 250 | " resnet34,\n", 251 | " metrics = error_rate)\n", 252 | "\n", 253 | "learn.fine_tune(1)\n", 254 | "import ipywidgets as widgets\n", 255 | "\n", 256 | "uploader = widgets.FileUpload()\n", 257 | "uploader\n", 258 | "def pred():\n", 259 | " img = PILImage.create(uploader.data[0])\n", 260 | " img.show()\n", 261 | "\n", 262 | " #Make Prediction\n", 263 | " is_cat,_,probs = learn.predict(img)\n", 264 | "\n", 265 | " print(f\"Image is of a Cat: {is_cat}.\")\n", 266 | " print(f\"Probability image is a cat: {probs[1].item():.6f}\")\n", 267 | " pred()" 268 | ], 269 | "execution_count": null, 270 | "outputs": [] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": { 275 | "id": "qOG-HMzONLd1" 276 | }, 277 | "source": [ 278 | "# **References**" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": { 284 | "id": "ljFIXtDbNPtz" 285 | }, 286 | "source": [ 287 | "[FastAI in Machine Learning](https://thecleverprogrammer.com/2021/01/22/fastai-in-machine-learning/)" 288 | ] 289 | } 290 | ] 291 | } -------------------------------------------------------------------------------- /Deep Learning library/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Distance_Measure_.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Distance Measure .ipynb", 7 | "provenance": [], 8 | "toc_visible": true, 9 | "authorship_tag": "ABX9TyNuKUxrnYYZ7AoByDw9rqRO", 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "name": "python3", 14 | "display_name": "Python 3" 15 | }, 16 | "language_info": { 17 | "name": "python" 18 | } 19 | }, 20 | "cells": [ 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "view-in-github", 25 | "colab_type": "text" 26 | }, 27 | "source": [ 28 | "\"Open" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "source": [ 34 | "#**Euclidean Distance**" 35 | ], 36 | "metadata": { 37 | "id": "yf4g4m_Im8F1" 38 | } 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "source": [ 43 | "## **Using the SciPy library**" 44 | ], 45 | "metadata": { 46 | "id": "WCaSZlQymsn5" 47 | } 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": { 53 | "id": "h3_WjqkGmiTy" 54 | }, 55 | "outputs": [], 56 | "source": [ 57 | "from scipy.spatial import distance\n", 58 | "A = (5, 3)\n", 59 | "B = (2, 4)\n", 60 | "d = distance.euclidean(A, B)\n", 61 | "print('Euclidean Distance:',d)\n" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "source": [ 67 | "## **Using NumPy Library**" 68 | ], 69 | "metadata": { 70 | "id": "0-EPqZKNnJK6" 71 | } 72 | }, 73 | { 74 | "cell_type": "code", 75 | "source": [ 76 | "import numpy as np\n", 77 | "A = np.array((5, 3))\n", 78 | "B = np.array((2, 4))\n", 79 | "d = np.linalg.norm(A-B)\n", 80 | "print(\"Euclidean Distance: \",d)" 81 | ], 82 | "metadata": { 83 | "id": "sxFTX13JnPwC" 84 | }, 85 | "execution_count": null, 86 | "outputs": [] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "source": [ 91 | "# **Manhattan Distance**" 92 | ], 93 | "metadata": { 94 | "id": "WLxylSminTcE" 95 | } 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "source": [ 100 | "**Manhattan Distance using Python**\n" 101 | ], 102 | "metadata": { 103 | "id": "TRsCTYYZncFO" 104 | } 105 | }, 106 | { 107 | "cell_type": "code", 108 | "source": [ 109 | "from scipy.spatial import distance\n", 110 | "A = (5, 3)\n", 111 | "B = (2, 4)\n", 112 | "d = distance.cityblock(A, B)\n", 113 | "print('Manhattan Distance:',d)" 114 | ], 115 | "metadata": { 116 | "id": "OjZy5ym6neYk" 117 | }, 118 | "execution_count": null, 119 | "outputs": [] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "source": [ 124 | "#**Cosine Distance**" 125 | ], 126 | "metadata": { 127 | "id": "mn0JXkoOnma0" 128 | } 129 | }, 130 | { 131 | "cell_type": "code", 132 | "source": [ 133 | "from scipy.spatial import distance\n", 134 | "A = (5, 3)\n", 135 | "B = (2, 4)\n", 136 | "d = 1 - distance.cosine(A, B)\n", 137 | "print('Cosine Distance:',d)\n" 138 | ], 139 | "metadata": { 140 | "colab": { 141 | "base_uri": "https://localhost:8080/" 142 | }, 143 | "id": "EyiWTN0enyYR", 144 | "outputId": "bc656dc3-72fb-42c1-f449-b2f8addcaf96" 145 | }, 146 | "execution_count": 5, 147 | "outputs": [ 148 | { 149 | "output_type": "stream", 150 | "name": "stdout", 151 | "text": [ 152 | "Cosine Distance: 0.8436614877321075\n" 153 | ] 154 | } 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "source": [ 160 | "# **References**" 161 | ], 162 | "metadata": { 163 | "id": "ts_op__5n1jZ" 164 | } 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "source": [ 169 | "[Measure Distance between data points in Machine Learning](https://deepblade.com/measure-distance-between-data-points-in-machine-learning/)" 170 | ], 171 | "metadata": { 172 | "id": "n6KuV9W1o7rw" 173 | } 174 | } 175 | ] 176 | } -------------------------------------------------------------------------------- /Feature Selection/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /KNN_with_Python_.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "KNN with Python .ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [], 9 | "toc_visible": true, 10 | "authorship_tag": "ABX9TyP5F/MUuMECwDqfqT/hWEMn", 11 | "include_colab_link": true 12 | }, 13 | "kernelspec": { 14 | "name": "python3", 15 | "display_name": "Python 3" 16 | }, 17 | "language_info": { 18 | "name": "python" 19 | } 20 | }, 21 | "cells": [ 22 | { 23 | "cell_type": "markdown", 24 | "metadata": { 25 | "id": "view-in-github", 26 | "colab_type": "text" 27 | }, 28 | "source": [ 29 | "\"Open" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "source": [ 35 | "# **Dataset**" 36 | ], 37 | "metadata": { 38 | "id": "2EE6EUa9CgX9" 39 | } 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 1, 44 | "metadata": { 45 | "id": "3xvG_5WCB0xW" 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "# read in the iris data\n", 50 | "from sklearn.datasets import load_iris\n", 51 | "iris = load_iris()\n", 52 | "\n", 53 | "# create X (features) and y (response)\n", 54 | "X = iris.data\n", 55 | "y = iris.target" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "source": [ 61 | "# **KNN (K=5)**\n" 62 | ], 63 | "metadata": { 64 | "id": "edWE1NORCzEM" 65 | } 66 | }, 67 | { 68 | "cell_type": "code", 69 | "source": [ 70 | "from sklearn.neighbors import KNeighborsClassifier\n", 71 | "from sklearn import metrics\n", 72 | "knn = KNeighborsClassifier(n_neighbors=5)\n", 73 | "knn.fit(X, y)\n", 74 | "y_pred = knn.predict(X)\n", 75 | "print(metrics.accuracy_score(y, y_pred))" 76 | ], 77 | "metadata": { 78 | "id": "zUWFag3LC29s" 79 | }, 80 | "execution_count": null, 81 | "outputs": [] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "source": [ 86 | "# **KNN (K=1)**\n" 87 | ], 88 | "metadata": { 89 | "id": "1iiDPsmkDNcK" 90 | } 91 | }, 92 | { 93 | "cell_type": "code", 94 | "source": [ 95 | "knn = KNeighborsClassifier(n_neighbors=1)\n", 96 | "knn.fit(X, y)\n", 97 | "y_pred = knn.predict(X)\n", 98 | "print(metrics.accuracy_score(y, y_pred))" 99 | ], 100 | "metadata": { 101 | "colab": { 102 | "base_uri": "https://localhost:8080/" 103 | }, 104 | "id": "U-nSWOTvDR96", 105 | "outputId": "06264400-522e-44c8-c0c5-c05fe88f9052" 106 | }, 107 | "execution_count": 4, 108 | "outputs": [ 109 | { 110 | "output_type": "stream", 111 | "name": "stdout", 112 | "text": [ 113 | "1.0\n" 114 | ] 115 | } 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "source": [ 121 | "#**Evaluation procedure #2: Train/test split**\n" 122 | ], 123 | "metadata": { 124 | "id": "gZFVYcunDf16" 125 | } 126 | }, 127 | { 128 | "cell_type": "code", 129 | "source": [ 130 | "# print the shapes of X and y\n", 131 | "print(X.shape)\n", 132 | "print(y.shape)" 133 | ], 134 | "metadata": { 135 | "colab": { 136 | "base_uri": "https://localhost:8080/" 137 | }, 138 | "id": "MMlebVPgDq4j", 139 | "outputId": "7d4efdfd-3afa-474d-d494-ee916a0d887b" 140 | }, 141 | "execution_count": 5, 142 | "outputs": [ 143 | { 144 | "output_type": "stream", 145 | "name": "stdout", 146 | "text": [ 147 | "(150, 4)\n", 148 | "(150,)\n" 149 | ] 150 | } 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "source": [ 156 | "# STEP 1: split X and y into training and testing sets\n", 157 | "from sklearn.model_selection import train_test_split\n", 158 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=4)" 159 | ], 160 | "metadata": { 161 | "id": "UbBF5af_F9fM" 162 | }, 163 | "execution_count": 6, 164 | "outputs": [] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "source": [ 169 | "# print the shapes of the new X objects\n", 170 | "print(X_train.shape)\n", 171 | "print(X_test.shape)" 172 | ], 173 | "metadata": { 174 | "colab": { 175 | "base_uri": "https://localhost:8080/" 176 | }, 177 | "id": "5ue1TOw_GDpY", 178 | "outputId": "3c8dfdc9-963e-462d-9222-055679529f83" 179 | }, 180 | "execution_count": 7, 181 | "outputs": [ 182 | { 183 | "output_type": "stream", 184 | "name": "stdout", 185 | "text": [ 186 | "(90, 4)\n", 187 | "(60, 4)\n" 188 | ] 189 | } 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "source": [ 195 | "# print the shapes of the new y objects\n", 196 | "print(y_train.shape)\n", 197 | "print(y_test.shape)" 198 | ], 199 | "metadata": { 200 | "id": "DyrZNBH5GHwI" 201 | }, 202 | "execution_count": null, 203 | "outputs": [] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "source": [ 208 | "**Repeat for KNN with K=5:**\n", 209 | "\n" 210 | ], 211 | "metadata": { 212 | "id": "WVrAozv6GO0Z" 213 | } 214 | }, 215 | { 216 | "cell_type": "code", 217 | "source": [ 218 | "knn = KNeighborsClassifier(n_neighbors=5)\n", 219 | "knn.fit(X_train, y_train)\n", 220 | "y_pred = knn.predict(X_test)\n", 221 | "print(metrics.accuracy_score(y_test, y_pred))" 222 | ], 223 | "metadata": { 224 | "colab": { 225 | "base_uri": "https://localhost:8080/" 226 | }, 227 | "id": "B1BXJmzJGRTR", 228 | "outputId": "f57550a9-6e05-40bd-c7cc-3d987cae7780" 229 | }, 230 | "execution_count": 9, 231 | "outputs": [ 232 | { 233 | "output_type": "stream", 234 | "name": "stdout", 235 | "text": [ 236 | "0.9666666666666667\n" 237 | ] 238 | } 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "source": [ 244 | "#**Repeat for KNN with K=1**\n", 245 | "\n" 246 | ], 247 | "metadata": { 248 | "id": "BkVRVhTtGZHu" 249 | } 250 | }, 251 | { 252 | "cell_type": "code", 253 | "source": [ 254 | "knn = KNeighborsClassifier(n_neighbors=1)\n", 255 | "knn.fit(X_train, y_train)\n", 256 | "y_pred = knn.predict(X_test)\n", 257 | "print(metrics.accuracy_score(y_test, y_pred))" 258 | ], 259 | "metadata": { 260 | "id": "bCyJ_HhzGkuC" 261 | }, 262 | "execution_count": null, 263 | "outputs": [] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "source": [ 268 | "# **Can we locate an even better value for K?**\n", 269 | "\n" 270 | ], 271 | "metadata": { 272 | "id": "jZlQPqiNGsJn" 273 | } 274 | }, 275 | { 276 | "cell_type": "code", 277 | "source": [ 278 | "# try K=1 through K=25 and record testing accuracy\n", 279 | "k_range = list(range(1, 26))\n", 280 | "scores = []\n", 281 | "for k in k_range:\n", 282 | " knn = KNeighborsClassifier(n_neighbors=k)\n", 283 | " knn.fit(X_train, y_train)\n", 284 | " y_pred = knn.predict(X_test)\n", 285 | " scores.append(metrics.accuracy_score(y_test, y_pred))" 286 | ], 287 | "metadata": { 288 | "id": "TnvErmrEGxDY" 289 | }, 290 | "execution_count": 12, 291 | "outputs": [] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "source": [ 296 | "# import Matplotlib (scientific plotting library)\n", 297 | "import matplotlib.pyplot as plt\n", 298 | "\n", 299 | "# allow plots to appear within the notebook\n", 300 | "%matplotlib inline\n", 301 | "\n", 302 | "# plot the relationship between K and testing accuracy\n", 303 | "plt.plot(k_range, scores)\n", 304 | "plt.xlabel('Value of K for KNN')\n", 305 | "plt.ylabel('Testing Accuracy')" 306 | ], 307 | "metadata": { 308 | "id": "DzRYpsiAG86g" 309 | }, 310 | "execution_count": null, 311 | "outputs": [] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "source": [ 316 | "# **References**\n" 317 | ], 318 | "metadata": { 319 | "id": "bqzF4sMlG_yX" 320 | } 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "source": [ 325 | "[1-Comparing Machine Learning models in scikit-learn](https://github.com/justmarkham/scikit-learn-videos/blob/master/05_model_evaluation.ipynb)" 326 | ], 327 | "metadata": { 328 | "id": "N_FzM6jhHFlR" 329 | } 330 | } 331 | ] 332 | } -------------------------------------------------------------------------------- /ML(Andrew)/4-Linear Regression with Multiple Variables/Readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /ML(Andrew)/4-Linear Regression with Multiple Variables/ex1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/4-Linear Regression with Multiple Variables/ex1.pdf -------------------------------------------------------------------------------- /ML(Andrew)/4-Linear Regression with Multiple Variables/ex1data1.txt: -------------------------------------------------------------------------------- 1 | 6.1101,17.592 2 | 5.5277,9.1302 3 | 8.5186,13.662 4 | 7.0032,11.854 5 | 5.8598,6.8233 6 | 8.3829,11.886 7 | 7.4764,4.3483 8 | 8.5781,12 9 | 6.4862,6.5987 10 | 5.0546,3.8166 11 | 5.7107,3.2522 12 | 14.164,15.505 13 | 5.734,3.1551 14 | 8.4084,7.2258 15 | 5.6407,0.71618 16 | 5.3794,3.5129 17 | 6.3654,5.3048 18 | 5.1301,0.56077 19 | 6.4296,3.6518 20 | 7.0708,5.3893 21 | 6.1891,3.1386 22 | 20.27,21.767 23 | 5.4901,4.263 24 | 6.3261,5.1875 25 | 5.5649,3.0825 26 | 18.945,22.638 27 | 12.828,13.501 28 | 10.957,7.0467 29 | 13.176,14.692 30 | 22.203,24.147 31 | 5.2524,-1.22 32 | 6.5894,5.9966 33 | 9.2482,12.134 34 | 5.8918,1.8495 35 | 8.2111,6.5426 36 | 7.9334,4.5623 37 | 8.0959,4.1164 38 | 5.6063,3.3928 39 | 12.836,10.117 40 | 6.3534,5.4974 41 | 5.4069,0.55657 42 | 6.8825,3.9115 43 | 11.708,5.3854 44 | 5.7737,2.4406 45 | 7.8247,6.7318 46 | 7.0931,1.0463 47 | 5.0702,5.1337 48 | 5.8014,1.844 49 | 11.7,8.0043 50 | 5.5416,1.0179 51 | 7.5402,6.7504 52 | 5.3077,1.8396 53 | 7.4239,4.2885 54 | 7.6031,4.9981 55 | 6.3328,1.4233 56 | 6.3589,-1.4211 57 | 6.2742,2.4756 58 | 5.6397,4.6042 59 | 9.3102,3.9624 60 | 9.4536,5.4141 61 | 8.8254,5.1694 62 | 5.1793,-0.74279 63 | 21.279,17.929 64 | 14.908,12.054 65 | 18.959,17.054 66 | 7.2182,4.8852 67 | 8.2951,5.7442 68 | 10.236,7.7754 69 | 5.4994,1.0173 70 | 20.341,20.992 71 | 10.136,6.6799 72 | 7.3345,4.0259 73 | 6.0062,1.2784 74 | 7.2259,3.3411 75 | 5.0269,-2.6807 76 | 6.5479,0.29678 77 | 7.5386,3.8845 78 | 5.0365,5.7014 79 | 10.274,6.7526 80 | 5.1077,2.0576 81 | 5.7292,0.47953 82 | 5.1884,0.20421 83 | 6.3557,0.67861 84 | 9.7687,7.5435 85 | 6.5159,5.3436 86 | 8.5172,4.2415 87 | 9.1802,6.7981 88 | 6.002,0.92695 89 | 5.5204,0.152 90 | 5.0594,2.8214 91 | 5.7077,1.8451 92 | 7.6366,4.2959 93 | 5.8707,7.2029 94 | 5.3054,1.9869 95 | 8.2934,0.14454 96 | 13.394,9.0551 97 | 5.4369,0.61705 98 | -------------------------------------------------------------------------------- /ML(Andrew)/4-Linear Regression with Multiple Variables/ex1data2.txt: -------------------------------------------------------------------------------- 1 | 2104,3,399900 2 | 1600,3,329900 3 | 2400,3,369000 4 | 1416,2,232000 5 | 3000,4,539900 6 | 1985,4,299900 7 | 1534,3,314900 8 | 1427,3,198999 9 | 1380,3,212000 10 | 1494,3,242500 11 | 1940,4,239999 12 | 2000,3,347000 13 | 1890,3,329999 14 | 4478,5,699900 15 | 1268,3,259900 16 | 2300,4,449900 17 | 1320,2,299900 18 | 1236,3,199900 19 | 2609,4,499998 20 | 3031,4,599000 21 | 1767,3,252900 22 | 1888,2,255000 23 | 1604,3,242900 24 | 1962,4,259900 25 | 3890,3,573900 26 | 1100,3,249900 27 | 1458,3,464500 28 | 2526,3,469000 29 | 2200,3,475000 30 | 2637,3,299900 31 | 1839,2,349900 32 | 1000,1,169900 33 | 2040,4,314900 34 | 3137,3,579900 35 | 1811,4,285900 36 | 1437,3,249900 37 | 1239,3,229900 38 | 2132,4,345000 39 | 4215,4,549000 40 | 2162,4,287000 41 | 1664,2,368500 42 | 2238,3,329900 43 | 2567,4,314000 44 | 1200,3,299000 45 | 852,2,179900 46 | 1852,4,299900 47 | 1203,3,239500 48 | -------------------------------------------------------------------------------- /ML(Andrew)/4-Linear Regression with Multiple Variables/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import sys 3 | sys.path.append('..') 4 | 5 | from submission import SubmissionBase 6 | 7 | 8 | class Grader(SubmissionBase): 9 | X1 = np.column_stack((np.ones(20), np.exp(1) + np.exp(2) * np.linspace(0.1, 2, 20))) 10 | Y1 = X1[:, 1] + np.sin(X1[:, 0]) + np.cos(X1[:, 1]) 11 | X2 = np.column_stack((X1, X1[:, 1]**0.5, X1[:, 1]**0.25)) 12 | Y2 = np.power(Y1, 0.5) + Y1 13 | 14 | def __init__(self): 15 | part_names = ['Warm up exercise', 16 | 'Computing Cost (for one variable)', 17 | 'Gradient Descent (for one variable)', 18 | 'Feature Normalization', 19 | 'Computing Cost (for multiple variables)', 20 | 'Gradient Descent (for multiple variables)', 21 | 'Normal Equations'] 22 | super().__init__('linear-regression', part_names) 23 | 24 | def __iter__(self): 25 | for part_id in range(1, 8): 26 | try: 27 | func = self.functions[part_id] 28 | 29 | # Each part has different expected arguments/different function 30 | if part_id == 1: 31 | res = func() 32 | elif part_id == 2: 33 | res = func(self.X1, self.Y1, np.array([0.5, -0.5])) 34 | elif part_id == 3: 35 | res = func(self.X1, self.Y1, np.array([0.5, -0.5]), 0.01, 10) 36 | elif part_id == 4: 37 | res = func(self.X2[:, 1:4]) 38 | elif part_id == 5: 39 | res = func(self.X2, self.Y2, np.array([0.1, 0.2, 0.3, 0.4])) 40 | elif part_id == 6: 41 | res = func(self.X2, self.Y2, np.array([-0.1, -0.2, -0.3, -0.4]), 0.01, 10) 42 | elif part_id == 7: 43 | res = func(self.X2, self.Y2) 44 | else: 45 | raise KeyError 46 | yield part_id, res 47 | except KeyError: 48 | yield part_id, 0 49 | -------------------------------------------------------------------------------- /ML(Andrew)/5-Logistic Regression (LR)/ex3data1.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/5-Logistic Regression (LR)/ex3data1.mat -------------------------------------------------------------------------------- /ML(Andrew)/5-Logistic Regression (LR)/ex3weights.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/5-Logistic Regression (LR)/ex3weights.mat -------------------------------------------------------------------------------- /ML(Andrew)/5-Logistic Regression (LR)/neuralnetwork.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/5-Logistic Regression (LR)/neuralnetwork.png -------------------------------------------------------------------------------- /ML(Andrew)/5-Logistic Regression (LR)/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /ML(Andrew)/5-Logistic Regression (LR)/token.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/5-Logistic Regression (LR)/token.pkl -------------------------------------------------------------------------------- /ML(Andrew)/5-Logistic Regression (LR)/utils.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import numpy as np 3 | from matplotlib import pyplot 4 | 5 | sys.path.append('..') 6 | from submission import SubmissionBase 7 | 8 | 9 | def displayData(X, example_width=None, figsize=(10, 10)): 10 | """ 11 | Displays 2D data stored in X in a nice grid. 12 | """ 13 | # Compute rows, cols 14 | if X.ndim == 2: 15 | m, n = X.shape 16 | elif X.ndim == 1: 17 | n = X.size 18 | m = 1 19 | X = X[None] # Promote to a 2 dimensional array 20 | else: 21 | raise IndexError('Input X should be 1 or 2 dimensional.') 22 | 23 | example_width = example_width or int(np.round(np.sqrt(n))) 24 | example_height = n / example_width 25 | 26 | # Compute number of items to display 27 | display_rows = int(np.floor(np.sqrt(m))) 28 | display_cols = int(np.ceil(m / display_rows)) 29 | 30 | fig, ax_array = pyplot.subplots(display_rows, display_cols, figsize=figsize) 31 | fig.subplots_adjust(wspace=0.025, hspace=0.025) 32 | 33 | ax_array = [ax_array] if m == 1 else ax_array.ravel() 34 | 35 | for i, ax in enumerate(ax_array): 36 | ax.imshow(X[i].reshape(example_width, example_width, order='F'), 37 | cmap='Greys', extent=[0, 1, 0, 1]) 38 | ax.axis('off') 39 | 40 | 41 | def sigmoid(z): 42 | """ 43 | Computes the sigmoid of z. 44 | """ 45 | return 1.0 / (1.0 + np.exp(-z)) 46 | 47 | 48 | class Grader(SubmissionBase): 49 | # Random Test Cases 50 | X = np.stack([np.ones(20), 51 | np.exp(1) * np.sin(np.arange(1, 21)), 52 | np.exp(0.5) * np.cos(np.arange(1, 21))], axis=1) 53 | 54 | y = (np.sin(X[:, 0] + X[:, 1]) > 0).astype(float) 55 | 56 | Xm = np.array([[-1, -1], 57 | [-1, -2], 58 | [-2, -1], 59 | [-2, -2], 60 | [1, 1], 61 | [1, 2], 62 | [2, 1], 63 | [2, 2], 64 | [-1, 1], 65 | [-1, 2], 66 | [-2, 1], 67 | [-2, 2], 68 | [1, -1], 69 | [1, -2], 70 | [-2, -1], 71 | [-2, -2]]) 72 | ym = np.array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]) 73 | 74 | t1 = np.sin(np.reshape(np.arange(1, 25, 2), (4, 3), order='F')) 75 | t2 = np.cos(np.reshape(np.arange(1, 41, 2), (4, 5), order='F')) 76 | 77 | def __init__(self): 78 | part_names = ['Regularized Logistic Regression', 79 | 'One-vs-All Classifier Training', 80 | 'One-vs-All Classifier Prediction', 81 | 'Neural Network Prediction Function'] 82 | 83 | super().__init__('multi-class-classification-and-neural-networks', part_names) 84 | 85 | def __iter__(self): 86 | for part_id in range(1, 5): 87 | try: 88 | func = self.functions[part_id] 89 | 90 | # Each part has different expected arguments/different function 91 | if part_id == 1: 92 | res = func(np.array([0.25, 0.5, -0.5]), self.X, self.y, 0.1) 93 | res = np.hstack(res).tolist() 94 | elif part_id == 2: 95 | res = func(self.Xm, self.ym, 4, 0.1) 96 | elif part_id == 3: 97 | res = func(self.t1, self.Xm) + 1 98 | elif part_id == 4: 99 | res = func(self.t1, self.t2, self.Xm) + 1 100 | else: 101 | raise KeyError 102 | yield part_id, res 103 | except KeyError: 104 | yield part_id, 0 105 | -------------------------------------------------------------------------------- /ML(Andrew)/Neural Networks: Representation/ex4-backpropagation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/Neural Networks: Representation/ex4-backpropagation.png -------------------------------------------------------------------------------- /ML(Andrew)/Neural Networks: Representation/ex4data1.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/Neural Networks: Representation/ex4data1.mat -------------------------------------------------------------------------------- /ML(Andrew)/Neural Networks: Representation/ex4weights.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/Neural Networks: Representation/ex4weights.mat -------------------------------------------------------------------------------- /ML(Andrew)/Neural Networks: Representation/neural_network.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/ML(Andrew)/Neural Networks: Representation/neural_network.png -------------------------------------------------------------------------------- /ML(Andrew)/Neural Networks: Representation/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /ML(Andrew)/Neural Networks: Representation/utils.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import numpy as np 3 | from matplotlib import pyplot 4 | 5 | sys.path.append('..') 6 | from submission import SubmissionBase 7 | 8 | 9 | def displayData(X, example_width=None, figsize=(10, 10)): 10 | """ 11 | Displays 2D data stored in X in a nice grid. 12 | """ 13 | # Compute rows, cols 14 | if X.ndim == 2: 15 | m, n = X.shape 16 | elif X.ndim == 1: 17 | n = X.size 18 | m = 1 19 | X = X[None] # Promote to a 2 dimensional array 20 | else: 21 | raise IndexError('Input X should be 1 or 2 dimensional.') 22 | 23 | example_width = example_width or int(np.round(np.sqrt(n))) 24 | example_height = n / example_width 25 | 26 | # Compute number of items to display 27 | display_rows = int(np.floor(np.sqrt(m))) 28 | display_cols = int(np.ceil(m / display_rows)) 29 | 30 | fig, ax_array = pyplot.subplots(display_rows, display_cols, figsize=figsize) 31 | fig.subplots_adjust(wspace=0.025, hspace=0.025) 32 | 33 | ax_array = [ax_array] if m == 1 else ax_array.ravel() 34 | 35 | for i, ax in enumerate(ax_array): 36 | # Display Image 37 | h = ax.imshow(X[i].reshape(example_width, example_width, order='F'), 38 | cmap='Greys', extent=[0, 1, 0, 1]) 39 | ax.axis('off') 40 | 41 | 42 | def predict(Theta1, Theta2, X): 43 | """ 44 | Predict the label of an input given a trained neural network 45 | Outputs the predicted label of X given the trained weights of a neural 46 | network(Theta1, Theta2) 47 | """ 48 | # Useful values 49 | m = X.shape[0] 50 | num_labels = Theta2.shape[0] 51 | 52 | # You need to return the following variables correctly 53 | p = np.zeros(m) 54 | h1 = sigmoid(np.dot(np.concatenate([np.ones((m, 1)), X], axis=1), Theta1.T)) 55 | h2 = sigmoid(np.dot(np.concatenate([np.ones((m, 1)), h1], axis=1), Theta2.T)) 56 | p = np.argmax(h2, axis=1) 57 | return p 58 | 59 | 60 | def debugInitializeWeights(fan_out, fan_in): 61 | """ 62 | Initialize the weights of a layer with fan_in incoming connections and fan_out outgoings 63 | connections using a fixed strategy. This will help you later in debugging. 64 | 65 | Note that W should be set a matrix of size (1+fan_in, fan_out) as the first row of W handles 66 | the "bias" terms. 67 | 68 | Parameters 69 | ---------- 70 | fan_out : int 71 | The number of outgoing connections. 72 | 73 | fan_in : int 74 | The number of incoming connections. 75 | 76 | Returns 77 | ------- 78 | W : array_like (1+fan_in, fan_out) 79 | The initialized weights array given the dimensions. 80 | """ 81 | # Initialize W using "sin". This ensures that W is always of the same values and will be 82 | # useful for debugging 83 | W = np.sin(np.arange(1, 1 + (1+fan_in)*fan_out))/10.0 84 | W = W.reshape(fan_out, 1+fan_in, order='F') 85 | return W 86 | 87 | 88 | def computeNumericalGradient(J, theta, e=1e-4): 89 | """ 90 | Computes the gradient using "finite differences" and gives us a numerical estimate of the 91 | gradient. 92 | 93 | Parameters 94 | ---------- 95 | J : func 96 | The cost function which will be used to estimate its numerical gradient. 97 | 98 | theta : array_like 99 | The one dimensional unrolled network parameters. The numerical gradient is computed at 100 | those given parameters. 101 | 102 | e : float (optional) 103 | The value to use for epsilon for computing the finite difference. 104 | 105 | Notes 106 | ----- 107 | The following code implements numerical gradient checking, and 108 | returns the numerical gradient. It sets `numgrad[i]` to (a numerical 109 | approximation of) the partial derivative of J with respect to the 110 | i-th input argument, evaluated at theta. (i.e., `numgrad[i]` should 111 | be the (approximately) the partial derivative of J with respect 112 | to theta[i].) 113 | """ 114 | numgrad = np.zeros(theta.shape) 115 | perturb = np.diag(e * np.ones(theta.shape)) 116 | for i in range(theta.size): 117 | loss1, _ = J(theta - perturb[:, i]) 118 | loss2, _ = J(theta + perturb[:, i]) 119 | numgrad[i] = (loss2 - loss1)/(2*e) 120 | return numgrad 121 | 122 | 123 | def checkNNGradients(nnCostFunction, lambda_=0): 124 | """ 125 | Creates a small neural network to check the backpropagation gradients. It will output the 126 | analytical gradients produced by your backprop code and the numerical gradients 127 | (computed using computeNumericalGradient). These two gradient computations should result in 128 | very similar values. 129 | 130 | Parameters 131 | ---------- 132 | nnCostFunction : func 133 | A reference to the cost function implemented by the student. 134 | 135 | lambda_ : float (optional) 136 | The regularization parameter value. 137 | """ 138 | input_layer_size = 3 139 | hidden_layer_size = 5 140 | num_labels = 3 141 | m = 5 142 | 143 | # We generate some 'random' test data 144 | Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size) 145 | Theta2 = debugInitializeWeights(num_labels, hidden_layer_size) 146 | 147 | # Reusing debugInitializeWeights to generate X 148 | X = debugInitializeWeights(m, input_layer_size - 1) 149 | y = np.arange(1, 1+m) % num_labels 150 | # print(y) 151 | # Unroll parameters 152 | nn_params = np.concatenate([Theta1.ravel(), Theta2.ravel()]) 153 | 154 | # short hand for cost function 155 | costFunc = lambda p: nnCostFunction(p, input_layer_size, hidden_layer_size, 156 | num_labels, X, y, lambda_) 157 | cost, grad = costFunc(nn_params) 158 | numgrad = computeNumericalGradient(costFunc, nn_params) 159 | 160 | # Visually examine the two gradient computations.The two columns you get should be very similar. 161 | print(np.stack([numgrad, grad], axis=1)) 162 | print('The above two columns you get should be very similar.') 163 | print('(Left-Your Numerical Gradient, Right-Analytical Gradient)\n') 164 | 165 | # Evaluate the norm of the difference between two the solutions. If you have a correct 166 | # implementation, and assuming you used e = 0.0001 in computeNumericalGradient, then diff 167 | # should be less than 1e-9. 168 | diff = np.linalg.norm(numgrad - grad)/np.linalg.norm(numgrad + grad) 169 | 170 | print('If your backpropagation implementation is correct, then \n' 171 | 'the relative difference will be small (less than 1e-9). \n' 172 | 'Relative Difference: %g' % diff) 173 | 174 | 175 | def sigmoid(z): 176 | """ 177 | Computes the sigmoid of z. 178 | """ 179 | return 1.0 / (1.0 + np.exp(-z)) 180 | 181 | 182 | class Grader(SubmissionBase): 183 | X = np.reshape(3 * np.sin(np.arange(1, 31)), (3, 10), order='F') 184 | Xm = np.reshape(np.sin(np.arange(1, 33)), (16, 2), order='F') / 5 185 | ym = np.arange(1, 17) % 4 186 | t1 = np.sin(np.reshape(np.arange(1, 25, 2), (4, 3), order='F')) 187 | t2 = np.cos(np.reshape(np.arange(1, 41, 2), (4, 5), order='F')) 188 | t = np.concatenate([t1.ravel(), t2.ravel()], axis=0) 189 | 190 | def __init__(self): 191 | part_names = ['Feedforward and Cost Function', 192 | 'Regularized Cost Function', 193 | 'Sigmoid Gradient', 194 | 'Neural Network Gradient (Backpropagation)', 195 | 'Regularized Gradient'] 196 | super().__init__('neural-network-learning', part_names) 197 | 198 | def __iter__(self): 199 | for part_id in range(1, 6): 200 | try: 201 | func = self.functions[part_id] 202 | 203 | # Each part has different expected arguments/different function 204 | if part_id == 1: 205 | res = func(self.t, 2, 4, 4, self.Xm, self.ym, 0)[0] 206 | elif part_id == 2: 207 | res = func(self.t, 2, 4, 4, self.Xm, self.ym, 1.5) 208 | elif part_id == 3: 209 | res = func(self.X, ) 210 | elif part_id == 4: 211 | J, grad = func(self.t, 2, 4, 4, self.Xm, self.ym, 0) 212 | grad1 = np.reshape(grad[:12], (4, 3)) 213 | grad2 = np.reshape(grad[12:], (4, 5)) 214 | grad = np.concatenate([grad1.ravel('F'), grad2.ravel('F')]) 215 | res = np.hstack([J, grad]).tolist() 216 | elif part_id == 5: 217 | J, grad = func(self.t, 2, 4, 4, self.Xm, self.ym, 1.5) 218 | grad1 = np.reshape(grad[:12], (4, 3)) 219 | grad2 = np.reshape(grad[12:], (4, 5)) 220 | grad = np.concatenate([grad1.ravel('F'), grad2.ravel('F')]) 221 | res = np.hstack([J, grad]).tolist() 222 | else: 223 | raise KeyError 224 | yield part_id, res 225 | except KeyError: 226 | yield part_id, 0 227 | -------------------------------------------------------------------------------- /ML(Andrew)/Rread: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /ML(Andrew)/Rreadme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Machine Leanring.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/Machine Leanring.png -------------------------------------------------------------------------------- /Machine Learning/New Text Document.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/Machine Learning/New Text Document.txt -------------------------------------------------------------------------------- /Machine Learning/📚Chapter 1 - Introduction/New Text Document.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/Machine Learning/📚Chapter 1 - Introduction/New Text Document.txt -------------------------------------------------------------------------------- /Machine Learning/📚Chapter 2 -Linear Regression with one Variable/New Text Document.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/Machine Learning/📚Chapter 2 -Linear Regression with one Variable/New Text Document.txt -------------------------------------------------------------------------------- /Model Evaluation/Bias_and_Variance_using_Python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Bias and Variance using Python.ipynb", 7 | "provenance": [], 8 | "toc_visible": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "gL0AffZGOZxZ" 23 | }, 24 | "source": [ 25 | "# **Bias and Variance in Machine Learning**\n" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "id": "O5iX_J1BY8UO" 32 | }, 33 | "source": [ 34 | "We use the terms bias and variance or bias-variance trade-off to describe the performance of a machine learning model. In this article, I will introduce you to the concept of bias and variance in machine learning\n", 35 | "\n", 36 | "When training a machine learning model, it is very important to understand the bias and variance of predictions of your model. It helps in analyzing prediction errors which help us in training more accurate machine learning models. In this article, I’ll walk you through how to calculate bias and variance using Python." 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "id": "2vGnZiIXuWdv" 43 | }, 44 | "source": [ 45 | "In machine learning, you must have heard that the model has a **high variance or high bias**." 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "id": "HCJMFdRIyEE_" 52 | }, 53 | "source": [ 54 | "**High bias**" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "id": "Uf9-H-oMuum6" 61 | }, 62 | "source": [ 63 | "To understand what bias and variance are, suppose we have a point estimator of a parameter or function. Then, **the bias is usually defined as the difference between the expected value of the estimator and the parameter we want to estimate.**" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": { 69 | "id": "yNYPasTnyj3W" 70 | }, 71 | "source": [ 72 | "high bias is proportional to the **underfitting**." 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": { 78 | "id": "2Q-c8Ae5yNEh" 79 | }, 80 | "source": [ 81 | "Bias is the difference between predicted values and expected results. A machine learning model with a low bias is a perfect model and a model with a high bias is expected with a high error rate on the training and test sets." 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": { 87 | "id": "OG4K7aCUvDcK" 88 | }, 89 | "source": [ 90 | "If the bias is greater than zero, we also say that the estimator is positively biased, if the bias is less than zero, the estimator is negatively biased, and if the bias is exactly zero, the estimator is unbiased. " 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": { 96 | "id": "YRBAgEMEzC4W" 97 | }, 98 | "source": [ 99 | "**Variance**" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": { 105 | "id": "UvtrnQBozdsd" 106 | }, 107 | "source": [ 108 | "Variance as the difference between the expected value of the estimator squared minus the expectation squared of the estimator. A machine learning model with high variance indicates that the model may work well on the data it was trained on, but it will not generalize well on the dataset it has never seen before." 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": { 114 | "id": "IHDYldyNzI4l" 115 | }, 116 | "source": [ 117 | " In general, one could say that a high variance is proportional to the **overfitting**" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": { 123 | "id": "8w8p0-KTQWwo" 124 | }, 125 | "source": [ 126 | "**Bias and Variance using Python**\n" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": { 132 | "id": "brQLUbEU0Hpe" 133 | }, 134 | "source": [ 135 | "You must be using the scikit-learn library in Python for implementing most of the machine learning algorithms. But it does not have any function to calculate the bias and variance of your trained model. So to calculate the bias and variance of your model using Python, you have to install another library known as mlxtend. You can easily install it in your system by using the pip command:" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "metadata": { 141 | "id": "ixuoUgX4aQl8" 142 | }, 143 | "source": [ 144 | "!pip install mlxtend" 145 | ], 146 | "execution_count": null, 147 | "outputs": [] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "metadata": { 152 | "colab": { 153 | "base_uri": "https://localhost:8080/" 154 | }, 155 | "id": "YjfUljli1FiK", 156 | "outputId": "02b8d63f-23ac-4547-eb93-760afa9b8b29" 157 | }, 158 | "source": [ 159 | "!pip install mlxtend --upgrade" 160 | ], 161 | "execution_count": 1, 162 | "outputs": [ 163 | { 164 | "output_type": "stream", 165 | "text": [ 166 | "Requirement already satisfied: mlxtend in /usr/local/lib/python3.7/dist-packages (0.18.0)\n", 167 | "Requirement already satisfied: matplotlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (3.2.2)\n", 168 | "Requirement already satisfied: joblib>=0.13.2 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.0.1)\n", 169 | "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from mlxtend) (57.2.0)\n", 170 | "Requirement already satisfied: scikit-learn>=0.20.3 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (0.22.2.post1)\n", 171 | "Requirement already satisfied: scipy>=1.2.1 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.4.1)\n", 172 | "Requirement already satisfied: pandas>=0.24.2 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.1.5)\n", 173 | "Requirement already satisfied: numpy>=1.16.2 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.19.5)\n", 174 | "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->mlxtend) (2.4.7)\n", 175 | "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->mlxtend) (2.8.1)\n", 176 | "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->mlxtend) (1.3.1)\n", 177 | "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->mlxtend) (0.10.0)\n", 178 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from cycler>=0.10->matplotlib>=3.0.0->mlxtend) (1.15.0)\n", 179 | "Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.2->mlxtend) (2018.9)\n" 180 | ], 181 | "name": "stdout" 182 | } 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": { 188 | "id": "kNEwf4BrQqhB" 189 | }, 190 | "source": [ 191 | "Now let’s train a machine learning model and then we will see how we can calculate its bias and variance using Python:\n", 192 | "\n", 193 | "\n", 194 | "\n" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "metadata": { 200 | "id": "K7Vcp8aQ0o3z" 201 | }, 202 | "source": [ 203 | "from mlxtend.evaluate import bias_variance_decomp\n", 204 | "import numpy as np\n", 205 | "import pandas as pd\n", 206 | "from sklearn.linear_model import LinearRegression\n", 207 | "from sklearn.utils import shuffle\n", 208 | "from sklearn.metrics import mean_squared_error\n", 209 | "\n", 210 | "data = pd.read_csv(\"https://raw.githubusercontent.com/amankharwal/Website-data/master/student-mat.csv\")\n", 211 | "data = data[[\"G1\", \"G2\", \"G3\", \"studytime\", \"failures\", \"absences\"]]\n", 212 | "\n", 213 | "predict = \"G3\"\n", 214 | "x = np.array(data.drop([predict], 1))\n", 215 | "y = np.array(data[predict])\n", 216 | "\n", 217 | "from sklearn.model_selection import train_test_split\n", 218 | "xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2)\n", 219 | "\n", 220 | "linear_regression = LinearRegression()\n", 221 | "linear_regression.fit(xtrain, ytrain)\n", 222 | "y_pred = linear_regression.predict(xtest)" 223 | ], 224 | "execution_count": 2, 225 | "outputs": [] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": { 230 | "id": "ZOpJ6on01UyC" 231 | }, 232 | "source": [ 233 | "So till now, we have trained a machine learning model by using the linear regression algorithm, below is how we can calculate its bias and variance using Python:" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "metadata": { 239 | "colab": { 240 | "base_uri": "https://localhost:8080/" 241 | }, 242 | "id": "gjnWuL-k1Tpm", 243 | "outputId": "a6c993d9-e5f6-44d9-fd2e-c28387585c35" 244 | }, 245 | "source": [ 246 | "mse, bias, variance = bias_variance_decomp(linear_regression, xtrain, ytrain, xtest, ytest, \n", 247 | " loss='mse', num_rounds=200, random_seed=123)\n", 248 | "print(\"Average Bias : \", bias)\n", 249 | "print(\"Average Variance : \", variance)" 250 | ], 251 | "execution_count": 3, 252 | "outputs": [ 253 | { 254 | "output_type": "stream", 255 | "text": [ 256 | "Average Bias : 4.910302451198915\n", 257 | "Average Variance : 0.05685635558630853\n" 258 | ], 259 | "name": "stdout" 260 | } 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": { 266 | "id": "nmU-Bn7q1gm-" 267 | }, 268 | "source": [ 269 | "Bias is the difference between predicted values and expected results. Variance is the variability of your model’s predictions over different sets of data. I hope you liked this article on how to calculate the bias and variance of a machine learning model. Feel free to ask your valuable questions in the comments section below." 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": { 275 | "id": "MIXjbz4hOO6i" 276 | }, 277 | "source": [ 278 | "[Bias and Variance using Python](https://thecleverprogrammer.com/2021/05/20/bias-and-variance-using-python/)" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": { 284 | "id": "kP2erWOV1vHB" 285 | }, 286 | "source": [ 287 | "[Bias and Variance in Machine Learning](https://thecleverprogrammer.com/2020/12/28/bias-and-variance-in-machine-learning/)" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": { 293 | "id": "YPEeaJW9170J" 294 | }, 295 | "source": [ 296 | "[Overfitting and Underfitting in Machine Learning\n", 297 | "](https://thecleverprogrammer.com/2020/09/04/overfitting-and-underfitting-in-machine-learning/)" 298 | ] 299 | } 300 | ] 301 | } -------------------------------------------------------------------------------- /Model Evaluation/What_is_Cross_Validation_in_Machine_Learning_.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "What is Cross-Validation in Machine Learning?.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [], 9 | "toc_visible": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "language_info": { 16 | "name": "python" 17 | } 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "bIsI9dS-FNPO" 24 | }, 25 | "source": [ 26 | "# **Introduction**" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "id": "ZdupLwSYFUHh" 33 | }, 34 | "source": [ 35 | "In Machine Learning, Cross-validation is a statistical method of evaluating generalization performance that is more stable and thorough than using a division of dataset into a training and test set. In this article, I’ll walk you through what cross-validation is and how to use it for machine learning using the Python programming language." 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": { 41 | "id": "aEQSzsbvGKch" 42 | }, 43 | "source": [ 44 | "![](https://drive.google.com/uc?export=view&id=1ZKQVeYKrTuLUxD6QYZ1nsoxP7cqbhmAq)" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "id": "OhNzcC8XGWxO" 51 | }, 52 | "source": [ 53 | "In cross-validation, the data is instead split multiple times and multiple models are trained. The most commonly used version of cross-validation is k-times cross-validation, where k is a user-specified number, usually 5 or 10." 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": { 59 | "id": "aZE7qjSMGoGv" 60 | }, 61 | "source": [ 62 | "In five-way cross-validation, the data is first partitioned into five parts of (approximately) equal size, called folds. Then, a sequence of models is formed. The first model is trained using the first fold as a test set, and the remaining folds (2–5) are used as a training set." 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": { 68 | "id": "95HA3eL9HBzf" 69 | }, 70 | "source": [ 71 | "The model is built using data from folds 2 to 5, then the precision is evaluated on fold 1. Then another model is built, this time using fold 2 as the test set and the data from folds 1, 3, 4 and 5 as a training set.\n", 72 | "\n", 73 | "This process is repeated using folds 3, 4 and 5 as test sets. For each of these five divisions of the data into training and testing sets, we calculate the precision. In the end, we collected five precision values." 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": { 79 | "id": "22KV7aATHIU5" 80 | }, 81 | "source": [ 82 | "# **Implementation Of Cross-Validation with Python**\n" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": { 88 | "id": "8OYkq1D_HMbf" 89 | }, 90 | "source": [ 91 | "We can easily implement the process of Cross-validation with Python programming language by using the Scikit-learn library in Python." 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": { 97 | "id": "TfTjWdy9HWK_" 98 | }, 99 | "source": [ 100 | "Cross-validation is implemented in scikit-learn using the cross_val_score function of the model_selection module. The parameters of the cross_val_score function are the model we want to evaluate, the training data, and the ground truth labels. Let’s evaluate LogisticRegression on the iris dataset:" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "metadata": { 106 | "id": "6JPdJfM3EJ-b" 107 | }, 108 | "source": [ 109 | "from sklearn.model_selection import cross_val_score\n", 110 | "from sklearn.datasets import load_iris\n", 111 | "from sklearn.linear_model import LogisticRegression\n", 112 | "\n", 113 | "iris = load_iris()\n", 114 | "logreg = LogisticRegression()\n", 115 | "\n", 116 | "scores = cross_val_score(logreg, iris.data, iris.target)\n", 117 | "print(\"Cross-validation scores: {}\".format(scores))" 118 | ], 119 | "execution_count": null, 120 | "outputs": [] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": { 125 | "id": "316KH8M9HpDn" 126 | }, 127 | "source": [ 128 | "By default, cross_val_score performs triple cross-validation, returning three precision values. We can modify the number of folds used by modifying the cv parameter:" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "metadata": { 134 | "id": "w5hj18T1Htho" 135 | }, 136 | "source": [ 137 | "scores = cross_val_score(logreg, iris.data, iris.target, cv=5)\n", 138 | "print(\"Cross-validation scores: {}\".format(scores))" 139 | ], 140 | "execution_count": null, 141 | "outputs": [] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": { 146 | "id": "VsCWOjgEH0AQ" 147 | }, 148 | "source": [ 149 | "A common way to summarize the precision of cross-validation is to calculate the mean:\n", 150 | "\n" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "metadata": { 156 | "colab": { 157 | "base_uri": "https://localhost:8080/" 158 | }, 159 | "id": "TPFUGuP9H3_e", 160 | "outputId": "3334c1b9-6b81-4145-f6a9-7fd46ffb4e59" 161 | }, 162 | "source": [ 163 | "print(\"Average cross-validation score: {:.2f}\".format(scores.mean()))\n" 164 | ], 165 | "execution_count": 3, 166 | "outputs": [ 167 | { 168 | "output_type": "stream", 169 | "text": [ 170 | "Average cross-validation score: 0.97\n" 171 | ], 172 | "name": "stdout" 173 | } 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": { 179 | "id": "XD-d44CUH_8n" 180 | }, 181 | "source": [ 182 | "# **Benefits & Drawbacks of Using Cross-Validation**\n" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": { 188 | "id": "we1jyh0CIElI" 189 | }, 190 | "source": [ 191 | "There are several advantages to using cross-validation instead of a single division into one training and one set of tests. First of all, remember that train_test_split performs a random division of data." 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": { 197 | "id": "tUMXrkbEIRen" 198 | }, 199 | "source": [ 200 | "Imagine that we are “lucky” at randomly splitting the data, and all the hard-to-categorize examples end up in the training set. In this case, the test set will only contain “simple” examples, and the accuracy of our test set will be unrealistic.\n", 201 | "\n", 202 | "Conversely, if we are “unlucky” we may have randomly placed all of the hard-to-rank examples in the test set and therefore have an unrealistic score." 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": { 208 | "id": "9H9IvQmtIqWW" 209 | }, 210 | "source": [ 211 | "However, when using cross-validation, each example will be in the test set exactly once: each example is in one of the folds, and each fold is the test set once. Therefore, the model must generalize well to all samples in the dataset for all cross-validation scores (and their mean) to be high.\n", 212 | "\n", 213 | "Having multiple splits of the data also provides information about the sensitivity of our model to the selection of the training data set. For the iris dataset, we saw accuracies between 90% and 100%. That’s quite a range, and it gives us an idea of ​​how the model might work in the worst-case scenario and the best-case scenario when applied to new data.\n", 214 | "\n", 215 | "Another advantage of cross-validation over using a single data division is that we use our data more efficiently. When using train_test_split, we typically use 75% of the data for training and 25% of the data for evaluation.\n", 216 | "\n", 217 | "When using five-fold cross-validation, on each iteration we can use four-fifths of the data (80%) to fit the model. When using 10 cross-validations, we can use the nine-tenths of the data (90%) to fit the model. More data will generally result in more accurate models.\n", 218 | "\n", 219 | "The main disadvantage is the increase in computational costs. Since we are currently training k models instead of a single model, the cross-validation will be about k times slower than doing a single division of the data." 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": { 225 | "id": "Fby7CCu5E0-C" 226 | }, 227 | "source": [ 228 | "# **References**\n", 229 | "[What is Cross-Validation in Machine Learning?](https://thecleverprogrammer.com/2020/10/25/what-is-cross-validation-in-machine-learning/)" 230 | ] 231 | } 232 | ] 233 | } -------------------------------------------------------------------------------- /Model Evaluation/readme: -------------------------------------------------------------------------------- 1 | R 2 | -------------------------------------------------------------------------------- /Preprocessing/Create_new_Features_(Faker)_.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Create new Features (Faker) .ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "code", 21 | "metadata": { 22 | "id": "VOLioVdPYeAI" 23 | }, 24 | "source": [ 25 | "" 26 | ], 27 | "execution_count": null, 28 | "outputs": [] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "N7LxyqR9Yh8d" 34 | }, 35 | "source": [ 36 | "# **Introduction**" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "id": "fu6PCH6GYmMi" 43 | }, 44 | "source": [ 45 | "let say you want to create data with certain data type (Bool, txt ...) with special characteristics ( name , address etc) to test some python library or specific implementation. But it take time to find that specific kind of data. you wander , is ther any quick way that you can create your own data? What if there is a package that enables you to create fake data in one line of code. Checkout Fake: a python package that does exactly that " 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "metadata": { 51 | "colab": { 52 | "base_uri": "https://localhost:8080/" 53 | }, 54 | "id": "gsOZ4jgDjUzd", 55 | "outputId": "708b0c31-3eb9-4d84-c87a-4e1d2f4bcf61" 56 | }, 57 | "source": [ 58 | "!pip install faker " 59 | ], 60 | "execution_count": 1, 61 | "outputs": [ 62 | { 63 | "output_type": "stream", 64 | "text": [ 65 | "Collecting faker\n", 66 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/95/c4/6abf74493bf4eb4cea24cab5932106921ce6d014a626966031da4ee7ad25/Faker-8.2.1-py3-none-any.whl (1.2MB)\n", 67 | "\u001b[K |████████████████████████████████| 1.2MB 5.3MB/s \n", 68 | "\u001b[?25hRequirement already satisfied: text-unidecode==1.3 in /usr/local/lib/python3.7/dist-packages (from faker) (1.3)\n", 69 | "Requirement already satisfied: python-dateutil>=2.4 in /usr/local/lib/python3.7/dist-packages (from faker) (2.8.1)\n", 70 | "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.4->faker) (1.15.0)\n", 71 | "Installing collected packages: faker\n", 72 | "Successfully installed faker-8.2.1\n" 73 | ], 74 | "name": "stdout" 75 | } 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "metadata": { 81 | "id": "CZR5qZrajbms" 82 | }, 83 | "source": [ 84 | "# Installing the package \n", 85 | "from faker import Faker\n", 86 | "fake=Faker()\n", 87 | "name=fake.name()\n", 88 | "color= fake.color_name()\n", 89 | "city= fake.city()\n", 90 | "job = fake.job()" 91 | ], 92 | "execution_count": 4, 93 | "outputs": [] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "metadata": { 98 | "colab": { 99 | "base_uri": "https://localhost:8080/" 100 | }, 101 | "id": "QHdtMY3ykUQb", 102 | "outputId": "87d4a505-f9aa-4412-d301-456ea376c9ac" 103 | }, 104 | "source": [ 105 | "print('Her name is {}. She lives in {}. her favorite color is {}. she work as {}'.format(name,city,color,job))" 106 | ], 107 | "execution_count": 7, 108 | "outputs": [ 109 | { 110 | "output_type": "stream", 111 | "text": [ 112 | "Her name is Jonathan Hernandez. She lives in West Marialand. her favorite color is Ivory. she work as TEFL teacher\n" 113 | ], 114 | "name": "stdout" 115 | } 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "metadata": { 121 | "id": "ZiTE9Z_PkFa3" 122 | }, 123 | "source": [ 124 | "" 125 | ], 126 | "execution_count": null, 127 | "outputs": [] 128 | } 129 | ] 130 | } -------------------------------------------------------------------------------- /Preprocessing/Creating_artificial_datasets.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Creating artificial datasets.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | }, 17 | "accelerator": "GPU" 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "BiE4HQuXA1jb" 24 | }, 25 | "source": [ 26 | "# **Import libaray**" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "id": "69-GFSEUYpQA" 33 | }, 34 | "source": [ 35 | "**Creating artificial datasets**\n", 36 | "\n", 37 | "For instance, you can create artificial datasets using scikit-learn (as shown below) that can be used to try out different machine learning workflow that you may have devised.\n" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "metadata": { 43 | "id": "j9t6Ik9SAUPX" 44 | }, 45 | "source": [ 46 | "from sklearn.datasets import make_classification" 47 | ], 48 | "execution_count": 5, 49 | "outputs": [] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "metadata": { 54 | "id": "f52X64tDAYXY" 55 | }, 56 | "source": [ 57 | "X, Y = make_classification(n_samples=200, n_classes=2, n_features=10, n_redundant=0, random_state=1)\n" 58 | ], 59 | "execution_count": 9, 60 | "outputs": [] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": { 65 | "id": "5CKqdX86iIBc" 66 | }, 67 | "source": [ 68 | "# **References**" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": { 74 | "id": "OZF79FvqiNhO" 75 | }, 76 | "source": [ 77 | "[How to Master Scikit-learn for Data Science](https://towardsdatascience.com/how-to-master-scikit-learn-for-data-science-c29214ec25b0)" 78 | ] 79 | } 80 | ] 81 | } -------------------------------------------------------------------------------- /Preprocessing/Data_representation_in_scikit_learn.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Data representation in scikit-learn.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | }, 17 | "accelerator": "GPU" 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "BiE4HQuXA1jb" 24 | }, 25 | "source": [ 26 | "# **Import libaray**" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "metadata": { 32 | "id": "CYcSX5ofA8Yz" 33 | }, 34 | "source": [ 35 | "import numpy as np \n", 36 | "import pandas as pd \n", 37 | "import seaborn as sns \n", 38 | "import matplotlib.pyplot as plt\n", 39 | "from prettytable import PrettyTable\n", 40 | "from sklearn.metrics import roc_curve, auc\n", 41 | "from mlxtend.plotting import plot_confusion_matrix \n", 42 | "from sklearn.model_selection import train_test_split\n", 43 | "from sklearn.metrics import classification_report, confusion_matrix\n", 44 | "from sklearn.tree import DecisionTreeClassifier\n", 45 | "from sklearn.ensemble import RandomForestClassifier\n", 46 | "from sklearn.svm import LinearSVC\n", 47 | "from sklearn.linear_model import LogisticRegression\n", 48 | "from sklearn.neighbors import KNeighborsClassifier\n", 49 | "import warnings\n", 50 | "warnings.filterwarnings(\"ignore\")" 51 | ], 52 | "execution_count": 1, 53 | "outputs": [] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": { 58 | "id": "-dEH-aOwYdNx" 59 | }, 60 | "source": [ 61 | "# **Introduction**\n", 62 | "\n", 63 | "Let’s start with the basics and consider the data representation used in scikit-learn, which is essentially a tabular dataset.At a high-level, for a supervised learning problem the tabular dataset will be comprised of both X and y variables while an unsupervised learning problem will constitute of only X variables.\n", 64 | "\n", 65 | "At a high-level, X variables are also known as independent variables and they can be either quantitative or qualitative descriptions of samples of interests while the y variable is also known as the dependent variable and they are essentially the target or response variable that predictive models are built to predict." 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": { 71 | "id": "9Rl_C9QLeAc8" 72 | }, 73 | "source": [ 74 | "![](https://drive.google.com/uc?export=view&id=1OH5lF4gI10mL3T4dG2YYhhjp2Ich7p_U)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": { 80 | "id": "dSRa4CGgeNHG" 81 | }, 82 | "source": [ 83 | "For example, if we’re building a predictive model to predict whether individuals have a disease or not the disease/non-disease status is the y variable whereas health indicators obtained by clinical test results are used as X variables.\n" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": { 89 | "id": "69-GFSEUYpQA" 90 | }, 91 | "source": [ 92 | "# **Loading data from CSV files via Pandas**\n", 93 | "\n", 94 | "Practically, the contents of a dataset can be stored in a CSV file and it can be read in using the Pandas library via the pd.read_csv() function. Thus, the data structure of the loaded data is known as the Pandas DataFrame." 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "metadata": { 100 | "colab": { 101 | "base_uri": "https://localhost:8080/" 102 | }, 103 | "id": "fYNNsGgxmxyQ", 104 | "outputId": "db152256-b938-4017-cba9-ab9d37289e47" 105 | }, 106 | "source": [ 107 | "from google.colab import drive\n", 108 | "drive.mount('/content/drive')" 109 | ], 110 | "execution_count": 2, 111 | "outputs": [ 112 | { 113 | "output_type": "stream", 114 | "text": [ 115 | "Mounted at /content/drive\n" 116 | ], 117 | "name": "stdout" 118 | } 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "metadata": { 124 | "id": "bkoAdyxInLR4" 125 | }, 126 | "source": [ 127 | "import pandas as pd\n", 128 | "import numpy as np\n", 129 | "data = pd.read_csv(\"//content/drive/MyDrive/Datasets/Student field Recommendation /Placement_Data_Full_Class.csv\")" 130 | ], 131 | "execution_count": 7, 132 | "outputs": [] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "metadata": { 137 | "id": "OE_f4-bToHWA", 138 | "colab": { 139 | "base_uri": "https://localhost:8080/", 140 | "height": 241 141 | }, 142 | "outputId": "47538ec6-98c7-43f0-d835-f443e3e86808" 143 | }, 144 | "source": [ 145 | "data.head()" 146 | ], 147 | "execution_count": 8, 148 | "outputs": [ 149 | { 150 | "output_type": "execute_result", 151 | "data": { 152 | "text/html": [ 153 | "
\n", 154 | "\n", 167 | "\n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | "
sl_nogenderssc_pssc_bhsc_phsc_bhsc_sdegree_pdegree_tworkexetest_pspecialisationmba_pstatussalary
01M67.00Others91.00OthersCommerce58.00Sci&TechNo55.0Mkt&HR58.80Placed270000.0
12M79.33Central78.33OthersScience77.48Sci&TechYes86.5Mkt&Fin66.28Placed200000.0
23M65.00Central68.00CentralArts64.00Comm&MgmtNo75.0Mkt&Fin57.80Placed250000.0
34M56.00Central52.00CentralScience52.00Sci&TechNo66.0Mkt&HR59.43Not PlacedNaN
45M85.80Central73.60CentralCommerce73.30Comm&MgmtNo96.8Mkt&Fin55.50Placed425000.0
\n", 281 | "
" 282 | ], 283 | "text/plain": [ 284 | " sl_no gender ssc_p ssc_b ... specialisation mba_p status salary\n", 285 | "0 1 M 67.00 Others ... Mkt&HR 58.80 Placed 270000.0\n", 286 | "1 2 M 79.33 Central ... Mkt&Fin 66.28 Placed 200000.0\n", 287 | "2 3 M 65.00 Central ... Mkt&Fin 57.80 Placed 250000.0\n", 288 | "3 4 M 56.00 Central ... Mkt&HR 59.43 Not Placed NaN\n", 289 | "4 5 M 85.80 Central ... Mkt&Fin 55.50 Placed 425000.0\n", 290 | "\n", 291 | "[5 rows x 15 columns]" 292 | ] 293 | }, 294 | "metadata": { 295 | "tags": [] 296 | }, 297 | "execution_count": 8 298 | } 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": { 304 | "id": "ShNfkqfBeoWq" 305 | }, 306 | "source": [ 307 | "Afterwards, data processing can be performed on the DataFrame using the wide range of Pandas functions for handling missing data (i.e. dropping missing data or filling them in with imputed values), selecting specific column or range of columns, performing feature transformations, conditional filtering of data, etc.\n", 308 | "In the following example, we will separate the DataFrame as X and y variables, which will be used shortly for model building." 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": { 314 | "id": "Y7r4GCZ_r578" 315 | }, 316 | "source": [ 317 | "# **Prepraring X and Y**" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "metadata": { 323 | "id": "j9t6Ik9SAUPX" 324 | }, 325 | "source": [ 326 | "X=data.drop('specialisation',axis=1)" 327 | ], 328 | "execution_count": null, 329 | "outputs": [] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "metadata": { 334 | "id": "f52X64tDAYXY" 335 | }, 336 | "source": [ 337 | "y=data[['specialisation']]\n" 338 | ], 339 | "execution_count": null, 340 | "outputs": [] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": { 345 | "id": "5CKqdX86iIBc" 346 | }, 347 | "source": [ 348 | "# **References**" 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": { 354 | "id": "OZF79FvqiNhO" 355 | }, 356 | "source": [ 357 | "[How to Master Scikit-learn for Data Science](https://towardsdatascience.com/how-to-master-scikit-learn-for-data-science-c29214ec25b0)" 358 | ] 359 | } 360 | ] 361 | } -------------------------------------------------------------------------------- /Preprocessing/StandardScaler_in_Machine_Learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "StandardScaler in Machine Learning.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "qIlLHw-C-vA8" 23 | }, 24 | "source": [ 25 | "# **Introduction**" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "id": "d_TDMTfJ_nuh" 32 | }, 33 | "source": [ 34 | "In Machine Learning, StandardScaler is used to resize the distribution of values ​​so that the mean of the observed values ​​is 0 and the standard deviation is 1. In this article, I will walk you through how to use StandardScaler in Machine Learning.\n", 35 | "\n", 36 | "StandardScaler is an important technique that is mainly performed as a preprocessing step before many machine learning models, in order to standardize the range of functionality of the input dataset." 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "id": "cP4xM3Va_4k2" 43 | }, 44 | "source": [ 45 | "Some machine learning practitioners tend to standardize their data blindly before each machine learning model without making the effort to understand why it should be used, or even whether it is needed or not. So you need to understand when you should use the StandardScaler to scale your data." 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "id": "n19_fR7xADFO" 52 | }, 53 | "source": [ 54 | "#**When and How To Use StandardScaler?**" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "id": "6gsDegOhAOnv" 61 | }, 62 | "source": [ 63 | "StandardScaler comes into play when the characteristics of the input dataset differ greatly between their ranges, or simply when they are measured in different units of measure.\n", 64 | "\n", 65 | "StandardScaler removes the mean and scales the data to the unit variance. However, outliers have an influence when calculating the empirical mean and standard deviation, which narrows the range of characteristic values.\n", 66 | "\n", 67 | "These differences in the initial features can cause problems for many machine learning models. For example, for models based on the calculation of distance, if one of the features has a wide range of values, the distance will be governed by that particular characteristic.\n", 68 | "\n", 69 | "The idea behind the StandardScaler is that variables that are measured at different scales do not contribute equally to the fit of the model and the learning function of the model and could end up creating a bias. \n", 70 | "\n", 71 | "So, to deal with this potential problem, we need to standardize the data (μ = 0, σ = 1) that is typically used before we integrate it into the machine learning model." 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "metadata": { 77 | "id": "4U-xaO6I6RQZ" 78 | }, 79 | "source": [ 80 | "from sklearn.preprocessing import StandardScaler\n", 81 | "import numpy as np\n", 82 | "\n", 83 | "# 4 samples/observations and 2 variables/features\n", 84 | "X = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])\n", 85 | "# the scaler object (model)\n", 86 | "scaler = StandardScaler()\n", 87 | "# fit and transform the data\n", 88 | "scaled_data = scaler.fit_transform(X)\n", 89 | "print(X)" 90 | ], 91 | "execution_count": null, 92 | "outputs": [] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "metadata": { 97 | "id": "igUk56jpAaw8", 98 | "outputId": "ce938b03-2f3e-497b-bee2-d07e8dbd1fc2", 99 | "colab": { 100 | "base_uri": "https://localhost:8080/" 101 | } 102 | }, 103 | "source": [ 104 | "print(scaled_data)\n" 105 | ], 106 | "execution_count": 6, 107 | "outputs": [ 108 | { 109 | "output_type": "stream", 110 | "text": [ 111 | "[[-1. -1.]\n", 112 | " [ 1. -1.]\n", 113 | " [-1. 1.]\n", 114 | " [ 1. 1.]]\n" 115 | ], 116 | "name": "stdout" 117 | } 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": { 123 | "id": "6-B4D3yZ-f9t" 124 | }, 125 | "source": [ 126 | "# **References**\n", 127 | "\n", 128 | "[StandardScaler in Machine Learning](https://thecleverprogrammer.com/2020/09/22/standardscaler-in-machine-learning/)" 129 | ] 130 | } 131 | ] 132 | } -------------------------------------------------------------------------------- /Preprocessing/Upload_Dataset_from_github_to_Colab.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Upload Dataset from github to Colab.ipynb", 7 | "provenance": [] 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "S2tqYTAohEb8" 22 | }, 23 | "source": [ 24 | "# New Section" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "metadata": { 30 | "id": "9Lc_LrjBht7s" 31 | }, 32 | "source": [ 33 | "import pandas as pd # data processing" 34 | ], 35 | "execution_count": 16, 36 | "outputs": [] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "metadata": { 41 | "id": "Yd6mvSMgg9ir" 42 | }, 43 | "source": [ 44 | "import zipfile\n", 45 | "import os" 46 | ], 47 | "execution_count": 10, 48 | "outputs": [] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "metadata": { 53 | "colab": { 54 | "base_uri": "https://localhost:8080/" 55 | }, 56 | "id": "7j52livQkVAb", 57 | "outputId": "3d27d14f-b9e3-4e78-daf5-b819bb787aba" 58 | }, 59 | "source": [ 60 | "!wget --no-check-certificate \\\n", 61 | " \"https://github.com/hussain0048/Machine-Learning/archive/refs/heads/master.zip\" \\\n", 62 | " -O \"/tmp/Machine-Learning.zip\"\n", 63 | "\n", 64 | "zip_ref = zipfile.ZipFile('/tmp/Machine-Learning.zip', 'r') #Opens the zip file in read mode\n", 65 | "zip_ref.extractall('/tmp') #Extracts the files into the /tmp folder\n", 66 | "zip_ref.close()" 67 | ], 68 | "execution_count": 13, 69 | "outputs": [ 70 | { 71 | "output_type": "stream", 72 | "text": [ 73 | "--2021-07-15 05:16:04-- https://github.com/hussain0048/Machine-Learning/archive/refs/heads/master.zip\n", 74 | "Resolving github.com (github.com)... 140.82.114.4\n", 75 | "Connecting to github.com (github.com)|140.82.114.4|:443... connected.\n", 76 | "HTTP request sent, awaiting response... 302 Found\n", 77 | "Location: https://codeload.github.com/hussain0048/Machine-Learning/zip/refs/heads/master [following]\n", 78 | "--2021-07-15 05:16:04-- https://codeload.github.com/hussain0048/Machine-Learning/zip/refs/heads/master\n", 79 | "Resolving codeload.github.com (codeload.github.com)... 140.82.112.10\n", 80 | "Connecting to codeload.github.com (codeload.github.com)|140.82.112.10|:443... connected.\n", 81 | "HTTP request sent, awaiting response... 200 OK\n", 82 | "Length: unspecified [application/zip]\n", 83 | "Saving to: ‘/tmp/Machine-Learning.zip’\n", 84 | "\n", 85 | "/tmp/Machine-Learni [ <=> ] 16.18M 24.6MB/s in 0.7s \n", 86 | "\n", 87 | "2021-07-15 05:16:05 (24.6 MB/s) - ‘/tmp/Machine-Learning.zip’ saved [16964377]\n", 88 | "\n" 89 | ], 90 | "name": "stdout" 91 | } 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "metadata": { 97 | "id": "t_ePp3U1l_D5" 98 | }, 99 | "source": [ 100 | "data = pd.read_csv(\"/tmp/Machine-Learning-master/Datasets/train.csv\")" 101 | ], 102 | "execution_count": 17, 103 | "outputs": [] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": { 108 | "id": "KIOinRuDjNVc" 109 | }, 110 | "source": [ 111 | "go to github rep\n", 112 | "right click on Download ZIP> copy link addres" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": { 118 | "id": "LDY0ocfYiL4z" 119 | }, 120 | "source": [ 121 | "![](https://drive.google.com/uc?export=view&id=19K_dvCKXahqLVa41-3klLp5Y4dlkkNwV)" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "metadata": { 127 | "id": "ggg1KCFXhbsW" 128 | }, 129 | "source": [ 130 | "zip_ref = zipfile.ZipFile('/tmp/Website-data.zip', 'r') #Opens the zip file in read mode\n", 131 | "zip_ref.extractall('/tmp') #Extracts the files into the /tmp folder\n", 132 | "zip_ref.close()" 133 | ], 134 | "execution_count": null, 135 | "outputs": [] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "metadata": { 140 | "id": "khdJwRrphnkd" 141 | }, 142 | "source": [ 143 | "data = pd.read_csv(\"/tmp/Website-data-master/Groceries_dataset.csv\")\n" 144 | ], 145 | "execution_count": null, 146 | "outputs": [] 147 | } 148 | ] 149 | } -------------------------------------------------------------------------------- /Preprocessing/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Recommendation System/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Scikit_Learn_Boosting_Methods.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "toc_visible": true, 8 | "authorship_tag": "ABX9TyPKLNYUQulZukfKIw41xtdM", 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "language_info": { 16 | "name": "python" 17 | } 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "view-in-github", 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "\"Open" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": null, 33 | "metadata": { 34 | "id": "K8CUatxoYDcT" 35 | }, 36 | "outputs": [], 37 | "source": [] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "source": [ 42 | "# **1- AdaBoost**" 43 | ], 44 | "metadata": { 45 | "id": "SjTKXSYlZZDS" 46 | } 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "source": [ 51 | "## **1.1- Classification with AdaBoost**" 52 | ], 53 | "metadata": { 54 | "id": "rTCbgzqXZnUK" 55 | } 56 | }, 57 | { 58 | "cell_type": "code", 59 | "source": [ 60 | "from sklearn.ensemble import AdaBoostClassifier\n", 61 | "from sklearn.datasets import make_classification\n", 62 | "X, y = make_classification(n_samples = 1000, n_features = 10,n_informative = 2, n_redundant = 0,random_state = 0, shuffle = False)\n", 63 | "ADBclf = AdaBoostClassifier(n_estimators = 100, random_state = 0)\n", 64 | "ADBclf.fit(X, y)" 65 | ], 66 | "metadata": { 67 | "colab": { 68 | "base_uri": "https://localhost:8080/", 69 | "height": 75 70 | }, 71 | "id": "7-KTAVDqZva6", 72 | "outputId": "e012f872-7c87-4c9b-aacd-2f8c3f19acab" 73 | }, 74 | "execution_count": 1, 75 | "outputs": [ 76 | { 77 | "output_type": "execute_result", 78 | "data": { 79 | "text/plain": [ 80 | "AdaBoostClassifier(n_estimators=100, random_state=0)" 81 | ], 82 | "text/html": [ 83 | "
AdaBoostClassifier(n_estimators=100, random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" 84 | ] 85 | }, 86 | "metadata": {}, 87 | "execution_count": 1 88 | } 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "source": [ 94 | "print(ADBclf.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))" 95 | ], 96 | "metadata": { 97 | "colab": { 98 | "base_uri": "https://localhost:8080/" 99 | }, 100 | "id": "OB5MvAqdZ37z", 101 | "outputId": "abd2b6e8-bd0d-49fc-fff6-1598fb302272" 102 | }, 103 | "execution_count": 2, 104 | "outputs": [ 105 | { 106 | "output_type": "stream", 107 | "name": "stdout", 108 | "text": [ 109 | "[1]\n" 110 | ] 111 | } 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "source": [ 117 | "**Extra-Tree method**" 118 | ], 119 | "metadata": { 120 | "id": "J8Jvz6VNaFXz" 121 | } 122 | }, 123 | { 124 | "cell_type": "code", 125 | "source": [ 126 | "from pandas import read_csv\n", 127 | "from sklearn.model_selection import KFold\n", 128 | "from sklearn.model_selection import cross_val_score\n", 129 | "from sklearn.ensemble import AdaBoostClassifier\n", 130 | "seed = 5\n", 131 | "kfold = KFold(n_splits = 10)\n", 132 | "num_trees = 100\n", 133 | "max_features = 5\n", 134 | "ADBclf = AdaBoostClassifier(n_estimators = num_trees, max_features = max_features)\n", 135 | "results = cross_val_score(ADBclf, X, y, cv = kfold)\n", 136 | "print(results.mean())" 137 | ], 138 | "metadata": { 139 | "colab": { 140 | "base_uri": "https://localhost:8080/", 141 | "height": 235 142 | }, 143 | "id": "LIsbQo61aJNK", 144 | "outputId": "e19af9fb-a1e7-480c-ac7d-e652b0db6217" 145 | }, 146 | "execution_count": 6, 147 | "outputs": [ 148 | { 149 | "output_type": "error", 150 | "ename": "TypeError", 151 | "evalue": "ignored", 152 | "traceback": [ 153 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 154 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 155 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mnum_trees\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m100\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0mmax_features\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0mADBclf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mAdaBoostClassifier\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn_estimators\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnum_trees\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmax_features\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmax_features\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 10\u001b[0m \u001b[0mresults\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcross_val_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mADBclf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkfold\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresults\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmean\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 156 | "\u001b[0;31mTypeError\u001b[0m: AdaBoostClassifier.__init__() got an unexpected keyword argument 'max_features'" 157 | ] 158 | } 159 | ] 160 | } 161 | ] 162 | } -------------------------------------------------------------------------------- /Sklearn/Association Mining/readm: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Sklearn/Graph Algorithms/readme: -------------------------------------------------------------------------------- 1 | 2 | Introduction to Graph Neural Network 3 | https://heartbeat.fritz.ai/introduction-to-graph-neural-networks-c5a9f4aa9e99 4 | -------------------------------------------------------------------------------- /Sklearn/Unsupervised Learning/BIRCH_Clustering_in_Machine_Learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "BIRCH Clustering in Machine Learning.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "qIlLHw-C-vA8" 23 | }, 24 | "source": [ 25 | "# **Introduction**" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "id": "nAgC69Io5PCm" 32 | }, 33 | "source": [ 34 | "The BIRCH is a Clustering algorithm in machine learning. It stands for Balanced Reducing and Clustering using Hierarchies. In this article, I will take you through the concept of BIRCH Clustering in Machine Learning and its implementation using Python." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "id": "7IyZxpFq5Zd1" 41 | }, 42 | "source": [ 43 | "BIRCH is a clustering algorithm in machine learning that has been specially designed for clustering on a very large data set. It is often faster than other clustering algorithms like batch K-Means. It provides a very similar result to the batch K-Means algorithm if the number of features in the dataset is not more than 20." 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": { 49 | "id": "BzlX_zN35iNC" 50 | }, 51 | "source": [ 52 | "When training the model using the BIRCH algorithm, it creates a tree structure with enough data to quickly assign each data point to a cluster. By storing all the data points in the tree, this algorithm allows the use of limited memory while working on a very large data set. In the section below, I will take you through its implementation by using the Python programming language." 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": { 58 | "id": "n19_fR7xADFO" 59 | }, 60 | "source": [ 61 | "#**BIRCH Clustering using Python**" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": { 67 | "id": "BZaXy3GY5yR4" 68 | }, 69 | "source": [ 70 | "The BIRCH algorithm starts with a threshold value, then learns from the data, then inserts data points into the tree. In the process, if it goes out of memory while learning from the data, it increases the threshold value and repeats the process. Now let’s see how to implement BIRCH clustering using Python. I’ll start this task by importing the necessary Python libraries and the dataset:" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": { 76 | "id": "6rTGVD0nChvK" 77 | }, 78 | "source": [ 79 | "# **2. Preparing the Data**" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "metadata": { 85 | "colab": { 86 | "base_uri": "https://localhost:8080/" 87 | }, 88 | "id": "tupzfDqeCprt", 89 | "outputId": "28d2a545-5163-4ade-8c15-2ca912cae60d" 90 | }, 91 | "source": [ 92 | "from google.colab import drive\n", 93 | "drive.mount('/content/drive')" 94 | ], 95 | "execution_count": 1, 96 | "outputs": [ 97 | { 98 | "output_type": "stream", 99 | "text": [ 100 | "Mounted at /content/drive\n" 101 | ], 102 | "name": "stdout" 103 | } 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": { 109 | "id": "BeDM7bY6C2a9" 110 | }, 111 | "source": [ 112 | "We’re ready to start building our neural network!\n", 113 | "\n" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": { 119 | "id": "r1HpibBgDJGI" 120 | }, 121 | "source": [ 122 | "# **3. Building the Model**" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "metadata": { 128 | "id": "2H50zmCjDQzS" 129 | }, 130 | "source": [ 131 | "import numpy as np\n", 132 | "import pandas as pd\n", 133 | "import matplotlib.pyplot as plt\n", 134 | "import seaborn as sns\n", 135 | "sns.set()\n", 136 | "\n", 137 | "data = pd.read_csv(\"/content/drive/MyDrive/Datasets/Customer Segmentation /customers.csv\")\n", 138 | "print(data.head())" 139 | ], 140 | "execution_count": null, 141 | "outputs": [] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": { 146 | "id": "hYAVqkXz59S0" 147 | }, 148 | "source": [ 149 | "The dataset that I am using here is based on customer segmentation. Now let’s prepare the data for implementing the clustering algorithm. Here I will rename the columns for simplicity and then I will only select two columns for implementing the BIRCH clustering algorithm using Python:" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "metadata": { 155 | "id": "vAeN5ZBFEHHc" 156 | }, 157 | "source": [ 158 | "data[\"Income\"] = data[[\"Annual Income (k$)\"]]\n", 159 | "data[\"Spending\"] = data[[\"Spending Score (1-100)\"]]\n", 160 | "data = data[[\"Income\", \"Spending\"]]\n", 161 | "print(data.head())" 162 | ], 163 | "execution_count": null, 164 | "outputs": [] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": { 169 | "id": "swpeW1fh6LA5" 170 | }, 171 | "source": [ 172 | "So we have prepared the data and now let’s import the BIRCH class from the sklearn library in Python and use it on the data and have a look at the results by visualizing the clusters:" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "metadata": { 178 | "id": "4U-xaO6I6RQZ" 179 | }, 180 | "source": [ 181 | "from sklearn.cluster import Birch\n", 182 | "model = Birch(branching_factor=30, n_clusters=5, threshold=2.5)\n", 183 | "model.fit(data)\n", 184 | "pred = model.predict(data)\n", 185 | "plt.scatter(data[\"Income\"], data[\"Spending\"], c=pred, cmap='rainbow', alpha=0.5, edgecolors='b')\n", 186 | "plt.show()" 187 | ], 188 | "execution_count": null, 189 | "outputs": [] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": { 194 | "id": "6-B4D3yZ-f9t" 195 | }, 196 | "source": [ 197 | "# **References**\n", 198 | "\n", 199 | "[BIRCH Clustering in Machine Learning](https://thecleverprogrammer.com/2021/03/15/birch-clustering-in-machine-learning/)" 200 | ] 201 | } 202 | ] 203 | } -------------------------------------------------------------------------------- /Sklearn/Unsupervised Learning/readme: -------------------------------------------------------------------------------- 1 | Unsupervised Machine Learning Algorithms 2 | https://thecleverprogrammer.com/2020/11/28/unsupervised-machine-learning-algorithms/?fbclid=IwAR1hrTax3aATD-tBz3qrAvMXgehghFudPa07M2gKetyWWb4vyPcmywEi60Y 3 | -------------------------------------------------------------------------------- /Sklearn/dataset/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Sklearn/readme: -------------------------------------------------------------------------------- 1 | 2 | Machine Learning with Scikit-Learn Python 3 | https://www.youtube.com/watch?v=FEksNK_i7lQ&list=PLM8wYQRetTxDHDWU-YBPfKXV3G0TKXvpy&index=4 4 | Create Your Own Demand By Enhancing Your Career With CoderzColumn 5 | https://coderzcolumn.com/ 6 | https://dataaspirant.com/save-scikit-learn-models-with-python-pickle/ 7 | https://dafriedman97.github.io/mlbook/content/c4/code.html 8 | https://www.coursera.org/learn/python-machine-learning?ranMID=40328&ranEAID=d1QCig2q3qI&ranSiteID=d1QCig2q3qI-7SLRY0cj7Y2ybRuVHN.HAw&siteID=d1QCig2q3qI-7SLRY0cj7Y2ybRuVHN.HAw&utm_content=2&utm_medium=partners&utm_source=linkshare&utm_campaign=d1QCig2q3qI#syllabus 9 | Machine learning with python 10 | https://www.youtube.com/playlist?list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v 11 | https://www.dataspoof.info/ 12 | Graph Algorithms with Python 13 | https://thecleverprogrammer.com/2020/10/09/graph-algorithms-with-python/?fbclid=IwAR15GrrQsBy6Hu3kr2gQZCWu38SZcQKu9rZELwIFOKgEIczeErD0cFmi1mA 14 | Python Machine Learning 15 | https://www.youtube.com/watch?v=jg5paDArl3E&list=PL7yh-TELLS1EZGz1-VDltwdwZvPV-jliQ 16 | All Machine Learning Algorithms Explained 17 | https://thecleverprogrammer.com/2020/06/05/all-machine-learning-algorithms-explained/?fbclid=IwAR1agYHZMyEKGESgIMtHCGTgwjNaAtmG-EnI80C7qeuo29QdvAV_VxJuILs 18 | All posts in Machine Learning 19 | https://skilllx.com/category/technologies/machine-learning/ 20 | What is Anomaly Detection in Machine Learning? 21 | https://thecleverprogrammer.com/2020/11/04/what-is-anomaly-detection-in-machine-learning/?fbclid=IwAR1fCr_LKHuHjc3XlsyVpxtQpAGpKIo0baBzmXUoyvO0WUtdNgP4WIWKG2I 22 | MACHINE LEARNING FROM SCRATCH PYTHON- TABLE OF CONTENT 23 | https://aihubprojects.com/machine-learning-from-scratch-python/?fbclid=IwAR0bgDZiVBczdqqZLlkLc4Towas2QMjsYNjJpOxqRFOUzfY-0e-U6mO-ZIs 24 | 25 | -------------------------------------------------------------------------------- /Sklearn/supervised algorithm/Naive_Bayes_Algorithm_in_Machine_Learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Naive Bayes Algorithm in Machine Learning.ipynb", 7 | "provenance": [] 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "2ReRIaHRDJwq" 22 | }, 23 | "source": [ 24 | "# **Introduction**" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": { 30 | "id": "2vldBcOaDU9B" 31 | }, 32 | "source": [ 33 | "In machine learning, the Naive Bayes algorithm is based on Bayes’ theorem with naïve assumptions. This makes it easier to train a model by assuming that the features are independent of each other. In this article, I will give you an introduction to the Naive Bayes algorithm in Machine Learning and its implementation using Python." 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": { 39 | "id": "3yJkU0w5DWsy" 40 | }, 41 | "source": [ 42 | "In machine learning, the Naive Bayes is a classification algorithm based on Bayes’ theorem. It is said to be naive because the foundation of this algorithm is based on naive assumptions. Some of the advantages of this algorithm are:\n", 43 | "\n", 44 | "- It is a very simple algorithm for classification problems compared to other classification algorithms.\n", 45 | "- It is also a very powerful algorithm which implies that it is faster to predict labels using it compared to other classification algorithms.\n", 46 | "- Another advantage of using it is that it can also give better results on small datasets compared to other algorithms." 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": { 52 | "id": "Tzi1tw53DrUS" 53 | }, 54 | "source": [ 55 | "Like all other machine learning algorithms, it also has some drawbacks. One of the biggest drawbacks that sometimes matters in classification issues is that Naive Bayes have a very strong assumption that features are independent of each other. It is therefore difficult to find such datasets in real problems where the features are independent of each other." 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": { 61 | "id": "BifXbMvDDze7" 62 | }, 63 | "source": [ 64 | "# **Assumptions:**" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": { 70 | "id": "7e6AK4UHD8uz" 71 | }, 72 | "source": [ 73 | "The naive hypothesis of the Naive Bayes classifier states that each entity in the dataset makes an independent and equal contribution to the prediction of the labels.\n", 74 | "\n", 75 | "Simply put, we can say that we may not find any correlation between features and that each feature has equal importance for the formation of a classification model.\n", 76 | "\n", 77 | "These assumptions are usually not true when working in real-life problems, but the algorithm still works well, which is why it is known as the “naive” Bayes." 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": { 83 | "id": "i2y-KQh1EHSJ" 84 | }, 85 | "source": [ 86 | "**Types:**\n", 87 | "\n", 88 | "There are three types of Naive Bayes classifiers which depend on the distribution of the dataset, namely; Gaussian, Mautinomial and Bernoulli. Let’s review the types of Naive Bayes Classifier before we implement this algorithm using Python:\n", 89 | "\n", 90 | "- Gaussian: It is used when the dataset is normally distributed.\n", 91 | "- Multinomial: It is used when the dataset contains discrete values.\n", 92 | "- Bernoulli: It is used while working on binary classification problems." 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": { 98 | "id": "N6VCaPDyEVAy" 99 | }, 100 | "source": [ 101 | "**Naive Bayes Algorithm Using Python**\n", 102 | "\n", 103 | "Hope so far you may have discovered a lot of facts about Naive Bayes classification algorithm in machine learning. Now in this section, I will walk you through how to implement it using the Python programming language. Here I will be using the classic iris dataset for this task:" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "metadata": { 109 | "colab": { 110 | "base_uri": "https://localhost:8080/" 111 | }, 112 | "id": "Rl8S77LBEUTb", 113 | "outputId": "d20590c5-759c-4f34-966d-74bfb6fcd2c0" 114 | }, 115 | "source": [ 116 | "from sklearn.naive_bayes import GaussianNB\n", 117 | "from sklearn.naive_bayes import MultinomialNB\n", 118 | "from sklearn import datasets\n", 119 | "from sklearn.metrics import confusion_matrix\n", 120 | "iris = datasets.load_iris()\n", 121 | "gnb = GaussianNB()\n", 122 | "mnb = MultinomialNB()\n", 123 | "y_pred_gnb = gnb.fit(iris.data, iris.target).predict(iris.data)\n", 124 | "cnf_matrix_gnb = confusion_matrix(iris.target, y_pred_gnb)\n", 125 | "print(cnf_matrix_gnb)" 126 | ], 127 | "execution_count": 1, 128 | "outputs": [ 129 | { 130 | "output_type": "stream", 131 | "text": [ 132 | "[[50 0 0]\n", 133 | " [ 0 47 3]\n", 134 | " [ 0 3 47]]\n" 135 | ], 136 | "name": "stdout" 137 | } 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "metadata": { 143 | "colab": { 144 | "base_uri": "https://localhost:8080/" 145 | }, 146 | "id": "DiTu6btfEg8B", 147 | "outputId": "dbb74cb4-57c7-4db6-e54b-1fe70f9294a3" 148 | }, 149 | "source": [ 150 | "y_pred_mnb = mnb.fit(iris.data, iris.target).predict(iris.data)\n", 151 | "cnf_matrix_mnb = confusion_matrix(iris.target, y_pred_mnb)\n", 152 | "print(cnf_matrix_mnb)" 153 | ], 154 | "execution_count": 2, 155 | "outputs": [ 156 | { 157 | "output_type": "stream", 158 | "text": [ 159 | "[[50 0 0]\n", 160 | " [ 0 46 4]\n", 161 | " [ 0 3 47]]\n" 162 | ], 163 | "name": "stdout" 164 | } 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": { 170 | "id": "FDJ9vY_0EloR" 171 | }, 172 | "source": [ 173 | "**Conclusion**\n", 174 | "\n", 175 | "This is how easy it is to implement the Naive Bayes algorithm using Python for classification problems in machine learning. Some of the real-time issues where the Naive Bayes classifier can be used are:\n", 176 | "\n", 177 | "- Text Classification\n", 178 | "- Spam Detection\n", 179 | "- Sentiment Analysis\n", 180 | "- Recommendation Systems\n", 181 | "\n", 182 | "I hope you liked this article on Naive Bayes classifier in machine learning and its implementation using Python. Feel free to ask your valuable questions in the comments section below." 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": { 188 | "id": "AbDjpQd3C8mi" 189 | }, 190 | "source": [ 191 | "# **References**" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": { 197 | "id": "L4CtOLMnDBuL" 198 | }, 199 | "source": [ 200 | "[Naive Bayes Algorithm in Machine Learning](https://thecleverprogrammer.com/2021/02/07/naive-bayes-algorithm-in-machine-learning/)" 201 | ] 202 | } 203 | ] 204 | } -------------------------------------------------------------------------------- /Sklearn/supervised algorithm/Perceptron_in_Machine_Learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Perceptron in Machine Learning.ipynb", 7 | "provenance": [], 8 | "toc_visible": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "XgG-pUdEI3CK" 23 | }, 24 | "source": [ 25 | "# **Introduction**\n", 26 | "Perceptron is one of the simplest architecture of **Artificial Neural Networks** in Machine Learning. It was invented by **Frank Rosenblatt in 1957**. In this article, I will take you through an introduction to Perceptron in Machine Learning and its implementation using Python.\n", 27 | "\n", 28 | "Perceptron is a type of neural network architecture that falls under the category of the simplest form of artificial neural networks. The Perceptrons are generally based on different types of artificial neurons known as **Threshold Logic Unit (TLU) or sometimes Linear Threshold Unit (LTU).** The inputs and outputs of a perceptron are numbers, unlike the values we see using a classification algorithm like logistic regression (True or False values).\n", 29 | "\n", 30 | "Perceptrons are made up of a single layer of Threshold Logic Unit where each TLU is connected to all inputs. A single TLU can be used to solve the binary classification problem and if all neurons in one layer are connected to each neuron in the previous layer, it is called a fully connected layer or dense layer. Such types of architectures can be used in the problems of multiclass classification.\n" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": { 36 | "id": "hBtctHfzJGya" 37 | }, 38 | "source": [ 39 | "# **Implementation of Perceptron using Python**\n", 40 | "\n", 41 | "Thus, a Perceptron is the simplest architecture of an artificial neural network that can be used to train binary or multiclass classification models. Now let’s see the implementation of perceptrons using Python. Here I will use a perceptron on the classic iris dataset to classify iris species. Here is how we can implement Perceptron using Python:\n" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "metadata": { 47 | "colab": { 48 | "base_uri": "https://localhost:8080/" 49 | }, 50 | "id": "EwAapSDzIlmZ", 51 | "outputId": "2570f288-e0f1-4f8e-8f0d-e777e0ba3222" 52 | }, 53 | "source": [ 54 | "import numpy as np\n", 55 | "from sklearn.datasets import load_iris\n", 56 | "from sklearn.linear_model import Perceptron\n", 57 | "iris = load_iris()\n", 58 | "x = iris.data[:,(2,3)] #petal length, petal width\n", 59 | "y = (iris.target == 0).astype(np.int) #iris setosa\n", 60 | "perceptron = Perceptron()\n", 61 | "perceptron.fit(x, y)\n", 62 | "ypred = perceptron.predict([[2, 0.5]])\n", 63 | "print(ypred)" 64 | ], 65 | "execution_count": 8, 66 | "outputs": [ 67 | { 68 | "output_type": "stream", 69 | "text": [ 70 | "[0]\n" 71 | ], 72 | "name": "stdout" 73 | } 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": { 79 | "id": "v4H1RuMkS4-L" 80 | }, 81 | "source": [ 82 | "The performance of Perceptrons strongly resembles the stochastic gradient descent algorithm in machine learning. But unlike a classification algorithm, perceptrons do not produce a binary class output because they make predictions on hard thresholds. This is why machine learning classification algorithms are more preferred than using a Perceptron architecture to solve a classification problem." 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": { 88 | "id": "K-wD6XrbKwKJ" 89 | }, 90 | "source": [ 91 | "# **Summary**\n", 92 | "Perceptrons are a type of neural network architecture that falls under the category of the simplest form of artificial neural networks. I hope you liked this article on an introduction to Perceptrons in machine learning and its implementation using the Python programming language. Feel free to ask your valuable questions in the comments section below.\n" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": { 98 | "id": "uOu2ttBhIqPr" 99 | }, 100 | "source": [ 101 | "# **References**\n", 102 | "\n", 103 | "[Perceptron in Machine Learning](https://thecleverprogrammer.com/2021/05/23/perceptron-in-machine-learning/)" 104 | ] 105 | } 106 | ] 107 | } -------------------------------------------------------------------------------- /Sklearn/supervised algorithm/Reg-Mulitple-Linear-Regression-Co2-py-v1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "button": false, 7 | "deletable": true, 8 | "new_sheet": false, 9 | "run_control": { 10 | "read_only": false 11 | } 12 | }, 13 | "source": [ 14 | "\n", 15 | "\n", 16 | "

Multiple Linear Regression

\n", 17 | "\n", 18 | "

About this Notebook

\n", 19 | "In this notebook, we learn how to use scikit-learn to implement Multiple linear regression. We download a dataset that is related to fuel consumption and Carbon dioxide emission of cars. Then, we split our data into training and test sets, create a model using training set, Evaluate your model using test set, and finally use model to predict unknown value\n" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "

Table of contents

\n", 27 | "\n", 28 | "
\n", 29 | "
    \n", 30 | "
  1. Understanding the Data
  2. \n", 31 | "
  3. Reading the Data in
  4. \n", 32 | "
  5. Multiple Regression Model
  6. \n", 33 | "
  7. Prediction
  8. \n", 34 | "
  9. Practice
  10. \n", 35 | "
\n", 36 | "
\n", 37 | "
\n", 38 | "
" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": { 44 | "button": false, 45 | "deletable": true, 46 | "new_sheet": false, 47 | "run_control": { 48 | "read_only": false 49 | } 50 | }, 51 | "source": [ 52 | "### Importing Needed packages" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": null, 58 | "metadata": { 59 | "button": false, 60 | "collapsed": true, 61 | "deletable": true, 62 | "new_sheet": false, 63 | "run_control": { 64 | "read_only": false 65 | } 66 | }, 67 | "outputs": [], 68 | "source": [ 69 | "import matplotlib.pyplot as plt\n", 70 | "import pandas as pd\n", 71 | "import pylab as pl\n", 72 | "import numpy as np\n", 73 | "%matplotlib inline" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": { 79 | "button": false, 80 | "deletable": true, 81 | "new_sheet": false, 82 | "run_control": { 83 | "read_only": false 84 | } 85 | }, 86 | "source": [ 87 | "### Downloading Data\n", 88 | "To download the data, we will use !wget to download it from IBM Object Storage." 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": { 95 | "button": false, 96 | "collapsed": true, 97 | "deletable": true, 98 | "new_sheet": false, 99 | "run_control": { 100 | "read_only": false 101 | } 102 | }, 103 | "outputs": [], 104 | "source": [ 105 | "!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "__Did you know?__ When it comes to Machine Learning, you will likely be working with large datasets. As a business, where can you host your data? IBM is offering a unique opportunity for businesses, with 10 Tb of IBM Cloud Object Storage: [Sign up now for free](http://cocl.us/ML0101EN-IBM-Offer-CC)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": { 118 | "button": false, 119 | "deletable": true, 120 | "new_sheet": false, 121 | "run_control": { 122 | "read_only": false 123 | } 124 | }, 125 | "source": [ 126 | "\n", 127 | "

Understanding the Data

\n", 128 | "\n", 129 | "### `FuelConsumption.csv`:\n", 130 | "We have downloaded a fuel consumption dataset, **`FuelConsumption.csv`**, which contains model-specific fuel consumption ratings and estimated carbon dioxide emissions for new light-duty vehicles for retail sale in Canada. [Dataset source](http://open.canada.ca/data/en/dataset/98f1a129-f628-4ce4-b24d-6f16bf24dd64)\n", 131 | "\n", 132 | "- **MODELYEAR** e.g. 2014\n", 133 | "- **MAKE** e.g. Acura\n", 134 | "- **MODEL** e.g. ILX\n", 135 | "- **VEHICLE CLASS** e.g. SUV\n", 136 | "- **ENGINE SIZE** e.g. 4.7\n", 137 | "- **CYLINDERS** e.g 6\n", 138 | "- **TRANSMISSION** e.g. A6\n", 139 | "- **FUELTYPE** e.g. z\n", 140 | "- **FUEL CONSUMPTION in CITY(L/100 km)** e.g. 9.9\n", 141 | "- **FUEL CONSUMPTION in HWY (L/100 km)** e.g. 8.9\n", 142 | "- **FUEL CONSUMPTION COMB (L/100 km)** e.g. 9.2\n", 143 | "- **CO2 EMISSIONS (g/km)** e.g. 182 --> low --> 0\n" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": { 149 | "button": false, 150 | "deletable": true, 151 | "new_sheet": false, 152 | "run_control": { 153 | "read_only": false 154 | } 155 | }, 156 | "source": [ 157 | "

Reading the data in

" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": { 164 | "button": false, 165 | "collapsed": true, 166 | "deletable": true, 167 | "new_sheet": false, 168 | "run_control": { 169 | "read_only": false 170 | } 171 | }, 172 | "outputs": [], 173 | "source": [ 174 | "df = pd.read_csv(\"FuelConsumption.csv\")\n", 175 | "\n", 176 | "# take a look at the dataset\n", 177 | "df.head()" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "Lets select some features that we want to use for regression." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": { 191 | "button": false, 192 | "collapsed": true, 193 | "deletable": true, 194 | "new_sheet": false, 195 | "run_control": { 196 | "read_only": false 197 | } 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_CITY','FUELCONSUMPTION_HWY','FUELCONSUMPTION_COMB','CO2EMISSIONS']]\n", 202 | "cdf.head(9)" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "Lets plot Emission values with respect to Engine size:" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "button": false, 217 | "collapsed": true, 218 | "deletable": true, 219 | "new_sheet": false, 220 | "run_control": { 221 | "read_only": false 222 | }, 223 | "scrolled": true 224 | }, 225 | "outputs": [], 226 | "source": [ 227 | "plt.scatter(cdf.ENGINESIZE, cdf.CO2EMISSIONS, color='blue')\n", 228 | "plt.xlabel(\"Engine size\")\n", 229 | "plt.ylabel(\"Emission\")\n", 230 | "plt.show()" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": { 236 | "button": false, 237 | "deletable": true, 238 | "new_sheet": false, 239 | "run_control": { 240 | "read_only": false 241 | } 242 | }, 243 | "source": [ 244 | "#### Creating train and test dataset\n", 245 | "Train/Test Split involves splitting the dataset into training and testing sets respectively, which are mutually exclusive. After which, you train with the training set and test with the testing set. \n", 246 | "This will provide a more accurate evaluation on out-of-sample accuracy because the testing dataset is not part of the dataset that have been used to train the data. It is more realistic for real world problems.\n", 247 | "\n", 248 | "This means that we know the outcome of each data point in this dataset, making it great to test with! And since this data has not been used to train the model, the model has no knowledge of the outcome of these data points. So, in essence, it’s truly an out-of-sample testing.\n", 249 | "\n" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": { 256 | "button": false, 257 | "collapsed": true, 258 | "deletable": true, 259 | "new_sheet": false, 260 | "run_control": { 261 | "read_only": false 262 | } 263 | }, 264 | "outputs": [], 265 | "source": [ 266 | "msk = np.random.rand(len(df)) < 0.8\n", 267 | "train = cdf[msk]\n", 268 | "test = cdf[~msk]" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": { 274 | "button": false, 275 | "deletable": true, 276 | "new_sheet": false, 277 | "run_control": { 278 | "read_only": false 279 | } 280 | }, 281 | "source": [ 282 | "#### Train data distribution" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": null, 288 | "metadata": { 289 | "button": false, 290 | "collapsed": true, 291 | "deletable": true, 292 | "new_sheet": false, 293 | "run_control": { 294 | "read_only": false 295 | } 296 | }, 297 | "outputs": [], 298 | "source": [ 299 | "plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')\n", 300 | "plt.xlabel(\"Engine size\")\n", 301 | "plt.ylabel(\"Emission\")\n", 302 | "plt.show()" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": { 308 | "button": false, 309 | "deletable": true, 310 | "new_sheet": false, 311 | "run_control": { 312 | "read_only": false 313 | } 314 | }, 315 | "source": [ 316 | "

Multiple Regression Model

\n" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "In reality, there are multiple variables that predict the Co2emission. When more than one independent variable is present, the process is called multiple linear regression. For example, predicting co2emission using FUELCONSUMPTION_COMB, EngineSize and Cylinders of cars. The good thing here is that Multiple linear regression is the extension of simple linear regression model." 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": { 330 | "button": false, 331 | "collapsed": true, 332 | "deletable": true, 333 | "new_sheet": false, 334 | "run_control": { 335 | "read_only": false 336 | } 337 | }, 338 | "outputs": [], 339 | "source": [ 340 | "from sklearn import linear_model\n", 341 | "regr = linear_model.LinearRegression()\n", 342 | "x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])\n", 343 | "y = np.asanyarray(train[['CO2EMISSIONS']])\n", 344 | "regr.fit (x, y)\n", 345 | "# The coefficients\n", 346 | "print ('Coefficients: ', regr.coef_)" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "As mentioned before, __Coefficient__ and __Intercept__ , are the parameters of the fit line. \n", 354 | "Given that it is a multiple linear regression, with 3 parameters, and knowing that the parameters are the intercept and coefficients of hyperplane, sklearn can estimate them from our data. Scikit-learn uses plain Ordinary Least Squares method to solve this problem.\n", 355 | "\n", 356 | "#### Ordinary Least Squares (OLS)\n", 357 | "OLS is a method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by minimizing the sum of the squares of the differences between the target dependent variable and those predicted by the linear function. In other words, it tries to minimizes the sum of squared errors (SSE) or mean squared error (MSE) between the target variable (y) and our predicted output ($\\hat{y}$) over all samples in the dataset.\n", 358 | "\n", 359 | "OLS can find the best parameters using of the following methods:\n", 360 | " - Solving the model parameters analytically using closed-form equations\n", 361 | " - Using an optimization algorithm (Gradient Descent, Stochastic Gradient Descent, Newton’s Method, etc.)" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "

Prediction

" 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": null, 374 | "metadata": { 375 | "button": false, 376 | "collapsed": true, 377 | "deletable": true, 378 | "new_sheet": false, 379 | "run_control": { 380 | "read_only": false 381 | } 382 | }, 383 | "outputs": [], 384 | "source": [ 385 | "y_hat= regr.predict(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])\n", 386 | "x = np.asanyarray(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])\n", 387 | "y = np.asanyarray(test[['CO2EMISSIONS']])\n", 388 | "print(\"Residual sum of squares: %.2f\"\n", 389 | " % np.mean((y_hat - y) ** 2))\n", 390 | "\n", 391 | "# Explained variance score: 1 is perfect prediction\n", 392 | "print('Variance score: %.2f' % regr.score(x, y))" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "__explained variance regression score:__ \n", 400 | "If $\\hat{y}$ is the estimated target output, y the corresponding (correct) target output, and Var is Variance, the square of the standard deviation, then the explained variance is estimated as follow:\n", 401 | "\n", 402 | "$\\texttt{explainedVariance}(y, \\hat{y}) = 1 - \\frac{Var\\{ y - \\hat{y}\\}}{Var\\{y\\}}$ \n", 403 | "The best possible score is 1.0, lower values are worse." 404 | ] 405 | }, 406 | { 407 | "cell_type": "markdown", 408 | "metadata": {}, 409 | "source": [ 410 | "

Practice

\n", 411 | "Try to use a multiple linear regression with the same dataset but this time use __FUEL CONSUMPTION in CITY__ and \n", 412 | "__FUEL CONSUMPTION in HWY__ instead of FUELCONSUMPTION_COMB. Does it result in better accuracy?" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": null, 418 | "metadata": {}, 419 | "outputs": [], 420 | "source": [ 421 | "# write your code here\n", 422 | "\n" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "Double-click __here__ for the solution.\n", 430 | "\n", 431 | "" 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": { 451 | "button": false, 452 | "deletable": true, 453 | "new_sheet": false, 454 | "run_control": { 455 | "read_only": false 456 | } 457 | }, 458 | "source": [ 459 | "

Want to learn more?

\n", 460 | "\n", 461 | "IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: SPSS Modeler\n", 462 | "\n", 463 | "Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at Watson Studio\n", 464 | "\n", 465 | "

Thanks for completing this lesson!

\n", 466 | "\n", 467 | "

Author: Saeed Aghabozorgi

\n", 468 | "

Saeed Aghabozorgi, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.

\n", 469 | "\n", 470 | "
\n", 471 | "\n", 472 | "

Copyright © 2018 Cognitive Class. This notebook and its source code are released under the terms of the MIT License.

" 473 | ] 474 | } 475 | ], 476 | "metadata": { 477 | "kernelspec": { 478 | "display_name": "Python 3", 479 | "language": "python", 480 | "name": "python3" 481 | }, 482 | "language_info": { 483 | "codemirror_mode": { 484 | "name": "ipython", 485 | "version": 3 486 | }, 487 | "file_extension": ".py", 488 | "mimetype": "text/x-python", 489 | "name": "python", 490 | "nbconvert_exporter": "python", 491 | "pygments_lexer": "ipython3", 492 | "version": "3.6.6" 493 | }, 494 | "widgets": { 495 | "state": {}, 496 | "version": "1.1.2" 497 | } 498 | }, 499 | "nbformat": 4, 500 | "nbformat_minor": 2 501 | } 502 | -------------------------------------------------------------------------------- /Sklearn/supervised algorithm/Reg-NoneLinearRegression-py-v1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "\n", 9 | "

Non Linear Regression Analysis

" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "If the data shows a curvy trend, then linear regression will not produce very accurate results when compared to a non-linear regression because, as the name implies, linear regression presumes that the data is linear. \n", 17 | "Let's learn about non linear regressions and apply an example on python. In this notebook, we fit a non-linear model to the datapoints corrensponding to China's GDP from 1960 to 2014." 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "

Importing required libraries

" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": { 31 | "collapsed": false 32 | }, 33 | "outputs": [], 34 | "source": [ 35 | "import numpy as np\n", 36 | "import matplotlib.pyplot as plt\n", 37 | "%matplotlib inline" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "Though Linear regression is very good to solve many problems, it cannot be used for all datasets. First recall how linear regression, could model a dataset. It models a linear relation between a dependent variable y and independent variable x. It had a simple equation, of degree 1, for example y = $2x$ + 3." 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": null, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "x = np.arange(-5.0, 5.0, 0.1)\n", 54 | "\n", 55 | "##You can adjust the slope and intercept to verify the changes in the graph\n", 56 | "y = 2*(x) + 3\n", 57 | "y_noise = 2 * np.random.normal(size=x.size)\n", 58 | "ydata = y + y_noise\n", 59 | "#plt.figure(figsize=(8,6))\n", 60 | "plt.plot(x, ydata, 'bo')\n", 61 | "plt.plot(x,y, 'r') \n", 62 | "plt.ylabel('Dependent Variable')\n", 63 | "plt.xlabel('Indepdendent Variable')\n", 64 | "plt.show()" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "Non-linear regressions are a relationship between independent variables $x$ and a dependent variable $y$ which result in a non-linear function modeled data. Essentially any relationship that is not linear can be termed as non-linear, and is usually represented by the polynomial of $k$ degrees (maximum power of $x$). \n", 72 | "\n", 73 | "$$ \\ y = a x^3 + b x^2 + c x + d \\ $$\n", 74 | "\n", 75 | "Non-linear functions can have elements like exponentials, logarithms, fractions, and others. For example: $$ y = \\log(x)$$\n", 76 | " \n", 77 | "Or even, more complicated such as :\n", 78 | "$$ y = \\log(a x^3 + b x^2 + c x + d)$$" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "Let's take a look at a cubic function's graph." 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": null, 91 | "metadata": { 92 | "collapsed": false 93 | }, 94 | "outputs": [], 95 | "source": [ 96 | "x = np.arange(-5.0, 5.0, 0.1)\n", 97 | "\n", 98 | "##You can adjust the slope and intercept to verify the changes in the graph\n", 99 | "y = 1*(x**3) + 1*(x**2) + 1*x + 3\n", 100 | "y_noise = 20 * np.random.normal(size=x.size)\n", 101 | "ydata = y + y_noise\n", 102 | "plt.plot(x, ydata, 'bo')\n", 103 | "plt.plot(x,y, 'r') \n", 104 | "plt.ylabel('Dependent Variable')\n", 105 | "plt.xlabel('Indepdendent Variable')\n", 106 | "plt.show()" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "As you can see, this function has $x^3$ and $x^2$ as independent variables. Also, the graphic of this function is not a straight line over the 2D plane. So this is a non-linear function." 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "Some other types of non-linear functions are:" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "### Quadratic" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "$$ Y = X^2 $$" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": null, 140 | "metadata": { 141 | "collapsed": false 142 | }, 143 | "outputs": [], 144 | "source": [ 145 | "x = np.arange(-5.0, 5.0, 0.1)\n", 146 | "\n", 147 | "##You can adjust the slope and intercept to verify the changes in the graph\n", 148 | "\n", 149 | "y = np.power(x,2)\n", 150 | "y_noise = 2 * np.random.normal(size=x.size)\n", 151 | "ydata = y + y_noise\n", 152 | "plt.plot(x, ydata, 'bo')\n", 153 | "plt.plot(x,y, 'r') \n", 154 | "plt.ylabel('Dependent Variable')\n", 155 | "plt.xlabel('Indepdendent Variable')\n", 156 | "plt.show()" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "### Exponential" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "An exponential function with base c is defined by $$ Y = a + b c^X$$ where b ≠0, c > 0 , c ≠1, and x is any real number. The base, c, is constant and the exponent, x, is a variable. \n", 171 | "\n" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": { 178 | "collapsed": false 179 | }, 180 | "outputs": [], 181 | "source": [ 182 | "X = np.arange(-5.0, 5.0, 0.1)\n", 183 | "\n", 184 | "##You can adjust the slope and intercept to verify the changes in the graph\n", 185 | "\n", 186 | "Y= np.exp(X)\n", 187 | "\n", 188 | "plt.plot(X,Y) \n", 189 | "plt.ylabel('Dependent Variable')\n", 190 | "plt.xlabel('Indepdendent Variable')\n", 191 | "plt.show()" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "### Logarithmic\n", 199 | "\n", 200 | "The response $y$ is a results of applying logarithmic map from input $x$'s to output variable $y$. It is one of the simplest form of __log()__: i.e. $$ y = \\log(x)$$\n", 201 | "\n", 202 | "Please consider that instead of $x$, we can use $X$, which can be polynomial representation of the $x$'s. In general form it would be written as \n", 203 | "\\begin{equation}\n", 204 | "y = \\log(X)\n", 205 | "\\end{equation}" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": null, 211 | "metadata": { 212 | "collapsed": false 213 | }, 214 | "outputs": [], 215 | "source": [ 216 | "X = np.arange(-5.0, 5.0, 0.1)\n", 217 | "\n", 218 | "Y = np.log(X)\n", 219 | "\n", 220 | "plt.plot(X,Y) \n", 221 | "plt.ylabel('Dependent Variable')\n", 222 | "plt.xlabel('Indepdendent Variable')\n", 223 | "plt.show()" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "### Sigmoidal/Logistic" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "$$ Y = a + \\frac{b}{1+ c^{(X-d)}}$$" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": null, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "X = np.arange(-5.0, 5.0, 0.1)\n", 247 | "\n", 248 | "\n", 249 | "Y = 1-4/(1+np.power(3, X-2))\n", 250 | "\n", 251 | "plt.plot(X,Y) \n", 252 | "plt.ylabel('Dependent Variable')\n", 253 | "plt.xlabel('Indepdendent Variable')\n", 254 | "plt.show()" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "\n", 262 | "# Non-Linear Regression example" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": {}, 268 | "source": [ 269 | "For an example, we're going to try and fit a non-linear model to the datapoints corresponding to China's GDP from 1960 to 2014. We download a dataset with two columns, the first, a year between 1960 and 2014, the second, China's corresponding annual gross domestic income in US dollars for that year. " 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": { 276 | "collapsed": false 277 | }, 278 | "outputs": [], 279 | "source": [ 280 | "import numpy as np\n", 281 | "import pandas as pd\n", 282 | "\n", 283 | "#downloading dataset\n", 284 | "!wget -nv -O china_gdp.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/china_gdp.csv\n", 285 | " \n", 286 | "df = pd.read_csv(\"china_gdp.csv\")\n", 287 | "df.head(10)" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "__Did you know?__ When it comes to Machine Learning, you will likely be working with large datasets. As a business, where can you host your data? IBM is offering a unique opportunity for businesses, with 10 Tb of IBM Cloud Object Storage: [Sign up now for free](http://cocl.us/ML0101EN-IBM-Offer-CC)" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "### Plotting the Dataset ###\n", 302 | "This is what the datapoints look like. It kind of looks like an either logistic or exponential function. The growth starts off slow, then from 2005 on forward, the growth is very significant. And finally, it decelerate slightly in the 2010s." 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": null, 308 | "metadata": { 309 | "collapsed": false 310 | }, 311 | "outputs": [], 312 | "source": [ 313 | "plt.figure(figsize=(8,5))\n", 314 | "x_data, y_data = (df[\"Year\"].values, df[\"Value\"].values)\n", 315 | "plt.plot(x_data, y_data, 'ro')\n", 316 | "plt.ylabel('GDP')\n", 317 | "plt.xlabel('Year')\n", 318 | "plt.show()" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "### Choosing a model ###\n", 326 | "\n", 327 | "From an initial look at the plot, we determine that the logistic function could be a good approximation,\n", 328 | "since it has the property of starting with a slow growth, increasing growth in the middle, and then decreasing again at the end; as illustrated below:" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": null, 334 | "metadata": { 335 | "collapsed": false 336 | }, 337 | "outputs": [], 338 | "source": [ 339 | "X = np.arange(-5.0, 5.0, 0.1)\n", 340 | "Y = 1.0 / (1.0 + np.exp(-X))\n", 341 | "\n", 342 | "plt.plot(X,Y) \n", 343 | "plt.ylabel('Dependent Variable')\n", 344 | "plt.xlabel('Indepdendent Variable')\n", 345 | "plt.show()" 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | "\n", 353 | "\n", 354 | "The formula for the logistic function is the following:\n", 355 | "\n", 356 | "$$ \\hat{Y} = \\frac1{1+e^{\\beta_1(X-\\beta_2)}}$$\n", 357 | "\n", 358 | "$\\beta_1$: Controls the curve's steepness,\n", 359 | "\n", 360 | "$\\beta_2$: Slides the curve on the x-axis." 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "### Building The Model ###\n", 368 | "Now, let's build our regression model and initialize its parameters. " 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": null, 374 | "metadata": {}, 375 | "outputs": [], 376 | "source": [ 377 | "def sigmoid(x, Beta_1, Beta_2):\n", 378 | " y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2)))\n", 379 | " return y" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "Lets look at a sample sigmoid line that might fit with the data:" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": null, 392 | "metadata": { 393 | "collapsed": false 394 | }, 395 | "outputs": [], 396 | "source": [ 397 | "beta_1 = 0.10\n", 398 | "beta_2 = 1990.0\n", 399 | "\n", 400 | "#logistic function\n", 401 | "Y_pred = sigmoid(x_data, beta_1 , beta_2)\n", 402 | "\n", 403 | "#plot initial prediction against datapoints\n", 404 | "plt.plot(x_data, Y_pred*15000000000000.)\n", 405 | "plt.plot(x_data, y_data, 'ro')" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "Our task here is to find the best parameters for our model. Lets first normalize our x and y:" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": null, 418 | "metadata": {}, 419 | "outputs": [], 420 | "source": [ 421 | "# Lets normalize our data\n", 422 | "xdata =x_data/max(x_data)\n", 423 | "ydata =y_data/max(y_data)" 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "metadata": {}, 429 | "source": [ 430 | "#### How we find the best parameters for our fit line?\n", 431 | "we can use __curve_fit__ which uses non-linear least squares to fit our sigmoid function, to data. Optimal values for the parameters so that the sum of the squared residuals of sigmoid(xdata, *popt) - ydata is minimized.\n", 432 | "\n", 433 | "popt are our optimized parameters." 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": null, 439 | "metadata": {}, 440 | "outputs": [], 441 | "source": [ 442 | "from scipy.optimize import curve_fit\n", 443 | "popt, pcov = curve_fit(sigmoid, xdata, ydata)\n", 444 | "#print the final parameters\n", 445 | "print(\" beta_1 = %f, beta_2 = %f\" % (popt[0], popt[1]))" 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "Now we plot our resulting regression model." 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": null, 458 | "metadata": {}, 459 | "outputs": [], 460 | "source": [ 461 | "x = np.linspace(1960, 2015, 55)\n", 462 | "x = x/max(x)\n", 463 | "plt.figure(figsize=(8,5))\n", 464 | "y = sigmoid(x, *popt)\n", 465 | "plt.plot(xdata, ydata, 'ro', label='data')\n", 466 | "plt.plot(x,y, linewidth=3.0, label='fit')\n", 467 | "plt.legend(loc='best')\n", 468 | "plt.ylabel('GDP')\n", 469 | "plt.xlabel('Year')\n", 470 | "plt.show()" 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": {}, 476 | "source": [ 477 | "## Practice\n", 478 | "Can you calculate what is the accuracy of our model?" 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": null, 484 | "metadata": {}, 485 | "outputs": [], 486 | "source": [ 487 | "# write your code here\n", 488 | "\n", 489 | "\n" 490 | ] 491 | }, 492 | { 493 | "cell_type": "markdown", 494 | "metadata": {}, 495 | "source": [ 496 | "Double-click __here__ for the solution.\n", 497 | "\n", 498 | "" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "

Want to learn more?

\n", 527 | "\n", 528 | "IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: SPSS Modeler\n", 529 | "\n", 530 | "Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at Watson Studio\n", 531 | "\n", 532 | "

Thanks for completing this lesson!

\n", 533 | "\n", 534 | "

Author: Saeed Aghabozorgi

\n", 535 | "

Saeed Aghabozorgi, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.

\n", 536 | "\n", 537 | "
\n", 538 | "\n", 539 | "

Copyright © 2018 Cognitive Class. This notebook and its source code are released under the terms of the MIT License.

" 540 | ] 541 | } 542 | ], 543 | "metadata": { 544 | "kernelspec": { 545 | "display_name": "Python 3", 546 | "language": "python", 547 | "name": "python3" 548 | }, 549 | "language_info": { 550 | "codemirror_mode": { 551 | "name": "ipython", 552 | "version": 3 553 | }, 554 | "file_extension": ".py", 555 | "mimetype": "text/x-python", 556 | "name": "python", 557 | "nbconvert_exporter": "python", 558 | "pygments_lexer": "ipython3", 559 | "version": "3.6.6" 560 | } 561 | }, 562 | "nbformat": 4, 563 | "nbformat_minor": 2 564 | } 565 | -------------------------------------------------------------------------------- /Sklearn/supervised algorithm/Voting_Classifiers.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Voting Classifiers.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [], 9 | "toc_visible": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "language_info": { 16 | "name": "python" 17 | } 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "gL0AffZGOZxZ" 24 | }, 25 | "source": [ 26 | "# **Introduction**\n" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "id": "sEtW2gI7KVy5" 33 | }, 34 | "source": [ 35 | "Similarly, if we aggregate the predictions of a group of models (such as classifiers or regressors), we will often get better predictions than the best individual predictor. A group of predictors is called an **ensemble**. Thus this technique is called **ensemble learning**, and an ensemble learning algorithm is called an Ensemble Method." 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": { 41 | "id": "lIDjC_tsK0HY" 42 | }, 43 | "source": [ 44 | "As an example of an ensemble method, we can train a **group of decision tree classifiers**, each on a random subset of the training data. **Such an ensemble of decision trees is called a random forest**. Despite its simplicity, this is one of the most powerful machine learning algorithms available today. In this chapter, we will discuss the most famous ensemble learning methods, including: **Bagging, Boosting, & Stacking.**" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "id": "Lr8nF4iiAxoS" 51 | }, 52 | "source": [ 53 | "# **Voting Classifiers**" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": { 59 | "id": "eJIKaFHZA7MB" 60 | }, 61 | "source": [ 62 | "Suppose we have trained a few classifiers, each achieving an 80% accuracy. A very simple way to create an even better classifiers is to aggregate the predictions of all our classifiers and choose the prediction that is the most frequent.\n", 63 | "\n", 64 | "**Majority voting classification is called Hard Voting**" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": { 70 | "id": "cUPzZRuhB53n" 71 | }, 72 | "source": [ 73 | "![](https://drive.google.com/uc?export=view&id=1Y01QJdvZ4mucKd2HIfZjnZPPdn35ISIc\n", 74 | ")" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": { 80 | "id": "ytZeVqhkNGE_" 81 | }, 82 | "source": [ 83 | "Somewhat surprisingly, this classifier achieves an even better accuracy than the best predictor in the ensemble. Even if each classifier is a weak learner (does slightly better then random guessing). Assuming that we have a sufficient number of weak learners and enough diversity.\n", 84 | "\n", 85 | "Due to the law of large numbers, if we build an ensemble containing 1,000 classifiers with individual accuracies of $51%$ & trained for binary classification, If we predict the majority voting class, we can hope for up to $75%$ accuracy.\n", 86 | "\n", 87 | "This is only true if all classifiers are completely independent, making uncorrelated errors, which is clearly not the case because they are trained on the same data.\n", 88 | "\n", 89 | "One way to get diverse classifiers is use different algorithms for each one of them & train them on different subset of the training data.\n", 90 | "\n", 91 | "Let's implement a hard voting ensemble learner using scikit-learn:" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": { 97 | "id": "Bh-YhPsZCG7S" 98 | }, 99 | "source": [ 100 | "**Python implmentation**" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "metadata": { 106 | "id": "gMfrhXQhNVob" 107 | }, 108 | "source": [ 109 | "import numpy as np\n", 110 | "import pandas as pd\n", 111 | "import matplotlib.pyplot as plt\n", 112 | "import sklearn" 113 | ], 114 | "execution_count": 1, 115 | "outputs": [] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "metadata": { 120 | "id": "hprKmZLBNdNZ" 121 | }, 122 | "source": [ 123 | "from sklearn.ensemble import RandomForestClassifier\n", 124 | "from sklearn.ensemble import VotingClassifier\n", 125 | "from sklearn.linear_model import LogisticRegression\n", 126 | "from sklearn.svm import SVC" 127 | ], 128 | "execution_count": 2, 129 | "outputs": [] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "metadata": { 134 | "id": "oRZKUesUNiNn" 135 | }, 136 | "source": [ 137 | "log_clf = LogisticRegression(solver='lbfgs')\n", 138 | "rf_clf = RandomForestClassifier(n_estimators=100)\n", 139 | "svm_clf = SVC(gamma='scale')" 140 | ], 141 | "execution_count": 3, 142 | "outputs": [] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "metadata": { 147 | "id": "fXyBuAIjNnBZ" 148 | }, 149 | "source": [ 150 | "from sklearn import datasets\n", 151 | "from sklearn.model_selection import train_test_split" 152 | ], 153 | "execution_count": 4, 154 | "outputs": [] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "metadata": { 159 | "id": "xBa0B1EhNspZ" 160 | }, 161 | "source": [ 162 | "X, y = datasets.make_moons(n_samples=10000, noise=0.5)\n", 163 | "X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.33)" 164 | ], 165 | "execution_count": 5, 166 | "outputs": [] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "metadata": { 171 | "colab": { 172 | "base_uri": "https://localhost:8080/" 173 | }, 174 | "id": "57JwyteWNw1Q", 175 | "outputId": "967b0632-dda0-4799-cf94-6a17410fdea2" 176 | }, 177 | "source": [ 178 | "X_train.shape, y_train.shape, X_val.shape, y_val.shape\n" 179 | ], 180 | "execution_count": 6, 181 | "outputs": [ 182 | { 183 | "output_type": "execute_result", 184 | "data": { 185 | "text/plain": [ 186 | "((6700, 2), (6700,), (3300, 2), (3300,))" 187 | ] 188 | }, 189 | "metadata": { 190 | "tags": [] 191 | }, 192 | "execution_count": 6 193 | } 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "metadata": { 199 | "id": "xUzphfgFN2V2" 200 | }, 201 | "source": [ 202 | "voting_clf = VotingClassifier(estimators=[('lr', log_clf), ('rf', rf_clf), ('svc', svm_clf)], voting='hard')" 203 | ], 204 | "execution_count": 7, 205 | "outputs": [] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "metadata": { 210 | "colab": { 211 | "base_uri": "https://localhost:8080/" 212 | }, 213 | "id": "zNROV_7TN6dO", 214 | "outputId": "fc519296-81a5-47c0-af5e-f6c8e8a599fb" 215 | }, 216 | "source": [ 217 | "voting_clf.fit(X_train, y_train)\n" 218 | ], 219 | "execution_count": 8, 220 | "outputs": [ 221 | { 222 | "output_type": "execute_result", 223 | "data": { 224 | "text/plain": [ 225 | "VotingClassifier(estimators=[('lr',\n", 226 | " LogisticRegression(C=1.0, class_weight=None,\n", 227 | " dual=False, fit_intercept=True,\n", 228 | " intercept_scaling=1,\n", 229 | " l1_ratio=None, max_iter=100,\n", 230 | " multi_class='auto',\n", 231 | " n_jobs=None, penalty='l2',\n", 232 | " random_state=None,\n", 233 | " solver='lbfgs', tol=0.0001,\n", 234 | " verbose=0, warm_start=False)),\n", 235 | " ('rf',\n", 236 | " RandomForestClassifier(bootstrap=True,\n", 237 | " ccp_alpha=0.0,\n", 238 | " class_weight=None,\n", 239 | " cr...\n", 240 | " oob_score=False,\n", 241 | " random_state=None,\n", 242 | " verbose=0,\n", 243 | " warm_start=False)),\n", 244 | " ('svc',\n", 245 | " SVC(C=1.0, break_ties=False, cache_size=200,\n", 246 | " class_weight=None, coef0=0.0,\n", 247 | " decision_function_shape='ovr', degree=3,\n", 248 | " gamma='scale', kernel='rbf', max_iter=-1,\n", 249 | " probability=False, random_state=None,\n", 250 | " shrinking=True, tol=0.001, verbose=False))],\n", 251 | " flatten_transform=True, n_jobs=None, voting='hard',\n", 252 | " weights=None)" 253 | ] 254 | }, 255 | "metadata": { 256 | "tags": [] 257 | }, 258 | "execution_count": 8 259 | } 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": { 265 | "id": "HqIKJAZrOD6u" 266 | }, 267 | "source": [ 268 | "Let's take a look at the performance of each classifier + ensemble method on the validation set:\n", 269 | "\n" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "metadata": { 275 | "id": "I77QMfXHOHRq" 276 | }, 277 | "source": [ 278 | "from sklearn.metrics import accuracy_score\n" 279 | ], 280 | "execution_count": 9, 281 | "outputs": [] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "metadata": { 286 | "colab": { 287 | "base_uri": "https://localhost:8080/" 288 | }, 289 | "id": "lQLu9-ZtOI7n", 290 | "outputId": "a09fc59f-13d1-4d4c-99f1-3206e4c877d9" 291 | }, 292 | "source": [ 293 | "for clf in [log_clf, rf_clf, svm_clf, voting_clf]:\n", 294 | " clf.fit(X_train, y_train)\n", 295 | " y_hat = clf.predict(X_val)\n", 296 | " print(clf.__class__.__name__, accuracy_score(y_val, y_hat))" 297 | ], 298 | "execution_count": 10, 299 | "outputs": [ 300 | { 301 | "output_type": "stream", 302 | "text": [ 303 | "LogisticRegression 0.8151515151515152\n", 304 | "RandomForestClassifier 0.803939393939394\n", 305 | "SVC 0.8303030303030303\n", 306 | "VotingClassifier 0.8254545454545454\n" 307 | ], 308 | "name": "stdout" 309 | } 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": { 315 | "id": "vodRbHN8OVam" 316 | }, 317 | "source": [ 318 | "There we have it! The voting classifier slightly outperforms the individual classifiers.\n", 319 | "\n", 320 | "If all ensemble method learners can estimate class probabilities, we can average their probabilities per class then predict the class with the highest probability. This is called Soft voting. It often yields results better than hard voting because it weights confidence." 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": { 326 | "id": "EgWJOp-40ADb" 327 | }, 328 | "source": [ 329 | "# **References**" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": { 335 | "id": "MIXjbz4hOO6i" 336 | }, 337 | "source": [ 338 | "[Chapter 7. Ensemble Learning & Random Forests](https://github.com/Akramz/Hands-on-Machine-Learning-with-Scikit-Learn-Keras-and-TensorFlow/blob/master/07.Ensembles_RFs.ipynb)" 339 | ] 340 | } 341 | ] 342 | } -------------------------------------------------------------------------------- /Sklearn/supervised algorithm/XGBoost_in_Machine_Learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "XGBoost in Machine Learning.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "bIsI9dS-FNPO" 23 | }, 24 | "source": [ 25 | "# **Introduction**" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "id": "ZdupLwSYFUHh" 32 | }, 33 | "source": [ 34 | "XGBoost or Gradient Boosting is a machine learning algorithm that goes through cycles to iteratively add models to a set. In this article, I will take you through the XGBoost algorithm in Machine Learning." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "id": "0EpPpnxScrD4" 41 | }, 42 | "source": [ 43 | "The cycle of the XGBoost algorithm begins by initializing the whole with a unique model, the predictions of which can be quite naive." 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": { 49 | "id": "22KV7aATHIU5" 50 | }, 51 | "source": [ 52 | "# **The Process of XGBoost Algorithm:**\n" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": { 58 | "id": "nQkZsGzvc1pi" 59 | }, 60 | "source": [ 61 | "- First, we use the current set to generate predictions for each observation in the dataset. To make a prediction, we add the predictions of all the models in the set.\n", 62 | "- These predictions are used to calculate a loss function.\n", 63 | "- Then we use the loss function to fit a new model which will be added to the set. Specifically, we determine the parameters of the model so that adding this new model to the set reduces the loss.\n", 64 | "- Finally, we add the new model to the set, and …\n", 65 | "then repeat!" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": { 71 | "id": "zpOJPL34rP5Z" 72 | }, 73 | "source": [ 74 | "# **XGBoost Algorithm in Action**\n" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": { 80 | "id": "ePqKoUWTre6h" 81 | }, 82 | "source": [ 83 | "I’ll start by loading the training and validation data into X_train, X_valid, y_train and y_valid. The dataset, I am using here can be easily downloaded from here." 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "metadata": { 89 | "colab": { 90 | "base_uri": "https://localhost:8080/" 91 | }, 92 | "id": "HoOKm_KQsBWL", 93 | "outputId": "1d16352d-393f-46a2-88dc-22c1ac52da25" 94 | }, 95 | "source": [ 96 | "\n", 97 | "from google.colab import drive\n", 98 | "drive.mount('/content/drive')" 99 | ], 100 | "execution_count": 9, 101 | "outputs": [ 102 | { 103 | "output_type": "stream", 104 | "text": [ 105 | "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" 106 | ], 107 | "name": "stdout" 108 | } 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "metadata": { 114 | "id": "Fw0YMemUsOWj" 115 | }, 116 | "source": [ 117 | "import pandas as pd\n", 118 | "from sklearn.model_selection import train_test_split\n", 119 | "\n", 120 | "# Read the data\n", 121 | "data = pd.read_csv('/content/drive/MyDrive/Datasets/melb_data.csv')\n", 122 | "\n", 123 | "# Select subset of predictors\n", 124 | "cols_to_use = ['Rooms', 'Distance', 'Landsize', 'BuildingArea', 'YearBuilt']\n", 125 | "X = data[cols_to_use]\n", 126 | "\n", 127 | "# Select target\n", 128 | "y = data.Price\n", 129 | "\n", 130 | "# Separate data into training and validation sets\n", 131 | "X_train, X_valid, y_train, y_valid = train_test_split(X,y) " 132 | ], 133 | "execution_count": 6, 134 | "outputs": [] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": { 139 | "id": "otrycWU6sqTK" 140 | }, 141 | "source": [ 142 | "Now, here you will learn how to use the XGBoost algorithm. Here we need to import the scikit-learn API for XGBoost (xgboost.XGBRegressor). This allows us to create and adjust a model like we would in scikit-learn. As you will see in the output, the XGBRegressor class has many adjustable parameters:" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "metadata": { 148 | "id": "IsTLbK6GssO6" 149 | }, 150 | "source": [ 151 | "from xgboost import XGBRegressor\n", 152 | "\n", 153 | "my_model = XGBRegressor()\n", 154 | "my_model.fit(X_train, y_train)" 155 | ], 156 | "execution_count": null, 157 | "outputs": [] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": { 162 | "id": "ZAYMrKnXs3aj" 163 | }, 164 | "source": [ 165 | "Now, we need to make predictions and evaluate our model:\n", 166 | "\n" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "metadata": { 172 | "colab": { 173 | "base_uri": "https://localhost:8080/" 174 | }, 175 | "id": "V_A5LMrws-5i", 176 | "outputId": "2c451630-6381-4aec-a62c-590e92000d0e" 177 | }, 178 | "source": [ 179 | "from sklearn.metrics import mean_absolute_error\n", 180 | "\n", 181 | "predictions = my_model.predict(X_valid)\n", 182 | "print(\"Mean Absolute Error: \" + str(mean_absolute_error(predictions, y_valid)))" 183 | ], 184 | "execution_count": 8, 185 | "outputs": [ 186 | { 187 | "output_type": "stream", 188 | "text": [ 189 | "Mean Absolute Error: 279829.9009295499\n" 190 | ], 191 | "name": "stdout" 192 | } 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": { 198 | "id": "XD-d44CUH_8n" 199 | }, 200 | "source": [ 201 | "# **Parameter Tuning**\n" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": { 207 | "id": "fYkOlpS737BL" 208 | }, 209 | "source": [ 210 | "XGBoost has a few features that can drastically affect the accuracy and speed of training. The first feature you need to understand are:\n", 211 | "\n" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": { 217 | "id": "eJwABNAd3_Jc" 218 | }, 219 | "source": [ 220 | "**n_estimators**\n" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": { 226 | "id": "0zaqY98h4Ct6" 227 | }, 228 | "source": [ 229 | "n_estimators specifies the number of times to skip the modelling cycle described above. It is equal to the number of models we include in the set." 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": { 235 | "id": "BECHRnND4Ih2" 236 | }, 237 | "source": [ 238 | "- Too low a value results in an underfitting, leading to inaccurate predictions on training data and test data.\n", 239 | "- Too high a value results in overfitting, resulting in accurate predictions on training data, but inaccurate predictions on test data (which is important to us)." 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": { 245 | "id": "m45TqHHI4THz" 246 | }, 247 | "source": [ 248 | "Typical the values ​​lie between 100 to 1000, although it all depends a lot on the learning_rate parameter described below. Here is the code to set the number of models in the set:" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "metadata": { 254 | "id": "Uc_9nhDP4U19" 255 | }, 256 | "source": [ 257 | "XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n", 258 | " colsample_bynode=1, colsample_bytree=1, gamma=0,\n", 259 | " importance_type='gain', learning_rate=0.1, max_delta_step=0,\n", 260 | " max_depth=3, min_child_weight=1, missing=None, n_estimators=500,\n", 261 | " n_jobs=1, nthread=None, objective='reg:linear', random_state=0,\n", 262 | " reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,\n", 263 | " silent=None, subsample=1, verbosity=1)" 264 | ], 265 | "execution_count": null, 266 | "outputs": [] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": { 271 | "id": "OA6GZob_4cCd" 272 | }, 273 | "source": [ 274 | "**early_stopping_rounds**\n" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": { 280 | "id": "U6wv3Tnv4eNL" 281 | }, 282 | "source": [ 283 | "early_stopping_rounds provides a way to automatically find the ideal value for n_estimators. Stopping early causes the iteration of the model to stop when the validation score stops improving, even though we are not stopping hard for n_estimators. It’s a good idea to set n_estimators high and then use early_stopping_rounds to find the optimal time to stop the iteration." 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": { 289 | "id": "tbHxfNEq4jH1" 290 | }, 291 | "source": [ 292 | "Since random chance sometimes causes a single round where validation scores do not improve, you must specify a number for the number of direct deterioration turns to allow before stopping. Setting early_stopping_rounds = 5 is a reasonable choice. In this case, we stop after 5 consecutive rounds of deterioration of validation scores. Now let’s see how we can use early_stopping:" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "metadata": { 298 | "id": "Q4nFErKq4r1T" 299 | }, 300 | "source": [ 301 | "my_model = XGBRegressor(n_estimators=500)\n", 302 | "my_model.fit(X_train, y_train, \n", 303 | " early_stopping_rounds=5, \n", 304 | " eval_set=[(X_valid, y_valid)],\n", 305 | " verbose=False)" 306 | ], 307 | "execution_count": null, 308 | "outputs": [] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": { 313 | "id": "4FdGAi104xp0" 314 | }, 315 | "source": [ 316 | "**learning_rate**\n" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": { 322 | "id": "dTaHClVE43Z0" 323 | }, 324 | "source": [ 325 | "Instead of getting predictions by simply adding up the predictions of each component model, we can multiply the predictions of each model by a small number before adding them.\n", 326 | "\n", 327 | "This means that every tree we add to the set helps us less. So we can set a high value for the n_estimators without overfitting. If we use early shutdown, the appropriate number of trees will be determined automatically. Now, let’s see how we can use learning_rate in XGBoost algorithm:" 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "metadata": { 333 | "colab": { 334 | "base_uri": "https://localhost:8080/" 335 | }, 336 | "id": "C0TxFTgp45S6", 337 | "outputId": "cc0ed6cb-e257-431b-a1f9-28c3d9d9f431" 338 | }, 339 | "source": [ 340 | "my_model = XGBRegressor(n_estimators=1000, learning_rate=0.05)\n", 341 | "my_model.fit(X_train, y_train, \n", 342 | " early_stopping_rounds=5, \n", 343 | " eval_set=[(X_valid, y_valid)], \n", 344 | " verbose=False)" 345 | ], 346 | "execution_count": 12, 347 | "outputs": [ 348 | { 349 | "output_type": "stream", 350 | "text": [ 351 | "[07:22:39] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.\n" 352 | ], 353 | "name": "stdout" 354 | }, 355 | { 356 | "output_type": "execute_result", 357 | "data": { 358 | "text/plain": [ 359 | "XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n", 360 | " colsample_bynode=1, colsample_bytree=1, gamma=0,\n", 361 | " importance_type='gain', learning_rate=0.05, max_delta_step=0,\n", 362 | " max_depth=3, min_child_weight=1, missing=None, n_estimators=1000,\n", 363 | " n_jobs=1, nthread=None, objective='reg:linear', random_state=0,\n", 364 | " reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,\n", 365 | " silent=None, subsample=1, verbosity=1)" 366 | ] 367 | }, 368 | "metadata": {}, 369 | "execution_count": 12 370 | } 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": { 376 | "id": "8fxR6L_c5E6F" 377 | }, 378 | "source": [ 379 | "**n_jobs**\n" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": { 385 | "id": "sxepfNxS5HMr" 386 | }, 387 | "source": [ 388 | "On larger datasets where execution is a consideration, you can use parallelism to build your models faster. It is common to set the n_jobs parameter equal to the number of cores on your machine. On smaller data sets, this won’t help.\n", 389 | "\n", 390 | "The resulting model will not be better, so micro-optimizing the timing of the fit is usually just a distraction. But it’s very useful in large datasets where you would spend a lot of time waiting for the fit command. Now, let’s see how to use this parameter in the XGBoost algorithm:" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "metadata": { 396 | "id": "k4CmOX1O5Ov2" 397 | }, 398 | "source": [ 399 | "my_model = XGBRegressor(n_estimators=1000, learning_rate=0.05, n_jobs=4)\n", 400 | "my_model.fit(X_train, y_train, \n", 401 | " early_stopping_rounds=5, \n", 402 | " eval_set=[(X_valid, y_valid)], \n", 403 | " verbose=False)" 404 | ], 405 | "execution_count": null, 406 | "outputs": [] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": { 411 | "id": "Fby7CCu5E0-C" 412 | }, 413 | "source": [ 414 | "# **References**\n", 415 | "[XGBoost in Machine Learning](https://thecleverprogrammer.com/2020/09/04/xgboost-in-machine-learning/)" 416 | ] 417 | } 418 | ] 419 | } -------------------------------------------------------------------------------- /Sklearn/supervised algorithm/dataset/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Sklearn/supervised algorithm/readm: -------------------------------------------------------------------------------- 1 | Supervised learning is a type of machine learning problem where users are given targets which they need to predict. 2 | Classification is a type of supervised learning where an algorithm predicts one output from a list of given classes. 3 | It can be a binary classification task where there are 2-classes or multi-class problems where there are more than 2-classes. 4 | Scikit-Learn - Naive Bayes¶ 5 | https://coderzcolumn.com/tutorials/machine-learning/scikit-learn-sklearn-naive-bayes?fbclid=IwAR2EUHN0XwJlCQ8hxjvYHh9Vl4g0AjllmD1ktHsNd7Mwu5g2bOLZEjdKld4 6 | -------------------------------------------------------------------------------- /Statistics/Readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /The-Art-of-Linear-Algebra.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dr-mushtaq/Machine-Learning/6c5b957b7088d99ac86cc65988448f064b6fdd98/The-Art-of-Linear-Algebra.pdf -------------------------------------------------------------------------------- /readme: -------------------------------------------------------------------------------- 1 | Semi Supervised Learning – A Gentle Introduction for Beginners 2 | https://machinelearningknowledge.ai/semi-supervised-learning-a-gentle-introduction-for-beginners/?fbclid=IwAR2hWWec_bhDJpjr9nOSEUkS1zjW4LJ-IsqLXtL8dm3mCPT-JHFfjVUThWY 3 | Machine Learning with python for everyone 4 | https://drive.google.com/file/d/16q7D0W0CIGS4qOAjt18BpEquodqpE7EV/view?usp%3Ddrivesdk&fbclid=IwAR0y98UMt5ts7FFCN32AN29o8gUHnTGlB1sMNR_wvqEXV_GCefLqvCVlheE 5 | 6 | 7 | --------------------------------------------------------------------------------