├── README.md ├── Régression Linéaire Multiple.ipynb └── Régression Linéaire Numpy.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # MachineLearning-tutorial-French 2 | Codes provenant de mes vidéos YouTube : https://www.youtube.com/channel/UCmpptkXu8iIFe6kfDK5o7VQ 3 | -------------------------------------------------------------------------------- /Régression Linéaire Multiple.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Régression Linéaire Multiple et Polynomiale Numpy\n", 8 | "\n", 9 | "Guillaume Saint-Cirgue\n", 10 | "https://machinelearnia.com/\n" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "import numpy as np\n", 20 | "from sklearn.datasets import make_regression\n", 21 | "import matplotlib.pyplot as plt" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "# 1. Régression Polynomiale: 1 variable $x_1$" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "## 1.1 Dataset\n", 36 | "\n", 37 | "Pour développer un modèle polynomial à partir des équations de la régression linéaire, il suffit d'ajouter des degrés de polynome dans les colonnes de la matrice $X$ ainsi qu'un nombre égal de lignes dans le vecteur $\\theta$.\n", 38 | "\n", 39 | "Dans ce notebook, nous allons développer un ploynome de degré 2: $f(x) = ax^2 + bx + c$. Pour celà, il faut développer les matrices suivantes:\n", 40 | "\n", 41 | "$X = \\begin{bmatrix} x^{2 (1)} & x^{(1)} & 1 \\\\ ... & ... & ... \\\\ x^{2 (m)} & x^{(m)} & 1 \\end{bmatrix}$\n", 42 | "\n", 43 | "$\\theta = \\begin{bmatrix} a\\\\b\\\\c \\end{bmatrix}$\n", 44 | "\n", 45 | "$y = \\begin{bmatrix} y^{(1)}\\\\...\\\\y^{(m)} \\end{bmatrix}$ *note : le vecteur $y$ reste le meme que pour la régression linéaire*" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 2, 51 | "metadata": {}, 52 | "outputs": [ 53 | { 54 | "data": { 55 | "text/plain": [ 56 | "" 57 | ] 58 | }, 59 | "execution_count": 2, 60 | "metadata": {}, 61 | "output_type": "execute_result" 62 | }, 63 | { 64 | "data": { 65 | "image/png": "\n", 66 | "text/plain": [ 67 | "
" 68 | ] 69 | }, 70 | "metadata": { 71 | "needs_background": "light" 72 | }, 73 | "output_type": "display_data" 74 | } 75 | ], 76 | "source": [ 77 | "np.random.seed(0) # permet de reproduire l'aléatoire\n", 78 | "\n", 79 | "x, y = make_regression(n_samples=100, n_features=1, noise = 10) # creation d'un dataset (x, y) linéaire\n", 80 | "y = y + abs(y/2) # modifie les valeurs de y pour rendre le dataset non-linéaire\n", 81 | "\n", 82 | "plt.scatter(x, y) # afficher les résultats. x en abscisse et y en ordonnée" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 3, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "name": "stdout", 92 | "output_type": "stream", 93 | "text": [ 94 | "(100, 1)\n", 95 | "(100,)\n", 96 | "(100, 1)\n" 97 | ] 98 | } 99 | ], 100 | "source": [ 101 | "# Verification des dimensions\n", 102 | "print(x.shape)\n", 103 | "print(y.shape)\n", 104 | "\n", 105 | "# redimensionner y\n", 106 | "y = y.reshape(y.shape[0], 1)\n", 107 | "print(y.shape)" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 4, 113 | "metadata": {}, 114 | "outputs": [ 115 | { 116 | "name": "stdout", 117 | "output_type": "stream", 118 | "text": [ 119 | "(100, 3)\n", 120 | "[[ 0.12927848 -0.35955316 1. ]\n", 121 | " [ 0.95382381 0.97663904 1. ]\n", 122 | " [ 0.1618788 0.40234164 1. ]\n", 123 | " [ 0.66120688 -0.81314628 1. ]\n", 124 | " [ 0.78816353 -0.88778575 1. ]\n", 125 | " [ 0.19701457 0.44386323 1. ]\n", 126 | " [ 0.95507205 -0.97727788 1. ]\n", 127 | " [ 0.18346819 0.42833187 1. ]\n", 128 | " [ 0.04337847 0.20827498 1. ]\n", 129 | " [ 0.09706498 -0.31155253 1. ]]\n" 130 | ] 131 | } 132 | ], 133 | "source": [ 134 | "# Création de la matrice X, inclut le Biais\n", 135 | "X = np.hstack((x, np.ones(x.shape)))\n", 136 | "X = np.hstack((x**2, X)) # ajoute le vecteur x^2 a la gauche de la matrice X\n", 137 | "\n", 138 | "print(X.shape)\n", 139 | "print(X[:10])" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 5, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "data": { 149 | "text/plain": [ 150 | "array([[-0.63743703],\n", 151 | " [-0.39727181],\n", 152 | " [-0.13288058]])" 153 | ] 154 | }, 155 | "execution_count": 5, 156 | "metadata": {}, 157 | "output_type": "execute_result" 158 | } 159 | ], 160 | "source": [ 161 | "# Initialisation du vecteur theta aléatoire, avec 3 éléments (car X a trois colonnes)\n", 162 | "theta = np.random.randn(3, 1)\n", 163 | "theta" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "## 1.2 Modèle Linéaire\n", 171 | "On implémente un modèle $F = X.\\theta$, puis on teste le modèle pour voir s'il n'y a pas de bug (bonne pratique oblige). En plus, cela permet de voir à quoi ressemble le modèle initial, défini par la valeur de $\\theta$" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 6, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [ 180 | "def model(X, theta):\n", 181 | " return X.dot(theta)" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 7, 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "data": { 191 | "text/plain": [ 192 | "" 193 | ] 194 | }, 195 | "execution_count": 7, 196 | "metadata": {}, 197 | "output_type": "execute_result" 198 | }, 199 | { 200 | "data": { 201 | "image/png": "\n", 202 | "text/plain": [ 203 | "
" 204 | ] 205 | }, 206 | "metadata": { 207 | "needs_background": "light" 208 | }, 209 | "output_type": "display_data" 210 | } 211 | ], 212 | "source": [ 213 | "plt.scatter(x, y)\n", 214 | "plt.scatter(x, model(X, theta), c='r')" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": {}, 220 | "source": [ 221 | "## 1.3 Fonction Cout : Erreur Quadratique moyenne\n", 222 | "On mesure les erreurs du modele sur le Dataset X, y en implémenter l'Erreur Quadratique Moyenne, **Mean Squared Error (MSE)** en anglais.\n", 223 | "\n", 224 | "$ J(\\theta) = \\frac{1}{2m} \\sum (X.\\theta - y)^2 $\n", 225 | "\n", 226 | "Ensuite, on teste notre fonction, pour voir s'il n'y a pas de bug" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": 8, 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "def cost_function(X, y, theta):\n", 236 | " m = len(y)\n", 237 | " return 1/(2*m) * np.sum((model(X, theta) - y)**2)" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 9, 243 | "metadata": {}, 244 | "outputs": [ 245 | { 246 | "data": { 247 | "text/plain": [ 248 | "1328.6654828872622" 249 | ] 250 | }, 251 | "execution_count": 9, 252 | "metadata": {}, 253 | "output_type": "execute_result" 254 | } 255 | ], 256 | "source": [ 257 | "cost_function(X, y, theta)" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "## 1.4 Gradients et Descente de Gradient\n", 265 | "On implémente la formule du gradient pour la **MSE**\n", 266 | "\n", 267 | "$\\frac{\\partial J(\\theta) }{\\partial \\theta} = \\frac{1}{m} X^T.(X.\\theta - y)$\n", 268 | "\n", 269 | "Ensuite on utilise cette fonction dans la descente de gradient:\n", 270 | "\n", 271 | "$\\theta = \\theta - \\alpha \\frac{\\partial J(\\theta) }{\\partial \\theta}$\n" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 10, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [ 280 | "def grad(X, y, theta):\n", 281 | " m = len(y)\n", 282 | " return 1/m * X.T.dot(model(X, theta) - y)" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": 11, 288 | "metadata": {}, 289 | "outputs": [], 290 | "source": [ 291 | "def gradient_descent(X, y, theta, learning_rate, n_iterations):\n", 292 | " \n", 293 | " cost_history = np.zeros(n_iterations) # création d'un tableau de stockage pour enregistrer l'évolution du Cout du modele\n", 294 | " \n", 295 | " for i in range(0, n_iterations):\n", 296 | " theta = theta - learning_rate * grad(X, y, theta) # mise a jour du parametre theta (formule du gradient descent)\n", 297 | " cost_history[i] = cost_function(X, y, theta) # on enregistre la valeur du Cout au tour i dans cost_history[i]\n", 298 | " \n", 299 | " return theta, cost_history" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "## 1.5 Phase d'entrainement\n", 307 | "On définit un **nombre d'itérations**, ainsi qu'un **pas d'apprentissage $\\alpha$**, et c'est partit !\n", 308 | "\n", 309 | "Une fois le modèle entrainé, on observe les resultats par rapport a notre Dataset" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": 12, 315 | "metadata": {}, 316 | "outputs": [], 317 | "source": [ 318 | "n_iterations = 1000\n", 319 | "learning_rate = 0.01\n", 320 | "\n", 321 | "theta_final, cost_history = gradient_descent(X, y, theta, learning_rate, n_iterations)" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 13, 327 | "metadata": {}, 328 | "outputs": [ 329 | { 330 | "data": { 331 | "text/plain": [ 332 | "array([[ 8.60077615],\n", 333 | " [42.23116732],\n", 334 | " [ 8.18143081]])" 335 | ] 336 | }, 337 | "execution_count": 13, 338 | "metadata": {}, 339 | "output_type": "execute_result" 340 | } 341 | ], 342 | "source": [ 343 | "theta_final # voici les parametres du modele une fois que la machine a été entrainée" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 14, 349 | "metadata": {}, 350 | "outputs": [ 351 | { 352 | "data": { 353 | "text/plain": [ 354 | "" 355 | ] 356 | }, 357 | "execution_count": 14, 358 | "metadata": {}, 359 | "output_type": "execute_result" 360 | }, 361 | { 362 | "data": { 363 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD8CAYAAAB0IB+mAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3X10VfWd7/H3Nw9CotCIoEIgwOp1bKXewjI+9GJbn+ZC0UFqa2tXFKrVVJE79gkbb2YKepsx1bbqrU8r7fh8WnXaSulIS2uh4wxr2hoG2pFW79hKkAQrghQ1QSD53T/2OeEk2eecfR72OTnZn9daLDg7O2f/zmr95pfv7/v7/sw5h4iIjH0VpR6AiIgUhwK+iEhEKOCLiESEAr6ISEQo4IuIRIQCvohIRCjgi4hEhAK+iEhEKOCLiEREVakHkGzy5Mlu1qxZpR6GiEhZ2bx58+vOuSmZ7htVAX/WrFl0dnaWehgiImXFzLqC3KeUjohIRCjgi4hEhAK+iEhEKOCLiESEAr6ISEQo4IuIRETggG9mD5jZa2b2fNK11WbWbWZb438WJX3tJjN7ycxeNLMFhR64iIhkJ5sZ/kPAQp/rdzjn5sb/rAMws1OAy4A58e+518wq8x2siIjkLnDAd849C+wNePvFwOPOuXeccy8DLwFn5DA+EZGxJxaDWbOgosL7OxYrymMLkcNfYWa/i6d8jo1fqwdeSbpnZ/yaiEi0xWLQ3AxdXeCc93dzc1GCfr4B/z7g3cBcYBfwjfh187nX+b2BmTWbWaeZde7evTvP4YiIjGKxGCxbBr29Q6/39kJra+iPzyvgO+f+7Jzrd84NAN/mSNpmJzAj6dbpQE+K9+hwzjU65xqnTMnY+0dEpDwlZvb9/f5f37Ej9CHkFfDNbGrSy48CiQqetcBlZjbOzGYDJwG/yedZIiJlrbV15Mw+WUND6EMI3C3TzL4HnANMNrOdwCrgHDObi5eu2Q58FsA5t83MngR+DxwGrnfOpfixJiISAWlm8IfH11DV1hb6EAIHfOfcp3wu/2Oa+9uA8D+BiEg5aGjwFmiHOWwVtCxcwdmnnMOSkIegnbYiIsXQ1ga1tUMu9VaN4wsXfp7vn/xhbl//YuhDUMAXESmGpibo6GDnxCkMYOycOIWWhStYO+dcAHr29YU+hFF14pWIyJjW1MQnX5lKt09wn1ZXE/rjNcMXESmilQtOpqZ6aKeZmupKVi44OfRna4YvIlJES+Z5TQduX/8iPfv6mFZXw8oFJw9eD5MCvohIkS2ZV1+UAD+cUjoiIhGhgC8iEhEK+CIiEaEcvohIkjVbukuyoFoMCvgiInFrtnRz0w//k75DXuuv7n193PTD/6Szay8bX9hd9j8EFPBFROJuX//iYLBP6DvUT+xXOwYP9Ej8EADKLugrhy8iEpeqvcHw05v6DvUXpfdNoSngi4jEZdPeoBi9bwpNAV9EJM6v7YHfea1QnN43haaALyISt2RePbdecir1dTUYUF9XQ9NZDSXrfVNoWrQVEUni1/agceakMVGqqYAvIpLBkt//kiX3t3rHFDY0wIw2mNeU9fuUusZfAV9EJJ1YDJqbjxxA3tXlvQbvUJOAUtX4Q/HKO5XDFxFJp7X1SLBP6O31rmchVY1/Mcs7NcMXkbJQsnTIjh3ZXU8hVRlnMcs7NcMXkVEvkQ7p3teH40g6ZM2W7vAeGovBrFnghm+7imtoyOrtUpVxFrO8UwFfREa9oqdDEnn7ri7/r9fWQltbVm9ZyqMNE5TSEZFRr6jpkFgMli2D/n7/r8+c6QX7LBZsobRHGyYo4IvIqDetroZun+Be8HRIYmafKtibwfbtOb99qY42TFBKR0RGvaKlQ/wqcpL0njitsM8rMgV8ERn1/Foe3HrJqYWfLaepvOmtGsdtH1xa2OcVmVI6IlIWipIOaWjwXag9bBW0LFzBj2fPZ3W4IwiVZvgiIgltbV4FTpLeqnF84cLPs3bOuWXZITOZAr6IRNZzbXfz6rEnMGAVvHrsCTy3/Q3o6KB3aj0DGDsnTqFl4QrWzjm3bDtkJlNKR0Qi6Y+XLuO07z8yOOs9cd9rvOvmL/Hcqq9zes/OITt768u4Q2Yyc6l2kZVAY2Oj6+zsLPUwRGSsi8UYuPxy3xTHq3XHc+Ibfy76kPJhZpudc42Z7lNKR0Qip3fll1MGv+P37S7qWIpJAV9EImf8rp6UX3utbkoRR1JcCvgiEjk9Eyf7Xh8AXvnS3xd3MEWkgC8ikfOdhVfTWzVuyLUB4MnGizi9dUVpBlUEqtIRkbKVa4/8uS3X85WDh/nchoeYtv91eiZO5s7zPs3ZX7mhCKMuHQV8ESlL+RwZuGRePXzlBj555sKyP5g8G4EDvpk9AFwEvOace1/82iTgCWAWsB34hHPuDTMz4C5gEdALfNo59x+FHbqIRFm6HvlBAnepO1eWQjY5/IeAhcOutQC/cM6dBPwi/hrgI8BJ8T/NwH35DVNExJvVz2/fwOyWp33bJUNxjwwsN4EDvnPuWWDvsMsXAw/H//0wsCTp+iPO8yugzsym5jtYEYmu4cccplLu/W7ClG8O/wTn3C4A59wuMzs+fr0eeCXpvp3xa7vyfJ6IjAG5LLb6pXCGGwv9bsIU1qKt+Vzz/aFsZs14aR8asjwUWETKT66LrelSNQaRWXjNR74B/89mNjU+u58KvBa/vhOYkXTfdMB3a5tzrgPoAK+XTp7jEZFRLtfF1lTHHNbX1bCp5byCj3Msynfj1VpgWfzfy4AfJV1fap6zgL8kUj8iEm25HkhetGMOx7BsyjK/B5wDTDazncAqoB140sw+A+wALo3fvg6vJPMlvLLMKws4ZhEpY7keSJ6Y/eey0Uo8ao8sIkU1PIcP3kw9pzNqYzHv4PEdO7zjCdvaoKmpwCMe/YK2R9ZOWxEpqoLN1GMxaG6G3l7vdVeX9xoiGfSDUMAXkdD5lWHmvNAai8ENN8CePSO/1tvrzfgV8H0p4ItIqPLpeTNCLAZXXQUHD6a+Z8eOXIc65qk9soiEKl0ZZtZuuCF9sAcvly++FPBFJFS5lmGOsHy5fxonWW2tt3ArvhTwRSRUqcots+p5E4vB/fenv6eyEjo6lL9PQwFfREJVkA1Tra2QroS8uhoefljBPgMt2opEXK6nRgVVkDLMdAuxZvDggwr2ASjgi0RYQSto0sj7sJGGBq/OfjgzePRRBfuAlNIRibCCVtCEqa3NW5BNZgbXXqtgnwUFfJEIK1gFTdiamrwF2ZkzwYzeqfWsvrSF2RMvZH77BtZs6S71CMuCUjoiEZZrI7OSaGqCpqaipaHGIs3wRSKsHFsOl00aahTSDF8kwnKtoAm7siedsklDjUIK+CIRl20FTalTKmWVhhpllNIRkayUOqVSjmmo0UIzfBHJSqlTKjr5KncK+CKSlYKkVPI8qSrvjVwRpZSOiGQl75TK8uVwxRXezlnnjpxUFYuFMFpJpoAvIllZMq+eWy85lfq6Ggyor6sJfh7t8uVw330jG6ElTqqSUCmlIyJZyymlkqnFsU6qCp1m+CISvlgMli1L3+JYJ1WFTjN8EQlXLObl6Pv7U99jFvikqlJu+ip3CvgiERZ68EzM7NMEewdYwK6Xpd70Ve6U0hGJqETw7N7Xh+NI8CxY58lYjMNXX5M22A8A3z3tQtZcE2zBttSbvsqdAr5IRIUdPHtXfpmqA6k3Yx22Cj530RdpveC6wM8s9aavcqeALxJRYQfP8bt6Un6tt2ocX7jw86ydc25WzyzIgegRpoAvElFhB8+eiZN9rx+2CloWrhgM9tk8U3108qOALxJRBQ+esRjMmgUVFTBrFr9+7wforRo35JbhM3sAi48liLw2fYmqdESiqqBNyBKll7293uuuLi7+82s88f4L+PB//YZp+1+nZ+JkbvvQ0iHBHrwqnWyeqT46uVPAF4mwggXP1tYjwT6u6kAfH+3Zyl/f9Dg9+/qoMKPfZ+NVvfLvRaOALyL5S9EWofbVHja1nAeMrKEH5d+LTTl8EfG1Zks389s3MLvlaea3bxhan798OVRVeTtkq6rg6KP93ySpXYLy76WnGb6IjJB2R+u327yOlwn9/fDWW17gP3z4yPXa2hHtEpR/Ly3N8EVkhFSbsra23zM02Cfr74eZM71Z/8yZ0NGR1aEmEj7N8EVkBL+NUIu3beTGn96d+pucg+3bwxuU5E0BX0RGSD7GcPG2jdz47CPU79+Npfumysp0X5VRQCkdkQhKuyDLkU1Zi7dtpP2ndzM9U7AHrw5fRjXN8EUiJkiL4SXz6qlf9xTznr6DKjeQ+U2vu44117Rye/sG9akfxQoS8M1sO/Am0A8cds41mtkk4AlgFrAd+IRz7o1CPE9EcpeuS+ZggI7FOP0fvgyZgn1tLXR0sOaUc9SnvgwUMqVzrnNurnOuMf66BfiFc+4k4Bfx1yJSYoG6ZPrsnB0hqRJHferLQ5g5/IuBh+P/fhhYEuKzRCSgQF0y0x0oXlsLjz3mVeTEyy7Vp748FCrgO+BnZrbZzBIrNyc453YBxP8+3u8bzazZzDrNrHP37t0FGo6IpBKoS2aqA8UrK33r69WnvjyYS3eKfNA3MZvmnOsxs+OBnwP/C1jrnKtLuucN59yx6d6nsbHRdXZ25j0ekbHE79xZyK7L5fD3OPc9U9j4wm569vWx7OVN3Pivj1D7ao8X6BO7Y5O7X8Jgvt5vM1WqPjlqnVAcZrY5KZ2e+r5CBPxhD14NvAVcA5zjnNtlZlOBXzrn0nZJUsAXGcovkFZXGBgc6j/y32664Jo2GH+7De6/39s0lZAI7ODl8nfsOPKDIM3O2dAPRJeUihbwzexooMI592b83z8HbgHOB/Y459rNrAWY5Jy7Md17KeCLDDW/fcPgBqhM6utqBjtTBnmPT7+8idX/1D402CfMnOm7a1ZBfXQKGvALUZZ5AvCUmSXe77vOuZ+a2XPAk2b2GWAHcGkBniUSKdksemazcLp420b+7uk7/IM9+C7aBqnfl9Et74DvnPsT8H6f63vwZvkikqPkFgdB7g3yHjevv5crtq5LX7Hhs2gbqH5fRjW1VhAZxfwqavxUV1rKg0SS32Pxto2Zg73ZiLbGoNLLsUCtFURGseHnzqY6JvDoo6pSzrITbRJm3P1/OGHfa+l74pjBtdf6Ls6m+m1DpZflQzN8kVFuybx6NrWcx8vtFzKQIuf+l75Dqd8g3ibhxEzBvrISHn0U7r3X98uB6vdlVFPAFykjOW1wCtAmYQDg4YfTll3qiMLyp5SOSBlZueDk7A8CT9cmAS/YP3XmYj4W4HQqHVFY3jTDFykjOc2yU7RJcMDOiVO4cclKKu/zT+PI2KIZvkiZyXqW3dY2ok1CX/U4WhasoHP+oqzbMmizVflSwBcZ6xKpmqQ2CTVtbdwVIIWjzVZjS8F76eRDrRVkrErXvGw0z5pTtWU4traa2qOqRv34o6KYrRVEJA2/WfJjvzqykDqaZ82pNlW90XuIN3q9UtDRPH4ZSou2IiHza0kw3Gg9HSropqrROn4ZSgFfJGRBWw+ku2/Nlm7mt29gdsvTzG/fwB8vXQZVVd7O2KoqWL68UMMdImhrB1CLhXKglI5IyII2QEs1m05OCS3etpHb1t3JuIGk3xj6++G++7x/p9glm6vhrR2m1dXw9juH2eezs1ctFkY/BXyRkJ37ninEfrWDdOUR6TZPJVJCN6+/l6Vb16Vuj9DRUfCADyPLQFMdqKIWC6OfAr5IiNZs6eYHm7uHBHsD/se7J7F9T1+gKpfGTevYuO5Oqgf60/fC6U+/TlAofrN+VemUBwV8kRD5Ldg6YPuePt/TqUaIxbjjn79JRdrfD+Iqg+XaC0EtFsqTFm1FQpRXD/lYDJYuDRbswdtNK5KGAr5IiOpqq7O6Pmj5crjiChgYyPgMB3D++aHk72VsUcAXCVGqjewpN7jHYjBhgld1k2EXvAPerBrHzZ+4CZ55Jq9xSjQo4IuEKNXBJPv6DjG/fQNrtnQfuRiLwVVXwVtvZXxfB7wwaQZntKxhbsv1BRqtjHUK+CIhSlebnmhJMBj0W1vh4MGM75kI9levfEgHkEhWFPBFQpRpp+qQlgQZDioB4LjjsMce4717drCp5TwFe8mKyjJFQpRcs56823bxto3c+OwjTNv/Oj0TJ8OMO7yDSrq6fN/HAd897UL+7oLrmPZKDSu3dCvYS9Y0wxcpsOF9bwA2tZxHfTy9s3jbRtp/ejfT9++mAsf0/bu9kspFizhYOXIONgA8MncRrRdch8MnFSQSkAK+SAEl2g507+sbEZxXLjiZj7/4L3zz6TuoPfzO0G/s7YV16/jtzXewt2YCDm9Wv2f8BD530RdZtWBoczR1p5RcKKUjUkB+O2v7DvWztf0eVq/5BhcfPJi6PcKOHZzeuoJ5/X812Gs+HXWnlGwp4IsUkF8Q/sm3r+M9e18BSN8LJ37Y+Kq/mcPnn9iacX+tulNKtpTSESmg4UH4ke+18p69r6QP9AC1td5h43gLvU1nNaT9HnWnlFxohi9SQCsXnMy/3XIXn9vwENP2v47hMgf7ykqvtXHSoeJfXXIqjTMnDXakrKutxjlvI5e6U0qudIi5SCEtX467777MQT7ZY48NCfYi2dIh5iLFFotBtsH+/POHBPs1W7rVZ15Coxy+SCHEYrBsWaBbEyWXXHfdkKZn6Uo6RQpBAV8kH7EYTJ4Ml1+e9sQpl/TntTPOxpwb0c44VUmn6u2lUJTSEclVLObtkO3tzXirVVQM/kA4IcU9eR2WIhKAZvgiuWptDRTsAfjsZzPekqquXvX2UigK+BIpw/vcZJUfj8Vg1iyoqPD+TtHobAgzL1d/770Zn+3XWVP19lJISulIZCQWRRN58sSiKJC5EiYW4/DV11B1IJ5e6erCYViq/bC1tUNq64M8O7mzpqp0JAwK+BIZ6RZFUwbVWAxaW3FdXSP+YzGcf9A/7ji4664h5ZZBn71kXr0CvIQm9IBvZguBu4BK4DvOufawnyniV8+e1aLo8uVw//2D58qmrq13MHOmd3hJQ4PXHsFnE1WhF2RVry+5CDXgm1klcA/w18BO4DkzW+uc+32Yz5VoS5U+qamuoPfQwIj7hyyKxmJw9dVw4ECgZ3VPnML07dsz3jetrmbIASi+z04hEdy79/VRaUa/89o1JH6vyCo1JZEW9qLtGcBLzrk/OecOAo8DF4f8TIm4VOkTv2BfXWFHFkVjMfqvvDJwsO+tGsd3Fl4d6N5cF2STN2MB9Md/4xi+cqB6fQki7JROPfBK0uudwJkhP1MiLps0yTHjqwZnxb0rv0ztocx96B3ezP7O8z7N2S3XB3pOrguyfj+8UlG9vmQSdsD3S30OmZyYWTPQDNAQ7wcuko9U6ZPhEufKsvp1aGigZlfmEs2+6nG0LFhB5/xFvgE7XW49lwXZbIK46vUlk7AD/k5gRtLr6UBP8g3OuQ6gA7xumSGPR8ao5EBbV1tNdYVxaODI/52Sc94AN6+/lyu2rjuS0+zqwpF6cdYBb1eP5w+rbueu1hUpx5Bz2WcKQX94qV5fggg7h/8ccJKZzTazo4DLgLUhP1MiZnjTsTd6D4FBXU01BtTX1dB0VgMff/Ff2HzXp3j5axexNDnYx1UwMjdO/Nq/Nryf933h+yw98N9SbtYKoxeOX+4/IfHDqb6uhlsvOVULtpJRqDN859xhM1sBrMcry3zAObctzGdK9PgF2kP9jqPHVbF68Ry2tt/D395yN8f2vZmxdbED3qiZwLF9bwKwd/wEbr6gmbVzzgXS1+1nKr3MpZQyOfefXKVTr1JMyUHodfjOuXXAurCfI9GVKtB27+tj8iUXsWr71sA96g9MreeDV3+Htw+mXihN9bxU6ZfaoyrzSvdoM5YUinrpSMnk1dcmid9i5eJtG3n+mx9nfhbBHjNqb/8a225ZyOVnpS4gSH5e8md4+53DVPg87O2D/fzvH/5OrY+l5BTwpSQKedjH8Dz34m0b+dr6uznm0IGsgj3XXju4S/arS07lzk/OTVs7P/wz7Os7xECKsgO/PQCgUkopLvXSkZLIqa9NkuH58Ac6H+T09d+n0nmBNWigd8BfaibwX63/wOnDqm8y1c5nUyOfikoppZgU8KUkgvSWSbXIOTwf3vzENzhr67rszpLFC/aPzF3EqgXLqTlQya1bukf8sEmXP89mdm4G46sqh/yAUCmlFJsCvoTOL3Bn6i3jt8j5+Se20tm1l40v7B4SOJt++9OsZvTg7ZS97UNLh1TfrF67LavF0aA18gBNZzbQOHOSGp5JSZlzo2evU2Njo+vs7Cz1MKSAhgdu8Ga2Hzutnh9s7h5yvbrCOGZ8Fft6D1ERLz8cLrGBKrFLdtr+1zFcoHLLAYzH5n6EVQuWp7zvzk/OBYK1QPD7bMNVmvGpM2fw1SWnZhihSO7MbLNzrjHTfZrhS6hS5eo3vrCbWy85dTCwvqummrcPHvY2TcGIYJ8c4PfVTODoA28zzqXPn3uHhhs9EycPmc2nc/OPt3Hg0ECg8snE6y8++VvfH071dTVsajkv4zNFikUBX0KVLlefnB+f376BfX0jG5ct3raRVc90MOnAkU1Tk/r2Z3yuAx6dt4iv/M/Us3k/iR84ydItJieu+f0Wo/y8jDYK+BKqVHnuCjNmtzyd9nCSEf1uMkjMsfutgtj7F7IqRbA3oK622je4p5JugVZHE0q5UA5fQhUkz11TXcn46go++NzPktI2xwRqhZBs58QpnH3dg4OvK1OsAyTaEvjNysdVVfj+pqH0jIxmQXP42ngloVoyr56PnZZ+ptt3qJ+Fv93A19bfzfT9u6nAMSnLYN9XPY7bPrR08HVNdSWfOnNGyo1TS+bVc+slp1JfVzPYYO3WS05l9eI5OR1UIlIOlNKR0G18Ybfv9eSF2AEzqpz/blQ/Byur6D2qhroDb0FDA89f8yU2V56CDUuppCuFTFdjr/SMjEVK6UjoZrc8PaTt8OJtG1n9i46sUzZwpJvl6vOb+fGcc3m5/cICjlSkPKksU0IXtN3vtLoamp/4BpcnLcBmG+gBBoBH4ztjwUvDiEhwCvgyxJot3axeu21w4fLY2mpW/c0c36P8grb7/eFTqzh+679lH+SPOgomTMDt3UvPxCl87YNXDNbSK68ukj2ldGTQmi3drPyn3w45GhCgwuBdNdXs6z00OJNPHMgxXH1dDZtm7ILWVtixAyZNwu3ZEzzYV1bCwAA0NEBb22D3ylwODxGJiqApHQV8GTS/fUPg81NTlVku3raR/7vhPujtzX4AtbXQ0TEY5EUkGOXwx7CwZrtBuz/2HeofrHFPrrTpmTiZmoMH4EAOwf644+CuuxTsRUKkgF9m8jkqL5Nsuj/2O8ct6+8dshA7ff9u30PAU3GAKdCLFI02XpWZdAeH5GvlgpOp9jujz8enX940JNgnBPluBxyorKbzq9+C119XsBcpEs3wy0yQg0NylfgNIblKJ7lmHmDv+AncckEzN25+POVswTE08B+sqOTQ0ROoefMv9EyczHcWXs3cluu16CpSZAr4ZSbTwSHJcsn1L/n9L1nyrRtgz57B9Exy8D7uwJvc9vSdaVsTv1Ezgd7q8YN5/ds+tJTN8xexqeU8pgOrM39MEQmBAn6ZSdX0a3hNek65/lgMrrwSDnmz+1TpmXGu3yuf7B8Z9AeA1ec3j+g9bzqsW6TklMMvM6mafg0P4jnl+ltbB4N9Rv39XhllMjOeOnOx70EjOqxbpPQ0wy9D6Zp+JSTn9IeXTjLjDv+F0h07gg9i5kxvY1Rig1V8o1TlKedQo8NAREYlBfxRLtea+2l1NZy2ad2IJmXT9++G5mbvxfCg39AAXV2ZB1VdfWQX7LD3WBL/W7tiRUYf7bQdxVIdAO6XwiEWGzLb/uPpH2baj56g5tA7/m8+cyZs3z7yPZJy+L5UNy8y6ugAlDEgVR5+a/s9MGsWVFR4fy9f7s3au7rAOejq4t0/eDR1sAecX/qmqQkefNAL6gnHHQePPea9r3OqmxcpY0rpjGLpznkd1NUF99/vBeNkGX5z+/O7pnCi3xd80jQiMjYo4I9iy17exN/++O7BTU9vVY3jmMPvjCyXzDIt11s1jlvPvoK7CjNMESkTCvijVSzG3z/1dSqT8ukTDqdO0fgZYGjOzuHtlL35gmY2z19UkGGKSPlQDj9MsdjQXHssFvx7W1uHBPtMhp8G21s1jsfmLqJ3aj0DGDsnTuGGi77IaTd8j5/PvUBlkiIRpBl+WGIxbyE10Re+qyt1OaSfLGriE0f/nf+n50a0M1jact6Q0s56lUmKRNbYKsscVpqYfGJS0c2a5V/T7lcOmc33DzMA/HrhJ7iq8cpg5ZsiMuZErywzMaNOKk2kuTm7NEohpZqhB525t7V5Z7oOM2AV7KudyADGq3XHs/mr3+IDP3kiULsFEYm2sTPDz3dGXWC906ZTu6t75PWp9dT27Az2JrEY3OB1rgS06UlEfEVvhp/vjLrAbvvgUnqrxg251ls1jts+uDT4mzQ1eRudtOlJRApg7AT8hobsrofs4dnzaVm4gp0TpwxWybQsXMHDs+cHfo81W7qZ376B2S1PM799A2u2jPyNQUQkqLFTpdPWNrQqBrz2vW1tJRnOtLoa1s45d0Sr4PqAbYLDPLtWRKIprxm+ma02s24z2xr/syjpazeZ2Utm9qKZLch/qBk0NUFHh5ezN/P+7ugoWQpk5YKTqamuHHItmzbBYZ5dKyLRVIgZ/h3Oua8nXzCzU4DLgDnANOAZM/sr59Kci1cIo6gPTGIWnmub4DDPrhWRaAorpXMx8Lhz7h3gZTN7CTgD+PeQnjcqBTmoJJVszq4VEQmiEIu2K8zsd2b2gJkdG79WD7ySdM/O+LURzKzZzDrNrHP37t0FGM7YkG9KSERkuIwB38yeMbPnff5cDNwHvBuYC+wCvpH4Np+38i34d851OOcanXONU6ZMyfFjjD1Bz64VEQkqY0rHOXdBkDcys28D/xx/uROYkfTl6UBP1qOLuHxSQiIiw+VbpTM16eVHgefj/14LXGZm48xsNnAS8Jt8niUiIvlQqi/RAAADH0lEQVTJd9H2NjObi5eu2Q58FsA5t83MngR+DxwGrg+9QofcD/wWEYmCvAK+c+6KNF9rA4q260kblURE0hszrRW0UUlEJL0xE/C1UUlEJL0xE/BTbUjSRiUREc+YCfjaqCQikt6Y6ZaZb+8aEZGxbswEfNBGJRGRdMZMSkdERNJTwBcRiQgFfBGRiFDAFxGJCAV8EZGIMOd829SXhJntBrpKPY4Cmwy8XupBFFkUPzPoc0fNaPrcM51zGQ8UGVUBfywys07nXGOpx1FMUfzMoM9d6nEUWzl+bqV0REQiQgFfRCQiFPDD11HqAZRAFD8z6HNHTdl9buXwRUQiQjN8EZGIUMAPmZndbmYvmNnvzOwpM6sr9ZiKwcwuNbNtZjZgZmVVyZALM1toZi+a2Utm1lLq8RSDmT1gZq+Z2fOlHksxmdkMM9toZn+I/3/8hlKPKSgF/PD9HHifc+6/A/8PuKnE4ymW54FLgGdLPZCwmVklcA/wEeAU4FNmdkppR1UUDwELSz2IEjgMfNE5917gLOD6cvnfWwE/ZM65nznnDsdf/gqYXsrxFItz7g/OuagcKHwG8JJz7k/OuYPA48DFJR5T6JxzzwJ7Sz2OYnPO7XLO/Uf8328CfwDKoi+7An5xXQX8pNSDkIKrB15Jer2TMgkAkh8zmwXMA35d2pEEM6YOQCkVM3sGONHnS63OuR/F72nF+1UwVsyxhSnI544I87mm8rcxzsyOAX4AfM45t7/U4wlCAb8AnHMXpPu6mS0DLgLOd2OoDjbT546QncCMpNfTgZ4SjUWKwMyq8YJ9zDn3w1KPJyildEJmZguBLwOLnXO9pR6PhOI54CQzm21mRwGXAWtLPCYJiZkZ8I/AH5xz3yz1eLKhgB++u4EJwM/NbKuZ3V/qARWDmX3UzHYCHwCeNrP1pR5TWOKL8iuA9XgLeE8657aVdlThM7PvAf8OnGxmO83sM6UeU5HMB64Azov/N73VzBaVelBBaKetiEhEaIYvIhIRCvgiIhGhgC8iEhEK+CIiEaGALyISEQr4IiIRoYAvIhIRCvgiIhHx/wGhIR4L4P6U6wAAAABJRU5ErkJggg==\n", 364 | "text/plain": [ 365 | "
" 366 | ] 367 | }, 368 | "metadata": { 369 | "needs_background": "light" 370 | }, 371 | "output_type": "display_data" 372 | } 373 | ], 374 | "source": [ 375 | "# création d'un vecteur prédictions qui contient les prédictions de notre modele final\n", 376 | "predictions = model(X, theta_final)\n", 377 | "\n", 378 | "# Affiche les résultats de prédictions (en rouge) par rapport a notre Dataset (en bleu)\n", 379 | "plt.scatter(x, y)\n", 380 | "plt.scatter(x, predictions, c='r')" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": {}, 386 | "source": [ 387 | "## 1.6 Courbes d'apprentissage\n", 388 | "Pour vérifier si notre algorithme de Descente de gradient a bien fonctionné, on observe l'évolution de la fonction cout à travers les itérations. On est sensé obtenir une courbe qui diminue à chaque itération jusqu'à stagner a un niveau minimal (proche de zéro). Si la courbe ne suit pas ce motif, alors le pas **learning_rate** est peut-etre trop élevé, il faut prendre un pas plus faible." 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 15, 394 | "metadata": {}, 395 | "outputs": [ 396 | { 397 | "data": { 398 | "text/plain": [ 399 | "[]" 400 | ] 401 | }, 402 | "execution_count": 15, 403 | "metadata": {}, 404 | "output_type": "execute_result" 405 | }, 406 | { 407 | "data": { 408 | "image/png": "\n", 409 | "text/plain": [ 410 | "
" 411 | ] 412 | }, 413 | "metadata": { 414 | "needs_background": "light" 415 | }, 416 | "output_type": "display_data" 417 | } 418 | ], 419 | "source": [ 420 | "plt.plot(range(n_iterations), cost_history)" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "## 1.7 Evaluation finale\n", 428 | "Pour évaluer la réelle performance de notre modèle avec une métrique populaire (pour votre patron, client, ou vos collegues) on peut utiliser le **coefficient de détermination**, aussi connu sous le nom $R^2$. Il nous vient de la méthode des moindres carrés. Plus le résultat est proche de 1, meilleur est votre modèle" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": 16, 434 | "metadata": {}, 435 | "outputs": [], 436 | "source": [ 437 | "def coef_determination(y, pred):\n", 438 | " u = ((y - pred)**2).sum()\n", 439 | " v = ((y - y.mean())**2).sum()\n", 440 | " return 1 - u/v" 441 | ] 442 | }, 443 | { 444 | "cell_type": "code", 445 | "execution_count": 17, 446 | "metadata": {}, 447 | "outputs": [ 448 | { 449 | "data": { 450 | "text/plain": [ 451 | "0.9287186469389942" 452 | ] 453 | }, 454 | "execution_count": 17, 455 | "metadata": {}, 456 | "output_type": "execute_result" 457 | } 458 | ], 459 | "source": [ 460 | "coef_determination(y, predictions)" 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "metadata": {}, 466 | "source": [ 467 | "# 2. Régression Multiples Variables\n", 468 | "C'est lorsqu'on integre plusieures variables $x_1, x_2, x_3, etc.$ à notre modèle que les choses commencent à devenir vraiment intéressantes. C'est peut-être aussi à ce moment que les gens commencent parfois à parler *d'intelligence artificielle*, car il est difficile pour un être humain de se représenter dans sa tête un modèle à plusieurs dimensions (nous n'évoluons que dans un espace 3D). On se dit alors que la machine, quant à elle, arrive à se réprésenter ces espaces, car elle y trouve le meilleur modèle (avec la descente de gradient) et les gens disent donc qu'elle est intelligente, alors que ce ne sont que des mathématiques.\n", 469 | "\n", 470 | "## 2.1 Dataset\n", 471 | "\n", 472 | "\n", 473 | "Dans ce notbook, nous allons créer un modèle à 2 variables $x_1, x_2$. Pour cela, il suffit d'injecter les différentes variables $x_1, x_2$ (les **features** en anglais) dans la matrice $X$, et de créer le vecteur $\\theta$ qui s'accorde avec:\n", 474 | "\n", 475 | "\n", 476 | "\n", 477 | "$X = \\begin{bmatrix} x^{(1)}_1 & x^{(1)}_2 & 1 \\\\ ... & ... & ... \\\\ x^{(m)}_1 & x^{(m)}_2 & 1 \\end{bmatrix}$\n", 478 | "\n", 479 | "$\\theta = \\begin{bmatrix} a\\\\b\\\\c \\end{bmatrix}$\n", 480 | "\n", 481 | "$y = \\begin{bmatrix} y^{(1)}\\\\...\\\\y^{(m)} \\end{bmatrix}$ *note : le vecteur $y$ reste le meme que pour la régression linéaire*\n" 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": 18, 487 | "metadata": {}, 488 | "outputs": [ 489 | { 490 | "data": { 491 | "text/plain": [ 492 | "" 493 | ] 494 | }, 495 | "execution_count": 18, 496 | "metadata": {}, 497 | "output_type": "execute_result" 498 | }, 499 | { 500 | "data": { 501 | "image/png": "\n", 502 | "text/plain": [ 503 | "
" 504 | ] 505 | }, 506 | "metadata": { 507 | "needs_background": "light" 508 | }, 509 | "output_type": "display_data" 510 | } 511 | ], 512 | "source": [ 513 | "np.random.seed(0) # permet de reproduire l'aléatoire\n", 514 | "\n", 515 | "x, y = make_regression(n_samples=100, n_features=2, noise = 10) # creation d'un dataset (x, y) linéaire\n", 516 | "\n", 517 | "plt.scatter(x[:,0], y) # afficher les résultats. x_1 en abscisse et y en ordonnée" 518 | ] 519 | }, 520 | { 521 | "cell_type": "markdown", 522 | "metadata": {}, 523 | "source": [ 524 | "Ce Dataset ne contenant que 2 variables $x_1 et x_2$ il est possible de le visualiser dans un espace 3D. Comme vous pouvez le voir, ce modèle peut être représenté par une surface. Au passage, cette surface est plane car make_regression nous retourne des données linéaire. Si on veut créer une surface non plane, il suffit de modifier la valeur de y comme nous l'avons fait au début de ce notebook. (Nous ne le ferons pas ici)" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": 25, 530 | "metadata": {}, 531 | "outputs": [ 532 | { 533 | "data": { 534 | "text/plain": [ 535 | "Text(0.5, 0, 'y')" 536 | ] 537 | }, 538 | "execution_count": 25, 539 | "metadata": {}, 540 | "output_type": "execute_result" 541 | } 542 | ], 543 | "source": [ 544 | "from mpl_toolkits.mplot3d import Axes3D\n", 545 | "#%matplotlib notebook #activez cette ligne pour manipuler le graph 3D\n", 546 | "\n", 547 | "ax = fig.add_subplot(111, projection='3d')\n", 548 | "\n", 549 | "ax.scatter(x[:,0], x[:,1], y) # affiche en 3D la variable x_1, x_2, et la target y\n", 550 | "\n", 551 | "# affiche les noms des axes\n", 552 | "ax.set_xlabel('x_1')\n", 553 | "ax.set_ylabel('x_2')\n", 554 | "ax.set_zlabel('y')" 555 | ] 556 | }, 557 | { 558 | "cell_type": "code", 559 | "execution_count": 26, 560 | "metadata": {}, 561 | "outputs": [ 562 | { 563 | "name": "stdout", 564 | "output_type": "stream", 565 | "text": [ 566 | "(100, 2)\n", 567 | "(100, 1)\n", 568 | "(100, 1)\n" 569 | ] 570 | } 571 | ], 572 | "source": [ 573 | "# Verification des dimensions\n", 574 | "print(x.shape)\n", 575 | "print(y.shape)\n", 576 | "\n", 577 | "# redimensionner y\n", 578 | "y = y.reshape(y.shape[0], 1)\n", 579 | "print(y.shape)" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 27, 585 | "metadata": {}, 586 | "outputs": [ 587 | { 588 | "name": "stdout", 589 | "output_type": "stream", 590 | "text": [ 591 | "(100, 3)\n", 592 | "[[ 1.05445173 -1.07075262 1. ]\n", 593 | " [-0.36274117 -0.63432209 1. ]\n", 594 | " [-0.85409574 0.3130677 1. ]\n", 595 | " [ 1.3263859 0.29823817 1. ]\n", 596 | " [-0.4615846 -1.31590741 1. ]\n", 597 | " [ 1.94362119 -1.17312341 1. ]\n", 598 | " [-1.60205766 0.62523145 1. ]\n", 599 | " [-0.40178094 0.17742614 1. ]\n", 600 | " [-0.97727788 1.86755799 1. ]\n", 601 | " [ 0.37816252 0.15494743 1. ]]\n" 602 | ] 603 | } 604 | ], 605 | "source": [ 606 | "# Création de la matrice X, inclut le Biais\n", 607 | "X = np.hstack((x, np.ones((x.shape[0], 1)))) # ajoute un vecteur Biais de dimension (x.shape[0], 1)\n", 608 | "\n", 609 | "print(X.shape)\n", 610 | "print(X[:10])" 611 | ] 612 | }, 613 | { 614 | "cell_type": "code", 615 | "execution_count": 28, 616 | "metadata": {}, 617 | "outputs": [ 618 | { 619 | "data": { 620 | "text/plain": [ 621 | "array([[-1.26459544],\n", 622 | " [ 0.85693318],\n", 623 | " [-0.49399435]])" 624 | ] 625 | }, 626 | "execution_count": 28, 627 | "metadata": {}, 628 | "output_type": "execute_result" 629 | } 630 | ], 631 | "source": [ 632 | "# Initialisation du vecteur theta aléatoire, avec 3 éléments (car X a trois colonnes)\n", 633 | "theta = np.random.randn(3, 1)\n", 634 | "theta" 635 | ] 636 | }, 637 | { 638 | "cell_type": "markdown", 639 | "metadata": {}, 640 | "source": [ 641 | "## 2.2 Modèle Linéaire\n", 642 | "## 2.3 Fonction Cout\n", 643 | "## 2.4 Gradient Descent\n", 644 | "\n", 645 | "Nos fonctions étant déjà implémentées, inutile de les réécrire. Passons imédiatement à la phase d'entrainement !" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": {}, 651 | "source": [ 652 | "## 2.5 Phase d'entrainement" 653 | ] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "execution_count": 29, 658 | "metadata": {}, 659 | "outputs": [], 660 | "source": [ 661 | "n_iterations = 1000\n", 662 | "learning_rate = 0.01\n", 663 | "\n", 664 | "theta_final, cost_history = gradient_descent(X, y, theta, learning_rate, n_iterations)" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": 30, 670 | "metadata": {}, 671 | "outputs": [ 672 | { 673 | "data": { 674 | "text/plain": [ 675 | "array([[28.67153869],\n", 676 | " [97.2952491 ],\n", 677 | " [-0.51147823]])" 678 | ] 679 | }, 680 | "execution_count": 30, 681 | "metadata": {}, 682 | "output_type": "execute_result" 683 | } 684 | ], 685 | "source": [ 686 | "# création d'un vecteur prédictions qui contient les prédictions de notre modele final\n", 687 | "predictions = model(X, theta_final)\n", 688 | "\n", 689 | "theta_final" 690 | ] 691 | }, 692 | { 693 | "cell_type": "code", 694 | "execution_count": 31, 695 | "metadata": {}, 696 | "outputs": [ 697 | { 698 | "data": { 699 | "text/plain": [ 700 | "" 701 | ] 702 | }, 703 | "execution_count": 31, 704 | "metadata": {}, 705 | "output_type": "execute_result" 706 | }, 707 | { 708 | "data": { 709 | "image/png": "\n", 710 | "text/plain": [ 711 | "
" 712 | ] 713 | }, 714 | "metadata": { 715 | "needs_background": "light" 716 | }, 717 | "output_type": "display_data" 718 | } 719 | ], 720 | "source": [ 721 | "fig = plt.figure()\n", 722 | "ax = fig.add_subplot(111, projection='3d')\n", 723 | "\n", 724 | "ax.scatter(x[:,0], x[:,1], y)\n", 725 | "ax.scatter(x[:,0], x[:,1], predictions)" 726 | ] 727 | }, 728 | { 729 | "cell_type": "markdown", 730 | "metadata": {}, 731 | "source": [ 732 | "## 2.6 Courbe d'apprentissage" 733 | ] 734 | }, 735 | { 736 | "cell_type": "code", 737 | "execution_count": 32, 738 | "metadata": {}, 739 | "outputs": [ 740 | { 741 | "data": { 742 | "text/plain": [ 743 | "[]" 744 | ] 745 | }, 746 | "execution_count": 32, 747 | "metadata": {}, 748 | "output_type": "execute_result" 749 | }, 750 | { 751 | "data": { 752 | "image/png": "\n", 753 | "text/plain": [ 754 | "
" 755 | ] 756 | }, 757 | "metadata": { 758 | "needs_background": "light" 759 | }, 760 | "output_type": "display_data" 761 | } 762 | ], 763 | "source": [ 764 | "plt.plot(range(n_iterations), cost_history)" 765 | ] 766 | }, 767 | { 768 | "cell_type": "markdown", 769 | "metadata": {}, 770 | "source": [ 771 | "## 1.7 Evaluation finale" 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": 33, 777 | "metadata": {}, 778 | "outputs": [ 779 | { 780 | "data": { 781 | "text/plain": [ 782 | "0.9916687122229687" 783 | ] 784 | }, 785 | "execution_count": 33, 786 | "metadata": {}, 787 | "output_type": "execute_result" 788 | } 789 | ], 790 | "source": [ 791 | "coef_determination(y, predictions)" 792 | ] 793 | }, 794 | { 795 | "cell_type": "markdown", 796 | "metadata": {}, 797 | "source": [ 798 | "Merci d'avoir suivi ce tutoriel. Abonnez-vous a ma chaine youtube pour ne pas louper d'autres tutos (chaque semaine de nouvelles vidéos!) https://www.youtube.com/channel/UCmpptkXu8iIFe6kfDK5o7VQ" 799 | ] 800 | }, 801 | { 802 | "cell_type": "code", 803 | "execution_count": null, 804 | "metadata": {}, 805 | "outputs": [], 806 | "source": [] 807 | } 808 | ], 809 | "metadata": { 810 | "kernelspec": { 811 | "display_name": "Python 3", 812 | "language": "python", 813 | "name": "python3" 814 | }, 815 | "language_info": { 816 | "codemirror_mode": { 817 | "name": "ipython", 818 | "version": 3 819 | }, 820 | "file_extension": ".py", 821 | "mimetype": "text/x-python", 822 | "name": "python", 823 | "nbconvert_exporter": "python", 824 | "pygments_lexer": "ipython3", 825 | "version": "3.7.3" 826 | } 827 | }, 828 | "nbformat": 4, 829 | "nbformat_minor": 2 830 | } 831 | -------------------------------------------------------------------------------- /Régression Linéaire Numpy.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Régression Linéaire Simple Numpy\n", 8 | "\n", 9 | "Guillaume Saint-Cirgue\n", 10 | "https://machinelearnia.com/\n", 11 | "\n", 12 | "Youtube: coming soon\n", 13 | "\n", 14 | "article associé: coming soon" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 31, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import numpy as np\n", 24 | "from sklearn.datasets import make_regression\n", 25 | "import matplotlib.pyplot as plt" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "# 1. Dataset\n", 33 | "Génération de données aléatoires avec une tendance linéaire avec make_regression: on a un dataset $(x, y)$ qui contient 100 exemples, et une seule variable $x$. Note: chaque fois que la cellule est executée, des données différentes sont générer. Utiliser np.random.seed(0) pour reproduire le meme Dataset a chaque fois." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 32, 39 | "metadata": {}, 40 | "outputs": [ 41 | { 42 | "data": { 43 | "text/plain": [ 44 | "" 45 | ] 46 | }, 47 | "execution_count": 32, 48 | "metadata": {}, 49 | "output_type": "execute_result" 50 | }, 51 | { 52 | "data": { 53 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD8CAYAAAB6paOMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAGPFJREFUeJzt3X+QXXV5x/HPk2XVTW3dUKLCQkzqZEKhEdLuIJ38I2hNxB+stIxQq0x1Jv/gTHWYjGHoDLbVYaeZVvpD7cSW0U6pkBYNaaONSOgwwzSVTRN+REiNIJBNRmJhW8fswGZ5+sfem9y9e37ee37e837NZHb37N17vneU85zzfJ/v8zV3FwCguZaVPQAAQLkIBADQcAQCAGg4AgEANByBAAAajkAAAA1HIACAhiMQAEDDEQgAoOHOKXsASZx33nm+evXqsocBALVy4MCBn7r7yrjX1SIQrF69WlNTU2UPAwBqxcyeS/I6UkMA0HCZBAIzu8vMXjSzJzuOnWtmD5jZD1tfV7SOm5n9pZkdNbPHzezXsxgDAKA3WT0RfE3S5q5j2yQ96O5rJT3Y+lmS3idpbevfFklfyWgMAIAeZBII3P1hSS91Hb5W0tdb339d0kTH8b/3BfsljZrZ+VmMAwCQXp5zBG9x9xOS1Pr65tbxMUkvdLzuWOvYIma2xcymzGzq5MmTOQ4TAJqtjKohCzi2ZHccd98haYckjY+Ps3sOgMbYdXBa2/ce0fGZWV0wOqKtm9ZpYsOS++XM5BkIfmJm57v7iVbq58XW8WOSLup43YWSjuc4DgCojV0Hp3XrN5/Q7Ny8JGl6Zla3fvMJScotGOSZGtot6abW9zdJur/j+Mdb1UNXSvrfdgoJAJpu+94jZ4JA2+zcvLbvPZLbOTN5IjCzb0h6l6TzzOyYpNslTUraaWaflPS8pOtbL/+2pGskHZV0StLvZzEGABgEx2dmUx3PQiaBwN1vDPnVuwNe65JuzuK8ADBoLhgd0XTARf+C0ZHczsnKYgCokK2b1mlkeGjRsZHhIW3dtC63c9ai1xAANEV7QnhQqoYAAD2Y2DCW64W/G4EAAFIousa/CAQCAEgorMZ/6rmX9NDTJ2sbHAgEAJBQWI3/3fufP9MeoYgFYFmjaggAEgqr5e/ugZP3ArCsEQgAIKE0tfx5LgDLGoEAABIKqvEP6qIp9b4AbNfBaW2c3Kc12/Zo4+Q+7To43dP7pEEgAICEJjaM6Y7r1mtsdEQmaWx0RB+9clVmC8Dak9HTM7NynZ1vyDsYMFkMYCAUVdYZVOM//rZzMzl3VMM5FpQBQIQyWjd3ymoBWBkN5yRSQwAGQBmtm/PI5YfNK+TZcE4iEAAYAEXfSeeVyy+j4ZxEIAAwAIq+k87rCSRoMvqO69bnnt5ijgBA7W3dtG7RHIGU7510nk8gRTeckwgEAAZAXq2bwyqRytg8Jk8EAgADIes76T/c9URoD6Gin0DyxhwBAHTZdXB6URBo66zpLyOXnxeeCACgy/a9R5YEgbb2PEAZufy88EQAAF2iJn3rOg8QhScCAAOr17YTYZPBJtV2HiAKTwQABlI/i77Cuox+9MpVA5MO6sQTAYCB0fkEsMxM874405+0gVte5ahVRSAAUCm9pnO6G891B4G2pIu+BmkyOA6BAEBl9NNFNKjtQ5BBnOztF3MEACqjnx4+Se7067zoK08EAgCV0U8Pn7A7/SGzgVj0lSdSQwAqo58ePmFtH7j4xyMQAChde4J4emZWJi1a1Zs0ndO0Sp8sEQgAlKp7gtilM8FgLOXFvEmVPlkiEAAoVdAEcTsIPLLt6nIG1TBMFgMoVVkbtuMsnggAROp1gVdSg7bJSx3xRAAgVF6btHcqa8N2nMUTAYBQUQu8snoqyKPaJ++nmEFDIAAQqqj8fZbVPv20qWgqUkMAQoXl6aPy97sOTmvj5D6t2bZHGyf3ZZpGSqKfNhVNlXsgMLMfm9kTZnbIzKZax841swfM7IetryvyHgeA9NLm74uYU4hDFVJ6RT0RXOXul7v7eOvnbZIedPe1kh5s/QygYtJu0l6Fu/FenmKarqw5gmslvav1/dcl/bukz5Y0FgAR0uTvq3A3HtZziCqkcEU8Ebik75rZATPb0jr2Fnc/IUmtr2/u/iMz22JmU2Y2dfLkyQKGCaBfVbgbT/sUg2KeCDa6+3Eze7OkB8zs6SR/5O47JO2QpPHx8eCthgBUSlXuxuk5lE7ugcDdj7e+vmhm35J0haSfmNn57n7CzM6X9GLe4wCQXtp6/CzWBPS7BoA1BOnlGgjM7BckLXP3n7W+f6+kP5a0W9JNkiZbX+/PcxwA0uu1Hj/p3XjQBVtSX2sAWEPQG/OQDZ4zeXOzX5H0rdaP50j6R3f/gpn9sqSdklZJel7S9e7+Utj7jI+P+9TUVG7jBLDUxsl9gT2AsugK2n3BlqThZaZ5d70WcElKes48x1xHZnago1ozVK5PBO7+jKTLAo7/j6R353luAP3JswIoqMx0LigCpDxnFaqW6oiVxQAC5VkBlPbCnPScVahaqiMCAYBAeXYFTXNhTnNOOpn2hqZzAAJFVQD1W5kTVGYaZMgs1RoA9i3uTa6TxVlhshiojqCJ3pHhodSLtjqDyZtGhvXzV09rbv7s9aiX98RilZgsBjB4stqjoLvMlPr/8hAIACwRdVHOqzKH1cDlIRAAWCRuUVYvewxzt19tVA0BWCSulXQd9yhANAIBgEXiUj913KMA0UgNAQOm3zRMktRP3fYoQDQCAVBj3Rf9qy5eqfsOTKdquhb3HlJ/i7J6mVNAsUgNATUVlHu/e//zqdIwQe9x34Fp/fZvjKXe2CVs03pW+1YfTwRATQXl3sOWh4alYcLy9w89fTJVt84k7Z+pGqouAgFQU2ly7GFpmLD3mJ6Z1cbJfYkv2HGLzFgjUG2khoCaSppjNyk0DRP1HmnKPJkQrjcCAVBTQbn3IK7wieK490ha5kn753ojNQTUVHfufZkt7PDVbSziYtz5HkGVPVL0XX274mh6ZlamxXMUTAjXB4EAqLHO3HtYV9C4i3H7PcK2eQy7q+8+n0tngsEYE8K1QiAAShZUx//Q0ydTV9j0W50TtEdAVCAJq1pasXxYkvSZew9p+94jBIQaYD8CoERBd/HdiuzLn2ZV8ppte0LLVTuxr0B52I8AqIGgu+pucb3+s+zsmabMM2zFcLde9ipAsagaAkqy6+B0ogupFD5hW2Znz6RVSxJlpFXHEwFQgvYFPKmwCduwhVy37HxMUnjZaBaC5iR+/sppzczOLXktZaTVRiAASpAkJdQWNWEb9kQx7x7bbC4LQdtN9lK5hHKRGgJKEJUq2fj2cxM3fBsyC32fMnr+p92rANXAEwFQgqiJ1kd+9JLGRkf0xY9cHru5e9ACsk5l5ObpK1Q/PBEAJYhLlXRP+oZNCrdr9sOQm0cSBAKgBBMbxjQ6En0R70zthE0Kv3xq6cRsG7l5JEVqCChId2rnA5edv2QnsG7t1E7aFA8tHpAGTwRAAeJ2AgvTTu2kSfG0nwQIAkiKQAAUICy1s+fxE2d+7q7/6UztpFm8VUa1EOqN1BBQgLDUzsun5s7k+aO6d7a/3rLzsdhKoajzAUEIBEABkvblaQeBsP2Cf2nknMgJ4s7zAUmRGgIK0G9fnvYcQ5IgQLUQ0iIQAAUIWnEbVj4adDcf1pJibHREd37kclbyoi+khoCC9NOXJ2pzeFbyol8EAkDZ9vRPqnu/4CGzRRU/necPm2NgLgBZIDWExuu3p/+ug9PaOLlPa7bt0cbJfan2ApjYMHZm/qBdDRR0/qA5BuYCkJXSngjMbLOkv5A0JOlv3X2yrLGgOYLu/MNq/JPsqtWd3mlfxKXk7Z+TnD9sP2JJZzadHzLTvLtGR4ZlJs2cmivs6Qb1VkogMLMhSV+S9FuSjkl61Mx2u/sPyhgPmiHsoh3W4iFJLX4/QSTuPN3H4+YY2k8UnRvD9BKY0DxlpYaukHTU3Z9x91cl3SPp2pLGgoYIu2iHSZJ/T3oRjxJ2nrjzJ93chpXGiFNWIBiT9ELHz8dax84wsy1mNmVmUydPnix0cBhMaS7OpsWtosPmAXq9iHfqNf+f5vOw0hhRygoEQdsqLVo37+473H3c3cdXrlxZ0LAwyNJcnF1nUylRk8lZTOL2uqtXms9DdRGilBUIjkm6qOPnCyUdL2ksaIigi3bYRo+dHUHDUkqfvveQpp57KZOtGSc2jOmRbVfr2cn365FtVyf6+6SrlakuQpyyqoYelbTWzNZImpZ0g6TfLWksaIigypurLl65ZE+A7gtnVFrlH/Y/L0mhvYHyFLQOgaoh9MI8QSfDXE5sdo2kO7VQPnqXu38h7LXj4+M+NTVV2NjQLHGLydrlmWGGzPSjO67p+zxA1szsgLuPx76urECQBoEAZQpqBdHtzq6N5pO8x8jwEH2BkKukgYCVxUCM9mRulLiVyFHrDYCyEQgw0Ppp/9BpYsOYfu/KVaG/j7uoZ7HeAMgLgQADq98eQt0+P7E+MhhEXdSzWG8A5IVAgIGVRzrm8xPrQzebj7qo0zQOVUYbagysqHRMPxU8WzetS7yPQFtY07iJDWNUE6F0BAIMrLAe/qPLh/vqGBp1UY/7u+7XZNG9FOgX5aMYWGElm68/Z9miDp1toyPDOnT7e3Mdz+d2Hz5z7hXLh+WuwLFEbWAPJJW0fJQnAlROVqmSsDv3z9x7KPD1M7Nz2nVwOpM78e7PcNXFK3Xv91/Q3Gtnb7yiNqKnmghFIhCgUrJOlQSlY9otGYKk2UcgTNBnuHv/80rz7E01EYpE1RAqpYiFV1GTulnciQd9hjRBgGoiFI1AgEopYuHVxIYxrVg+HPi7ZWY9rzNoSzvWFcuH++5eCvSD1BAqJazSJ+tUye0fvDSwf9C8e99VO2GfIcjwkOn2D17KhR+l4okAldLPwqs07STa/YOGbOmOBP2mosL2CRgZXqblw2f/k1uxfFjbf+cyggBKR/koKqeXqqFeu3uu2bYnNH8/1kfF0q6D0/qjfzm8pDKIjqMoEm2oMbCCAkVYJVBcPX7cXgP9XLjD3ps1AigKbahReb10Bg1rJBd2MY+buI3b7rGfNBEdR1EXBAKUotfOoGHlpUG5fil+krlz4/gwSSd+k56bNQKoGgIBSpFmvUDnk0PYRXnePXCS+aqLV8Y+dbQ3jg8LBtYaQ1p0HEVdUD6KUiRNmyTZJlJa2Df4juvWR25M37lKWVraeqLdfqJ71sxbrw36m6i5g16b0wFFY7IYpUg6kRo3mdvpx5PvT3SO0ZFhvXL6tcAKo0+H9CFqv4Y9h1EnTBaj0pKmTZJOrAaldcL+dmZ2LjQtFZYeGjJjz2EMLAIBStE5SRvVWiHJxGpY3j3tpOzxmdnQADUf8uRMBRAGAXMEKE1QZ9BuQbuBDS8zvfEN52jm1Fxk3j1sJ7E3DC8LbAF9wehIaF4/bJ0CFUAYBAQCVFo/E65hfyspcqvJsACVdntKoC6YLEYlFL1vb69tLKgAQp3QYgK1sFAe+rhm515bdJyKHKB/bFWJytt1cFpb/+mxRds3trUrcrhLB/JHIEBptu89EhgE2uIqcrLe1hJoKspHUZq4C31cRU4R21oCTUAgQGmiLvSm6L2FJbp7AlkhEKA0Wzet0/Cy4K6hH71yVWx6h+6eQDYIBCjNxIYxbb/+Mo2OnN1IfsXyYd35kcv1+Yn1sX9Pd08gG0wWo1RJVhdH/a1Ed0+gXwQC1Fo/gQTAAgIBMkVdP1A/BALE2nVwWp/bfVgzswuN2lYsH9btH7x0yQU+bV0/QQOoBiaLEam9+rcdBCTp5VNz2vrPjy3ZvjHt9pO97FkMIHsEAkQKW/07N++6Zedji/YCTlPXz2IwoDpyCwRm9jkzmzazQ61/13T87lYzO2pmR8xsU15jQP+iFmfNuy+6m39TRxlop6C6fhaDAdWR9xPBF9398ta/b0uSmV0i6QZJl0raLOnLZjYU9SYoT9LFWbNz8zJT4rp+FoMB1VFGauhaSfe4+yvu/qyko5KuKGEcSCBq9W+3mVNzZ7aflM7u87t975EluX8WgwHVkXfV0KfM7OOSpiTd4u4vSxqTtL/jNcdax9CHvCpw2u/RWTW0zKSgpqGdWz3GVQ+xGAyojr4CgZl9T9JbA351m6SvSPoTSd76+meSPqGFfmLdllxWzGyLpC2StGrVqn6GOfDybsfcvWir+3zS4rv5qIngzvdhMRhQDX0FAnd/T5LXmdlXJf1r68djki7q+PWFko4HvPcOSTukhR3K+hnnoEt64c1K3N08E8FAveRZNXR+x48flvRk6/vdkm4ws9eb2RpJayV9P69xNEHVLrxMBAP1kudk8Z+a2RNm9rikqyR9RpLc/bCknZJ+IOnfJN3s7vPhb4M4RV944xaDBU0EDw+Zfv7K6UXrDgBUQ26BwN0/5u7r3f0d7v4hdz/R8bsvuPvb3X2du38nrzE0RdEVOHGLwSY2jJ2pHjIttKSQSzOzc6wiBiqIlcUDoPvCOzY6ojuuW5/bRGySVNTEhjE9su1qPTv5fi1/3TlLViezihioDprODYgiK3AuGB3RdEAwCEtFVW0OA8BiPBE0zK6D09o4ua+vXH3aVBSTx0C1EQgaJKuOn2lTUawiBqqN1FCDZLneIE0qilXEQLURCBqkzFw9q4iB6iI11CDk6gEEIRA0CLl6AEFIDTUIuXoAQQgEDUOuHkA3UkMA0HA8EdRMXhvQAGguAkGNZLkBDQEFQBupoRqJ6/qZVFYrjAEMBgJBjWS1ICyrgAJgMBAIaiSrBWF0AwXQiUBQI1ktCGOFMYBOTBbXSNiCMEnaOLlvycRv2ITw1k3rFk06S6wwBpqMQFAz3QvCwiqJpp57SfcdmI6sMKJqCIAkmbvHv6pk4+PjPjU1VfYwKmnj5L7A3cKGzDQf8L/t2OiIHtl2dRFDA1AyMzvg7uNxr2OOoObCJniDgkDU6wE0F4Gg5sImeIfMUr0eQHMRCGourJLoxndeRMtpAIkwWVxzURO/4287lwlhALGYLAaAAcVkMQAgEQIBADQcgQAAGo7J4gKxBwCAKiIQFCTLTWUAIEukhgrCHgAAqopAUBD2AABQVQSCgrAHAICqIhAUJKtNZQAga0wWF4Q9AABUFYGgQN2byqRF+SmAPBAIaoLyUwB5YY6gJig/BZCXvgKBmV1vZofN7DUzG+/63a1mdtTMjpjZpo7jm1vHjprZtn7O3ySUnwLIS79PBE9Kuk7Sw50HzewSSTdIulTSZklfNrMhMxuS9CVJ75N0iaQbW69FDMpPAeSlr0Dg7k+5e1Bu4lpJ97j7K+7+rKSjkq5o/Tvq7s+4+6uS7mm9FjEoPwWQl7wmi8ck7e/4+VjrmCS90HX8nTmNYaBQfgogL7GBwMy+J+mtAb+6zd3vD/uzgGOu4CeQwC3SzGyLpC2StGrVqrhhNkK/5acAECQ2ELj7e3p432OSLur4+UJJx1vfhx3vPu8OSTukha0qexgDACCBvMpHd0u6wcxeb2ZrJK2V9H1Jj0paa2ZrzOx1WphQ3p3TGAAACfQ1R2BmH5b0V5JWStpjZofcfZO7HzaznZJ+IOm0pJvdfb71N5+StFfSkKS73P1wX58AANAXc69+1mV8fNynpqbKHgYA1IqZHXD38bjXsbIYABqOQAAADUcgAICGIxAAQMMRCACg4QZ6PwI2cgGAeAMbCNjIBQCSGdjUEBu5AEAyAxsI2MgFAJIZ2EDARi4AkMzABgI2cgGAZAZ2spiNXAAgmYENBBIbuQBAEgObGgIAJEMgAICGIxAAQMMRCACg4QgEANBwtdiq0sxOSnqu7HFk5DxJPy17ECXhszcTn708b3P3lXEvqkUgGCRmNpVkD9FBxGfnszdNXT47qSEAaDgCAQA0HIGgeDvKHkCJ+OzNxGevOOYIAKDheCIAgIYjEJTAzLab2dNm9riZfcvMRsseU1HM7HozO2xmr5lZ5asp+mVmm83siJkdNbNtZY+nSGZ2l5m9aGZPlj2WIpnZRWb2kJk91fr/+h+UPaY4BIJyPCDp19z9HZL+W9KtJY+nSE9Kuk7Sw2UPJG9mNiTpS5LeJ+kSSTea2SXljqpQX5O0uexBlOC0pFvc/VclXSnp5qr/704gKIG7f9fdT7d+3C/pwjLHUyR3f8rdm7Jx9BWSjrr7M+7+qqR7JF1b8pgK4+4PS3qp7HEUzd1PuPt/tb7/maSnJFW6Hz6BoHyfkPSdsgeBXIxJeqHj52Oq+AUB2TKz1ZI2SPrPckcSbaA3pimTmX1P0lsDfnWbu9/fes1tWniMvLvIseUtyWdvCAs4RpleQ5jZGyXdJ+nT7v5/ZY8nCoEgJ+7+nqjfm9lNkj4g6d0+YDW8cZ+9QY5Juqjj5wslHS9pLCiQmQ1rIQjc7e7fLHs8cUgNlcDMNkv6rKQPufupsseD3Dwqaa2ZrTGz10m6QdLukseEnJmZSfo7SU+5+5+XPZ4kCATl+GtJvyjpATM7ZGZ/U/aAimJmHzazY5J+U9IeM9tb9pjy0ioI+JSkvVqYMNzp7ofLHVVxzOwbkv5D0jozO2Zmnyx7TAXZKOljkq5u/fd9yMyuKXtQUVhZDAANxxMBADQcgQAAGo5AAAANRyAAgIYjEABAwxEIAKDhCAQA0HAEAgBouP8HWE9WJdbY2mYAAAAASUVORK5CYII=\n", 54 | "text/plain": [ 55 | "
" 56 | ] 57 | }, 58 | "metadata": { 59 | "needs_background": "light" 60 | }, 61 | "output_type": "display_data" 62 | } 63 | ], 64 | "source": [ 65 | "np.random.seed(0) # pour toujours reproduire le meme dataset\n", 66 | "x, y = make_regression(n_samples=100, n_features=1, noise=10)\n", 67 | "plt.scatter(x, y) # afficher les résultats. X en abscisse et y en ordonnée" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "Important: vérifier les dimensions de x et y. On remarque que y n'a pas les dimensions (100, 1). On corrige le probleme avec np.reshape" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 33, 80 | "metadata": {}, 81 | "outputs": [ 82 | { 83 | "name": "stdout", 84 | "output_type": "stream", 85 | "text": [ 86 | "(100, 1)\n", 87 | "(100,)\n", 88 | "(100, 1)\n" 89 | ] 90 | } 91 | ], 92 | "source": [ 93 | "print(x.shape)\n", 94 | "print(y.shape)\n", 95 | "\n", 96 | "# redimensionner y\n", 97 | "y = y.reshape(y.shape[0], 1)\n", 98 | "\n", 99 | "print(y.shape)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "Création de la matrice X qui contient la colonne de Biais. Pour ca, on colle l'un contre l'autre le vecteur x et un vecteur 1 (avec np.ones) de dimension égale a celle de x" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 34, 112 | "metadata": {}, 113 | "outputs": [ 114 | { 115 | "name": "stdout", 116 | "output_type": "stream", 117 | "text": [ 118 | "(100, 2)\n" 119 | ] 120 | } 121 | ], 122 | "source": [ 123 | "X = np.hstack((x, np.ones(x.shape)))\n", 124 | "print(X.shape)" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "Finalement, création d'un vecteur parametre $\\theta$, initialisé avec des coefficients aléatoires. Ce vecteur est de dimension (2, 1). Si on désire toujours reproduire le meme vecteur $\\theta$, on utilise comme avant np.random.seed(0)." 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 35, 137 | "metadata": {}, 138 | "outputs": [ 139 | { 140 | "data": { 141 | "text/plain": [ 142 | "array([[1.76405235],\n", 143 | " [0.40015721]])" 144 | ] 145 | }, 146 | "execution_count": 35, 147 | "metadata": {}, 148 | "output_type": "execute_result" 149 | } 150 | ], 151 | "source": [ 152 | "np.random.seed(0) # pour produire toujours le meme vecteur theta aléatoire\n", 153 | "theta = np.random.randn(2, 1)\n", 154 | "theta" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "# 2. Modele Linéaire\n", 162 | "On implémente un modele $F = X.\\theta$, puis on teste le modele pour voir s'il n'y a pas de bug (bonne pratique oblige). En plus, cela permet de voir a quoi ressemble le modele initial, défini par la valeur de $\\theta$" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 36, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [ 171 | "def model(X, theta):\n", 172 | " return X.dot(theta)" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 37, 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "data": { 182 | "text/plain": [ 183 | "[]" 184 | ] 185 | }, 186 | "execution_count": 37, 187 | "metadata": {}, 188 | "output_type": "execute_result" 189 | }, 190 | { 191 | "data": { 192 | "image/png": "\n", 193 | "text/plain": [ 194 | "
" 195 | ] 196 | }, 197 | "metadata": { 198 | "needs_background": "light" 199 | }, 200 | "output_type": "display_data" 201 | } 202 | ], 203 | "source": [ 204 | "plt.scatter(x, y)\n", 205 | "plt.plot(x, model(X, theta), c='r')" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "# 3. Fonction Cout : Erreur Quadratique moyenne\n", 213 | "On mesure les erreurs du modele sur le Dataset X, y en implémenterl'erreur quadratique moyenne, **Mean Squared Error (MSE)** en anglais.\n", 214 | "\n", 215 | "$ J(\\theta) = \\frac{1}{2m} \\sum (X.\\theta - y)^2 $\n", 216 | "\n", 217 | "Ensuite, on teste notre fonction, pour voir s'il n'y a pas de bug" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 38, 223 | "metadata": {}, 224 | "outputs": [], 225 | "source": [ 226 | "def cost_function(X, y, theta):\n", 227 | " m = len(y)\n", 228 | " return 1/(2*m) * np.sum((model(X, theta) - y)**2)" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 39, 234 | "metadata": {}, 235 | "outputs": [ 236 | { 237 | "data": { 238 | "text/plain": [ 239 | "905.6306841935502" 240 | ] 241 | }, 242 | "execution_count": 39, 243 | "metadata": {}, 244 | "output_type": "execute_result" 245 | } 246 | ], 247 | "source": [ 248 | "cost_function(X, y, theta)" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "# 4. Gradients et Descente de Gradient\n", 256 | "On implémente la formule du gradient pour la **MSE**\n", 257 | "\n", 258 | "$\\frac{\\partial J(\\theta) }{\\partial \\theta} = \\frac{1}{m} X^T.(X.\\theta - y)$\n", 259 | "\n", 260 | "Ensuite on utilise cette fonction dans la descente de gradient:\n", 261 | "\n", 262 | "$\\theta = \\theta - \\alpha \\frac{\\partial J(\\theta) }{\\partial \\theta}$\n" 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": 40, 268 | "metadata": {}, 269 | "outputs": [], 270 | "source": [ 271 | "def grad(X, y, theta):\n", 272 | " m = len(y)\n", 273 | " return 1/m * X.T.dot(model(X, theta) - y)" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": 45, 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [ 282 | "def gradient_descent(X, y, theta, learning_rate, n_iterations):\n", 283 | " \n", 284 | " cost_history = np.zeros(n_iterations) # création d'un tableau de stockage pour enregistrer l'évolution du Cout du modele\n", 285 | " \n", 286 | " for i in range(0, n_iterations):\n", 287 | " theta = theta - learning_rate * grad(X, y, theta) # mise a jour du parametre theta (formule du gradient descent)\n", 288 | " cost_history[i] = cost_function(X, y, theta) # on enregistre la valeur du Cout au tour i dans cost_history[i]\n", 289 | " \n", 290 | " return theta, cost_history" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "# 5. Phase d'entrainement\n", 298 | "On définit un **nombre d'itérations**, ainsi qu'un **pas d'apprentissage $\\alpha$**, et c'est partit !\n", 299 | "\n", 300 | "Une fois le modele entrainé, on observe les resultats par rapport a notre Dataset" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 46, 306 | "metadata": {}, 307 | "outputs": [], 308 | "source": [ 309 | "n_iterations = 1000\n", 310 | "learning_rate = 0.01\n", 311 | "\n", 312 | "\n", 313 | "theta_final, cost_history = gradient_descent(X, y, theta, learning_rate, n_iterations)" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 47, 319 | "metadata": {}, 320 | "outputs": [ 321 | { 322 | "data": { 323 | "text/plain": [ 324 | "array([[42.61765864],\n", 325 | " [-0.81309274]])" 326 | ] 327 | }, 328 | "execution_count": 47, 329 | "metadata": {}, 330 | "output_type": "execute_result" 331 | } 332 | ], 333 | "source": [ 334 | "theta_final # voici les parametres du modele une fois que la machine a été entrainée" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 48, 340 | "metadata": {}, 341 | "outputs": [ 342 | { 343 | "data": { 344 | "text/plain": [ 345 | "[]" 346 | ] 347 | }, 348 | "execution_count": 48, 349 | "metadata": {}, 350 | "output_type": "execute_result" 351 | }, 352 | { 353 | "data": { 354 | "image/png": "\n", 355 | "text/plain": [ 356 | "
" 357 | ] 358 | }, 359 | "metadata": { 360 | "needs_background": "light" 361 | }, 362 | "output_type": "display_data" 363 | } 364 | ], 365 | "source": [ 366 | "# création d'un vecteur prédictions qui contient les prédictions de notre modele final\n", 367 | "predictions = model(X, theta_final)\n", 368 | "\n", 369 | "# Affiche les résultats de prédictions (en rouge) par rapport a notre Dataset (en bleu)\n", 370 | "plt.scatter(x, y)\n", 371 | "plt.plot(x, predictions, c='r')" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "# 6. Courbes d'apprentissage\n", 379 | "Pour vérifier si notre algorithme de Descente de gradient a bien fonctionné, on observe l'évolution de la fonction cout a travers les itérations. On est sensé obtenir une courbe qui diminue a chaque itération jusqu'a stagner a un niveau minimal (proche de zéro). Si la courbe ne suit pas ce motif, alors le pas **learning_rate** est peut-etre trop élevé, il faut prendre un pas plus faible." 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 49, 385 | "metadata": {}, 386 | "outputs": [ 387 | { 388 | "data": { 389 | "text/plain": [ 390 | "[]" 391 | ] 392 | }, 393 | "execution_count": 49, 394 | "metadata": {}, 395 | "output_type": "execute_result" 396 | }, 397 | { 398 | "data": { 399 | "image/png": "\n", 400 | "text/plain": [ 401 | "
" 402 | ] 403 | }, 404 | "metadata": { 405 | "needs_background": "light" 406 | }, 407 | "output_type": "display_data" 408 | } 409 | ], 410 | "source": [ 411 | "plt.plot(range(n_iterations), cost_history)" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": {}, 417 | "source": [ 418 | "# 7. Evaluation finale\n", 419 | "Pour évaluer la réelle performance de notre modele avec une métrique populaire (pour votre patron, client, ou vos collegues) on peut utiliser le **coefficient de détermination**, aussi connu sous le nom $R^2$. Il nous vient de la méthode des moindres carrés. Plus le résultat est proche de 1, meilleur est votre modele" 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": 50, 425 | "metadata": {}, 426 | "outputs": [], 427 | "source": [ 428 | "def coef_determination(y, pred):\n", 429 | " u = ((y - pred)**2).sum()\n", 430 | " v = ((y - y.mean())**2).sum()\n", 431 | " return 1 - u/v" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": 51, 437 | "metadata": {}, 438 | "outputs": [ 439 | { 440 | "data": { 441 | "text/plain": [ 442 | "0.9417294706504984" 443 | ] 444 | }, 445 | "execution_count": 51, 446 | "metadata": {}, 447 | "output_type": "execute_result" 448 | } 449 | ], 450 | "source": [ 451 | "coef_determination(y, predictions)" 452 | ] 453 | }, 454 | { 455 | "cell_type": "markdown", 456 | "metadata": {}, 457 | "source": [ 458 | "Merci d'avoir suivi ce tutoriel. Abonnez-vous a ma chaine youtube pour ne pas louper d'autres tutos (chaque semaine de nouvelles vidéos!) https://www.youtube.com/channel/UCmpptkXu8iIFe6kfDK5o7VQ" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": {}, 465 | "outputs": [], 466 | "source": [] 467 | } 468 | ], 469 | "metadata": { 470 | "kernelspec": { 471 | "display_name": "Python 3", 472 | "language": "python", 473 | "name": "python3" 474 | }, 475 | "language_info": { 476 | "codemirror_mode": { 477 | "name": "ipython", 478 | "version": 3 479 | }, 480 | "file_extension": ".py", 481 | "mimetype": "text/x-python", 482 | "name": "python", 483 | "nbconvert_exporter": "python", 484 | "pygments_lexer": "ipython3", 485 | "version": "3.7.3" 486 | } 487 | }, 488 | "nbformat": 4, 489 | "nbformat_minor": 2 490 | } 491 | --------------------------------------------------------------------------------