├── GB KNN Model.ipynb
├── Linear_Regression.ipynb
└── Logistic Regression.ipynb
/Linear_Regression.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "GzfdMfk10NE6"
7 | },
8 | "source": [
9 | "## **Linear Regression with Python Scikit Learn**\n",
10 | "In this section we will see how the Python Scikit-Learn library for machine learning can be used to implement regression functions. We will start with simple linear regression involving two variables.\n",
11 | "\n",
12 | "### **Simple Linear Regression**\n",
13 | "In this regression task we will predict the percentage of marks that a student is expected to score based upon the number of hours they studied. This is a simple linear regression task as it involves just two variables."
14 | ]
15 | },
16 | {
17 | "cell_type": "code",
18 | "execution_count": 1,
19 | "metadata": {
20 | "id": "V9QN2ZxC38pB"
21 | },
22 | "outputs": [],
23 | "source": [
24 | "# Importing all libraries required in this notebook\n",
25 | "import pandas as pd\n",
26 | "import numpy as np \n",
27 | "import matplotlib.pyplot as plt "
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "metadata": {
34 | "colab": {
35 | "base_uri": "https://localhost:8080/",
36 | "height": 380
37 | },
38 | "id": "LtU4YMEhqm9m",
39 | "outputId": "cae4d898-e81c-4cb0-cae3-b668a921a074"
40 | },
41 | "outputs": [
42 | {
43 | "name": "stdout",
44 | "output_type": "stream",
45 | "text": [
46 | "Data imported successfully\n"
47 | ]
48 | },
49 | {
50 | "data": {
51 | "text/html": [
52 | "
\n",
53 | "\n",
66 | "
\n",
67 | " \n",
68 | " \n",
69 | " | \n",
70 | " Hours | \n",
71 | " Scores | \n",
72 | "
\n",
73 | " \n",
74 | " \n",
75 | " \n",
76 | " 0 | \n",
77 | " 2.5 | \n",
78 | " 21 | \n",
79 | "
\n",
80 | " \n",
81 | " 1 | \n",
82 | " 5.1 | \n",
83 | " 47 | \n",
84 | "
\n",
85 | " \n",
86 | " 2 | \n",
87 | " 3.2 | \n",
88 | " 27 | \n",
89 | "
\n",
90 | " \n",
91 | " 3 | \n",
92 | " 8.5 | \n",
93 | " 75 | \n",
94 | "
\n",
95 | " \n",
96 | " 4 | \n",
97 | " 3.5 | \n",
98 | " 30 | \n",
99 | "
\n",
100 | " \n",
101 | " 5 | \n",
102 | " 1.5 | \n",
103 | " 20 | \n",
104 | "
\n",
105 | " \n",
106 | " 6 | \n",
107 | " 9.2 | \n",
108 | " 88 | \n",
109 | "
\n",
110 | " \n",
111 | " 7 | \n",
112 | " 5.5 | \n",
113 | " 60 | \n",
114 | "
\n",
115 | " \n",
116 | " 8 | \n",
117 | " 8.3 | \n",
118 | " 81 | \n",
119 | "
\n",
120 | " \n",
121 | " 9 | \n",
122 | " 2.7 | \n",
123 | " 25 | \n",
124 | "
\n",
125 | " \n",
126 | "
\n",
127 | "
"
128 | ],
129 | "text/plain": [
130 | " Hours Scores\n",
131 | "0 2.5 21\n",
132 | "1 5.1 47\n",
133 | "2 3.2 27\n",
134 | "3 8.5 75\n",
135 | "4 3.5 30\n",
136 | "5 1.5 20\n",
137 | "6 9.2 88\n",
138 | "7 5.5 60\n",
139 | "8 8.3 81\n",
140 | "9 2.7 25"
141 | ]
142 | },
143 | "execution_count": 2,
144 | "metadata": {},
145 | "output_type": "execute_result"
146 | }
147 | ],
148 | "source": [
149 | "# Reading data from remote link\n",
150 | "url = \"https://raw.githubusercontent.com/mhassandata/Regression_model/main/score.csv\"\n",
151 | "s_data = pd.read_csv(url)\n",
152 | "print(\"Data imported successfully\")\n",
153 | "s_data.head(10)"
154 | ]
155 | },
156 | {
157 | "cell_type": "markdown",
158 | "metadata": {
159 | "id": "RHsPneuM4NgB"
160 | },
161 | "source": [
162 | "Let's plot our data points on 2-D graph to eyeball our dataset and see if we can manually find any relationship between the data. We can create the plot with the following script:"
163 | ]
164 | },
165 | {
166 | "cell_type": "code",
167 | "execution_count": 3,
168 | "metadata": {
169 | "colab": {
170 | "base_uri": "https://localhost:8080/",
171 | "height": 472
172 | },
173 | "id": "qxYBZkhAqpn9",
174 | "outputId": "36a7e99a-3ffa-41bf-da42-ce089c3e5dbd"
175 | },
176 | "outputs": [
177 | {
178 | "data": {
179 | "image/png": "",
180 | "text/plain": [
181 | ""
182 | ]
183 | },
184 | "metadata": {},
185 | "output_type": "display_data"
186 | }
187 | ],
188 | "source": [
189 | "# Plotting the distribution of scores\n",
190 | "s_data.plot(x='Hours', y='Scores', style='o') \n",
191 | "plt.title('Hours vs Percentage') \n",
192 | "plt.xlabel('Hours Studied') \n",
193 | "plt.ylabel('Percentage Score') \n",
194 | "plt.show()"
195 | ]
196 | },
197 | {
198 | "cell_type": "markdown",
199 | "metadata": {
200 | "id": "fiQaULio4Rzr"
201 | },
202 | "source": [
203 | "**From the graph above, we can clearly see that there is a positive linear relation between the number of hours studied and percentage of score.**"
204 | ]
205 | },
206 | {
207 | "cell_type": "markdown",
208 | "metadata": {
209 | "id": "WWtEr64M4jdz"
210 | },
211 | "source": [
212 | "### **Preparing the data**\n",
213 | "\n",
214 | "The next step is to divide the data into \"attributes\" (inputs) and \"labels\" (outputs)."
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": 4,
220 | "metadata": {
221 | "id": "LiJ5210e4tNX"
222 | },
223 | "outputs": [],
224 | "source": [
225 | "X = s_data.iloc[:, :-1].values \n",
226 | "y = s_data.iloc[:, 1].values "
227 | ]
228 | },
229 | {
230 | "cell_type": "code",
231 | "execution_count": 5,
232 | "metadata": {
233 | "colab": {
234 | "base_uri": "https://localhost:8080/"
235 | },
236 | "id": "0DrNCxfV_0sS",
237 | "outputId": "31344c84-6445-4b80-f7a9-17ea56604245"
238 | },
239 | "outputs": [
240 | {
241 | "data": {
242 | "text/plain": [
243 | "array([[2.5],\n",
244 | " [5.1],\n",
245 | " [3.2],\n",
246 | " [8.5],\n",
247 | " [3.5],\n",
248 | " [1.5],\n",
249 | " [9.2],\n",
250 | " [5.5],\n",
251 | " [8.3],\n",
252 | " [2.7],\n",
253 | " [7.7],\n",
254 | " [5.9],\n",
255 | " [4.5],\n",
256 | " [3.3],\n",
257 | " [1.1],\n",
258 | " [8.9],\n",
259 | " [2.5],\n",
260 | " [1.9],\n",
261 | " [6.1],\n",
262 | " [7.4],\n",
263 | " [2.7],\n",
264 | " [4.8],\n",
265 | " [3.8],\n",
266 | " [6.9],\n",
267 | " [7.8]])"
268 | ]
269 | },
270 | "execution_count": 5,
271 | "metadata": {},
272 | "output_type": "execute_result"
273 | }
274 | ],
275 | "source": [
276 | "X"
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": 6,
282 | "metadata": {
283 | "colab": {
284 | "base_uri": "https://localhost:8080/"
285 | },
286 | "id": "HKJA37KL_0sT",
287 | "outputId": "6ecc3cb9-084d-42eb-de9f-1c57256cca6d"
288 | },
289 | "outputs": [
290 | {
291 | "data": {
292 | "text/plain": [
293 | "array([21, 47, 27, 75, 30, 20, 88, 60, 81, 25, 85, 62, 41, 42, 17, 95, 30,\n",
294 | " 24, 67, 69, 30, 54, 35, 76, 86])"
295 | ]
296 | },
297 | "execution_count": 6,
298 | "metadata": {},
299 | "output_type": "execute_result"
300 | }
301 | ],
302 | "source": [
303 | "y"
304 | ]
305 | },
306 | {
307 | "cell_type": "markdown",
308 | "metadata": {
309 | "id": "Riz-ZiZ34fO4"
310 | },
311 | "source": [
312 | "Now that we have our attributes and labels, the next step is to split this data into training and test sets. We'll do this by using Scikit-Learn's built-in train_test_split() method:"
313 | ]
314 | },
315 | {
316 | "cell_type": "code",
317 | "execution_count": 7,
318 | "metadata": {
319 | "id": "udFYso1M4BNw"
320 | },
321 | "outputs": [],
322 | "source": [
323 | "from sklearn.model_selection import train_test_split\n",
324 | "\n",
325 | "X_train, X_test, y_train, y_test = train_test_split(X, y, \n",
326 | " test_size=0.2, random_state=0) "
327 | ]
328 | },
329 | {
330 | "cell_type": "code",
331 | "execution_count": 8,
332 | "metadata": {
333 | "colab": {
334 | "base_uri": "https://localhost:8080/"
335 | },
336 | "id": "LDQkaigQ_0sT",
337 | "outputId": "6162a217-b98c-4c7d-a51a-e417cd8b95cd"
338 | },
339 | "outputs": [
340 | {
341 | "data": {
342 | "text/plain": [
343 | "array([[3.8],\n",
344 | " [1.9],\n",
345 | " [7.8],\n",
346 | " [6.9],\n",
347 | " [1.1],\n",
348 | " [5.1],\n",
349 | " [7.7],\n",
350 | " [3.3],\n",
351 | " [8.3],\n",
352 | " [9.2],\n",
353 | " [6.1],\n",
354 | " [3.5],\n",
355 | " [2.7],\n",
356 | " [5.5],\n",
357 | " [2.7],\n",
358 | " [8.5],\n",
359 | " [2.5],\n",
360 | " [4.8],\n",
361 | " [8.9],\n",
362 | " [4.5]])"
363 | ]
364 | },
365 | "execution_count": 8,
366 | "metadata": {},
367 | "output_type": "execute_result"
368 | }
369 | ],
370 | "source": [
371 | "X_train"
372 | ]
373 | },
374 | {
375 | "cell_type": "markdown",
376 | "metadata": {
377 | "id": "a6WXptFU5CkC"
378 | },
379 | "source": [
380 | "### **Training the Algorithm**\n",
381 | "Now that we have split our data into training and testing sets, we can finally train our algorithm on it. "
382 | ]
383 | },
384 | {
385 | "cell_type": "code",
386 | "execution_count": 9,
387 | "metadata": {
388 | "colab": {
389 | "base_uri": "https://localhost:8080/"
390 | },
391 | "id": "qddCuaS84fpK",
392 | "outputId": "41a29249-38ab-4773-f7f6-e1ede34e7536"
393 | },
394 | "outputs": [],
395 | "source": [
396 | "from sklearn.linear_model import LinearRegression \n",
397 | "\n",
398 | "regressor = LinearRegression()"
399 | ]
400 | },
401 | {
402 | "cell_type": "code",
403 | "execution_count": 10,
404 | "metadata": {},
405 | "outputs": [
406 | {
407 | "name": "stdout",
408 | "output_type": "stream",
409 | "text": [
410 | "Training complete.\n"
411 | ]
412 | }
413 | ],
414 | "source": [
415 | "regressor.fit(X_train, y_train) \n",
416 | "\n",
417 | "print(\"Training complete.\")"
418 | ]
419 | },
420 | {
421 | "cell_type": "code",
422 | "execution_count": 11,
423 | "metadata": {},
424 | "outputs": [
425 | {
426 | "data": {
427 | "text/plain": [
428 | "array([16.88414476, 33.73226078, 75.357018 , 26.79480124, 60.49103328])"
429 | ]
430 | },
431 | "execution_count": 11,
432 | "metadata": {},
433 | "output_type": "execute_result"
434 | }
435 | ],
436 | "source": [
437 | "regressor.predict(X_test)"
438 | ]
439 | },
440 | {
441 | "cell_type": "code",
442 | "execution_count": 12,
443 | "metadata": {
444 | "colab": {
445 | "base_uri": "https://localhost:8080/",
446 | "height": 430
447 | },
448 | "id": "LO3cIcQV_0sU",
449 | "outputId": "9f399f85-9909-4797-b6e2-8b54b017aeda"
450 | },
451 | "outputs": [
452 | {
453 | "data": {
454 | "text/plain": [
455 | ""
456 | ]
457 | },
458 | "execution_count": 12,
459 | "metadata": {},
460 | "output_type": "execute_result"
461 | },
462 | {
463 | "data": {
464 | "image/png": "",
465 | "text/plain": [
466 | ""
467 | ]
468 | },
469 | "metadata": {},
470 | "output_type": "display_data"
471 | }
472 | ],
473 | "source": [
474 | "plt.scatter(X,y)"
475 | ]
476 | },
477 | {
478 | "cell_type": "code",
479 | "execution_count": 13,
480 | "metadata": {},
481 | "outputs": [
482 | {
483 | "name": "stdout",
484 | "output_type": "stream",
485 | "text": [
486 | "The calculated parameters are theta_1: 9.910656480642233, and theta_2: 2.0181600414346974\n"
487 | ]
488 | }
489 | ],
490 | "source": [
491 | "print(f\"The calculated parameters are theta_1: {regressor.coef_[0]}, and theta_2: {regressor.intercept_}\")"
492 | ]
493 | },
494 | {
495 | "cell_type": "code",
496 | "execution_count": 14,
497 | "metadata": {},
498 | "outputs": [],
499 | "source": [
500 | "# Plotting the regression line\n",
501 | "line = regressor.coef_*X_test+regressor.intercept_"
502 | ]
503 | },
504 | {
505 | "cell_type": "code",
506 | "execution_count": 15,
507 | "metadata": {},
508 | "outputs": [
509 | {
510 | "data": {
511 | "text/plain": [
512 | "array([21, 47, 27, 75, 30, 20, 88, 60, 81, 25, 85, 62, 41, 42, 17, 95, 30,\n",
513 | " 24, 67, 69, 30, 54, 35, 76, 86])"
514 | ]
515 | },
516 | "execution_count": 15,
517 | "metadata": {},
518 | "output_type": "execute_result"
519 | }
520 | ],
521 | "source": [
522 | "y"
523 | ]
524 | },
525 | {
526 | "cell_type": "code",
527 | "execution_count": 16,
528 | "metadata": {
529 | "colab": {
530 | "base_uri": "https://localhost:8080/",
531 | "height": 430
532 | },
533 | "id": "J61NX2_2-px7",
534 | "outputId": "20d96bf4-8f2c-4a15-c004-b34a63f0f815"
535 | },
536 | "outputs": [
537 | {
538 | "data": {
539 | "image/png": "",
540 | "text/plain": [
541 | ""
542 | ]
543 | },
544 | "metadata": {},
545 | "output_type": "display_data"
546 | },
547 | {
548 | "data": {
549 | "image/png": "",
550 | "text/plain": [
551 | ""
552 | ]
553 | },
554 | "metadata": {},
555 | "output_type": "display_data"
556 | }
557 | ],
558 | "source": [
559 | "preds = regressor.predict(X)\n",
560 | "\n",
561 | "plt.subplot(1, 2, 1)\n",
562 | "plt.scatter(X, y)\n",
563 | "plt.plot(X, preds)\n",
564 | "plt.show()\n",
565 | "\n",
566 | "# Plotting for the test data\n",
567 | "plt.subplot(1, 2, 2)\n",
568 | "plt.scatter(X_test, y_test, color=\"red\")\n",
569 | "plt.plot(X_test, line)\n",
570 | "plt.show()"
571 | ]
572 | },
573 | {
574 | "cell_type": "markdown",
575 | "metadata": {
576 | "id": "JCQn-g4m5OK2"
577 | },
578 | "source": [
579 | "### **Making Predictions**\n",
580 | "Now that we have trained our algorithm, it's time to make some predictions."
581 | ]
582 | },
583 | {
584 | "cell_type": "code",
585 | "execution_count": 15,
586 | "metadata": {
587 | "colab": {
588 | "base_uri": "https://localhost:8080/"
589 | },
590 | "id": "Tt-Fmzu55EGM",
591 | "outputId": "f03a010b-399e-49c2-ab18-1f41f4b13a89"
592 | },
593 | "outputs": [
594 | {
595 | "name": "stdout",
596 | "output_type": "stream",
597 | "text": [
598 | "[[1.5]\n",
599 | " [3.2]\n",
600 | " [7.4]\n",
601 | " [2.5]\n",
602 | " [5.9]]\n"
603 | ]
604 | }
605 | ],
606 | "source": [
607 | "print(X_test) # Testing data - In Hours\n",
608 | "y_pred = regressor.predict(X_test) # Predicting the scores"
609 | ]
610 | },
611 | {
612 | "cell_type": "code",
613 | "execution_count": 16,
614 | "metadata": {
615 | "colab": {
616 | "base_uri": "https://localhost:8080/",
617 | "height": 206
618 | },
619 | "id": "6bmZUMZh5QLb",
620 | "outputId": "943e567f-fe9d-43ad-9c81-a841d1026ef1"
621 | },
622 | "outputs": [
623 | {
624 | "data": {
625 | "text/html": [
626 | "\n",
627 | " \n",
628 | "
\n",
629 | "
\n",
630 | "\n",
643 | "
\n",
644 | " \n",
645 | " \n",
646 | " | \n",
647 | " Actual | \n",
648 | " Predicted | \n",
649 | "
\n",
650 | " \n",
651 | " \n",
652 | " \n",
653 | " 0 | \n",
654 | " 20 | \n",
655 | " 16.884145 | \n",
656 | "
\n",
657 | " \n",
658 | " 1 | \n",
659 | " 27 | \n",
660 | " 33.732261 | \n",
661 | "
\n",
662 | " \n",
663 | " 2 | \n",
664 | " 69 | \n",
665 | " 75.357018 | \n",
666 | "
\n",
667 | " \n",
668 | " 3 | \n",
669 | " 30 | \n",
670 | " 26.794801 | \n",
671 | "
\n",
672 | " \n",
673 | " 4 | \n",
674 | " 62 | \n",
675 | " 60.491033 | \n",
676 | "
\n",
677 | " \n",
678 | "
\n",
679 | "
\n",
680 | "
\n",
690 | " \n",
691 | " \n",
728 | "\n",
729 | " \n",
753 | "
\n",
754 | "
\n",
755 | " "
756 | ],
757 | "text/plain": [
758 | " Actual Predicted\n",
759 | "0 20 16.884145\n",
760 | "1 27 33.732261\n",
761 | "2 69 75.357018\n",
762 | "3 30 26.794801\n",
763 | "4 62 60.491033"
764 | ]
765 | },
766 | "execution_count": 16,
767 | "metadata": {},
768 | "output_type": "execute_result"
769 | }
770 | ],
771 | "source": [
772 | "# Comparing Actual vs Predicted\n",
773 | "df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred}) \n",
774 | "df "
775 | ]
776 | },
777 | {
778 | "cell_type": "code",
779 | "execution_count": 21,
780 | "metadata": {
781 | "colab": {
782 | "base_uri": "https://localhost:8080/"
783 | },
784 | "id": "KAFO8zbx-AH1",
785 | "outputId": "3c003ec3-e681-47fb-a684-e71ab2cee7f5"
786 | },
787 | "outputs": [
788 | {
789 | "name": "stdout",
790 | "output_type": "stream",
791 | "text": [
792 | "No of Hours = 9.5\n",
793 | "Predicted Score = 96.16939660753593\n"
794 | ]
795 | }
796 | ],
797 | "source": [
798 | "#You can also test with your own data\n",
799 | "hours = 9.5\n",
800 | "own_pred = regressor.predict([[9.5]])\n",
801 | "print(\"No of Hours = {}\".format(hours))\n",
802 | "print(\"Predicted Score = {}\".format(own_pred[0]))"
803 | ]
804 | },
805 | {
806 | "cell_type": "markdown",
807 | "metadata": {
808 | "id": "0AAsPVA_6KmK"
809 | },
810 | "source": [
811 | "### **Evaluating the model**\n",
812 | "\n",
813 | "The final step is to evaluate the performance of algorithm. This step is particularly important to compare how well different algorithms perform on a particular dataset. For simplicity here, we have chosen the mean square error. There are many such metrics."
814 | ]
815 | },
816 | {
817 | "cell_type": "code",
818 | "execution_count": 22,
819 | "metadata": {
820 | "colab": {
821 | "base_uri": "https://localhost:8080/"
822 | },
823 | "id": "r5UOrRH-5VCQ",
824 | "outputId": "a4bc5295-e596-40c4-faee-f72ef065f366"
825 | },
826 | "outputs": [
827 | {
828 | "name": "stdout",
829 | "output_type": "stream",
830 | "text": [
831 | "Mean Absolute Error: 4.183859899002982\n"
832 | ]
833 | }
834 | ],
835 | "source": [
836 | "from sklearn import metrics \n",
837 | "print('Mean Absolute Error:', \n",
838 | " metrics.mean_absolute_error(y_test, y_pred)) "
839 | ]
840 | },
841 | {
842 | "cell_type": "code",
843 | "execution_count": 18,
844 | "metadata": {
845 | "id": "1MzDbtLh_0sX"
846 | },
847 | "outputs": [],
848 | "source": []
849 | }
850 | ],
851 | "metadata": {
852 | "colab": {
853 | "provenance": []
854 | },
855 | "kernelspec": {
856 | "display_name": "Python 3 (ipykernel)",
857 | "language": "python",
858 | "name": "python3"
859 | },
860 | "language_info": {
861 | "codemirror_mode": {
862 | "name": "ipython",
863 | "version": 3
864 | },
865 | "file_extension": ".py",
866 | "mimetype": "text/x-python",
867 | "name": "python",
868 | "nbconvert_exporter": "python",
869 | "pygments_lexer": "ipython3",
870 | "version": "3.12.4"
871 | }
872 | },
873 | "nbformat": 4,
874 | "nbformat_minor": 0
875 | }
876 |
--------------------------------------------------------------------------------