├── Housing Price Prediction.md ├── README.md ├── output_29_1.png └── output_31_1.png /Housing Price Prediction.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |

Data Analysis with Python

5 | 6 | # House Sales in King County, USA 7 | 8 | This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. 9 | 10 | id :a notation for a house 11 | 12 | date: Date house was sold 13 | 14 | 15 | price: Price is prediction target 16 | 17 | 18 | bedrooms: Number of Bedrooms/House 19 | 20 | 21 | bathrooms: Number of bathrooms/bedrooms 22 | 23 | sqft_living: square footage of the home 24 | 25 | sqft_lot: square footage of the lot 26 | 27 | 28 | floors :Total floors (levels) in house 29 | 30 | 31 | waterfront :House which has a view to a waterfront 32 | 33 | 34 | view: Has been viewed 35 | 36 | 37 | condition :How good the condition is Overall 38 | 39 | grade: overall grade given to the housing unit, based on King County grading system 40 | 41 | 42 | sqft_above :square footage of house apart from basement 43 | 44 | 45 | sqft_basement: square footage of the basement 46 | 47 | yr_built :Built Year 48 | 49 | 50 | yr_renovated :Year when house was renovated 51 | 52 | zipcode:zip code 53 | 54 | 55 | lat: Latitude coordinate 56 | 57 | long: Longitude coordinate 58 | 59 | sqft_living15 :Living room area in 2015(implies-- some renovations) This might or might not have affected the lotsize area 60 | 61 | 62 | sqft_lot15 :lotSize area in 2015(implies-- some renovations) 63 | 64 | You will require the following libraries 65 | 66 | 67 | ```python 68 | import pandas as pd 69 | import matplotlib.pyplot as plt 70 | import numpy as np 71 | import seaborn as sns 72 | from sklearn.pipeline import Pipeline 73 | from sklearn.preprocessing import StandardScaler,PolynomialFeatures 74 | %matplotlib inline 75 | ``` 76 | 77 | # 1.0 Importing the Data 78 | 79 | Load the csv: 80 | 81 | 82 | ```python 83 | file_name='https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/coursera/project/kc_house_data_NaN.csv' 84 | df=pd.read_csv(file_name) 85 | ``` 86 | 87 | 88 | we use the method head to display the first 5 columns of the dataframe. 89 | 90 | 91 | ```python 92 | df.head(5) 93 | ``` 94 | 95 | 96 | 97 | 98 |
99 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 |
Unnamed: 0iddatepricebedroomsbathroomssqft_livingsqft_lotfloorswaterfront...gradesqft_abovesqft_basementyr_builtyr_renovatedzipcodelatlongsqft_living15sqft_lot15
00712930052020141013T000000221900.03.01.00118056501.00...711800195509817847.5112-122.25713405650
11641410019220141209T000000538000.03.02.25257072422.00...72170400195119919812547.7210-122.31916907639
22563150040020150225T000000180000.02.01.00770100001.00...67700193309802847.7379-122.23327208062
33248720087520141209T000000604000.04.03.00196050001.00...71050910196509813647.5208-122.39313605000
44195440051020150218T000000510000.03.02.00168080801.00...816800198709807447.6168-122.04518007503
262 |

5 rows × 22 columns

263 |
264 | 265 | 266 | 267 | #### Question 1 268 | Display the data types of each column using the attribute dtype, then take a screenshot and submit it, include your code in the image. 269 | 270 | 271 | ```python 272 | df.dtypes 273 | ``` 274 | 275 | 276 | 277 | 278 | Unnamed: 0 int64 279 | id int64 280 | date object 281 | price float64 282 | bedrooms float64 283 | bathrooms float64 284 | sqft_living int64 285 | sqft_lot int64 286 | floors float64 287 | waterfront int64 288 | view int64 289 | condition int64 290 | grade int64 291 | sqft_above int64 292 | sqft_basement int64 293 | yr_built int64 294 | yr_renovated int64 295 | zipcode int64 296 | lat float64 297 | long float64 298 | sqft_living15 int64 299 | sqft_lot15 int64 300 | dtype: object 301 | 302 | 303 | 304 | We use the method describe to obtain a statistical summary of the dataframe. 305 | 306 | 307 | ```python 308 | df.describe() 309 | ``` 310 | 311 | 312 | 313 | 314 |
315 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 | 416 | 417 | 418 | 419 | 420 | 421 | 422 | 423 | 424 | 425 | 426 | 427 | 428 | 429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | 441 | 442 | 443 | 444 | 445 | 446 | 447 | 448 | 449 | 450 | 451 | 452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 | 468 | 469 | 470 | 471 | 472 | 473 | 474 | 475 | 476 | 477 | 478 | 479 | 480 | 481 | 482 | 483 | 484 | 485 | 486 | 487 | 488 | 489 | 490 | 491 | 492 | 493 | 494 | 495 | 496 | 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 510 | 511 | 512 | 513 | 514 | 515 | 516 | 517 | 518 | 519 | 520 | 521 | 522 | 523 | 524 | 525 | 526 | 527 | 528 | 529 | 530 | 531 | 532 | 533 | 534 | 535 | 536 | 537 | 538 | 539 | 540 | 541 | 542 | 543 | 544 | 545 | 546 | 547 | 548 | 549 |
Unnamed: 0idpricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontview...gradesqft_abovesqft_basementyr_builtyr_renovatedzipcodelatlongsqft_living15sqft_lot15
count21613.000002.161300e+042.161300e+0421600.00000021603.00000021613.0000002.161300e+0421613.00000021613.00000021613.000000...21613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.000000
mean10806.000004.580302e+095.400881e+053.3728702.1157362079.8997361.510697e+041.4943090.0075420.234303...7.6568731788.390691291.5090451971.00513684.40225898077.93980547.560053-122.2138961986.55249212768.455652
std6239.280022.876566e+093.671272e+050.9266570.768996918.4408974.142051e+040.5399890.0865170.766318...1.175459828.090978442.57504329.373411401.67924053.5050260.1385640.140828685.39130427304.179631
min0.000001.000102e+067.500000e+041.0000000.500000290.0000005.200000e+021.0000000.0000000.000000...1.000000290.0000000.0000001900.0000000.00000098001.00000047.155900-122.519000399.000000651.000000
25%5403.000002.123049e+093.219500e+053.0000001.7500001427.0000005.040000e+031.0000000.0000000.000000...7.0000001190.0000000.0000001951.0000000.00000098033.00000047.471000-122.3280001490.0000005100.000000
50%10806.000003.904930e+094.500000e+053.0000002.2500001910.0000007.618000e+031.5000000.0000000.000000...7.0000001560.0000000.0000001975.0000000.00000098065.00000047.571800-122.2300001840.0000007620.000000
75%16209.000007.308900e+096.450000e+054.0000002.5000002550.0000001.068800e+042.0000000.0000000.000000...8.0000002210.000000560.0000001997.0000000.00000098118.00000047.678000-122.1250002360.00000010083.000000
max21612.000009.900000e+097.700000e+0633.0000008.00000013540.0000001.651359e+063.5000001.0000004.000000...13.0000009410.0000004820.0000002015.0000002015.00000098199.00000047.777600-121.3150006210.000000871200.000000
550 |

8 rows × 21 columns

551 |
552 | 553 | 554 | 555 | # 2.0 Data Wrangling 556 | 557 | #### Question 2 558 | Drop the columns "id" and "Unnamed: 0" from axis 1 using the method drop(), then use the method describe() to obtain a statistical summary of the data. Take a screenshot and submit it, make sure the inplace parameter is set to True 559 | 560 | 561 | ```python 562 | df.drop('id', axis=1, inplace=True) 563 | df.drop('Unnamed: 0', axis=1, inplace=True) 564 | df.describe() 565 | ``` 566 | 567 | 568 | 569 | 570 |
571 | 584 | 585 | 586 | 587 | 588 | 589 | 590 | 591 | 592 | 593 | 594 | 595 | 596 | 597 | 598 | 599 | 600 | 601 | 602 | 603 | 604 | 605 | 606 | 607 | 608 | 609 | 610 | 611 | 612 | 613 | 614 | 615 | 616 | 617 | 618 | 619 | 620 | 621 | 622 | 623 | 624 | 625 | 626 | 627 | 628 | 629 | 630 | 631 | 632 | 633 | 634 | 635 | 636 | 637 | 638 | 639 | 640 | 641 | 642 | 643 | 644 | 645 | 646 | 647 | 648 | 649 | 650 | 651 | 652 | 653 | 654 | 655 | 656 | 657 | 658 | 659 | 660 | 661 | 662 | 663 | 664 | 665 | 666 | 667 | 668 | 669 | 670 | 671 | 672 | 673 | 674 | 675 | 676 | 677 | 678 | 679 | 680 | 681 | 682 | 683 | 684 | 685 | 686 | 687 | 688 | 689 | 690 | 691 | 692 | 693 | 694 | 695 | 696 | 697 | 698 | 699 | 700 | 701 | 702 | 703 | 704 | 705 | 706 | 707 | 708 | 709 | 710 | 711 | 712 | 713 | 714 | 715 | 716 | 717 | 718 | 719 | 720 | 721 | 722 | 723 | 724 | 725 | 726 | 727 | 728 | 729 | 730 | 731 | 732 | 733 | 734 | 735 | 736 | 737 | 738 | 739 | 740 | 741 | 742 | 743 | 744 | 745 | 746 | 747 | 748 | 749 | 750 | 751 | 752 | 753 | 754 | 755 | 756 | 757 | 758 | 759 | 760 | 761 | 762 | 763 | 764 | 765 | 766 | 767 | 768 | 769 | 770 | 771 | 772 | 773 | 774 | 775 | 776 | 777 | 778 | 779 | 780 | 781 | 782 | 783 | 784 | 785 | 786 | 787 |
pricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditiongradesqft_abovesqft_basementyr_builtyr_renovatedzipcodelatlongsqft_living15sqft_lot15
count2.161300e+0421600.00000021603.00000021613.0000002.161300e+0421613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.00000021613.000000
mean5.400881e+053.3728702.1157362079.8997361.510697e+041.4943090.0075420.2343033.4094307.6568731788.390691291.5090451971.00513684.40225898077.93980547.560053-122.2138961986.55249212768.455652
std3.671272e+050.9266570.768996918.4408974.142051e+040.5399890.0865170.7663180.6507431.175459828.090978442.57504329.373411401.67924053.5050260.1385640.140828685.39130427304.179631
min7.500000e+041.0000000.500000290.0000005.200000e+021.0000000.0000000.0000001.0000001.000000290.0000000.0000001900.0000000.00000098001.00000047.155900-122.519000399.000000651.000000
25%3.219500e+053.0000001.7500001427.0000005.040000e+031.0000000.0000000.0000003.0000007.0000001190.0000000.0000001951.0000000.00000098033.00000047.471000-122.3280001490.0000005100.000000
50%4.500000e+053.0000002.2500001910.0000007.618000e+031.5000000.0000000.0000003.0000007.0000001560.0000000.0000001975.0000000.00000098065.00000047.571800-122.2300001840.0000007620.000000
75%6.450000e+054.0000002.5000002550.0000001.068800e+042.0000000.0000000.0000004.0000008.0000002210.000000560.0000001997.0000000.00000098118.00000047.678000-122.1250002360.00000010083.000000
max7.700000e+0633.0000008.00000013540.0000001.651359e+063.5000001.0000004.0000005.00000013.0000009410.0000004820.0000002015.0000002015.00000098199.00000047.777600-121.3150006210.000000871200.000000
788 |
789 | 790 | 791 | 792 | we can see we have missing values for the columns bedrooms and bathrooms 793 | 794 | 795 | ```python 796 | print("number of NaN values for the column bedrooms :", df['bedrooms'].isnull().sum()) 797 | print("number of NaN values for the column bathrooms :", df['bathrooms'].isnull().sum()) 798 | 799 | ``` 800 | 801 | number of NaN values for the column bedrooms : 13 802 | number of NaN values for the column bathrooms : 10 803 | 804 | 805 | 806 | We can replace the missing values of the column 'bedrooms' with the mean of the column 'bedrooms' using the method replace. Don't forget to set the inplace parameter top True 807 | 808 | 809 | ```python 810 | mean=df['bedrooms'].mean() 811 | df['bedrooms'].replace(np.nan,mean, inplace=True) 812 | ``` 813 | 814 | 815 | We also replace the missing values of the column 'bathrooms' with the mean of the column 'bedrooms' using the method replace.Don't forget to set the inplace parameter top Ture 816 | 817 | 818 | ```python 819 | mean=df['bathrooms'].mean() 820 | df['bathrooms'].replace(np.nan,mean, inplace=True) 821 | ``` 822 | 823 | 824 | ```python 825 | print("number of NaN values for the column bedrooms :", df['bedrooms'].isnull().sum()) 826 | print("number of NaN values for the column bathrooms :", df['bathrooms'].isnull().sum()) 827 | ``` 828 | 829 | number of NaN values for the column bedrooms : 0 830 | number of NaN values for the column bathrooms : 0 831 | 832 | 833 | # 3.0 Exploratory data analysis 834 | 835 | #### Question 3 836 | Use the method value_counts to count the number of houses with unique floor values, use the method .to_frame() to convert it to a dataframe. 837 | 838 | 839 | 840 | ```python 841 | df['floors'].value_counts().to_frame() 842 | ``` 843 | 844 | 845 | 846 | 847 |
848 | 861 | 862 | 863 | 864 | 865 | 866 | 867 | 868 | 869 | 870 | 871 | 872 | 873 | 874 | 875 | 876 | 877 | 878 | 879 | 880 | 881 | 882 | 883 | 884 | 885 | 886 | 887 | 888 | 889 | 890 | 891 | 892 | 893 | 894 |
floors
1.010680
2.08241
1.51910
3.0613
2.5161
3.58
895 |
896 | 897 | 898 | 899 | ### Question 4 900 | Use the function boxplot in the seaborn library to determine whether houses with a waterfront view or without a waterfront view have more price outliers . 901 | 902 | 903 | ```python 904 | sns.boxplot(x='waterfront', y='price', data=df) 905 | ``` 906 | 907 | 908 | 909 | 910 | 911 | 912 | 913 | 914 | 915 | ![png](output_29_1.png) 916 | 917 | 918 | ### Question 5 919 | Use the function regplot in the seaborn library to determine if the feature sqft_above is negatively or positively correlated with price. 920 | 921 | 922 | ```python 923 | sns.regplot(x='sqft_above', y='price', data=df) 924 | ``` 925 | 926 | 927 | 928 | 929 | 930 | 931 | 932 | 933 | 934 | ![png](output_31_1.png) 935 | 936 | 937 | 938 | We can use the Pandas method corr() to find the feature other than price that is most correlated with price. 939 | 940 | 941 | ```python 942 | df.corr()['price'].sort_values() 943 | ``` 944 | 945 | # Module 4: Model Development 946 | 947 | Import libraries 948 | 949 | 950 | ```python 951 | import matplotlib.pyplot as plt 952 | from sklearn.linear_model import LinearRegression 953 | 954 | ``` 955 | 956 | 957 | We can Fit a linear regression model using the longitude feature 'long' and caculate the R^2. 958 | 959 | 960 | ```python 961 | X = df[['long']] 962 | Y = df['price'] 963 | lm = LinearRegression() 964 | lm 965 | lm.fit(X,Y) 966 | lm.score(X, Y) 967 | ``` 968 | 969 | ### Question 6 970 | Fit a linear regression model to predict the 'price' using the feature 'sqft_living' then calculate the R^2. Take a screenshot of your code and the value of the R^2. 971 | 972 | 973 | ```python 974 | U = df[['sqft_living']] 975 | V = df['price'] 976 | lm.fit(U,V) 977 | lm.score(U,V) 978 | ``` 979 | 980 | 981 | 982 | 983 | 0.49285321790379316 984 | 985 | 986 | 987 | ### Question 7 988 | Fit a linear regression model to predict the 'price' using the list of features: 989 | 990 | 991 | ```python 992 | features =["floors", "waterfront","lat" ,"bedrooms" ,"sqft_basement" ,"view" ,"bathrooms","sqft_living15","sqft_above","grade","sqft_living"] 993 | X = df[features] 994 | Y = df['price'] 995 | lm.fit(X,Y) 996 | ``` 997 | 998 | 999 | 1000 | 1001 | LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, 1002 | normalize=False) 1003 | 1004 | 1005 | 1006 | the calculate the R^2. Take a screenshot of your code 1007 | 1008 | 1009 | ```python 1010 | lm.score(X,Y) 1011 | ``` 1012 | 1013 | 1014 | 1015 | 1016 | 0.6576951666037504 1017 | 1018 | 1019 | 1020 | #### this will help with Question 8 1021 | 1022 | Create a list of tuples, the first element in the tuple contains the name of the estimator: 1023 | 1024 | 'scale' 1025 | 1026 | 'polynomial' 1027 | 1028 | 'model' 1029 | 1030 | The second element in the tuple contains the model constructor 1031 | 1032 | StandardScaler() 1033 | 1034 | PolynomialFeatures(include_bias=False) 1035 | 1036 | LinearRegression() 1037 | 1038 | 1039 | 1040 | ```python 1041 | Input=[('scale',StandardScaler()),('polynomial', PolynomialFeatures(include_bias=False)),('model',LinearRegression())] 1042 | ``` 1043 | 1044 | ### Question 8 1045 | Use the list to create a pipeline object, predict the 'price', fit the object using the features in the list features , then fit the model and calculate the R^2 1046 | 1047 | 1048 | ```python 1049 | pipe=Pipeline(Input) 1050 | pipe 1051 | ``` 1052 | 1053 | 1054 | 1055 | 1056 | Pipeline(memory=None, 1057 | steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)), ('polynomial', PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)), ('model', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, 1058 | normalize=False))]) 1059 | 1060 | 1061 | 1062 | 1063 | ```python 1064 | pipe.fit(X,Y) 1065 | ``` 1066 | 1067 | /opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/preprocessing/data.py:645: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler. 1068 | return self.partial_fit(X, y) 1069 | /opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/base.py:467: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler. 1070 | return self.fit(X, y, **fit_params).transform(X) 1071 | 1072 | 1073 | 1074 | 1075 | 1076 | Pipeline(memory=None, 1077 | steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)), ('polynomial', PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)), ('model', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, 1078 | normalize=False))]) 1079 | 1080 | 1081 | 1082 | 1083 | ```python 1084 | pipe.score(X,Y) 1085 | ``` 1086 | 1087 | /opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/pipeline.py:511: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler. 1088 | Xt = transform.transform(Xt) 1089 | 1090 | 1091 | 1092 | 1093 | 1094 | 0.7513427797293394 1095 | 1096 | 1097 | 1098 | # Module 5: MODEL EVALUATION AND REFINEMENT 1099 | 1100 | import the necessary modules 1101 | 1102 | 1103 | ```python 1104 | from sklearn.model_selection import cross_val_score 1105 | from sklearn.model_selection import train_test_split 1106 | print("done") 1107 | ``` 1108 | 1109 | done 1110 | 1111 | 1112 | we will split the data into training and testing set 1113 | 1114 | 1115 | ```python 1116 | features =["floors", "waterfront","lat" ,"bedrooms" ,"sqft_basement" ,"view" ,"bathrooms","sqft_living15","sqft_above","grade","sqft_living"] 1117 | X = df[features ] 1118 | Y = df['price'] 1119 | 1120 | x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=1) 1121 | 1122 | 1123 | print("number of test samples :", x_test.shape[0]) 1124 | print("number of training samples:",x_train.shape[0]) 1125 | ``` 1126 | 1127 | number of test samples : 3242 1128 | number of training samples: 18371 1129 | 1130 | 1131 | ### Question 9 1132 | Create and fit a Ridge regression object using the training data, setting the regularization parameter to 0.1 and calculate the R^2 using the test data. 1133 | 1134 | 1135 | 1136 | ```python 1137 | from sklearn.linear_model import Ridge 1138 | ``` 1139 | 1140 | 1141 | ```python 1142 | RigeModel=Ridge(alpha=0.1) 1143 | RigeModel.fit(x_train, y_train) 1144 | RigeModel.score(x_test, y_test) 1145 | ``` 1146 | 1147 | 1148 | 1149 | 1150 | 0.6478759163939111 1151 | 1152 | 1153 | 1154 | ### Question 10 1155 | Perform a second order polynomial transform on both the training data and testing data. Create and fit a Ridge regression object using the training data, setting the regularisation parameter to 0.1. Calculate the R^2 utilising the test data provided. Take a screenshot of your code and the R^2. 1156 | 1157 | 1158 | ```python 1159 | pr = PolynomialFeatures(degree=2) 1160 | x_train_pr = pr.fit_transform(x_train) 1161 | x_test_pr = pr.fit_transform(x_test) 1162 | 1163 | RigeModel=Ridge(alpha=0.1) 1164 | RigeModel.fit(x_train_pr, y_train) 1165 | RigeModel.score(x_test_pr, y_test) 1166 | ``` 1167 | 1168 | 1169 | 1170 | 1171 | 0.7002744268659787 1172 | 1173 | 1174 | 1175 |

About this Project:

1176 | 1177 | This project is part of a graded excercise in "Data Analysis Using Python" course on Coursera offered by IBM 1178 | 1179 | 1180 | 1181 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # House-sale-price-prediction-using-python 2 | This project analyzes and predicts housing sale price based on features such as square footage, number of bedrooms, views, locations, etc. It uses the dataset of house sale prices for King County, USA, including home sales between May 2014 and May 2015. 3 | 4 | It uses python codes to do data-cleaning, analyse data and create models for price prediction, evaluate and refine models. Major activities covered include: 5 | - Numerical representation of data using correlation, linear and polynomial regression, R-Squared values, etc 6 | - Graphical representation of data using boxplot, and seaborn's regplot. 7 | - Model refinement suing ridge regression object. 8 | - Polynomial transform of training and test data, etc. 9 | -------------------------------------------------------------------------------- /output_29_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calistus-igwilo/House-sale-price-prediction-using-python/23d5b4735e772ff99a759b5548f76355ed63346d/output_29_1.png -------------------------------------------------------------------------------- /output_31_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calistus-igwilo/House-sale-price-prediction-using-python/23d5b4735e772ff99a759b5548f76355ed63346d/output_31_1.png --------------------------------------------------------------------------------