├── README.md ├── RandomForestRegressor(400).ipynb ├── RandomForestRegressor-40.ipynb ├── SVR(linear).ipynb ├── SVR(sigmoid).ipynb └── notebook containing correlation matrix_minor2_covarep.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Depression-Detection-using-Speech 2 | Though the social media provides a means to capture an individual’s present state of mind, but sometimes a person’s feeling or thoughts might depend on one or other indirect causes and thus this data can’t solely be used for depression detection. Therefore, we extended our approach by further analysing the audio features of each individual being interviewed by a virtual interviewer called Ellie, controlled by a human interviewer in another room. Each of the session lasted for an average of 16 minutes. Also, prior to the interview, each participant completed a psychiatric questionnaire (PHQ-8) [5], from which a binary "truth" classification (depressed, not depressed) was derived. It was implemented in following phases: 3 | 4 | • Data Collection: The data collected include audio and video recordings and extensive questionnaire responses. Data has been transcribed and annotated for a variety of verbal and non-verbal features. All audio recordings and associated depression metrics were provided by the DAIC-WOZ Database [9], which was compiled by USC's Institute of Creative Technologies and released as part of the 2016 Audio/Visual Emotional Challenge and Workshop (AVEC 2016). The dataset consists of 189 sessions. The audio data provided by AVEC consisted of an audio file of entire interview with the participant, pre-extracted features using the COVAREP toolbox at 10-ms intervals over the entire recording (F0, NAQ, QOQ, H1H2, PSP, MDQ, peak Slope, Rd, Rd conf, MCEP 0-24, HMPDM 1-24, HMPDD 1-12, and Formants 1-3), Transcript file containing speaking times and values of participant and the virtual interviewer, and a formant file. 5 | 6 | • Data Pre-processing: Data pre-processing has been done for removing the rows that are having 50% or more of the values as zeroes as it is of no use. Further, PHQ-8 score column is added in each of the files for training of the model. 7 | 8 | • Handling Large Data: Data has been divided into 11 separate folders for training purpose. Each folder is separately trained on the model and overall results are averaged for testing purpose. Only 10 percent of the data has been utilised for training purposes which was randomly selected from the data of every patient. Also, the data types of data are reduced from 64- bit float to 32-bit float. After training every model data frame are removed to evacuate the space so that enough space is available on RAM for the training purpose of each of the folders generated. 9 | 10 | • Data Imbalance: Data pre-processing has been done for removing the rows with having 50% or more of the values as zeroes as it is of no use. Further,189 sessions of interactions were ranging between 7-33min (with an average of 16min) therefore, biasing can occur. Also, a larger volume of signal from an individual may emphasize some characteristics that may be specific to a person. In the data-set, the number of non-depressed subjects is about four times larger than that of depressed ones. To rectify this imbalance, undersampling has been done on the dataset. 11 | 12 | Data Correlation: Correlation matrix is generated to identify the relationship between the different audio features and how they can impact each other. The correlation coefficient values obtained lies between 0 and 0.4 (Fig-6.7). This indicates that features are independent of each other. Further, the impact of each of the features on the target variable of score prediction is also analysed (Fig. 6.8). 13 | 14 | • Classification: Using Support Vector Regression (kernel-sigmoid), Support Vector Regressor with (kernel linear), Random Forest Regressor with 40 estimators and Random Forest Regressor with 400 estimators, for predicting the PHQ-8 score of the tested participants against the proposed model. Further, for analysing the accuracy of the model the hypothesis is considered that person with depression scale value>=10 is depressed and otherwise not depressed. The binary classification column is added addressing the depressed person as “1”.The measure of how well the model performs was taken as the similarity in results of the model implemented and the questionnaire. If majority of the answers in the questionnaire in a domain were marked” yes” and the model too gave a higher value for that category, then the case was taken as positives. Any conflict resulted in a negative case. The models have been evaluated by measuring the root mean square error and mean absolute error when predicting the PHQ8 score of a patient 15 | 16 | 17 | MODEL Accuracy MAE(Mean Absolute Error) RMSE(Root Mean Square Error) 18 | SVR with Linear kernel 71.8% 5.403 5.434 19 | SVR with Sigmoid Kernel 71.8% 5.394 5.413 20 | Random Forest 40 71.8% 6.2117 6.2514 21 | Random Forest 400 71.8% 6.233 6.274 22 | 23 | -------------------------------------------------------------------------------- /RandomForestRegressor(400).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import matplotlib.pyplot as plt\n", 11 | "from sklearn.svm import SVR\n", 12 | "from sklearn.metrics import roc_curve\n", 13 | "from sklearn.metrics import auc\n", 14 | "from sklearn.model_selection import train_test_split \n", 15 | "from sklearn.metrics import classification_report, confusion_matrix \n", 16 | "from sklearn.metrics import accuracy_score\n", 17 | "import os\n", 18 | "from statistics import *\n", 19 | "from sklearn.ensemble import RandomForestRegressor" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 2, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "import pandas as pd\n", 29 | "import random \n", 30 | "import glob \n", 31 | "\n", 32 | "path = r'D:\\New folder\\1' # use your path\n", 33 | "\n", 34 | "all_files = glob.glob(path + \"/*.csv\")\n", 35 | "\n", 36 | "li = []\n", 37 | "\n", 38 | "for filename in all_files:\n", 39 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 40 | " s = n//10 # sample size of 10%\n", 41 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 42 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 43 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 44 | " li.append(df)\n", 45 | "\n", 46 | "frame1 = pd.concat(li, axis=0, ignore_index=True)\n", 47 | "frame1=frame1.astype('float32')\n", 48 | "df=frame1.copy()\n", 49 | "x=frame1.iloc[:,:]\n", 50 | "#x.drop(['BW'], axis=1)\n", 51 | "#y=frame[frame.columns[-1]]\n", 52 | "y=frame1.iloc[:,-1]\n", 53 | "del x['BW']\n", 54 | "models = []\n", 55 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 56 | "rf.fit(x,y)\n", 57 | "models.append(rf)" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 3, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "import gc\n", 67 | "del [[frame1,li,df]]\n", 68 | "frame1=pd.DataFrame()" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 4, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "path = r'D:\\New folder\\2' # use your path\n", 78 | "\n", 79 | "all_files = glob.glob(path + \"/*.csv\")\n", 80 | "\n", 81 | "li = []\n", 82 | "\n", 83 | "for filename in all_files:\n", 84 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 85 | " s = n//10 # sample size of 10%\n", 86 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 87 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 88 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 89 | " li.append(df)\n", 90 | "\n", 91 | "frame2 = pd.concat(li, axis=0, ignore_index=True)\n", 92 | "frame2=frame2.astype('float32')\n", 93 | "df=frame2.copy()\n", 94 | "x=frame2.iloc[:,:]\n", 95 | "y=frame2.iloc[:,-1]\n", 96 | "del x['BW']\n", 97 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 98 | "rf.fit(x,y)\n", 99 | "models.append(rf)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 5, 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "import gc\n", 109 | "del [[frame2,li,df]]\n", 110 | "frame2=pd.DataFrame()" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 6, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "path = r'D:\\New folder\\3' # use your path\n", 120 | "\n", 121 | "all_files = glob.glob(path + \"/*.csv\")\n", 122 | "\n", 123 | "li = []\n", 124 | "\n", 125 | "for filename in all_files:\n", 126 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 127 | " s = n//10 # sample size of 10%\n", 128 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 129 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 130 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 131 | " li.append(df)\n", 132 | "\n", 133 | "frame3 = pd.concat(li, axis=0, ignore_index=True)\n", 134 | "frame3=frame3.astype('float32')\n", 135 | "df=frame3.copy()\n", 136 | "x=frame3.iloc[:,:]\n", 137 | "y=frame3.iloc[:,-1]\n", 138 | "del x['BW']\n", 139 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 140 | "rf.fit(x,y)\n", 141 | "models.append(rf)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 7, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "import gc\n", 151 | "del [[frame3,li,df]]\n", 152 | "frame3=pd.DataFrame()" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 8, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "path = r'D:\\New folder\\4' # use your path\n", 162 | "\n", 163 | "all_files = glob.glob(path + \"/*.csv\")\n", 164 | "\n", 165 | "li = []\n", 166 | "\n", 167 | "for filename in all_files:\n", 168 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 169 | " s = n//10 # sample size of 10%\n", 170 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 171 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 172 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 173 | " li.append(df)\n", 174 | "\n", 175 | "frame4 = pd.concat(li, axis=0, ignore_index=True)\n", 176 | "frame4=frame4.astype('float32')\n", 177 | "df=frame4.copy()\n", 178 | "x=frame4.iloc[:,:]\n", 179 | "y=frame4.iloc[:,-1]\n", 180 | "del x['BW']\n", 181 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 182 | "rf.fit(x,y)\n", 183 | "models.append(rf)" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 9, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "import gc\n", 193 | "del [[frame4,li,df]]\n", 194 | "frame4=pd.DataFrame()" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 10, 200 | "metadata": {}, 201 | "outputs": [], 202 | "source": [ 203 | "path = r'D:\\New folder\\5' # use your path\n", 204 | "\n", 205 | "all_files = glob.glob(path + \"/*.csv\")\n", 206 | "\n", 207 | "li = []\n", 208 | "\n", 209 | "for filename in all_files:\n", 210 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 211 | " s = n//10 # sample size of 10%\n", 212 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 213 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 214 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 215 | " li.append(df)\n", 216 | "\n", 217 | "frame5 = pd.concat(li, axis=0, ignore_index=True)\n", 218 | "frame5=frame5.astype('float32')\n", 219 | "df=frame5.copy()\n", 220 | "x=frame5.iloc[:,:]\n", 221 | "y=frame5.iloc[:,-1]\n", 222 | "del x['BW']\n", 223 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 224 | "rf.fit(x,y)\n", 225 | "models.append(rf)" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 11, 231 | "metadata": {}, 232 | "outputs": [], 233 | "source": [ 234 | "import gc\n", 235 | "del [[frame5,li,df]]\n", 236 | "frame5=pd.DataFrame()" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 12, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "path = r'D:\\New folder\\6' # use your path\n", 246 | "\n", 247 | "all_files = glob.glob(path + \"/*.csv\")\n", 248 | "\n", 249 | "li = []\n", 250 | "\n", 251 | "for filename in all_files:\n", 252 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 253 | " s = n//10 # sample size of 10%\n", 254 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 255 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 256 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 257 | " li.append(df)\n", 258 | "\n", 259 | "frame6 = pd.concat(li, axis=0, ignore_index=True)\n", 260 | "frame6=frame6.astype('float32')\n", 261 | "df=frame6.copy()\n", 262 | "x=frame6.iloc[:,:]\n", 263 | "y=frame6.iloc[:,-1]\n", 264 | "del x['BW']\n", 265 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 266 | "rf.fit(x,y)\n", 267 | "models.append(rf)" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 13, 273 | "metadata": {}, 274 | "outputs": [], 275 | "source": [ 276 | "import gc\n", 277 | "del [[frame6,li,df]]\n", 278 | "frame6=pd.DataFrame()" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 14, 284 | "metadata": {}, 285 | "outputs": [], 286 | "source": [ 287 | "path = r'D:\\New folder\\7' # use your path\n", 288 | "\n", 289 | "all_files = glob.glob(path + \"/*.csv\")\n", 290 | "\n", 291 | "li = []\n", 292 | "\n", 293 | "for filename in all_files:\n", 294 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 295 | " s = n//10 # sample size of 10%\n", 296 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 297 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 298 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 299 | " li.append(df)\n", 300 | "\n", 301 | "frame7 = pd.concat(li, axis=0, ignore_index=True)\n", 302 | "frame7=frame7.astype('float32')\n", 303 | "df=frame7.copy()\n", 304 | "x=frame7.iloc[:,:]\n", 305 | "y=frame7.iloc[:,-1]\n", 306 | "del x['BW']\n", 307 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 308 | "rf.fit(x,y)\n", 309 | "models.append(rf)" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": 15, 315 | "metadata": {}, 316 | "outputs": [], 317 | "source": [ 318 | "import gc\n", 319 | "del [[frame7,li,df]]\n", 320 | "frame7=pd.DataFrame()" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 16, 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [ 329 | "path = r'D:\\New folder\\8' # use your path\n", 330 | "\n", 331 | "all_files = glob.glob(path + \"/*.csv\")\n", 332 | "\n", 333 | "li = []\n", 334 | "\n", 335 | "for filename in all_files:\n", 336 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 337 | " s = n//10 # sample size of 10%\n", 338 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 339 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 340 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 341 | " li.append(df)\n", 342 | "\n", 343 | "frame8 = pd.concat(li, axis=0, ignore_index=True)\n", 344 | "frame8=frame8.astype('float32')\n", 345 | "df=frame8.copy()\n", 346 | "x=frame8.iloc[:,:]\n", 347 | "y=frame8.iloc[:,-1]\n", 348 | "del x['BW']\n", 349 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 350 | "rf.fit(x,y)\n", 351 | "models.append(rf)" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 17, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "import gc\n", 361 | "del [[frame8,li,df]]\n", 362 | "frame8=pd.DataFrame()" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": 18, 368 | "metadata": {}, 369 | "outputs": [], 370 | "source": [ 371 | "path = r'D:\\New folder\\9' # use your path\n", 372 | "\n", 373 | "all_files = glob.glob(path + \"/*.csv\")\n", 374 | "\n", 375 | "li = []\n", 376 | "\n", 377 | "for filename in all_files:\n", 378 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 379 | " s = n//10 # sample size of 10%\n", 380 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 381 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 382 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 383 | " li.append(df)\n", 384 | "\n", 385 | "frame9 = pd.concat(li, axis=0, ignore_index=True)\n", 386 | "frame9=frame9.astype('float32')\n", 387 | "df=frame9.copy()\n", 388 | "x=frame9.iloc[:,:]\n", 389 | "y=frame9.iloc[:,-1]\n", 390 | "del x['BW']\n", 391 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 392 | "rf.fit(x,y)\n", 393 | "models.append(rf)" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 19, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [ 402 | "import gc\n", 403 | "del [[frame9,li,df]]\n", 404 | "frame9=pd.DataFrame()" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 20, 410 | "metadata": {}, 411 | "outputs": [], 412 | "source": [ 413 | "path = r'D:\\New folder\\10' # use your path\n", 414 | "\n", 415 | "all_files = glob.glob(path + \"/*.csv\")\n", 416 | "\n", 417 | "li = []\n", 418 | "\n", 419 | "for filename in all_files:\n", 420 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 421 | " s = n//10 # sample size of 10%\n", 422 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 423 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 424 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 425 | " li.append(df)\n", 426 | "\n", 427 | "frame10 = pd.concat(li, axis=0, ignore_index=True)\n", 428 | "frame10=frame10.astype('float32')\n", 429 | "df=frame10.copy()\n", 430 | "x=frame10.iloc[:,:]\n", 431 | "y=frame10.iloc[:,-1]\n", 432 | "del x['BW']\n", 433 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 434 | "rf.fit(x,y)\n", 435 | "models.append(rf)" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": 21, 441 | "metadata": {}, 442 | "outputs": [], 443 | "source": [ 444 | "import gc\n", 445 | "del [[frame10,li,df]]\n", 446 | "frame10=pd.DataFrame()" 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": 22, 452 | "metadata": {}, 453 | "outputs": [], 454 | "source": [ 455 | "path = r'D:\\New folder\\11' # use your path\n", 456 | "\n", 457 | "all_files = glob.glob(path + \"/*.csv\")\n", 458 | "\n", 459 | "li = []\n", 460 | "\n", 461 | "for filename in all_files:\n", 462 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 463 | " s = n//10 # sample size of 10%\n", 464 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 465 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 466 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 467 | " li.append(df)\n", 468 | "\n", 469 | "frame11 = pd.concat(li, axis=0, ignore_index=True)\n", 470 | "frame11=frame11.astype('float32')\n", 471 | "df=frame11.copy()\n", 472 | "x=frame11.iloc[:,:]\n", 473 | "y=frame11.iloc[:,-1]\n", 474 | "del x['BW']\n", 475 | "rf = RandomForestRegressor(n_estimators = 400, random_state = 42)\n", 476 | "rf.fit(x,y)\n", 477 | "models.append(rf)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 23, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "import gc\n", 487 | "del [[frame11,li,df]]\n", 488 | "frame11=pd.DataFrame()" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 24, 494 | "metadata": {}, 495 | "outputs": [ 496 | { 497 | "name": "stdout", 498 | "output_type": "stream", 499 | "text": [ 500 | "Accuracy when using Random Forest Regression is 71.7948717948718\n" 501 | ] 502 | } 503 | ], 504 | "source": [ 505 | "path = r'D:\\Training' # use your path\n", 506 | "all_files = glob.glob(path + \"/*.csv\")\n", 507 | "y_p=[]\n", 508 | "y_pid=[]\n", 509 | "for filename in all_files:\n", 510 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 511 | " s = n//10 # sample size of 10%\n", 512 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 513 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 514 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 515 | " x=df.iloc[:,:]\n", 516 | "#y=frame[frame.columns[-1]]\n", 517 | " sum1=0\n", 518 | " count=0\n", 519 | " for model in models:\n", 520 | " count=count+1\n", 521 | " y_pred=model.predict(x)\n", 522 | " sum1=sum1+y_pred.mean()\n", 523 | " base=os.path.basename(filename)\n", 524 | " x=os.path.splitext(base)[0]\n", 525 | " x = x.split(\"_\")\n", 526 | " k=sum1/count\n", 527 | " if k<10:\n", 528 | " y_p=np.append(y_p,[0])\n", 529 | " y_pid=np.append(y_pid,[k])\n", 530 | " else:\n", 531 | " y_p=np.append(y_p,[1])\n", 532 | " y_pid=np.append(y_pid,[k])\n", 533 | "y_p\n", 534 | "filename=r'C:\\Users\\Ashutosh\\Downloads\\full_test_split.csv'\n", 535 | "#filename=r'E:\\Book.csv'\n", 536 | "dframe = pd.read_csv(filename, index_col=None,header=0)\n", 537 | "y_test=dframe[['PHQ_Binary']]\n", 538 | "y_test\n", 539 | "print (\"Accuracy when using Random Forest Regression is \", accuracy_score(y_test,y_p.round(), normalize=True)*100)" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": 27, 545 | "metadata": {}, 546 | "outputs": [ 547 | { 548 | "name": "stdout", 549 | "output_type": "stream", 550 | "text": [ 551 | "RMSE using Random Forest Regressor with 400 n_estimators is 6.246960537236372\n" 552 | ] 553 | } 554 | ], 555 | "source": [ 556 | "from sklearn.metrics import mean_squared_error\n", 557 | "from math import sqrt\n", 558 | "\n", 559 | "rms = sqrt(mean_squared_error(y_test, y_pid))\n", 560 | "print (\"RMSE using Random Forest Regressor with 400 n_estimators is \",rms) " 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": 29, 566 | "metadata": {}, 567 | "outputs": [ 568 | { 569 | "name": "stdout", 570 | "output_type": "stream", 571 | "text": [ 572 | "Mean_Absolute_Error using Random Forest Regressor with 400 n_estimators is: 6.207468415346345\n" 573 | ] 574 | } 575 | ], 576 | "source": [ 577 | "from sklearn.metrics import mean_absolute_error\n", 578 | "a=mean_absolute_error(y_test, y_pid)\n", 579 | "print(\"Mean_Absolute_Error using Random Forest Regressor with 400 n_estimators is:\",a)" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": null, 585 | "metadata": {}, 586 | "outputs": [], 587 | "source": [] 588 | } 589 | ], 590 | "metadata": { 591 | "kernelspec": { 592 | "display_name": "Python 3", 593 | "language": "python", 594 | "name": "python3" 595 | }, 596 | "language_info": { 597 | "codemirror_mode": { 598 | "name": "ipython", 599 | "version": 3 600 | }, 601 | "file_extension": ".py", 602 | "mimetype": "text/x-python", 603 | "name": "python", 604 | "nbconvert_exporter": "python", 605 | "pygments_lexer": "ipython3", 606 | "version": "3.7.3" 607 | } 608 | }, 609 | "nbformat": 4, 610 | "nbformat_minor": 2 611 | } 612 | -------------------------------------------------------------------------------- /RandomForestRegressor-40.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import matplotlib.pyplot as plt\n", 11 | "from sklearn.svm import SVR\n", 12 | "from sklearn.metrics import roc_curve\n", 13 | "from sklearn.metrics import auc\n", 14 | "from sklearn.model_selection import train_test_split \n", 15 | "from sklearn.metrics import classification_report, confusion_matrix \n", 16 | "from sklearn.metrics import accuracy_score\n", 17 | "import os\n", 18 | "from statistics import *\n", 19 | "from sklearn.ensemble import RandomForestRegressor" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 2, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "import pandas as pd\n", 29 | "import random \n", 30 | "import glob \n", 31 | "\n", 32 | "path = r'D:\\New folder\\1' # use your path\n", 33 | "\n", 34 | "all_files = glob.glob(path + \"/*.csv\")\n", 35 | "\n", 36 | "li = []\n", 37 | "\n", 38 | "for filename in all_files:\n", 39 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 40 | " s = n//10 # sample size of 10%\n", 41 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 42 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 43 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 44 | " li.append(df)\n", 45 | "\n", 46 | "frame1 = pd.concat(li, axis=0, ignore_index=True)\n", 47 | "frame1=frame1.astype('float32')\n", 48 | "df=frame1.copy()\n", 49 | "x=frame1.iloc[:,:]\n", 50 | "#x.drop(['BW'], axis=1)\n", 51 | "#y=frame[frame.columns[-1]]\n", 52 | "y=frame1.iloc[:,-1]\n", 53 | "del x['BW']\n", 54 | "models = []\n", 55 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 56 | "rf.fit(x,y)\n", 57 | "models.append(rf)" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 3, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "import gc\n", 67 | "del [[frame1,li,df]]\n", 68 | "frame1=pd.DataFrame()" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 4, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "path = r'D:\\New folder\\2' # use your path\n", 78 | "\n", 79 | "all_files = glob.glob(path + \"/*.csv\")\n", 80 | "\n", 81 | "li = []\n", 82 | "\n", 83 | "for filename in all_files:\n", 84 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 85 | " s = n//10 # sample size of 10%\n", 86 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 87 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 88 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 89 | " li.append(df)\n", 90 | "\n", 91 | "frame2 = pd.concat(li, axis=0, ignore_index=True)\n", 92 | "frame2=frame2.astype('float32')\n", 93 | "df=frame2.copy()\n", 94 | "x=frame2.iloc[:,:]\n", 95 | "y=frame2.iloc[:,-1]\n", 96 | "del x['BW']\n", 97 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 98 | "rf.fit(x,y)\n", 99 | "models.append(rf)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 5, 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "import gc\n", 109 | "del [[frame2,li,df]]\n", 110 | "frame2=pd.DataFrame()" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 6, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "path = r'D:\\New folder\\3' # use your path\n", 120 | "\n", 121 | "all_files = glob.glob(path + \"/*.csv\")\n", 122 | "\n", 123 | "li = []\n", 124 | "\n", 125 | "for filename in all_files:\n", 126 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 127 | " s = n//100 # sample size of 10%\n", 128 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 129 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 130 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 131 | " li.append(df)\n", 132 | "\n", 133 | "frame3 = pd.concat(li, axis=0, ignore_index=True)\n", 134 | "frame3=frame3.astype('float32')\n", 135 | "df=frame3.copy()\n", 136 | "x=frame3.iloc[:,:]\n", 137 | "y=frame3.iloc[:,-1]\n", 138 | "del x['BW']\n", 139 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 140 | "rf.fit(x,y)\n", 141 | "models.append(rf)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 7, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "import gc\n", 151 | "del [[frame3,li,df]]\n", 152 | "frame3=pd.DataFrame()" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 8, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "path = r'D:\\New folder\\4' # use your path\n", 162 | "\n", 163 | "all_files = glob.glob(path + \"/*.csv\")\n", 164 | "\n", 165 | "li = []\n", 166 | "\n", 167 | "for filename in all_files:\n", 168 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 169 | " s = n//10 # sample size of 10%\n", 170 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 171 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 172 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 173 | " li.append(df)\n", 174 | "\n", 175 | "frame4 = pd.concat(li, axis=0, ignore_index=True)\n", 176 | "frame4=frame4.astype('float32')\n", 177 | "df=frame4.copy()\n", 178 | "x=frame4.iloc[:,:]\n", 179 | "y=frame4.iloc[:,-1]\n", 180 | "del x['BW']\n", 181 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 182 | "rf.fit(x,y)\n", 183 | "models.append(rf)" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 9, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "import gc\n", 193 | "del [[frame4,li,df]]\n", 194 | "frame4=pd.DataFrame()" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 10, 200 | "metadata": {}, 201 | "outputs": [], 202 | "source": [ 203 | "path = r'D:\\New folder\\5' # use your path\n", 204 | "\n", 205 | "all_files = glob.glob(path + \"/*.csv\")\n", 206 | "\n", 207 | "li = []\n", 208 | "\n", 209 | "for filename in all_files:\n", 210 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 211 | " s = n//10 # sample size of 10%\n", 212 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 213 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 214 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 215 | " li.append(df)\n", 216 | "\n", 217 | "frame5 = pd.concat(li, axis=0, ignore_index=True)\n", 218 | "frame5=frame5.astype('float32')\n", 219 | "df=frame5.copy()\n", 220 | "x=frame5.iloc[:,:]\n", 221 | "y=frame5.iloc[:,-1]\n", 222 | "del x['BW']\n", 223 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 224 | "rf.fit(x,y)\n", 225 | "models.append(rf)" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 11, 231 | "metadata": {}, 232 | "outputs": [], 233 | "source": [ 234 | "import gc\n", 235 | "del [[frame5,li,df]]\n", 236 | "frame5=pd.DataFrame()" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 12, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "path = r'D:\\New folder\\6' # use your path\n", 246 | "\n", 247 | "all_files = glob.glob(path + \"/*.csv\")\n", 248 | "\n", 249 | "li = []\n", 250 | "\n", 251 | "for filename in all_files:\n", 252 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 253 | " s = n//10 # sample size of 10%\n", 254 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 255 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 256 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 257 | " li.append(df)\n", 258 | "\n", 259 | "frame6 = pd.concat(li, axis=0, ignore_index=True)\n", 260 | "frame6=frame6.astype('float32')\n", 261 | "df=frame6.copy()\n", 262 | "x=frame6.iloc[:,:]\n", 263 | "y=frame6.iloc[:,-1]\n", 264 | "del x['BW']\n", 265 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 266 | "rf.fit(x,y)\n", 267 | "models.append(rf)" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 13, 273 | "metadata": {}, 274 | "outputs": [], 275 | "source": [ 276 | "import gc\n", 277 | "del [[frame6,li,df]]\n", 278 | "frame6=pd.DataFrame()" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 14, 284 | "metadata": {}, 285 | "outputs": [], 286 | "source": [ 287 | "path = r'D:\\New folder\\7' # use your path\n", 288 | "\n", 289 | "all_files = glob.glob(path + \"/*.csv\")\n", 290 | "\n", 291 | "li = []\n", 292 | "\n", 293 | "for filename in all_files:\n", 294 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 295 | " s = n//10 # sample size of 10%\n", 296 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 297 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 298 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 299 | " li.append(df)\n", 300 | "\n", 301 | "frame7 = pd.concat(li, axis=0, ignore_index=True)\n", 302 | "frame7=frame7.astype('float32')\n", 303 | "df=frame7.copy()\n", 304 | "x=frame7.iloc[:,:]\n", 305 | "y=frame7.iloc[:,-1]\n", 306 | "del x['BW']\n", 307 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 308 | "rf.fit(x,y)\n", 309 | "models.append(rf)" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": 15, 315 | "metadata": {}, 316 | "outputs": [], 317 | "source": [ 318 | "import gc\n", 319 | "del [[frame7,li,df]]\n", 320 | "frame7=pd.DataFrame()" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 16, 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [ 329 | "path = r'D:\\New folder\\8' # use your path\n", 330 | "\n", 331 | "all_files = glob.glob(path + \"/*.csv\")\n", 332 | "\n", 333 | "li = []\n", 334 | "\n", 335 | "for filename in all_files:\n", 336 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 337 | " s = n//10 # sample size of 10%\n", 338 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 339 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 340 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 341 | " li.append(df)\n", 342 | "\n", 343 | "frame8 = pd.concat(li, axis=0, ignore_index=True)\n", 344 | "frame8=frame8.astype('float32')\n", 345 | "df=frame8.copy()\n", 346 | "x=frame8.iloc[:,:]\n", 347 | "y=frame8.iloc[:,-1]\n", 348 | "del x['BW']\n", 349 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 350 | "rf.fit(x,y)\n", 351 | "models.append(rf)" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 17, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "import gc\n", 361 | "del [[frame8,li,df]]\n", 362 | "frame8=pd.DataFrame()" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": 18, 368 | "metadata": {}, 369 | "outputs": [], 370 | "source": [ 371 | "path = r'D:\\New folder\\9' # use your path\n", 372 | "\n", 373 | "all_files = glob.glob(path + \"/*.csv\")\n", 374 | "\n", 375 | "li = []\n", 376 | "\n", 377 | "for filename in all_files:\n", 378 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 379 | " s = n//10 # sample size of 10%\n", 380 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 381 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 382 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 383 | " li.append(df)\n", 384 | "\n", 385 | "frame9 = pd.concat(li, axis=0, ignore_index=True)\n", 386 | "frame9=frame9.astype('float32')\n", 387 | "df=frame9.copy()\n", 388 | "x=frame9.iloc[:,:]\n", 389 | "y=frame9.iloc[:,-1]\n", 390 | "del x['BW']\n", 391 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 392 | "rf.fit(x,y)\n", 393 | "models.append(rf)" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 19, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [ 402 | "import gc\n", 403 | "del [[frame9,li,df]]\n", 404 | "frame9=pd.DataFrame()" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 20, 410 | "metadata": {}, 411 | "outputs": [], 412 | "source": [ 413 | "path = r'D:\\New folder\\10' # use your path\n", 414 | "\n", 415 | "all_files = glob.glob(path + \"/*.csv\")\n", 416 | "\n", 417 | "li = []\n", 418 | "\n", 419 | "for filename in all_files:\n", 420 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 421 | " s = n//10 # sample size of 10%\n", 422 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 423 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 424 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 425 | " li.append(df)\n", 426 | "\n", 427 | "frame10 = pd.concat(li, axis=0, ignore_index=True)\n", 428 | "frame10=frame10.astype('float32')\n", 429 | "df=frame10.copy()\n", 430 | "x=frame10.iloc[:,:]\n", 431 | "y=frame10.iloc[:,-1]\n", 432 | "del x['BW']\n", 433 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 434 | "rf.fit(x,y)\n", 435 | "models.append(rf)" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": 21, 441 | "metadata": {}, 442 | "outputs": [], 443 | "source": [ 444 | "import gc\n", 445 | "del [[frame10,li,df]]\n", 446 | "frame10=pd.DataFrame()" 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": 22, 452 | "metadata": {}, 453 | "outputs": [], 454 | "source": [ 455 | "path = r'D:\\New folder\\11' # use your path\n", 456 | "\n", 457 | "all_files = glob.glob(path + \"/*.csv\")\n", 458 | "\n", 459 | "li = []\n", 460 | "\n", 461 | "for filename in all_files:\n", 462 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 463 | " s = n//10 # sample size of 10%\n", 464 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 465 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 466 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 467 | " li.append(df)\n", 468 | "\n", 469 | "frame11 = pd.concat(li, axis=0, ignore_index=True)\n", 470 | "frame11=frame11.astype('float32')\n", 471 | "df=frame11.copy()\n", 472 | "x=frame11.iloc[:,:]\n", 473 | "y=frame11.iloc[:,-1]\n", 474 | "del x['BW']\n", 475 | "rf = RandomForestRegressor(n_estimators = 40, random_state = 42)\n", 476 | "rf.fit(x,y)\n", 477 | "models.append(rf)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 23, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "import gc\n", 487 | "del [[frame11,li,df]]\n", 488 | "frame11=pd.DataFrame()" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 25, 494 | "metadata": {}, 495 | "outputs": [ 496 | { 497 | "name": "stdout", 498 | "output_type": "stream", 499 | "text": [ 500 | "Accuracy when using Random Forest Regression is 71.7948717948718\n" 501 | ] 502 | } 503 | ], 504 | "source": [ 505 | "path = r'D:\\Training' # use your path\n", 506 | "all_files = glob.glob(path + \"/*.csv\")\n", 507 | "y_p=[]\n", 508 | "y_pid=[]\n", 509 | "for filename in all_files:\n", 510 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 511 | " s = n//10 # sample size of 10%\n", 512 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 513 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 514 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 515 | " x=df.iloc[:,:]\n", 516 | "#y=frame[frame.columns[-1]]\n", 517 | " sum1=0\n", 518 | " count=0\n", 519 | " for model in models:\n", 520 | " count=count+1\n", 521 | " y_pred=model.predict(x)\n", 522 | " sum1=sum1+y_pred.mean()\n", 523 | " base=os.path.basename(filename)\n", 524 | " x=os.path.splitext(base)[0]\n", 525 | " x = x.split(\"_\")\n", 526 | " k=sum1/count\n", 527 | " if k<10:\n", 528 | " y_p=np.append(y_p,[0])\n", 529 | " y_pid=np.append(y_pid,[k])\n", 530 | " else:\n", 531 | " y_p=np.append(y_p,[1])\n", 532 | " y_pid=np.append(y_pid,[k])\n", 533 | "y_p\n", 534 | "filename=r'C:\\Users\\Ashutosh\\Downloads\\full_test_split.csv'\n", 535 | "#filename=r'E:\\Book.csv'\n", 536 | "dframe = pd.read_csv(filename, index_col=None,header=0)\n", 537 | "y_test=dframe[['PHQ_Binary']]\n", 538 | "y_test\n", 539 | "print (\"Accuracy when using Random Forest Regression is \", accuracy_score(y_test,y_p.round(), normalize=True)*100)" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": 31, 545 | "metadata": {}, 546 | "outputs": [ 547 | { 548 | "name": "stdout", 549 | "output_type": "stream", 550 | "text": [ 551 | "RMSE using Random Forest Regressor with 40 n_estimators is 6.2514532324013725\n" 552 | ] 553 | } 554 | ], 555 | "source": [ 556 | "from sklearn.metrics import mean_squared_error\n", 557 | "from math import sqrt\n", 558 | "\n", 559 | "rms = sqrt(mean_squared_error(y_test, y_pid))\n", 560 | "print (\"RMSE using Random Forest Regressor with 40 n_estimators is \",rms) " 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": 33, 566 | "metadata": {}, 567 | "outputs": [ 568 | { 569 | "name": "stdout", 570 | "output_type": "stream", 571 | "text": [ 572 | "Mean_Absolute_Error using Random Forest Regressor with 40 n_estimators is: 6.211735661537876\n" 573 | ] 574 | } 575 | ], 576 | "source": [ 577 | "from sklearn.metrics import mean_absolute_error\n", 578 | "a=mean_absolute_error(y_test, y_pid)\n", 579 | "print(\"Mean_Absolute_Error using Random Forest Regressor with 40 n_estimators is:\",a)" 580 | ] 581 | } 582 | ], 583 | "metadata": { 584 | "kernelspec": { 585 | "display_name": "Python 3", 586 | "language": "python", 587 | "name": "python3" 588 | }, 589 | "language_info": { 590 | "codemirror_mode": { 591 | "name": "ipython", 592 | "version": 3 593 | }, 594 | "file_extension": ".py", 595 | "mimetype": "text/x-python", 596 | "name": "python", 597 | "nbconvert_exporter": "python", 598 | "pygments_lexer": "ipython3", 599 | "version": "3.7.3" 600 | } 601 | }, 602 | "nbformat": 4, 603 | "nbformat_minor": 2 604 | } 605 | -------------------------------------------------------------------------------- /SVR(linear).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 50, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import matplotlib.pyplot as plt\n", 11 | "from sklearn.svm import SVR\n", 12 | "from sklearn.metrics import roc_curve\n", 13 | "from sklearn.metrics import auc\n", 14 | "from sklearn.model_selection import train_test_split \n", 15 | "from sklearn.metrics import classification_report, confusion_matrix \n", 16 | "from sklearn.metrics import accuracy_score\n", 17 | "import os\n", 18 | "from statistics import *" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 3, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "import pandas as pd\n", 28 | "import random \n", 29 | "import glob\n", 30 | "\n", 31 | "path = r'D:\\New folder\\1' # use your path\n", 32 | "\n", 33 | "all_files = glob.glob(path + \"/*.csv\")\n", 34 | "\n", 35 | "li = []\n", 36 | "\n", 37 | "for filename in all_files:\n", 38 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 39 | " s = n//10 # sample size of 10%\n", 40 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 41 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 42 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 43 | " li.append(df)\n", 44 | "\n", 45 | "frame1 = pd.concat(li, axis=0, ignore_index=True)\n", 46 | "frame1=frame1.astype('float32')\n", 47 | "df=frame1.copy()\n", 48 | "x=frame1.iloc[:,:]\n", 49 | "#x.drop(['BW'], axis=1)\n", 50 | "#y=frame[frame.columns[-1]]\n", 51 | "y=frame1.iloc[:,-1]\n", 52 | "del x['BW']\n", 53 | "models = []\n", 54 | "svclassifier=SVR(kernel='linear')\n", 55 | "svclassifier.fit(x, y)\n", 56 | "models.append(svclassifier)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 4, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "import gc\n", 66 | "del [[frame1,li,df]]\n", 67 | "frame1=pd.DataFrame()" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 5, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "path = r'D:\\New folder\\2' # use your path\n", 77 | "\n", 78 | "all_files = glob.glob(path + \"/*.csv\")\n", 79 | "\n", 80 | "li = []\n", 81 | "\n", 82 | "for filename in all_files:\n", 83 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 84 | " s = n//10 # sample size of 10%\n", 85 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 86 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 87 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 88 | " li.append(df)\n", 89 | "\n", 90 | "frame2 = pd.concat(li, axis=0, ignore_index=True)\n", 91 | "frame2=frame2.astype('float32')\n", 92 | "df=frame2.copy()\n", 93 | "x=frame2.iloc[:,:]\n", 94 | "y=frame2.iloc[:,-1]\n", 95 | "del x['BW']\n", 96 | "svclassifier=SVR(kernel='linear')\n", 97 | "svclassifier.fit(x, y)\n", 98 | "models.append(svclassifier)" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 6, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [ 107 | "import gc\n", 108 | "del [[frame2,li,df]]\n", 109 | "frame2=pd.DataFrame()" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 7, 115 | "metadata": {}, 116 | "outputs": [], 117 | "source": [ 118 | "path = r'D:\\New folder\\3' # use your path\n", 119 | "\n", 120 | "all_files = glob.glob(path + \"/*.csv\")\n", 121 | "\n", 122 | "li = []\n", 123 | "\n", 124 | "for filename in all_files:\n", 125 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 126 | " s = n//10 # sample size of 10%\n", 127 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 128 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 129 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 130 | " li.append(df)\n", 131 | "\n", 132 | "frame3 = pd.concat(li, axis=0, ignore_index=True)\n", 133 | "frame3=frame3.astype('float32')\n", 134 | "df=frame3.copy()\n", 135 | "x=frame3.iloc[:,:]\n", 136 | "y=frame3.iloc[:,-1]\n", 137 | "del x['BW']\n", 138 | "svclassifier=SVR(kernel='linear')\n", 139 | "svclassifier.fit(x, y)\n", 140 | "models.append(svclassifier)" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 8, 146 | "metadata": {}, 147 | "outputs": [], 148 | "source": [ 149 | "import gc\n", 150 | "del [[frame3,li,df]]\n", 151 | "frame3=pd.DataFrame()" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 9, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "path = r'D:\\New folder\\4' # use your path\n", 161 | "\n", 162 | "all_files = glob.glob(path + \"/*.csv\")\n", 163 | "\n", 164 | "li = []\n", 165 | "\n", 166 | "for filename in all_files:\n", 167 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 168 | " s = n//10 # sample size of 10%\n", 169 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 170 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 171 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 172 | " li.append(df)\n", 173 | "\n", 174 | "frame4 = pd.concat(li, axis=0, ignore_index=True)\n", 175 | "frame4=frame4.astype('float32')\n", 176 | "df=frame4.copy()\n", 177 | "x=frame4.iloc[:,:]\n", 178 | "y=frame4.iloc[:,-1]\n", 179 | "del x['BW']\n", 180 | "svclassifier=SVR(kernel='linear')\n", 181 | "svclassifier.fit(x, y)\n", 182 | "models.append(svclassifier)" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 10, 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "import gc\n", 192 | "del [[frame4,li,df]]\n", 193 | "frame4=pd.DataFrame()" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 11, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "path = r'D:\\New folder\\5' # use your path\n", 203 | "\n", 204 | "all_files = glob.glob(path + \"/*.csv\")\n", 205 | "\n", 206 | "li = []\n", 207 | "\n", 208 | "for filename in all_files:\n", 209 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 210 | " s = n//10 # sample size of 10%\n", 211 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 212 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 213 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 214 | " li.append(df)\n", 215 | "\n", 216 | "frame5 = pd.concat(li, axis=0, ignore_index=True)\n", 217 | "frame5=frame5.astype('float32')\n", 218 | "df=frame5.copy()\n", 219 | "x=frame5.iloc[:,:]\n", 220 | "y=frame5.iloc[:,-1]\n", 221 | "del x['BW']\n", 222 | "svclassifier=SVR(kernel='linear')\n", 223 | "svclassifier.fit(x, y)\n", 224 | "models.append(svclassifier)" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 12, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [ 233 | "import gc\n", 234 | "del [[frame5,li,df]]\n", 235 | "frame5=pd.DataFrame()" 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": 13, 241 | "metadata": {}, 242 | "outputs": [], 243 | "source": [ 244 | "path = r'D:\\New folder\\6' # use your path\n", 245 | "\n", 246 | "all_files = glob.glob(path + \"/*.csv\")\n", 247 | "\n", 248 | "li = []\n", 249 | "\n", 250 | "for filename in all_files:\n", 251 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 252 | " s = n//10 # sample size of 10%\n", 253 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 254 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 255 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 256 | " li.append(df)\n", 257 | "\n", 258 | "frame6 = pd.concat(li, axis=0, ignore_index=True)\n", 259 | "frame6=frame6.astype('float32')\n", 260 | "df=frame6.copy()\n", 261 | "x=frame6.iloc[:,:]\n", 262 | "y=frame6.iloc[:,-1]\n", 263 | "del x['BW']\n", 264 | "svclassifier=SVR(kernel='linear')\n", 265 | "svclassifier.fit(x, y)\n", 266 | "models.append(svclassifier)" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 14, 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "import gc\n", 276 | "del [[frame6,li,df]]\n", 277 | "frame6=pd.DataFrame()" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 15, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [ 286 | "path = r'D:\\New folder\\7' # use your path\n", 287 | "\n", 288 | "all_files = glob.glob(path + \"/*.csv\")\n", 289 | "\n", 290 | "li = []\n", 291 | "\n", 292 | "for filename in all_files:\n", 293 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 294 | " s = n//10 # sample size of 10%\n", 295 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 296 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 297 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 298 | " li.append(df)\n", 299 | "\n", 300 | "frame7 = pd.concat(li, axis=0, ignore_index=True)\n", 301 | "frame7=frame7.astype('float32')\n", 302 | "df=frame7.copy()\n", 303 | "x=frame7.iloc[:,:]\n", 304 | "y=frame7.iloc[:,-1]\n", 305 | "del x['BW']\n", 306 | "svclassifier=SVR(kernel='linear')\n", 307 | "svclassifier.fit(x, y)\n", 308 | "models.append(svclassifier)" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 16, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "import gc\n", 318 | "del [[frame7,li,df]]\n", 319 | "frame7=pd.DataFrame()" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 17, 325 | "metadata": {}, 326 | "outputs": [], 327 | "source": [ 328 | "path = r'D:\\New folder\\8' # use your path\n", 329 | "\n", 330 | "all_files = glob.glob(path + \"/*.csv\")\n", 331 | "\n", 332 | "li = []\n", 333 | "\n", 334 | "for filename in all_files:\n", 335 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 336 | " s = n//10 # sample size of 10%\n", 337 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 338 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 339 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 340 | " li.append(df)\n", 341 | "\n", 342 | "frame8 = pd.concat(li, axis=0, ignore_index=True)\n", 343 | "frame8=frame8.astype('float32')\n", 344 | "df=frame8.copy()\n", 345 | "x=frame8.iloc[:,:]\n", 346 | "y=frame8.iloc[:,-1]\n", 347 | "del x['BW']\n", 348 | "svclassifier=SVR(kernel='linear')\n", 349 | "svclassifier.fit(x, y)\n", 350 | "models.append(svclassifier)" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 18, 356 | "metadata": {}, 357 | "outputs": [], 358 | "source": [ 359 | "import gc\n", 360 | "del [[frame8,li,df]]\n", 361 | "frame8=pd.DataFrame()" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 19, 367 | "metadata": {}, 368 | "outputs": [], 369 | "source": [ 370 | "path = r'D:\\New folder\\9' # use your path\n", 371 | "\n", 372 | "all_files = glob.glob(path + \"/*.csv\")\n", 373 | "\n", 374 | "li = []\n", 375 | "\n", 376 | "for filename in all_files:\n", 377 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 378 | " s = n//10 # sample size of 10%\n", 379 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 380 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 381 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 382 | " li.append(df)\n", 383 | "\n", 384 | "frame9 = pd.concat(li, axis=0, ignore_index=True)\n", 385 | "frame9=frame9.astype('float32')\n", 386 | "df=frame9.copy()\n", 387 | "x=frame9.iloc[:,:]\n", 388 | "y=frame9.iloc[:,-1]\n", 389 | "del x['BW']\n", 390 | "svclassifier=SVR(kernel='linear')\n", 391 | "svclassifier.fit(x, y)\n", 392 | "models.append(svclassifier)" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 20, 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [ 401 | "import gc\n", 402 | "del [[frame9,li,df]]\n", 403 | "frame9=pd.DataFrame()" 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": 21, 409 | "metadata": {}, 410 | "outputs": [], 411 | "source": [ 412 | "path = r'D:\\New folder\\10' # use your path\n", 413 | "\n", 414 | "all_files = glob.glob(path + \"/*.csv\")\n", 415 | "\n", 416 | "li = []\n", 417 | "\n", 418 | "for filename in all_files:\n", 419 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 420 | " s = n//10 # sample size of 10%\n", 421 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 422 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 423 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 424 | " li.append(df)\n", 425 | "\n", 426 | "frame10 = pd.concat(li, axis=0, ignore_index=True)\n", 427 | "frame10=frame10.astype('float32')\n", 428 | "df=frame10.copy()\n", 429 | "x=frame10.iloc[:,:]\n", 430 | "y=frame10.iloc[:,-1]\n", 431 | "del x['BW']\n", 432 | "svclassifier=SVR(kernel='linear')\n", 433 | "svclassifier.fit(x, y)\n", 434 | "models.append(svclassifier)" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 22, 440 | "metadata": {}, 441 | "outputs": [], 442 | "source": [ 443 | "import gc\n", 444 | "del [[frame10,li,df]]\n", 445 | "frame10=pd.DataFrame()" 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 23, 451 | "metadata": {}, 452 | "outputs": [], 453 | "source": [ 454 | "path = r'D:\\New folder\\11' # use your path\n", 455 | "\n", 456 | "all_files = glob.glob(path + \"/*.csv\")\n", 457 | "\n", 458 | "li = []\n", 459 | "\n", 460 | "for filename in all_files:\n", 461 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 462 | " s = n//10 # sample size of 10%\n", 463 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 464 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 465 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 466 | " li.append(df)\n", 467 | "\n", 468 | "frame11 = pd.concat(li, axis=0, ignore_index=True)\n", 469 | "frame11=frame11.astype('float32')\n", 470 | "df=frame11.copy()\n", 471 | "x=frame11.iloc[:,:]\n", 472 | "y=frame11.iloc[:,-1]\n", 473 | "del x['BW']\n", 474 | "svclassifier=SVR(kernel='linear')\n", 475 | "svclassifier.fit(x, y)\n", 476 | "models.append(svclassifier)" 477 | ] 478 | }, 479 | { 480 | "cell_type": "code", 481 | "execution_count": 24, 482 | "metadata": {}, 483 | "outputs": [], 484 | "source": [ 485 | "import gc\n", 486 | "del [[frame11,li,df]]\n", 487 | "frame11=pd.DataFrame()" 488 | ] 489 | }, 490 | { 491 | "cell_type": "code", 492 | "execution_count": 70, 493 | "metadata": {}, 494 | "outputs": [ 495 | { 496 | "name": "stdout", 497 | "output_type": "stream", 498 | "text": [ 499 | "Accuracy when using SVR is 71.7948717948718\n" 500 | ] 501 | } 502 | ], 503 | "source": [ 504 | "path = r'D:\\Training' # use your path\n", 505 | "all_files = glob.glob(path + \"/*.csv\")\n", 506 | "y_p=[]\n", 507 | "y_pid=[]\n", 508 | "for filename in all_files:\n", 509 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 510 | " s = n//10 # sample size of 10%\n", 511 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 512 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 513 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 514 | " x=df.iloc[:,:]\n", 515 | "#y=frame[frame.columns[-1]]\n", 516 | " sum1=0\n", 517 | " count=0\n", 518 | " for model in models:\n", 519 | " count=count+1\n", 520 | " y_pred=model.predict(x)\n", 521 | " sum1=sum1+y_pred.mean()\n", 522 | " base=os.path.basename(filename)\n", 523 | " x=os.path.splitext(base)[0]\n", 524 | " x = x.split(\"_\")\n", 525 | " k=sum1/count\n", 526 | " if k<10:\n", 527 | " y_p=np.append(y_p,[0])\n", 528 | " y_pid=np.append(y_pid,[k])\n", 529 | " else:\n", 530 | " y_p=np.append(y_p,[1])\n", 531 | " y_pid=np.append(y_pid,[k])\n", 532 | "y_p\n", 533 | "filename=r'C:\\Users\\Ashutosh\\Downloads\\full_test_split.csv'\n", 534 | "#filename=r'E:\\Book.csv'\n", 535 | "dframe = pd.read_csv(filename, index_col=None,header=0)\n", 536 | "y_test=dframe[['PHQ_Binary']]\n", 537 | "y_test\n", 538 | "print (\"Accuracy when using SVR is \", accuracy_score(y_test,y_p.round(), normalize=True)*100)" 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 72, 544 | "metadata": {}, 545 | "outputs": [ 546 | { 547 | "name": "stdout", 548 | "output_type": "stream", 549 | "text": [ 550 | "RMSE using SVR is 5.435206908553691\n" 551 | ] 552 | } 553 | ], 554 | "source": [ 555 | "from sklearn.metrics import mean_squared_error\n", 556 | "from math import sqrt\n", 557 | "\n", 558 | "rms = sqrt(mean_squared_error(y_test, y_pid))\n", 559 | "print (\"RMSE using SVR is \",rms) " 560 | ] 561 | }, 562 | { 563 | "cell_type": "code", 564 | "execution_count": 74, 565 | "metadata": {}, 566 | "outputs": [ 567 | { 568 | "name": "stdout", 569 | "output_type": "stream", 570 | "text": [ 571 | "Mean_Absolute_Error using SVR is: 5.403092898303896\n" 572 | ] 573 | } 574 | ], 575 | "source": [ 576 | "from sklearn.metrics import mean_absolute_error\n", 577 | "a=mean_absolute_error(y_test, y_pid)\n", 578 | "print(\"Mean_Absolute_Error using SVR is:\",a)" 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": null, 584 | "metadata": {}, 585 | "outputs": [], 586 | "source": [] 587 | } 588 | ], 589 | "metadata": { 590 | "kernelspec": { 591 | "display_name": "Python 3", 592 | "language": "python", 593 | "name": "python3" 594 | }, 595 | "language_info": { 596 | "codemirror_mode": { 597 | "name": "ipython", 598 | "version": 3 599 | }, 600 | "file_extension": ".py", 601 | "mimetype": "text/x-python", 602 | "name": "python", 603 | "nbconvert_exporter": "python", 604 | "pygments_lexer": "ipython3", 605 | "version": "3.7.3" 606 | } 607 | }, 608 | "nbformat": 4, 609 | "nbformat_minor": 2 610 | } 611 | -------------------------------------------------------------------------------- /SVR(sigmoid).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import matplotlib.pyplot as plt\n", 11 | "from sklearn.svm import SVR\n", 12 | "from sklearn.metrics import roc_curve\n", 13 | "from sklearn.metrics import auc\n", 14 | "from sklearn.model_selection import train_test_split \n", 15 | "from sklearn.metrics import classification_report, confusion_matrix \n", 16 | "from sklearn.metrics import accuracy_score\n", 17 | "import os\n", 18 | "from statistics import *" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "name": "stderr", 28 | "output_type": "stream", 29 | "text": [ 30 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 31 | " \"avoid this warning.\", FutureWarning)\n" 32 | ] 33 | } 34 | ], 35 | "source": [ 36 | "import pandas as pd\n", 37 | "import random \n", 38 | "import glob\n", 39 | "\n", 40 | "path = r'D:\\New folder\\1' # use your path\n", 41 | "\n", 42 | "all_files = glob.glob(path + \"/*.csv\")\n", 43 | "\n", 44 | "li = []\n", 45 | "\n", 46 | "for filename in all_files:\n", 47 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 48 | " s = n//10 # sample size of 10%\n", 49 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 50 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 51 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 52 | " li.append(df)\n", 53 | "\n", 54 | "frame1 = pd.concat(li, axis=0, ignore_index=True)\n", 55 | "frame1=frame1.astype('float32')\n", 56 | "df=frame1.copy()\n", 57 | "x=frame1.iloc[:,:]\n", 58 | "#x.drop(['BW'], axis=1)\n", 59 | "#y=frame[frame.columns[-1]]\n", 60 | "y=frame1.iloc[:,-1]\n", 61 | "del x['BW']\n", 62 | "models = []\n", 63 | "svclassifier=SVR(kernel='sigmoid')\n", 64 | "svclassifier.fit(x, y)\n", 65 | "models.append(svclassifier)" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 3, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "import gc\n", 75 | "del [[frame1,li,df]]\n", 76 | "frame1=pd.DataFrame()" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 4, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "name": "stderr", 86 | "output_type": "stream", 87 | "text": [ 88 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 89 | " \"avoid this warning.\", FutureWarning)\n" 90 | ] 91 | } 92 | ], 93 | "source": [ 94 | "path = r'D:\\New folder\\2' # use your path\n", 95 | "\n", 96 | "all_files = glob.glob(path + \"/*.csv\")\n", 97 | "\n", 98 | "li = []\n", 99 | "\n", 100 | "for filename in all_files:\n", 101 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 102 | " s = n//10 # sample size of 10%\n", 103 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 104 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 105 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 106 | " li.append(df)\n", 107 | "\n", 108 | "frame2 = pd.concat(li, axis=0, ignore_index=True)\n", 109 | "frame2=frame2.astype('float32')\n", 110 | "df=frame2.copy()\n", 111 | "x=frame2.iloc[:,:]\n", 112 | "y=frame2.iloc[:,-1]\n", 113 | "del x['BW']\n", 114 | "svclassifier=SVR(kernel='sigmoid')\n", 115 | "svclassifier.fit(x, y)\n", 116 | "models.append(svclassifier)" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 5, 122 | "metadata": {}, 123 | "outputs": [], 124 | "source": [ 125 | "import gc\n", 126 | "del [[frame2,li,df]]\n", 127 | "frame2=pd.DataFrame()" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 6, 133 | "metadata": {}, 134 | "outputs": [ 135 | { 136 | "name": "stderr", 137 | "output_type": "stream", 138 | "text": [ 139 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 140 | " \"avoid this warning.\", FutureWarning)\n" 141 | ] 142 | } 143 | ], 144 | "source": [ 145 | "path = r'D:\\New folder\\3' # use your path\n", 146 | "\n", 147 | "all_files = glob.glob(path + \"/*.csv\")\n", 148 | "\n", 149 | "li = []\n", 150 | "\n", 151 | "for filename in all_files:\n", 152 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 153 | " s = n//10 # sample size of 10%\n", 154 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 155 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 156 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 157 | " li.append(df)\n", 158 | "\n", 159 | "frame3 = pd.concat(li, axis=0, ignore_index=True)\n", 160 | "frame3=frame3.astype('float32')\n", 161 | "df=frame3.copy()\n", 162 | "x=frame3.iloc[:,:]\n", 163 | "y=frame3.iloc[:,-1]\n", 164 | "del x['BW']\n", 165 | "svclassifier=SVR(kernel='sigmoid')\n", 166 | "svclassifier.fit(x, y)\n", 167 | "models.append(svclassifier)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 7, 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [ 176 | "import gc\n", 177 | "del [[frame3,li,df]]\n", 178 | "frame3=pd.DataFrame()" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 8, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "name": "stderr", 188 | "output_type": "stream", 189 | "text": [ 190 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 191 | " \"avoid this warning.\", FutureWarning)\n" 192 | ] 193 | } 194 | ], 195 | "source": [ 196 | "path = r'D:\\New folder\\4' # use your path\n", 197 | "\n", 198 | "all_files = glob.glob(path + \"/*.csv\")\n", 199 | "\n", 200 | "li = []\n", 201 | "\n", 202 | "for filename in all_files:\n", 203 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 204 | " s = n//10 # sample size of 10%\n", 205 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 206 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 207 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 208 | " li.append(df)\n", 209 | "\n", 210 | "frame4 = pd.concat(li, axis=0, ignore_index=True)\n", 211 | "frame4=frame4.astype('float32')\n", 212 | "df=frame4.copy()\n", 213 | "x=frame4.iloc[:,:]\n", 214 | "y=frame4.iloc[:,-1]\n", 215 | "del x['BW']\n", 216 | "svclassifier=SVR(kernel='sigmoid')\n", 217 | "svclassifier.fit(x, y)\n", 218 | "models.append(svclassifier)" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 9, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "import gc\n", 228 | "del [[frame4,li,df]]\n", 229 | "frame4=pd.DataFrame()" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 10, 235 | "metadata": {}, 236 | "outputs": [ 237 | { 238 | "name": "stderr", 239 | "output_type": "stream", 240 | "text": [ 241 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 242 | " \"avoid this warning.\", FutureWarning)\n" 243 | ] 244 | } 245 | ], 246 | "source": [ 247 | "path = r'D:\\New folder\\5' # use your path\n", 248 | "\n", 249 | "all_files = glob.glob(path + \"/*.csv\")\n", 250 | "\n", 251 | "li = []\n", 252 | "\n", 253 | "for filename in all_files:\n", 254 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 255 | " s = n//10 # sample size of 10%\n", 256 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 257 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 258 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 259 | " li.append(df)\n", 260 | "\n", 261 | "frame5 = pd.concat(li, axis=0, ignore_index=True)\n", 262 | "frame5=frame5.astype('float32')\n", 263 | "df=frame5.copy()\n", 264 | "x=frame5.iloc[:,:]\n", 265 | "y=frame5.iloc[:,-1]\n", 266 | "del x['BW']\n", 267 | "svclassifier=SVR(kernel='sigmoid')\n", 268 | "svclassifier.fit(x, y)\n", 269 | "models.append(svclassifier)" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": 11, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "import gc\n", 279 | "del [[frame5,li,df]]\n", 280 | "frame5=pd.DataFrame()" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 12, 286 | "metadata": {}, 287 | "outputs": [ 288 | { 289 | "name": "stderr", 290 | "output_type": "stream", 291 | "text": [ 292 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 293 | " \"avoid this warning.\", FutureWarning)\n" 294 | ] 295 | } 296 | ], 297 | "source": [ 298 | "path = r'D:\\New folder\\6' # use your path\n", 299 | "\n", 300 | "all_files = glob.glob(path + \"/*.csv\")\n", 301 | "\n", 302 | "li = []\n", 303 | "\n", 304 | "for filename in all_files:\n", 305 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 306 | " s = n//10 # sample size of 10%\n", 307 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 308 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 309 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 310 | " li.append(df)\n", 311 | "\n", 312 | "frame6 = pd.concat(li, axis=0, ignore_index=True)\n", 313 | "frame6=frame6.astype('float32')\n", 314 | "df=frame6.copy()\n", 315 | "x=frame6.iloc[:,:]\n", 316 | "y=frame6.iloc[:,-1]\n", 317 | "del x['BW']\n", 318 | "svclassifier=SVR(kernel='sigmoid')\n", 319 | "svclassifier.fit(x, y)\n", 320 | "models.append(svclassifier)" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 13, 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [ 329 | "import gc\n", 330 | "del [[frame6,li,df]]\n", 331 | "frame6=pd.DataFrame()" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": 15, 337 | "metadata": {}, 338 | "outputs": [ 339 | { 340 | "name": "stderr", 341 | "output_type": "stream", 342 | "text": [ 343 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 344 | " \"avoid this warning.\", FutureWarning)\n" 345 | ] 346 | } 347 | ], 348 | "source": [ 349 | "path = r'D:\\New folder\\7' # use your path\n", 350 | "\n", 351 | "all_files = glob.glob(path + \"/*.csv\")\n", 352 | "\n", 353 | "li = []\n", 354 | "\n", 355 | "for filename in all_files:\n", 356 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 357 | " s = n//10 # sample size of 10%\n", 358 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 359 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 360 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 361 | " li.append(df)\n", 362 | "\n", 363 | "frame7 = pd.concat(li, axis=0, ignore_index=True)\n", 364 | "frame7=frame7.astype('float32')\n", 365 | "df=frame7.copy()\n", 366 | "x=frame7.iloc[:,:]\n", 367 | "y=frame7.iloc[:,-1]\n", 368 | "del x['BW']\n", 369 | "svclassifier=SVR(kernel='sigmoid')\n", 370 | "svclassifier.fit(x, y)\n", 371 | "models.append(svclassifier)" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 16, 377 | "metadata": {}, 378 | "outputs": [], 379 | "source": [ 380 | "import gc\n", 381 | "del [[frame7,li,df]]\n", 382 | "frame7=pd.DataFrame()" 383 | ] 384 | }, 385 | { 386 | "cell_type": "code", 387 | "execution_count": 17, 388 | "metadata": {}, 389 | "outputs": [ 390 | { 391 | "name": "stderr", 392 | "output_type": "stream", 393 | "text": [ 394 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 395 | " \"avoid this warning.\", FutureWarning)\n" 396 | ] 397 | } 398 | ], 399 | "source": [ 400 | "path = r'D:\\New folder\\8' # use your path\n", 401 | "\n", 402 | "all_files = glob.glob(path + \"/*.csv\")\n", 403 | "\n", 404 | "li = []\n", 405 | "\n", 406 | "for filename in all_files:\n", 407 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 408 | " s = n//10 # sample size of 10%\n", 409 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 410 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 411 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 412 | " li.append(df)\n", 413 | "\n", 414 | "frame8 = pd.concat(li, axis=0, ignore_index=True)\n", 415 | "frame8=frame8.astype('float32')\n", 416 | "df=frame8.copy()\n", 417 | "x=frame8.iloc[:,:]\n", 418 | "y=frame8.iloc[:,-1]\n", 419 | "del x['BW']\n", 420 | "svclassifier=SVR(kernel='sigmoid')\n", 421 | "svclassifier.fit(x, y)\n", 422 | "models.append(svclassifier)" 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": 18, 428 | "metadata": {}, 429 | "outputs": [], 430 | "source": [ 431 | "import gc\n", 432 | "del [[frame8,li,df]]\n", 433 | "frame8=pd.DataFrame()" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": 19, 439 | "metadata": {}, 440 | "outputs": [ 441 | { 442 | "name": "stderr", 443 | "output_type": "stream", 444 | "text": [ 445 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 446 | " \"avoid this warning.\", FutureWarning)\n" 447 | ] 448 | } 449 | ], 450 | "source": [ 451 | "path = r'D:\\New folder\\9' # use your path\n", 452 | "\n", 453 | "all_files = glob.glob(path + \"/*.csv\")\n", 454 | "\n", 455 | "li = []\n", 456 | "\n", 457 | "for filename in all_files:\n", 458 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 459 | " s = n//10 # sample size of 10%\n", 460 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 461 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 462 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 463 | " li.append(df)\n", 464 | "\n", 465 | "frame9 = pd.concat(li, axis=0, ignore_index=True)\n", 466 | "frame9=frame9.astype('float32')\n", 467 | "df=frame9.copy()\n", 468 | "x=frame9.iloc[:,:]\n", 469 | "y=frame9.iloc[:,-1]\n", 470 | "del x['BW']\n", 471 | "svclassifier=SVR(kernel='sigmoid')\n", 472 | "svclassifier.fit(x, y)\n", 473 | "models.append(svclassifier)" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": 20, 479 | "metadata": {}, 480 | "outputs": [], 481 | "source": [ 482 | "import gc\n", 483 | "del [[frame9,li,df]]\n", 484 | "frame9=pd.DataFrame()" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 21, 490 | "metadata": {}, 491 | "outputs": [ 492 | { 493 | "name": "stderr", 494 | "output_type": "stream", 495 | "text": [ 496 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 497 | " \"avoid this warning.\", FutureWarning)\n" 498 | ] 499 | } 500 | ], 501 | "source": [ 502 | "path = r'D:\\New folder\\10' # use your path\n", 503 | "\n", 504 | "all_files = glob.glob(path + \"/*.csv\")\n", 505 | "\n", 506 | "li = []\n", 507 | "\n", 508 | "for filename in all_files:\n", 509 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 510 | " s = n//10 # sample size of 10%\n", 511 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 512 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 513 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 514 | " li.append(df)\n", 515 | "\n", 516 | "frame10 = pd.concat(li, axis=0, ignore_index=True)\n", 517 | "frame10=frame10.astype('float32')\n", 518 | "df=frame10.copy()\n", 519 | "x=frame10.iloc[:,:]\n", 520 | "y=frame10.iloc[:,-1]\n", 521 | "del x['BW']\n", 522 | "svclassifier=SVR(kernel='sigmoid')\n", 523 | "svclassifier.fit(x, y)\n", 524 | "models.append(svclassifier)" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": 22, 530 | "metadata": {}, 531 | "outputs": [], 532 | "source": [ 533 | "import gc\n", 534 | "del [[frame10,li,df]]\n", 535 | "frame10=pd.DataFrame()" 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": 23, 541 | "metadata": {}, 542 | "outputs": [ 543 | { 544 | "name": "stderr", 545 | "output_type": "stream", 546 | "text": [ 547 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\svm\\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n", 548 | " \"avoid this warning.\", FutureWarning)\n" 549 | ] 550 | } 551 | ], 552 | "source": [ 553 | "path = r'D:\\New folder\\11' # use your path\n", 554 | "\n", 555 | "all_files = glob.glob(path + \"/*.csv\")\n", 556 | "\n", 557 | "li = []\n", 558 | "\n", 559 | "for filename in all_files:\n", 560 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 561 | " s = n//10 # sample size of 10%\n", 562 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 563 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 564 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 565 | " li.append(df)\n", 566 | "\n", 567 | "frame11 = pd.concat(li, axis=0, ignore_index=True)\n", 568 | "frame11=frame11.astype('float32')\n", 569 | "df=frame11.copy()\n", 570 | "x=frame11.iloc[:,:]\n", 571 | "y=frame11.iloc[:,-1]\n", 572 | "del x['BW']\n", 573 | "svclassifier=SVR(kernel='sigmoid')\n", 574 | "svclassifier.fit(x, y)\n", 575 | "models.append(svclassifier)" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": 24, 581 | "metadata": {}, 582 | "outputs": [], 583 | "source": [ 584 | "import gc\n", 585 | "del [[frame11,li,df]]\n", 586 | "frame11=pd.DataFrame()" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": 25, 592 | "metadata": {}, 593 | "outputs": [ 594 | { 595 | "name": "stdout", 596 | "output_type": "stream", 597 | "text": [ 598 | "Accuracy when using SVR is 71.7948717948718\n" 599 | ] 600 | } 601 | ], 602 | "source": [ 603 | "path = r'D:\\Training' # use your path\n", 604 | "all_files = glob.glob(path + \"/*.csv\")\n", 605 | "y_p=[]\n", 606 | "y_pid=[]\n", 607 | "for filename in all_files:\n", 608 | " n = sum(1 for line in open(filename))-1 # Calculate number of rows in file\n", 609 | " s = n//10 # sample size of 10%\n", 610 | " skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header \n", 611 | " df = pd.read_csv(filename, index_col=None,skiprows=skip,header=0)\n", 612 | " df= df[(df==0).sum(axis=1)/len(df.columns) <= 0.50]\n", 613 | " x=df.iloc[:,:]\n", 614 | "#y=frame[frame.columns[-1]]\n", 615 | " sum1=0\n", 616 | " count=0\n", 617 | " for model in models:\n", 618 | " count=count+1\n", 619 | " y_pred=model.predict(x)\n", 620 | " sum1=sum1+y_pred.mean()\n", 621 | " base=os.path.basename(filename)\n", 622 | " x=os.path.splitext(base)[0]\n", 623 | " x = x.split(\"_\")\n", 624 | " k=sum1/count\n", 625 | " if k<10:\n", 626 | " y_p=np.append(y_p,[0])\n", 627 | " y_pid=np.append(y_pid,[k])\n", 628 | " else:\n", 629 | " y_p=np.append(y_p,[1])\n", 630 | " y_pid=np.append(y_pid,[k])\n", 631 | "y_p\n", 632 | "filename=r'C:\\Users\\Ashutosh\\Downloads\\full_test_split.csv'\n", 633 | "#filename=r'E:\\Book.csv'\n", 634 | "dframe = pd.read_csv(filename, index_col=None,header=0)\n", 635 | "y_test=dframe[['PHQ_Binary']]\n", 636 | "y_test\n", 637 | "print (\"Accuracy when using SVR is \", accuracy_score(y_test,y_p.round(), normalize=True)*100)" 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "execution_count": 30, 643 | "metadata": {}, 644 | "outputs": [ 645 | { 646 | "name": "stdout", 647 | "output_type": "stream", 648 | "text": [ 649 | "RMSE using SVR with sigmoid kernal is 5.413690938785482\n" 650 | ] 651 | } 652 | ], 653 | "source": [ 654 | "from sklearn.metrics import mean_squared_error\n", 655 | "from math import sqrt\n", 656 | "\n", 657 | "rms = sqrt(mean_squared_error(y_test, y_pid))\n", 658 | "print (\"RMSE using SVR with sigmoid kernal is \",rms) " 659 | ] 660 | }, 661 | { 662 | "cell_type": "code", 663 | "execution_count": 32, 664 | "metadata": {}, 665 | "outputs": [ 666 | { 667 | "name": "stdout", 668 | "output_type": "stream", 669 | "text": [ 670 | "Mean_Absolute_Error using SVR with sigmoid kernal is: 5.394384554821085\n" 671 | ] 672 | } 673 | ], 674 | "source": [ 675 | "from sklearn.metrics import mean_absolute_error\n", 676 | "a=mean_absolute_error(y_test, y_pid)\n", 677 | "print(\"Mean_Absolute_Error using SVR with sigmoid kernal is:\",a)" 678 | ] 679 | }, 680 | { 681 | "cell_type": "code", 682 | "execution_count": null, 683 | "metadata": {}, 684 | "outputs": [], 685 | "source": [] 686 | } 687 | ], 688 | "metadata": { 689 | "kernelspec": { 690 | "display_name": "Python 3", 691 | "language": "python", 692 | "name": "python3" 693 | }, 694 | "language_info": { 695 | "codemirror_mode": { 696 | "name": "ipython", 697 | "version": 3 698 | }, 699 | "file_extension": ".py", 700 | "mimetype": "text/x-python", 701 | "name": "python", 702 | "nbconvert_exporter": "python", 703 | "pygments_lexer": "ipython3", 704 | "version": "3.7.3" 705 | } 706 | }, 707 | "nbformat": 4, 708 | "nbformat_minor": 2 709 | } 710 | --------------------------------------------------------------------------------