├── .gitignore ├── README.md ├── conda_env.yaml ├── data_dir_struct.txt ├── fixed-time-horizon-prediction ├── Exp_1.2 │ ├── Bondville_3hour.py │ ├── Bondville_4hour.py │ ├── Boulder_3hour.py │ ├── Boulder_4hour.py │ ├── Desert_Rock_2hour.py │ ├── Desert_Rock_3hour.py │ ├── Desert_Rock_4hour.py │ ├── Exp_1.2_Bondville_2hour.py │ ├── Exp_1.2_Boulder_2hour.py │ ├── Exp_1.2_FortPeck_2hour.py │ ├── Exp_1.2_GoodwinCreek_2hour.py │ ├── Exp_1.2_PenState_2hour.py │ ├── Exp_1.3_DesertRock_3hour.py │ ├── Fort_Peck_3hour.py │ ├── Fort_Peck_4hour.py │ ├── Goodwin_Creek_2hour.py │ ├── Goodwin_Creek_3hour.py │ ├── Goodwin_Creek_4hour.py │ ├── Penn_State_2hour.py │ ├── Penn_State_3hour.py │ ├── Penn_State_4hour.py │ ├── RNN_Fort_Peck.py │ ├── RNN_Sioux_Falls.py │ ├── Sioux_Falls_2hour.py │ ├── Sioux_Falls_3hour.py │ └── Sioux_Falls_4hour.py ├── Exp_1.2_Bondville_2hour.ipynb ├── Exp_1.2_Boulder_2hour.ipynb ├── Exp_1.2_Desert_Rock_2hour.ipynb ├── Exp_1.2_FortPeck_2hour.ipynb ├── Exp_1.2_GoodwinCreek_2hour.ipynb ├── Exp_1.2_PenState_2hour.ipynb ├── Exp_1.2_SiouxFalls_2hour.ipynb ├── Exp_1.3_Bondville_3hour.ipynb ├── Exp_1_RNN_Bondville.ipynb ├── Exp_1_RNN_Boulder.ipynb ├── Exp_1_RNN_Desert_Rock.ipynb ├── Exp_1_RNN_Fort_Peck.ipynb ├── Exp_1_RNN_Goodwin_Creek.ipynb ├── Exp_1_RNN_Penn_State.ipynb └── Exp_1_RNN_Sioux_Falls.ipynb ├── multi-time-horizon-prediction ├── Exp_2.1_multi-time-scale_All_Locations.ipynb ├── Exp_2.1_multi-time-scale_All_Locations.py ├── Exp_2.1_multi-time-scale_Bondville_2009.py ├── Exp_2.1_multi-time-scale_Bondville_2015.py ├── Exp_2.1_multi-time-scale_Bondville_2016.py ├── Exp_2.1_multi-time-scale_Bondville_2017.py ├── Exp_2.1_multi-time-scale_Boulder_2009.py ├── Exp_2.1_multi-time-scale_Boulder_2015.py ├── Exp_2.1_multi-time-scale_Boulder_2016.py ├── Exp_2.1_multi-time-scale_Boulder_2017.py ├── Exp_2.1_multi-time-scale_Desert_Rock_2009.py ├── Exp_2.1_multi-time-scale_Desert_Rock_2015.py ├── Exp_2.1_multi-time-scale_Desert_Rock_2016.py ├── Exp_2.1_multi-time-scale_Desert_Rock_2017.py ├── Exp_2.1_multi-time-scale_Fort_Peck_2009.py ├── Exp_2.1_multi-time-scale_Fort_Peck_2015.py ├── Exp_2.1_multi-time-scale_Fort_Peck_2016.py ├── Exp_2.1_multi-time-scale_Fort_Peck_2017.py ├── Exp_2.1_multi-time-scale_Goodwin_Creek_2009.py ├── Exp_2.1_multi-time-scale_Goodwin_Creek_2015.py ├── Exp_2.1_multi-time-scale_Goodwin_Creek_2016.py ├── Exp_2.1_multi-time-scale_Goodwin_Creek_2017.py ├── Exp_2.1_multi-time-scale_Penn_State_2009.py ├── Exp_2.1_multi-time-scale_Penn_State_2015.py ├── Exp_2.1_multi-time-scale_Penn_State_2016.py ├── Exp_2.1_multi-time-scale_Penn_State_2017.py ├── Exp_2.1_multi-time-scale_Sioux_Falls_2009.py ├── Exp_2.1_multi-time-scale_Sioux_Falls_2015.py ├── Exp_2.1_multi-time-scale_Sioux_Falls_2016.py └── Exp_2.1_multi-time-scale_Sioux_Falls_2017.py └── multi-tscale-slim.yaml /.gitignore: -------------------------------------------------------------------------------- 1 | *.jpg 2 | *.zip 3 | .ipynb_checkpoints 4 | *Test.ipynb 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Multi-time Horizon Solar Forecasting using Recurrent Neural Networks 2 | 3 | This repository contains code to reproduce the results published in the ["Multi-time-horizon Solar Forecasting Using Recurrent Neural Network"](https://arxiv.org/abs/1807.05459) paper. In addition, a LSTM implementation for multi-time horizon solar forecasting is available at this repository: ["PyTorch implementation of LSTM Model for Multi-time-horizon Solar Forecasting"](https://github.com/sakshi-mishra/LSTM_Solar_Forecasting). 4 | 5 | ## Conda environment for running the code 6 | 7 | A conda environment file is provided for convenience. Assuming you have Anaconda python distribution available on your computer, you can create a new conda environment with the necessary packages using the following command: 8 | 9 | `conda env create -f multi-tscale-slim.yaml -n "multi_time_horizon"` 10 | 11 | ## Predictions with fixed time horizon 12 | The Jupyter Notebooks in [fixed-time-horizon-prediction](fixed-time-horizon-prediction) explain the experiments on forecasting solar on fixed time interval basis as described in Section V.A of the [paper](https://arxiv.org/abs/1807.05459). 13 | 14 | There are seven different sites for which the prediction is being done, thus seven ipython notebooks for each site (1-hour ahead forecast). 15 | * [fixed-time-horizon-prediction/Exp_1_RNN_Bondville.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Bondville.ipynb) : Code to train and predict for Bondville location, 1-hour ahead forecast 16 | * [fixed-time-horizon-prediction/Exp_1_RNN_Boulder.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Boulder.ipynb): Code to train and predict for Boulder location, 1- hour ahead forecast 17 | * [fixed-time-horizon-prediction/Exp_1_RNN_Desert_Rock.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Desert_Rock.ipynb): Code to train and predict for Desert Rock location, 1- hour ahead forecast 18 | * [fixed-time-horizon-prediction/Exp_1_RNN_Fort_Peck.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Fort_Peck.ipynb): Code to train and predict for Fore Peck location, 1- hour ahead forecast 19 | * [fixed-time-horizon-prediction/Exp_1_RNN_Goodwin_Creek.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Goodwin_Creek.ipynb): Code to train and predict for Goodwin Creek location, 1- hour ahead forecast 20 | * [fixed-time-horizon-prediction/Exp_1_RNN_Penn_State.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Penn_State.ipynb): code to train and predict for Penn State location 1-hour ahead forecast 21 | * [fixed-time-horizon-prediction/Exp_1_RNN_Sioux_Falls.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Sioux_Falls.ipynb): code to train and predict for Sioux Falls location 1-hour ahead forecast 22 | 23 | 2-hour ahead forecast Jupyter Notebooks for all seven locations: 24 | * [fixed-time-horizon-prediction/Exp_1.2_Bondville_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_Bondville_2hour.ipynb): code to train and predict for Bondville location 2-hour ahead forecast 25 | * [fixed-time-horizon-prediction/Exp_1.2_Boulder_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_Boulder_2hour.ipynb): code to train and predict for Boulder location 2-hour ahead forecast 26 | * [fixed-time-horizon-prediction/Exp_1.2_Desert_Rock_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_Desert_Rock_2hour.ipynb) : code to train and predict for Desert Rock location 2-hour ahead forecast 27 | * [fixed-time-horizon-prediction/Exp_1.2_FortPeck_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_FortPeck_2hour.ipynb): code to train and predict for Fort Peck location 2-hour ahead forecast 28 | * [fixed-time-horizon-prediction/Exp_1.2_GoodwinCreek_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_GoodwinCreek_2hour.ipynb): code to train and predict for Goodwin Creek location 2-hour ahead forecast 29 | * [fixed-time-horizon-prediction/Exp_1.2_PenState_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_PenState_2hour.ipynb): code to train and predict for Penn State location 2-hour ahead forecast 30 | * [fixed-time-horizon-prediction/Exp_1.2_SiouxFalls_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_SiouxFalls_2hour.ipynb): code to train and predict for Sioux Falls location 2-hour ahead forecast 31 | 32 | 3-hour ahead forecast Jupyter Notebook for Bondville location: 33 | * [fixed-time-horizon-prediction/Exp_1.3_Bondville_3hour.ipynb](fixed-time-horizon-prediction/Exp_1.3_Bondville_3hour.ipynb): code to train and predict for Bondville location 3-hour ahead forecast 34 | 35 | #### [fixed-time-horizon-prediction/Exp_1.2](fixed-time-horizon-prediction/Exp_1.2) folder contains the .py version of the Jupyter Notebooks listed above, along with additional .py files predicting 3-hour ahead and 4-ahead forecasts for all the seven locations. 36 | 37 | ## Predictions with multi-time horizon 38 | 39 | The python scripts in [multi-time-horizon-prediction](multi-time-horizon-prediction) explain the experiments on forecasting solar on multi-time-horizon basis as described in Section V.B of the [paper](https://arxiv.org/abs/1807.05459). 40 | 41 | The models are trained on year 2010 and 2011. The predictions are done for year 2009, 2015, 2016 and 2017 for all seven locations. There are four test years and seven locations, so there are total 4*7 .py files for training and prediction purpose. 42 | 43 | The Jupyter Notebook [multi-time-horizon-prediction/Exp_2.1_multi-time-scale_All_Locations.ipynb](multi-time-horizon-prediction/Exp_2.1_multi-time-scale_All_Locations.ipynb) contains the code for predicting the solar irradiance for all the locations as well as all the test years (2009, 2015, 2016 and 2017). 44 | 45 | ### Training/Testing Data 46 | 47 | The training and testing data needs to be downloaded from the [NOAA FTP server](ftp://aftp.cmdl.noaa.gov/data/radiation/surfrad/) for the locations/sites. You can use GNU wget to automate the download process. The scripts assume that the data is in the *data* folder as per the structure outlined in the [data_dir_struct.txt](data_dir_struct.txt) file. 48 | 49 | If you face any issues running the code or reproducing the results, create an issue on this repo. Contributions are welcome too :) 50 | 51 | ## Citing 52 | If you find this work useful for your research, please cite the paper: 53 | 54 | ```bibtex 55 | @misc{1807.05459, 56 | Author = {Sakshi Mishra and Praveen Palanisamy}, 57 | Title = {Multi-time-horizon Solar Forecasting Using Recurrent Neural Network}, 58 | Year = {2018}, 59 | Eprint = {arXiv:1807.05459}, 60 | } 61 | ``` 62 | -------------------------------------------------------------------------------- /conda_env.yaml: -------------------------------------------------------------------------------- 1 | name: multi_time_horizon 2 | channels: 3 | - defaults 4 | dependencies: 5 | - ca-certificates=2018.03.07=0 6 | - certifi=2018.4.16=py36_0 7 | - libedit=3.1.20170329=h6b74fdf_2 8 | - libffi=3.2.1=hd88cf55_4 9 | - libgcc-ng=7.2.0=hdf63c60_3 10 | - libstdcxx-ng=7.2.0=hdf63c60_3 11 | - ncurses=6.1=hf484d3e_0 12 | - openssl=1.0.2o=h20670df_0 13 | - pip=10.0.1=py36_0 14 | - python=3.6.6=hc3d631a_0 15 | - readline=7.0=ha6073c6_4 16 | - setuptools=39.2.0=py36_0 17 | - sqlite=3.24.0=h84994c4_0 18 | - tk=8.6.7=hc745277_3 19 | - wheel=0.31.1=py36_0 20 | - xz=5.2.4=h14c3975_4 21 | - zlib=1.2.11=ha838bed_2 22 | - pip: 23 | - cycler==0.10.0 24 | - kiwisolver==1.0.1 25 | - matplotlib==2.2.2 26 | - numexpr==2.6.6 27 | - numpy==1.15.0 28 | - pandas==0.23.3 29 | - pvlib==0.5.2 30 | - pyparsing==2.2.0 31 | - python-dateutil==2.7.3 32 | - pytz==2018.5 33 | - scipy==1.1.0 34 | - seaborn==0.9.0 35 | - six==1.11.0 36 | - tables==3.4.4 37 | -------------------------------------------------------------------------------- /data_dir_struct.txt: -------------------------------------------------------------------------------- 1 | data 2 | ├── Bondville 3 | │   ├── Exp_1_test 4 | │   │   ├── 2009 5 | │   │   ├── 2015 6 | │   │   ├── 2016 7 | │   │   └── 2017 8 | │   └── Exp_1_train 9 | ├── Boulder 10 | │   ├── Exp_1_test 11 | │   │   ├── 2009 12 | │   │   ├── 2015 13 | │   │   ├── 2016 14 | │   │   └── 2017 15 | │   └── Exp_1_train 16 | ├── Desert_Rock 17 | │   ├── Exp_1_test 18 | │   │   ├── 2009 19 | │   │   ├── 2015 20 | │   │   ├── 2016 21 | │   │   └── 2017 22 | │   └── Exp_1_train 23 | ├── Fort_Peck 24 | │   ├── Exp_1_test 25 | │   │   ├── 2009 26 | │   │   ├── 2015 27 | │   │   ├── 2016 28 | │   │   └── 2017 29 | │   └── Exp_1_train 30 | ├── Goodwin_Creek 31 | │   ├── Exp_1_test 32 | │   │   ├── 2009 33 | │   │   ├── 2015 34 | │   │   ├── 2016 35 | │   │   └── 2017 36 | │   └── Exp_1_train 37 | ├── Penn_State 38 | │   ├── Exp_1_test 39 | │   │   ├── 2009 40 | │   │   ├── 2015 41 | │   │   ├── 2016 42 | │   │   └── 2017 43 | │   └── Exp_1_train 44 | └── Sioux_Falls 45 | ├── Exp_1_test 46 | │   ├── 2009 47 | │   ├── 2015 48 | │   ├── 2016 49 | │   └── 2017 50 | └── Exp_1_train -------------------------------------------------------------------------------- /fixed-time-horizon-prediction/Exp_1.2/Exp_1.2_FortPeck_2hour.py: -------------------------------------------------------------------------------- 1 | 2 | # coding: utf-8 3 | 4 | # In[1]: 5 | 6 | import numpy as np 7 | import pandas as pd 8 | import datetime 9 | import glob 10 | import os.path 11 | from pandas.compat import StringIO 12 | 13 | 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI 15 | 16 | # In[2]: 17 | 18 | import itertools 19 | import matplotlib.pyplot as plt 20 | import pandas as pd 21 | import seaborn as sns 22 | 23 | 24 | # In[3]: 25 | 26 | #get_ipython().magic('matplotlib inline') 27 | sns.set_color_codes() 28 | 29 | 30 | # In[4]: 31 | 32 | import pvlib 33 | from pvlib import clearsky, atmosphere 34 | from pvlib.location import Location 35 | 36 | 37 | # In[5]: 38 | 39 | ftp = Location(48,-106.449, 'US/Mountain', 630.0216, 'Fort Peck') 40 | 41 | 42 | # In[6]: 43 | 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min', 45 | tz=ftp.tz) # 12 months of 2009 - For testing 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min', 47 | tz=ftp.tz) # 24 months of 2010 and 2011 - For training 48 | 49 | 50 | # In[7]: 51 | 52 | cs_2009 = ftp.get_clearsky(times2009) 53 | cs_2010and2011 = ftp.get_clearsky(times2010and2011) # ineichen with climatology table by default 54 | #cs_2011 = bvl.get_clearsky(times2011) 55 | 56 | 57 | # In[8]: 58 | 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 62 | 63 | 64 | # In[9]: 65 | 66 | cs_2009.reset_index(inplace=True) 67 | cs_2010and2011.reset_index(inplace=True) 68 | #cs_2011.reset_index(inplace=True) 69 | 70 | 71 | # In[10]: 72 | 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime()) 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year) 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month) 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day) 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour) 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute) 79 | 80 | 81 | # In[11]: 82 | 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime()) 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year) 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month) 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day) 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour) 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute) 89 | 90 | 91 | # In[12]: 92 | 93 | print(cs_2009.shape) 94 | print(cs_2010and2011.shape) 95 | #print(cs_2011.shape) 96 | 97 | 98 | # In[13]: 99 | 100 | cs_2009.drop(cs_2009.index[-1], inplace=True) 101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True) 102 | #cs_2011.drop(cs_2011.index[-1], inplace=True) 103 | 104 | 105 | # In[14]: 106 | 107 | print(cs_2009.shape) 108 | print(cs_2010and2011.shape) 109 | #print(cs_2011.shape) 110 | 111 | 112 | # In[15]: 113 | 114 | cs_2010and2011.head() 115 | 116 | 117 | # ### Import files from each year in a separate dataframe 118 | 119 | # 120 | # - year integer year, i.e., 1995 121 | # - jday integer Julian day (1 through 365 [or 366]) 122 | # - month integer number of the month (1-12) 123 | # - day integer day of the month(1-31) 124 | # - hour integer hour of the day (0-23) 125 | # - min integer minute of the hour (0-59) 126 | # - dt real decimal time (hour.decimalminutes, e.g., 23.5 = 2330) 127 | # - zen real solar zenith angle (degrees) 128 | # - dw_solar real downwelling global solar (Watts m^-2) 129 | # - uw_solar real upwelling global solar (Watts m^-2) 130 | # - direct_n real direct-normal solar (Watts m^-2) 131 | # - diffuse real downwelling diffuse solar (Watts m^-2) 132 | # - dw_ir real downwelling thermal infrared (Watts m^-2) 133 | # - dw_casetemp real downwelling IR case temp. (K) 134 | # - dw_dometemp real downwelling IR dome temp. (K) 135 | # - uw_ir real upwelling thermal infrared (Watts m^-2) 136 | # - uw_casetemp real upwelling IR case temp. (K) 137 | # - uw_dometemp real upwelling IR dome temp. (K) 138 | # - uvb real global UVB (milliWatts m^-2) 139 | # - par real photosynthetically active radiation (Watts m^-2) 140 | # - netsolar real net solar (dw_solar - uw_solar) (Watts m^-2) 141 | # - netir real net infrared (dw_ir - uw_ir) (Watts m^-2) 142 | # - totalnet real net radiation (netsolar+netir) (Watts m^-2) 143 | # - temp real 10-meter air temperature (?C) 144 | # - rh real relative humidity (%) 145 | # - windspd real wind speed (ms^-1) 146 | # - winddir real wind direction (degrees, clockwise from north) 147 | # - pressure real station pressure (mb) 148 | # 149 | 150 | # In[16]: 151 | 152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar', 153 | 'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp', 154 | 'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC', 155 | 'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC', 156 | 'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC', 157 | 'pressure','pressure_QC'] 158 | 159 | 160 | # In[17]: 161 | 162 | path = r'./data/Fort_Peck/Exp_1_train' 163 | all_files = glob.glob(path + "/*.dat") 164 | all_files.sort() 165 | 166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 167 | index_col=False,header=None, names=cols) for f in all_files],ignore_index=True) 168 | df_big_train.shape 169 | 170 | 171 | # In[18]: 172 | 173 | path = r'./data/Fort_Peck/Exp_1_test' 174 | all_files = glob.glob(path + "/*.dat") 175 | all_files.sort() 176 | 177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 178 | index_col=False,header=None, names=cols) for f in all_files),ignore_index=True) 179 | df_big_test.shape 180 | 181 | 182 | # In[21]: 183 | 184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape 185 | 186 | 187 | # ### Merging Clear Sky GHI And the big dataframe 188 | 189 | # In[ ]: 190 | 191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min']) 192 | df_train.shape 193 | 194 | 195 | # In[ ]: 196 | 197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min']) 198 | df_test.shape 199 | 200 | 201 | # In[ ]: 202 | 203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model 204 | df_test.drop(['index'], axis=1, inplace=True) 205 | 206 | 207 | # In[ ]: 208 | 209 | df_train.shape 210 | 211 | 212 | # ### Managing missing values 213 | 214 | # In[ ]: 215 | 216 | # Resetting index 217 | df_train.reset_index(drop=True, inplace=True) 218 | df_test.reset_index(drop=True, inplace=True) 219 | 220 | 221 | # In[ ]: 222 | 223 | # Dropping rows with two or more -9999.9 values in columns 224 | 225 | 226 | # In[ ]: 227 | 228 | # Step1: Get indices of all rows with 2 or more -999 229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0] 230 | # Step2: Drop those indices 231 | df_train.drop(missing_data_indices, axis=0, inplace=True) 232 | # Checking that the rows are dropped 233 | df_train.shape 234 | 235 | 236 | # In[ ]: 237 | 238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0] 239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True) 240 | df_test.shape 241 | 242 | 243 | # In[ ]: 244 | 245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column 246 | 247 | 248 | # #### First resetting index after dropping rows in the previous part of the code 249 | 250 | # In[ ]: 251 | 252 | # 2nd time - Reseting Index 253 | df_train.reset_index(drop=True, inplace=True) 254 | df_test.reset_index(drop=True, inplace=True) 255 | 256 | 257 | # In[ ]: 258 | 259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 260 | 261 | 262 | # In[ ]: 263 | 264 | len(one_miss_train_idx) 265 | 266 | 267 | # In[ ]: 268 | 269 | df_train.shape 270 | 271 | 272 | # In[ ]: 273 | 274 | col_names = df_train.columns 275 | from collections import defaultdict 276 | stats = defaultdict(int) 277 | total_single_missing_values = 0 278 | for name in col_names: 279 | col_mean = df_train[~(df_train[name] == -9999.9)][name].mean() 280 | missing_indices = np.where((df_train[name] == -9999.9)) 281 | stats[name] = len(missing_indices[0]) 282 | df_train[name].loc[missing_indices] = col_mean 283 | total_single_missing_values += sum(df_train[name] == -9999.9) 284 | 285 | 286 | 287 | # In[ ]: 288 | 289 | df_col_min = df_train.apply(min, axis=0) 290 | df_col_max = df_train.apply(max, axis =0) 291 | #print(df_col_min, df_col_max) 292 | 293 | 294 | # In[ ]: 295 | 296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 297 | 298 | 299 | # In[ ]: 300 | 301 | len(train) 302 | 303 | 304 | # In[ ]: 305 | 306 | # doing the same thing on test dataset 307 | 308 | 309 | # In[ ]: 310 | 311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 312 | len(one_miss_test_idx) 313 | 314 | 315 | # In[ ]: 316 | 317 | col_names_test = df_test.columns 318 | from collections import defaultdict 319 | stats_test = defaultdict(int) 320 | total_single_missing_values_test = 0 321 | for name in col_names_test: 322 | col_mean = df_test[~(df_test[name] == -9999.9)][name].mean() 323 | missing_indices = np.where((df_test[name] == -9999.9)) 324 | stats_test[name] = len(missing_indices[0]) 325 | df_test[name].loc[missing_indices] = col_mean 326 | total_single_missing_values_test += sum(df_test[name] == -9999.9) 327 | 328 | 329 | 330 | # In[8]: 331 | 332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 333 | 334 | 335 | # In[ ]: 336 | 337 | len(test) 338 | 339 | 340 | # In[ ]: 341 | 342 | df_train.shape 343 | 344 | 345 | # In[ ]: 346 | 347 | df_test.shape 348 | 349 | 350 | # ### Exploratory Data Analysis 351 | 352 | # In[ ]: 353 | 354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean() 355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean() 356 | j_day = df_test['jday'].unique() 357 | 358 | 359 | # In[ ]: 360 | 361 | fig = plt.figure() 362 | 363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8]) 364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8]) 365 | 366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red') 367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green') 368 | 369 | axes1.set_xlabel('Days') 370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)') 371 | axes1.set_title('Solar Irradiance - Test Year 2009') 372 | axes1.legend(loc='best') 373 | 374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight') 375 | 376 | 377 | # In[ ]: 378 | 379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg') 380 | #plt.title('observed dw_solar vs clear sky ghi') 381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)') 382 | plt.ylabel('Clear Sky GHI (Watts/m^2)') 383 | plt.savefig('Figure 3', bbox_inches='tight') 384 | 385 | 386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0 387 | 388 | # In[ ]: 389 | 390 | df_train = df_train[df_train['ghi']!=0] 391 | df_test = df_test[df_test['ghi']!=0] 392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi'] 393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi'] 394 | 395 | 396 | # In[ ]: 397 | 398 | df_train.reset_index(inplace=True) 399 | df_test.reset_index(inplace=True) 400 | 401 | 402 | # In[ ]: 403 | 404 | print("test Kt max: "+str(df_test['Kt'].max())) 405 | print("test Kt min: "+str(df_test['Kt'].min())) 406 | print("test Kt mean: "+str(df_test['Kt'].mean())) 407 | print("\n") 408 | print("train Kt max: "+str(df_train['Kt'].max())) 409 | print("train Kt min: "+str(df_train['Kt'].min())) 410 | print("train Kt mean: "+str(df_train['Kt'].mean())) 411 | 412 | 413 | # In[ ]: 414 | 415 | plt.plot(df_train['Kt']) 416 | 417 | 418 | # In[ ]: 419 | 420 | plt.plot(df_test['Kt']) 421 | 422 | 423 | # In[ ]: 424 | 425 | df_train= df_train[df_train['Kt']< 5000] 426 | df_train= df_train[df_train['Kt']> -1000] 427 | df_test= df_test[df_test['Kt']< 5000] 428 | df_test= df_test[df_test['Kt']> -1000] 429 | 430 | 431 | # #### Group the data (train dataframe) 432 | 433 | # In[ ]: 434 | 435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean() 436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean() 437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean() 438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean() 439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean() 440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean() 441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean() 442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean() 443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean() 444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean() 445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean() 446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean() 447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean() 448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean() 449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean() 450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean() 451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean() 452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean() 453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean() 454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean() 455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean() 456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean() 457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean() 458 | 459 | 460 | # In[ ]: 461 | 462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp, 463 | uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1) 464 | 465 | 466 | # In[ ]: 467 | 468 | df_new_train.head() 469 | 470 | 471 | # #### Groupdata - test dataframe 472 | 473 | # In[ ]: 474 | 475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean() 476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean() 477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean() 478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean() 479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean() 480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean() 481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean() 482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean() 483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean() 484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean() 485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean() 486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean() 487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean() 488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean() 489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean() 490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean() 491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean() 492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean() 493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean() 494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean() 495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean() 496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean() 497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean() 498 | 499 | 500 | # In[ ]: 501 | 502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir, 503 | test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp, 504 | test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh, 505 | test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1) 506 | 507 | 508 | # In[ ]: 509 | 510 | df_new_test.loc[2].xs(17,level='day') 511 | 512 | 513 | # ### Shifting Kt values to make 1 hour ahead forecast 514 | 515 | # #### Train dataset 516 | 517 | # In[ ]: 518 | 519 | levels_index= [] 520 | for m in df_new_train.index.levels: 521 | levels_index.append(m) 522 | 523 | 524 | # In[ ]: 525 | 526 | for i in levels_index[0]: 527 | for j in levels_index[1]: 528 | df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-2) 529 | 530 | 531 | # In[ ]: 532 | 533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())] 534 | 535 | 536 | # #### Test dataset 537 | 538 | # In[ ]: 539 | 540 | levels_index2= [] 541 | for m in df_new_test.index.levels: 542 | levels_index2.append(m) 543 | 544 | 545 | # In[ ]: 546 | 547 | for i in levels_index2[0]: 548 | for j in levels_index2[1]: 549 | df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-2) 550 | 551 | 552 | # In[ ]: 553 | 554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())] 555 | 556 | 557 | # In[ ]: 558 | 559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()] 560 | 561 | 562 | # ### Normalize train and test dataframe 563 | 564 | # In[ ]: 565 | 566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min()) 567 | test_norm = (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min()) 568 | 569 | 570 | # In[ ]: 571 | 572 | train_norm.reset_index(inplace=True,drop=True) 573 | test_norm.reset_index(inplace=True,drop=True) 574 | 575 | 576 | # ### Making train and test sets with train_norm and test_norm 577 | 578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize 579 | 580 | # In[89]: 581 | 582 | from fractions import gcd 583 | gcd(train_norm.shape[0],test_norm.shape[0]) 584 | 585 | 586 | # In[ ]: 587 | 588 | import math 589 | def roundup(x): 590 | return int(math.ceil(x / 100.0)) * 100 591 | 592 | 593 | # In[ ]: 594 | 595 | train_lim = roundup(train_norm.shape[0]) 596 | test_lim = roundup(test_norm.shape[0]) 597 | 598 | train_random = train_norm.sample(train_lim-train_norm.shape[0]) 599 | test_random = test_norm.sample(test_lim-test_norm.shape[0]) 600 | 601 | train_norm = train_norm.append(train_random) 602 | test_norm = test_norm.append(test_random) 603 | 604 | 605 | # In[ ]: 606 | 607 | X1 = train_norm.drop('Kt',axis=1) 608 | y1 = train_norm['Kt'] 609 | 610 | X2 = test_norm.drop('Kt',axis=1) 611 | y2 = test_norm['Kt'] 612 | 613 | 614 | # In[ ]: 615 | 616 | print("X1_train shape is {}".format(X1.shape)) 617 | print("y1_train shape is {}".format(y1.shape)) 618 | print("X2_test shape is {}".format(X2.shape)) 619 | print("y2_test shape is {}".format(y2.shape)) 620 | 621 | 622 | # In[ ]: 623 | 624 | X_train = np.array(X1) 625 | y_train = np.array(y1) 626 | X_test = np.array(X2) 627 | y_test = np.array(y2) 628 | 629 | 630 | # ### start of RNN 631 | 632 | # In[117]: 633 | 634 | import torch 635 | import torch.nn as nn 636 | import torchvision.transforms as transforms 637 | import torchvision.datasets as dsets 638 | from torch.autograd import Variable 639 | 640 | 641 | # In[118]: 642 | 643 | class RNNModel(nn.Module): 644 | def __init__(self, input_dim, hidden_dim, layer_dim, output_dim): 645 | super(RNNModel, self).__init__() 646 | #Hidden Dimension 647 | self.hidden_dim = hidden_dim 648 | 649 | # Number of hidden layers 650 | self.layer_dim = layer_dim 651 | 652 | #Building the RNN 653 | self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu') 654 | 655 | # Readout layer 656 | self.fc = nn.Linear(hidden_dim, output_dim) 657 | 658 | def forward(self, x): 659 | # Initializing the hidden state with zeros 660 | # (layer_dim, batch_size, hidden_dim) 661 | h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)) 662 | 663 | #One time step (the last one perhaps?) 664 | out, hn = self.rnn(x, h0) 665 | 666 | # Indexing hidden state of the last time step 667 | # out.size() --> ?? 668 | #out[:,-1,:] --> is it going to be 100,100 669 | out = self.fc(out[:,-1,:]) 670 | # out.size() --> 100,1 671 | return out 672 | 673 | 674 | 675 | # In[119]: 676 | 677 | # Instantiating Model Class 678 | input_dim = 22 679 | hidden_dim = 15 680 | layer_dim = 1 681 | output_dim = 1 682 | batch_size = 100 683 | 684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim) 685 | 686 | # Instantiating Loss Class 687 | criterion = nn.MSELoss() 688 | 689 | # Instantiate Optimizer Class 690 | learning_rate = 0.001 691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) 692 | 693 | # converting numpy array to torch tensor 694 | X_train = torch.from_numpy(X_train) 695 | y_train = torch.from_numpy(y_train) 696 | X_test = torch.from_numpy(X_test) 697 | y_test = torch.from_numpy(y_test) 698 | 699 | # initializing lists to store losses over epochs: 700 | train_loss = [] 701 | test_loss = [] 702 | train_iter = [] 703 | test_iter = [] 704 | 705 | 706 | # In[ ]: 707 | 708 | # Training the model 709 | seq_dim = 1 710 | 711 | n_iter =0 712 | num_samples = len(X_train) 713 | test_samples = len(X_test) 714 | batch_size = 100 715 | num_epochs = 1000 716 | feat_dim = X_train.shape[1] 717 | 718 | X_train = X_train.type(torch.FloatTensor) 719 | y_train = y_train.type(torch.FloatTensor) 720 | X_test = X_test.type(torch.FloatTensor) 721 | y_test = y_test.type(torch.FloatTensor) 722 | 723 | for epoch in range(num_epochs): 724 | for i in range(0, int(num_samples/batch_size -1)): 725 | 726 | 727 | features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 728 | Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size]) 729 | 730 | #print("Kt_value={}".format(Kt_value)) 731 | 732 | optimizer.zero_grad() 733 | 734 | outputs = model(features) 735 | #print("outputs ={}".format(outputs)) 736 | 737 | loss = criterion(outputs, Kt_value) 738 | 739 | train_loss.append(loss.data[0]) 740 | train_iter.append(n_iter) 741 | 742 | #print("loss = {}".format(loss)) 743 | loss.backward() 744 | 745 | optimizer.step() 746 | 747 | 748 | 749 | if n_iter%100 == 0: 750 | for i in range(0,int(test_samples/batch_size -1)): 751 | features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 752 | Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size]) 753 | 754 | outputs = model(features) 755 | 756 | mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples) 757 | 758 | test_iter.append(n_iter) 759 | test_loss.append(mse) 760 | 761 | print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse)) 762 | 763 | n_iter += 1 764 | 765 | 766 | # In[145]: 767 | 768 | print(len(test_loss)) 769 | #plt.plot(test_loss) 770 | plt.plot(train_loss,'-') 771 | #plt.ylim([0.000,0.99]) 772 | 773 | 774 | # In[146]: 775 | 776 | plt.plot(test_loss,'r') 777 | 778 | 779 | # #### Demornamization 780 | 781 | # In[161]: 782 | 783 | rmse = np.sqrt(mse) 784 | 785 | 786 | # In[243]: 787 | 788 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean() 789 | 790 | 791 | # In[244]: 792 | 793 | print("rmse_denorm",rmse_denorm) 794 | 795 | 796 | # In[259]: 797 | 798 | print(df_new_test['Kt'].describe()) 799 | 800 | 801 | # ### Saving train and test losses to a csv 802 | 803 | # In[ ]: 804 | 805 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss}, columns=['Train Loss']) 806 | df_trainLoss.to_csv('RNN Paper Results/Exp1_2_FortPeck_TrainLoss.csv') 807 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss}, columns=['Test Loss']) 808 | df_testLoss.to_csv('RNN Paper Results/Exp1_2_FortPeck_TestLoss.csv') 809 | 810 | -------------------------------------------------------------------------------- /fixed-time-horizon-prediction/Exp_1.2/Exp_1.2_GoodwinCreek_2hour.py: -------------------------------------------------------------------------------- 1 | 2 | # coding: utf-8 3 | 4 | # In[1]: 5 | 6 | import numpy as np 7 | import pandas as pd 8 | import datetime 9 | import glob 10 | import os.path 11 | from pandas.compat import StringIO 12 | 13 | 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI 15 | 16 | # In[2]: 17 | 18 | import itertools 19 | import matplotlib.pyplot as plt 20 | import pandas as pd 21 | import seaborn as sns 22 | 23 | 24 | # In[3]: 25 | 26 | get_ipython().magic('matplotlib inline') 27 | sns.set_color_codes() 28 | 29 | 30 | # In[4]: 31 | 32 | import pvlib 33 | from pvlib import clearsky, atmosphere 34 | from pvlib.location import Location 35 | 36 | 37 | # In[5]: 38 | 39 | gwc = Location(34.2487,-89.8925, 'US/Central', 98, 'Goodwin Creek') 40 | 41 | 42 | # In[6]: 43 | 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min', 45 | tz=gwc.tz) # 12 months of 2009 - For testing 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min', 47 | tz=gwc.tz) # 24 months of 2010 and 2011 - For training 48 | 49 | 50 | # In[7]: 51 | 52 | cs_2009 = gwc.get_clearsky(times2009) 53 | cs_2010and2011 = gwc.get_clearsky(times2010and2011) # ineichen with climatology table by default 54 | #cs_2011 = bvl.get_clearsky(times2011) 55 | 56 | 57 | # In[8]: 58 | 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 62 | 63 | 64 | # In[9]: 65 | 66 | cs_2009.reset_index(inplace=True) 67 | cs_2010and2011.reset_index(inplace=True) 68 | #cs_2011.reset_index(inplace=True) 69 | 70 | 71 | # In[10]: 72 | 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime()) 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year) 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month) 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day) 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour) 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute) 79 | 80 | 81 | # In[11]: 82 | 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime()) 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year) 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month) 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day) 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour) 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute) 89 | 90 | 91 | # In[12]: 92 | 93 | print(cs_2009.shape) 94 | print(cs_2010and2011.shape) 95 | #print(cs_2011.shape) 96 | 97 | 98 | # In[13]: 99 | 100 | cs_2009.drop(cs_2009.index[-1], inplace=True) 101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True) 102 | #cs_2011.drop(cs_2011.index[-1], inplace=True) 103 | 104 | 105 | # In[14]: 106 | 107 | print(cs_2009.shape) 108 | print(cs_2010and2011.shape) 109 | #print(cs_2011.shape) 110 | 111 | 112 | # In[15]: 113 | 114 | cs_2010and2011.head() 115 | 116 | 117 | # ### Import files from each year in a separate dataframe 118 | 119 | # 120 | # - year integer year, i.e., 1995 121 | # - jday integer Julian day (1 through 365 [or 366]) 122 | # - month integer number of the month (1-12) 123 | # - day integer day of the month(1-31) 124 | # - hour integer hour of the day (0-23) 125 | # - min integer minute of the hour (0-59) 126 | # - dt real decimal time (hour.decimalminutes, e.g., 23.5 = 2330) 127 | # - zen real solar zenith angle (degrees) 128 | # - dw_solar real downwelling global solar (Watts m^-2) 129 | # - uw_solar real upwelling global solar (Watts m^-2) 130 | # - direct_n real direct-normal solar (Watts m^-2) 131 | # - diffuse real downwelling diffuse solar (Watts m^-2) 132 | # - dw_ir real downwelling thermal infrared (Watts m^-2) 133 | # - dw_casetemp real downwelling IR case temp. (K) 134 | # - dw_dometemp real downwelling IR dome temp. (K) 135 | # - uw_ir real upwelling thermal infrared (Watts m^-2) 136 | # - uw_casetemp real upwelling IR case temp. (K) 137 | # - uw_dometemp real upwelling IR dome temp. (K) 138 | # - uvb real global UVB (milliWatts m^-2) 139 | # - par real photosynthetically active radiation (Watts m^-2) 140 | # - netsolar real net solar (dw_solar - uw_solar) (Watts m^-2) 141 | # - netir real net infrared (dw_ir - uw_ir) (Watts m^-2) 142 | # - totalnet real net radiation (netsolar+netir) (Watts m^-2) 143 | # - temp real 10-meter air temperature (?C) 144 | # - rh real relative humidity (%) 145 | # - windspd real wind speed (ms^-1) 146 | # - winddir real wind direction (degrees, clockwise from north) 147 | # - pressure real station pressure (mb) 148 | # 149 | 150 | # In[16]: 151 | 152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar', 153 | 'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp', 154 | 'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC', 155 | 'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC', 156 | 'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC', 157 | 'pressure','pressure_QC'] 158 | 159 | 160 | # In[17]: 161 | 162 | path = r'.\\data\\Goodwin_Creek\\Exp_1_train' 163 | all_files = glob.glob(path + "/*.dat") 164 | all_files.sort() 165 | 166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 167 | index_col=False,header=None, names=cols) for f in all_files],ignore_index=True) 168 | df_big_train.shape 169 | 170 | 171 | # In[18]: 172 | 173 | path = r'.\\data\\Goodwin_Creek\\Exp_1_test' 174 | all_files = glob.glob(path + "/*.dat") 175 | all_files.sort() 176 | 177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 178 | index_col=False,header=None, names=cols) for f in all_files),ignore_index=True) 179 | df_big_test.shape 180 | 181 | 182 | # In[19]: 183 | 184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape 185 | 186 | 187 | # ### Merging Clear Sky GHI And the big dataframe 188 | 189 | # In[20]: 190 | 191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min']) 192 | df_train.shape 193 | 194 | 195 | # In[21]: 196 | 197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min']) 198 | df_test.shape 199 | 200 | 201 | # In[22]: 202 | 203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model 204 | df_test.drop(['index'], axis=1, inplace=True) 205 | 206 | 207 | # In[23]: 208 | 209 | df_train.shape 210 | 211 | 212 | # ### Managing missing values 213 | 214 | # In[24]: 215 | 216 | # Resetting index 217 | df_train.reset_index(drop=True, inplace=True) 218 | df_test.reset_index(drop=True, inplace=True) 219 | 220 | 221 | # In[25]: 222 | 223 | # Dropping rows with two or more -9999.9 values in columns 224 | 225 | 226 | # In[26]: 227 | 228 | # Step1: Get indices of all rows with 2 or more -999 229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0] 230 | # Step2: Drop those indices 231 | df_train.drop(missing_data_indices, axis=0, inplace=True) 232 | # Checking that the rows are dropped 233 | df_train.shape 234 | 235 | 236 | # In[27]: 237 | 238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0] 239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True) 240 | df_test.shape 241 | 242 | 243 | # In[ ]: 244 | 245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column 246 | 247 | 248 | # #### First resetting index after dropping rows in the previous part of the code 249 | 250 | # In[28]: 251 | 252 | # 2nd time - Reseting Index 253 | df_train.reset_index(drop=True, inplace=True) 254 | df_test.reset_index(drop=True, inplace=True) 255 | 256 | 257 | # In[29]: 258 | 259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 260 | 261 | 262 | # In[30]: 263 | 264 | len(one_miss_train_idx) 265 | 266 | 267 | # In[31]: 268 | 269 | df_train.shape 270 | 271 | 272 | # In[32]: 273 | 274 | col_names = df_train.columns 275 | from collections import defaultdict 276 | stats = defaultdict(int) 277 | total_single_missing_values = 0 278 | for name in col_names: 279 | col_mean = df_train[~(df_train[name] == -9999.9)][name].mean() 280 | missing_indices = np.where((df_train[name] == -9999.9)) 281 | stats[name] = len(missing_indices[0]) 282 | df_train[name].loc[missing_indices] = col_mean 283 | total_single_missing_values += sum(df_train[name] == -9999.9) 284 | 285 | 286 | 287 | # In[33]: 288 | 289 | df_col_min = df_train.apply(min, axis=0) 290 | df_col_max = df_train.apply(max, axis =0) 291 | #print(df_col_min, df_col_max) 292 | 293 | 294 | # In[34]: 295 | 296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 297 | 298 | 299 | # In[35]: 300 | 301 | len(train) 302 | 303 | 304 | # In[36]: 305 | 306 | # doing the same thing on test dataset 307 | 308 | 309 | # In[37]: 310 | 311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 312 | len(one_miss_test_idx) 313 | 314 | 315 | # In[38]: 316 | 317 | col_names_test = df_test.columns 318 | from collections import defaultdict 319 | stats_test = defaultdict(int) 320 | total_single_missing_values_test = 0 321 | for name in col_names_test: 322 | col_mean = df_test[~(df_test[name] == -9999.9)][name].mean() 323 | missing_indices = np.where((df_test[name] == -9999.9)) 324 | stats_test[name] = len(missing_indices[0]) 325 | df_test[name].loc[missing_indices] = col_mean 326 | total_single_missing_values_test += sum(df_test[name] == -9999.9) 327 | 328 | 329 | 330 | # In[39]: 331 | 332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 333 | 334 | 335 | # In[40]: 336 | 337 | len(test) 338 | 339 | 340 | # In[41]: 341 | 342 | df_train.shape 343 | 344 | 345 | # In[42]: 346 | 347 | df_test.shape 348 | 349 | 350 | # ### Exploratory Data Analysis 351 | 352 | # In[ ]: 353 | 354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean() 355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean() 356 | j_day = df_test['jday'].unique() 357 | 358 | 359 | # In[ ]: 360 | 361 | fig = plt.figure() 362 | 363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8]) 364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8]) 365 | 366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red') 367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green') 368 | 369 | axes1.set_xlabel('Days') 370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)') 371 | axes1.set_title('Solar Irradiance - Test Year 2009') 372 | axes1.legend(loc='best') 373 | 374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight') 375 | 376 | 377 | # In[ ]: 378 | 379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg') 380 | #plt.title('observed dw_solar vs clear sky ghi') 381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)') 382 | plt.ylabel('Clear Sky GHI (Watts/m^2)') 383 | plt.savefig('Figure 3', bbox_inches='tight') 384 | 385 | 386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0 387 | 388 | # In[43]: 389 | 390 | df_train = df_train[df_train['ghi']!=0] 391 | df_test = df_test[df_test['ghi']!=0] 392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi'] 393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi'] 394 | 395 | 396 | # In[44]: 397 | 398 | df_train.reset_index(inplace=True) 399 | df_test.reset_index(inplace=True) 400 | 401 | 402 | # In[45]: 403 | 404 | print("test Kt max: "+str(df_test['Kt'].max())) 405 | print("test Kt min: "+str(df_test['Kt'].min())) 406 | print("test Kt mean: "+str(df_test['Kt'].mean())) 407 | print("\n") 408 | print("train Kt max: "+str(df_train['Kt'].max())) 409 | print("train Kt min: "+str(df_train['Kt'].min())) 410 | print("train Kt mean: "+str(df_train['Kt'].mean())) 411 | 412 | 413 | # In[46]: 414 | 415 | plt.plot(df_train['Kt']) 416 | 417 | 418 | # In[47]: 419 | 420 | plt.plot(df_test['Kt']) 421 | 422 | 423 | # In[48]: 424 | 425 | df_train= df_train[df_train['Kt']< 5000] 426 | df_train= df_train[df_train['Kt']> -1000] 427 | df_test= df_test[df_test['Kt']< 5000] 428 | df_test= df_test[df_test['Kt']> -1000] 429 | 430 | 431 | # #### Group the data (train dataframe) 432 | 433 | # In[49]: 434 | 435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean() 436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean() 437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean() 438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean() 439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean() 440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean() 441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean() 442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean() 443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean() 444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean() 445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean() 446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean() 447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean() 448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean() 449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean() 450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean() 451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean() 452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean() 453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean() 454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean() 455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean() 456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean() 457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean() 458 | 459 | 460 | # In[50]: 461 | 462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp, 463 | uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1) 464 | 465 | 466 | # In[51]: 467 | 468 | df_new_train.head() 469 | 470 | 471 | # #### Groupdata - test dataframe 472 | 473 | # In[52]: 474 | 475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean() 476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean() 477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean() 478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean() 479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean() 480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean() 481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean() 482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean() 483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean() 484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean() 485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean() 486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean() 487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean() 488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean() 489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean() 490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean() 491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean() 492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean() 493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean() 494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean() 495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean() 496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean() 497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean() 498 | 499 | 500 | # In[53]: 501 | 502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir, 503 | test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp, 504 | test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh, 505 | test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1) 506 | 507 | 508 | # In[54]: 509 | 510 | df_new_test.loc[2].xs(17,level='day') 511 | 512 | 513 | # ### Shifting Kt values to make 1 hour ahead forecast 514 | 515 | # #### Train dataset 516 | 517 | # In[55]: 518 | 519 | levels_index= [] 520 | for m in df_new_train.index.levels: 521 | levels_index.append(m) 522 | 523 | 524 | # In[56]: 525 | 526 | for i in levels_index[0]: 527 | for j in levels_index[1]: 528 | df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-2) 529 | 530 | 531 | # In[57]: 532 | 533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())] 534 | 535 | 536 | # #### Test dataset 537 | 538 | # In[58]: 539 | 540 | levels_index2= [] 541 | for m in df_new_test.index.levels: 542 | levels_index2.append(m) 543 | 544 | 545 | # In[59]: 546 | 547 | for i in levels_index2[0]: 548 | for j in levels_index2[1]: 549 | df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-2) 550 | 551 | 552 | # In[60]: 553 | 554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())] 555 | 556 | 557 | # In[61]: 558 | 559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()] 560 | 561 | 562 | # ### Normalize train and test dataframe 563 | 564 | # In[62]: 565 | 566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min()) 567 | test_norm = (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min()) 568 | 569 | 570 | # In[63]: 571 | 572 | train_norm.reset_index(inplace=True,drop=True) 573 | test_norm.reset_index(inplace=True,drop=True) 574 | 575 | 576 | # ### Making train and test sets with train_norm and test_norm 577 | 578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize 579 | 580 | # In[89]: 581 | 582 | from fractions import gcd 583 | gcd(train_norm.shape[0],test_norm.shape[0]) 584 | 585 | 586 | # In[64]: 587 | 588 | import math 589 | def roundup(x): 590 | return int(math.ceil(x / 100.0)) * 100 591 | 592 | 593 | # In[65]: 594 | 595 | train_lim = roundup(train_norm.shape[0]) 596 | test_lim = roundup(test_norm.shape[0]) 597 | 598 | train_random = train_norm.sample(train_lim-train_norm.shape[0]) 599 | test_random = test_norm.sample(test_lim-test_norm.shape[0]) 600 | 601 | train_norm = train_norm.append(train_random) 602 | test_norm = test_norm.append(test_random) 603 | 604 | 605 | # In[66]: 606 | 607 | X1 = train_norm.drop('Kt',axis=1) 608 | y1 = train_norm['Kt'] 609 | 610 | X2 = test_norm.drop('Kt',axis=1) 611 | y2 = test_norm['Kt'] 612 | 613 | 614 | # In[67]: 615 | 616 | print("X1_train shape is {}".format(X1.shape)) 617 | print("y1_train shape is {}".format(y1.shape)) 618 | print("X2_test shape is {}".format(X2.shape)) 619 | print("y2_test shape is {}".format(y2.shape)) 620 | 621 | 622 | # In[68]: 623 | 624 | X_train = np.array(X1) 625 | y_train = np.array(y1) 626 | X_test = np.array(X2) 627 | y_test = np.array(y2) 628 | 629 | 630 | # ### start of RNN 631 | 632 | # In[69]: 633 | 634 | import torch 635 | import torch.nn as nn 636 | import torchvision.transforms as transforms 637 | import torchvision.datasets as dsets 638 | from torch.autograd import Variable 639 | 640 | 641 | # In[70]: 642 | 643 | class RNNModel(nn.Module): 644 | def __init__(self, input_dim, hidden_dim, layer_dim, output_dim): 645 | super(RNNModel, self).__init__() 646 | #Hidden Dimension 647 | self.hidden_dim = hidden_dim 648 | 649 | # Number of hidden layers 650 | self.layer_dim = layer_dim 651 | 652 | #Building the RNN 653 | self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu') 654 | 655 | # Readout layer 656 | self.fc = nn.Linear(hidden_dim, output_dim) 657 | 658 | def forward(self, x): 659 | # Initializing the hidden state with zeros 660 | # (layer_dim, batch_size, hidden_dim) 661 | h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)) 662 | 663 | #One time step (the last one perhaps?) 664 | out, hn = self.rnn(x, h0) 665 | 666 | # Indexing hidden state of the last time step 667 | # out.size() --> ?? 668 | #out[:,-1,:] --> is it going to be 100,100 669 | out = self.fc(out[:,-1,:]) 670 | # out.size() --> 100,1 671 | return out 672 | 673 | 674 | 675 | # In[71]: 676 | 677 | # Instantiating Model Class 678 | input_dim = 22 679 | hidden_dim = 15 680 | layer_dim = 1 681 | output_dim = 1 682 | batch_size = 100 683 | 684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim) 685 | 686 | # Instantiating Loss Class 687 | criterion = nn.MSELoss() 688 | 689 | # Instantiate Optimizer Class 690 | learning_rate = 0.001 691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) 692 | 693 | # converting numpy array to torch tensor 694 | X_train = torch.from_numpy(X_train) 695 | y_train = torch.from_numpy(y_train) 696 | X_test = torch.from_numpy(X_test) 697 | y_test = torch.from_numpy(y_test) 698 | 699 | # initializing lists to store losses over epochs: 700 | train_loss = [] 701 | test_loss = [] 702 | train_iter = [] 703 | test_iter = [] 704 | 705 | 706 | # In[72]: 707 | 708 | # Training the model 709 | seq_dim = 1 710 | 711 | n_iter =0 712 | num_samples = len(X_train) 713 | test_samples = len(X_test) 714 | batch_size = 100 715 | num_epochs = 1000 716 | feat_dim = X_train.shape[1] 717 | 718 | X_train = X_train.type(torch.FloatTensor) 719 | y_train = y_train.type(torch.FloatTensor) 720 | X_test = X_test.type(torch.FloatTensor) 721 | y_test = y_test.type(torch.FloatTensor) 722 | 723 | for epoch in range(num_epochs): 724 | for i in range(0, int(num_samples/batch_size -1)): 725 | 726 | 727 | features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 728 | Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size]) 729 | 730 | #print("Kt_value={}".format(Kt_value)) 731 | 732 | optimizer.zero_grad() 733 | 734 | outputs = model(features) 735 | #print("outputs ={}".format(outputs)) 736 | 737 | loss = criterion(outputs, Kt_value) 738 | 739 | train_loss.append(loss.data[0]) 740 | train_iter.append(n_iter) 741 | 742 | #print("loss = {}".format(loss)) 743 | loss.backward() 744 | 745 | optimizer.step() 746 | 747 | 748 | 749 | if n_iter%100 == 0: 750 | for i in range(0,int(test_samples/batch_size -1)): 751 | features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 752 | Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size]) 753 | 754 | outputs = model(features) 755 | 756 | mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples) 757 | 758 | test_iter.append(n_iter) 759 | test_loss.append(mse) 760 | 761 | print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse)) 762 | n_iter += 1 763 | 764 | 765 | # In[73]: 766 | 767 | print(len(test_loss)) 768 | #plt.plot(test_loss) 769 | plt.plot(train_loss,'-') 770 | #plt.ylim([0.000,0.99]) 771 | 772 | 773 | # In[74]: 774 | 775 | plt.plot(test_loss,'r') 776 | 777 | 778 | # #### Demornamization 779 | 780 | # In[75]: 781 | 782 | rmse = np.sqrt(mse) 783 | 784 | 785 | # In[76]: 786 | 787 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean() 788 | 789 | 790 | # In[77]: 791 | 792 | rmse_denorm 793 | 794 | 795 | # In[78]: 796 | 797 | df_new_test['Kt'].describe() 798 | 799 | 800 | # ### Saving train and test losses to a csv 801 | 802 | # In[79]: 803 | 804 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss,'iteration':train_iter}, columns=['Train Loss','iteration']) 805 | df_trainLoss.to_csv('RNN Paper Results/Exp1_GoodwinCreek_TrainLoss.csv') 806 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss,'iteration':test_iter}, columns=['Test Loss','iteration']) 807 | df_testLoss.to_csv('RNN Paper Results/Exp1_GoodwinCreek_TestLoss.csv') 808 | 809 | 810 | # In[ ]: 811 | 812 | 813 | 814 | -------------------------------------------------------------------------------- /fixed-time-horizon-prediction/Exp_1.2/Exp_1.2_PenState_2hour.py: -------------------------------------------------------------------------------- 1 | 2 | # coding: utf-8 3 | 4 | # In[7]: 5 | 6 | import numpy as np 7 | import pandas as pd 8 | import datetime 9 | import glob 10 | import os.path 11 | from pandas.compat import StringIO 12 | 13 | 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI 15 | 16 | # In[8]: 17 | 18 | import itertools 19 | import matplotlib.pyplot as plt 20 | import pandas as pd 21 | import seaborn as sns 22 | 23 | 24 | # In[9]: 25 | 26 | get_ipython().magic('matplotlib inline') 27 | sns.set_color_codes() 28 | 29 | 30 | # In[5]: 31 | 32 | import pvlib 33 | from pvlib import clearsky, atmosphere 34 | from pvlib.location import Location 35 | 36 | 37 | # In[10]: 38 | 39 | pns = Location(40.798,-77.859, 'US/Eastern', 351.74, 'Penn State') 40 | 41 | 42 | # In[11]: 43 | 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min', 45 | tz=pns.tz) # 12 months of 2009 - For testing 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min', 47 | tz=pns.tz) # 24 months of 2010 and 2011 - For training 48 | 49 | 50 | # In[12]: 51 | 52 | cs_2009 = pns.get_clearsky(times2009) 53 | cs_2010and2011 = pns.get_clearsky(times2010and2011) # ineichen with climatology table by default 54 | #cs_2011 = bvl.get_clearsky(times2011) 55 | 56 | 57 | # In[13]: 58 | 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 62 | 63 | 64 | # In[14]: 65 | 66 | cs_2009.reset_index(inplace=True) 67 | cs_2010and2011.reset_index(inplace=True) 68 | #cs_2011.reset_index(inplace=True) 69 | 70 | 71 | # In[15]: 72 | 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime()) 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year) 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month) 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day) 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour) 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute) 79 | 80 | 81 | # In[16]: 82 | 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime()) 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year) 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month) 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day) 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour) 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute) 89 | 90 | 91 | # In[17]: 92 | 93 | print(cs_2009.shape) 94 | print(cs_2010and2011.shape) 95 | #print(cs_2011.shape) 96 | 97 | 98 | # In[18]: 99 | 100 | cs_2009.drop(cs_2009.index[-1], inplace=True) 101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True) 102 | #cs_2011.drop(cs_2011.index[-1], inplace=True) 103 | 104 | 105 | # In[19]: 106 | 107 | print(cs_2009.shape) 108 | print(cs_2010and2011.shape) 109 | #print(cs_2011.shape) 110 | 111 | 112 | # In[20]: 113 | 114 | cs_2010and2011.head() 115 | 116 | 117 | # ### Import files from each year in a separate dataframe 118 | 119 | # 120 | # - year integer year, i.e., 1995 121 | # - jday integer Julian day (1 through 365 [or 366]) 122 | # - month integer number of the month (1-12) 123 | # - day integer day of the month(1-31) 124 | # - hour integer hour of the day (0-23) 125 | # - min integer minute of the hour (0-59) 126 | # - dt real decimal time (hour.decimalminutes, e.g., 23.5 = 2330) 127 | # - zen real solar zenith angle (degrees) 128 | # - dw_solar real downwelling global solar (Watts m^-2) 129 | # - uw_solar real upwelling global solar (Watts m^-2) 130 | # - direct_n real direct-normal solar (Watts m^-2) 131 | # - diffuse real downwelling diffuse solar (Watts m^-2) 132 | # - dw_ir real downwelling thermal infrared (Watts m^-2) 133 | # - dw_casetemp real downwelling IR case temp. (K) 134 | # - dw_dometemp real downwelling IR dome temp. (K) 135 | # - uw_ir real upwelling thermal infrared (Watts m^-2) 136 | # - uw_casetemp real upwelling IR case temp. (K) 137 | # - uw_dometemp real upwelling IR dome temp. (K) 138 | # - uvb real global UVB (milliWatts m^-2) 139 | # - par real photosynthetically active radiation (Watts m^-2) 140 | # - netsolar real net solar (dw_solar - uw_solar) (Watts m^-2) 141 | # - netir real net infrared (dw_ir - uw_ir) (Watts m^-2) 142 | # - totalnet real net radiation (netsolar+netir) (Watts m^-2) 143 | # - temp real 10-meter air temperature (?C) 144 | # - rh real relative humidity (%) 145 | # - windspd real wind speed (ms^-1) 146 | # - winddir real wind direction (degrees, clockwise from north) 147 | # - pressure real station pressure (mb) 148 | # 149 | 150 | # In[21]: 151 | 152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar', 153 | 'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp', 154 | 'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC', 155 | 'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC', 156 | 'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC', 157 | 'pressure','pressure_QC'] 158 | 159 | 160 | # In[22]: 161 | 162 | path = r'.\\data\\Penn_State\\Exp_1_train' 163 | all_files = glob.glob(path + "/*.dat") 164 | all_files.sort() 165 | 166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 167 | index_col=False,header=None, names=cols) for f in all_files],ignore_index=True) 168 | df_big_train.shape 169 | 170 | 171 | # In[23]: 172 | 173 | path = r'.\\data\\Penn_State\\Exp_1_test' 174 | all_files = glob.glob(path + "/*.dat") 175 | all_files.sort() 176 | 177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 178 | index_col=False,header=None, names=cols) for f in all_files),ignore_index=True) 179 | df_big_test.shape 180 | 181 | 182 | # In[24]: 183 | 184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape 185 | 186 | 187 | # ### Merging Clear Sky GHI And the big dataframe 188 | 189 | # In[25]: 190 | 191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min']) 192 | df_train.shape 193 | 194 | 195 | # In[26]: 196 | 197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min']) 198 | df_test.shape 199 | 200 | 201 | # In[27]: 202 | 203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model 204 | df_test.drop(['index'], axis=1, inplace=True) 205 | 206 | 207 | # In[28]: 208 | 209 | df_train.shape 210 | 211 | 212 | # ### Managing missing values 213 | 214 | # In[29]: 215 | 216 | # Resetting index 217 | df_train.reset_index(drop=True, inplace=True) 218 | df_test.reset_index(drop=True, inplace=True) 219 | 220 | 221 | # In[ ]: 222 | 223 | # Dropping rows with two or more -9999.9 values in columns 224 | 225 | 226 | # In[30]: 227 | 228 | # Step1: Get indices of all rows with 2 or more -999 229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0] 230 | # Step2: Drop those indices 231 | df_train.drop(missing_data_indices, axis=0, inplace=True) 232 | # Checking that the rows are dropped 233 | df_train.shape 234 | 235 | 236 | # In[31]: 237 | 238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0] 239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True) 240 | df_test.shape 241 | 242 | 243 | # In[ ]: 244 | 245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column 246 | 247 | 248 | # #### First resetting index after dropping rows in the previous part of the code 249 | 250 | # In[32]: 251 | 252 | # 2nd time - Reseting Index 253 | df_train.reset_index(drop=True, inplace=True) 254 | df_test.reset_index(drop=True, inplace=True) 255 | 256 | 257 | # In[33]: 258 | 259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 260 | 261 | 262 | # In[34]: 263 | 264 | len(one_miss_train_idx) 265 | 266 | 267 | # In[35]: 268 | 269 | df_train.shape 270 | 271 | 272 | # In[36]: 273 | 274 | col_names = df_train.columns 275 | from collections import defaultdict 276 | stats = defaultdict(int) 277 | total_single_missing_values = 0 278 | for name in col_names: 279 | col_mean = df_train[~(df_train[name] == -9999.9)][name].mean() 280 | missing_indices = np.where((df_train[name] == -9999.9)) 281 | stats[name] = len(missing_indices[0]) 282 | df_train[name].loc[missing_indices] = col_mean 283 | total_single_missing_values += sum(df_train[name] == -9999.9) 284 | 285 | 286 | 287 | # In[37]: 288 | 289 | df_col_min = df_train.apply(min, axis=0) 290 | df_col_max = df_train.apply(max, axis =0) 291 | #print(df_col_min, df_col_max) 292 | 293 | 294 | # In[38]: 295 | 296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 297 | 298 | 299 | # In[39]: 300 | 301 | len(train) 302 | 303 | 304 | # In[40]: 305 | 306 | # doing the same thing on test dataset 307 | 308 | 309 | # In[41]: 310 | 311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 312 | len(one_miss_test_idx) 313 | 314 | 315 | # In[42]: 316 | 317 | col_names_test = df_test.columns 318 | from collections import defaultdict 319 | stats_test = defaultdict(int) 320 | total_single_missing_values_test = 0 321 | for name in col_names_test: 322 | col_mean = df_test[~(df_test[name] == -9999.9)][name].mean() 323 | missing_indices = np.where((df_test[name] == -9999.9)) 324 | stats_test[name] = len(missing_indices[0]) 325 | df_test[name].loc[missing_indices] = col_mean 326 | total_single_missing_values_test += sum(df_test[name] == -9999.9) 327 | 328 | 329 | 330 | # In[43]: 331 | 332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 333 | 334 | 335 | # In[44]: 336 | 337 | len(test) 338 | 339 | 340 | # In[45]: 341 | 342 | df_train.shape 343 | 344 | 345 | # In[46]: 346 | 347 | df_test.shape 348 | 349 | 350 | # ### Exploratory Data Analysis 351 | 352 | # In[47]: 353 | 354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean() 355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean() 356 | j_day = df_test['jday'].unique() 357 | 358 | 359 | # In[48]: 360 | 361 | fig = plt.figure() 362 | 363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8]) 364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8]) 365 | 366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red') 367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green') 368 | 369 | axes1.set_xlabel('Days') 370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)') 371 | axes1.set_title('Solar Irradiance - Test Year 2009') 372 | axes1.legend(loc='best') 373 | 374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight') 375 | 376 | 377 | # In[49]: 378 | 379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg') 380 | #plt.title('observed dw_solar vs clear sky ghi') 381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)') 382 | plt.ylabel('Clear Sky GHI (Watts/m^2)') 383 | plt.savefig('Figure 3', bbox_inches='tight') 384 | 385 | 386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0 387 | 388 | # In[50]: 389 | 390 | df_train = df_train[df_train['ghi']!=0] 391 | df_test = df_test[df_test['ghi']!=0] 392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi'] 393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi'] 394 | 395 | 396 | # In[51]: 397 | 398 | df_train.reset_index(inplace=True) 399 | df_test.reset_index(inplace=True) 400 | 401 | 402 | # In[52]: 403 | 404 | print("test Kt max: "+str(df_test['Kt'].max())) 405 | print("test Kt min: "+str(df_test['Kt'].min())) 406 | print("test Kt mean: "+str(df_test['Kt'].mean())) 407 | print("\n") 408 | print("train Kt max: "+str(df_train['Kt'].max())) 409 | print("train Kt min: "+str(df_train['Kt'].min())) 410 | print("train Kt mean: "+str(df_train['Kt'].mean())) 411 | 412 | 413 | # In[53]: 414 | 415 | plt.plot(df_train['Kt']) 416 | 417 | 418 | # In[54]: 419 | 420 | plt.plot(df_test['Kt']) 421 | 422 | 423 | # In[55]: 424 | 425 | df_train= df_train[df_train['Kt']< 5000] 426 | df_train= df_train[df_train['Kt']> -1000] 427 | df_test= df_test[df_test['Kt']< 5000] 428 | df_test= df_test[df_test['Kt']> -1000] 429 | 430 | 431 | # #### Group the data (train dataframe) 432 | 433 | # In[56]: 434 | 435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean() 436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean() 437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean() 438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean() 439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean() 440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean() 441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean() 442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean() 443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean() 444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean() 445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean() 446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean() 447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean() 448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean() 449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean() 450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean() 451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean() 452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean() 453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean() 454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean() 455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean() 456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean() 457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean() 458 | 459 | 460 | # In[57]: 461 | 462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp, 463 | uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1) 464 | 465 | 466 | # In[58]: 467 | 468 | df_new_train.head() 469 | 470 | 471 | # #### Groupdata - test dataframe 472 | 473 | # In[59]: 474 | 475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean() 476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean() 477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean() 478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean() 479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean() 480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean() 481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean() 482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean() 483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean() 484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean() 485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean() 486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean() 487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean() 488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean() 489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean() 490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean() 491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean() 492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean() 493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean() 494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean() 495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean() 496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean() 497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean() 498 | 499 | 500 | # In[60]: 501 | 502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir, 503 | test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp, 504 | test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh, 505 | test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1) 506 | 507 | 508 | # In[61]: 509 | 510 | df_new_test.loc[2].xs(17,level='day') 511 | 512 | 513 | # ### Shifting Kt values to make 1 hour ahead forecast 514 | 515 | # #### Train dataset 516 | 517 | # In[62]: 518 | 519 | levels_index= [] 520 | for m in df_new_train.index.levels: 521 | levels_index.append(m) 522 | 523 | 524 | # In[63]: 525 | 526 | for i in levels_index[0]: 527 | for j in levels_index[1]: 528 | df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-2) 529 | 530 | 531 | # In[64]: 532 | 533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())] 534 | 535 | 536 | # #### Test dataset 537 | 538 | # In[65]: 539 | 540 | levels_index2= [] 541 | for m in df_new_test.index.levels: 542 | levels_index2.append(m) 543 | 544 | 545 | # In[66]: 546 | 547 | for i in levels_index2[0]: 548 | for j in levels_index2[1]: 549 | df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-2) 550 | 551 | 552 | # In[67]: 553 | 554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())] 555 | 556 | 557 | # In[68]: 558 | 559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()] 560 | 561 | 562 | # ### Normalize train and test dataframe 563 | 564 | # In[69]: 565 | 566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min()) 567 | test_norm = (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min()) 568 | 569 | 570 | # In[70]: 571 | 572 | train_norm.reset_index(inplace=True,drop=True) 573 | test_norm.reset_index(inplace=True,drop=True) 574 | 575 | 576 | # ### Making train and test sets with train_norm and test_norm 577 | 578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize 579 | 580 | # In[89]: 581 | 582 | from fractions import gcd 583 | gcd(train_norm.shape[0],test_norm.shape[0]) 584 | 585 | 586 | # In[71]: 587 | 588 | import math 589 | def roundup(x): 590 | return int(math.ceil(x / 100.0)) * 100 591 | 592 | 593 | # In[72]: 594 | 595 | train_lim = roundup(train_norm.shape[0]) 596 | test_lim = roundup(test_norm.shape[0]) 597 | 598 | train_random = train_norm.sample(train_lim-train_norm.shape[0]) 599 | test_random = test_norm.sample(test_lim-test_norm.shape[0]) 600 | 601 | train_norm = train_norm.append(train_random) 602 | test_norm = test_norm.append(test_random) 603 | 604 | 605 | # In[73]: 606 | 607 | X1 = train_norm.drop('Kt',axis=1) 608 | y1 = train_norm['Kt'] 609 | 610 | X2 = test_norm.drop('Kt',axis=1) 611 | y2 = test_norm['Kt'] 612 | 613 | 614 | # In[74]: 615 | 616 | print("X1_train shape is {}".format(X1.shape)) 617 | print("y1_train shape is {}".format(y1.shape)) 618 | print("X2_test shape is {}".format(X2.shape)) 619 | print("y2_test shape is {}".format(y2.shape)) 620 | 621 | 622 | # In[75]: 623 | 624 | X_train = np.array(X1) 625 | y_train = np.array(y1) 626 | X_test = np.array(X2) 627 | y_test = np.array(y2) 628 | 629 | 630 | # ### start of RNN 631 | 632 | # In[76]: 633 | 634 | import torch 635 | import torch.nn as nn 636 | import torchvision.transforms as transforms 637 | import torchvision.datasets as dsets 638 | from torch.autograd import Variable 639 | 640 | 641 | # In[77]: 642 | 643 | class RNNModel(nn.Module): 644 | def __init__(self, input_dim, hidden_dim, layer_dim, output_dim): 645 | super(RNNModel, self).__init__() 646 | #Hidden Dimension 647 | self.hidden_dim = hidden_dim 648 | 649 | # Number of hidden layers 650 | self.layer_dim = layer_dim 651 | 652 | #Building the RNN 653 | self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu') 654 | 655 | # Readout layer 656 | self.fc = nn.Linear(hidden_dim, output_dim) 657 | 658 | def forward(self, x): 659 | # Initializing the hidden state with zeros 660 | # (layer_dim, batch_size, hidden_dim) 661 | h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)) 662 | 663 | #One time step (the last one perhaps?) 664 | out, hn = self.rnn(x, h0) 665 | 666 | # Indexing hidden state of the last time step 667 | # out.size() --> ?? 668 | #out[:,-1,:] --> is it going to be 100,100 669 | out = self.fc(out[:,-1,:]) 670 | # out.size() --> 100,1 671 | return out 672 | 673 | 674 | 675 | # In[78]: 676 | 677 | # Instantiating Model Class 678 | input_dim = 22 679 | hidden_dim = 15 680 | layer_dim = 1 681 | output_dim = 1 682 | batch_size = 100 683 | 684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim) 685 | 686 | # Instantiating Loss Class 687 | criterion = nn.MSELoss() 688 | 689 | # Instantiate Optimizer Class 690 | learning_rate = 0.001 691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) 692 | 693 | # converting numpy array to torch tensor 694 | X_train = torch.from_numpy(X_train) 695 | y_train = torch.from_numpy(y_train) 696 | X_test = torch.from_numpy(X_test) 697 | y_test = torch.from_numpy(y_test) 698 | 699 | # initializing lists to store losses over epochs: 700 | train_loss = [] 701 | test_loss = [] 702 | train_iter = [] 703 | test_iter = [] 704 | 705 | 706 | # In[79]: 707 | 708 | # Training the model 709 | seq_dim = 1 710 | 711 | n_iter =0 712 | num_samples = len(X_train) 713 | test_samples = len(X_test) 714 | batch_size = 100 715 | num_epochs = 1000 716 | feat_dim = X_train.shape[1] 717 | 718 | X_train = X_train.type(torch.FloatTensor) 719 | y_train = y_train.type(torch.FloatTensor) 720 | X_test = X_test.type(torch.FloatTensor) 721 | y_test = y_test.type(torch.FloatTensor) 722 | 723 | for epoch in range(num_epochs): 724 | for i in range(0, int(num_samples/batch_size -1)): 725 | 726 | 727 | features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 728 | Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size]) 729 | 730 | #print("Kt_value={}".format(Kt_value)) 731 | 732 | optimizer.zero_grad() 733 | 734 | outputs = model(features) 735 | #print("outputs ={}".format(outputs)) 736 | 737 | loss = criterion(outputs, Kt_value) 738 | 739 | train_loss.append(loss.data[0]) 740 | train_iter.append(n_iter) 741 | 742 | #print("loss = {}".format(loss)) 743 | loss.backward() 744 | 745 | optimizer.step() 746 | 747 | 748 | 749 | if n_iter%100 == 0: 750 | for i in range(0,int(test_samples/batch_size -1)): 751 | features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 752 | Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size]) 753 | 754 | outputs = model(features) 755 | 756 | mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples) 757 | 758 | test_iter.append(n_iter) 759 | test_loss.append(mse) 760 | 761 | print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse)) 762 | n_iter += 1 763 | 764 | 765 | # In[80]: 766 | 767 | print(len(test_loss)) 768 | #plt.plot(test_loss) 769 | plt.plot(train_loss,'-') 770 | #plt.ylim([0.000,0.99]) 771 | 772 | 773 | # In[81]: 774 | 775 | plt.plot(test_loss,'r') 776 | 777 | 778 | # #### Demornamization 779 | 780 | # In[82]: 781 | 782 | rmse = np.sqrt(mse) 783 | 784 | 785 | # In[83]: 786 | 787 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean() 788 | 789 | 790 | # In[84]: 791 | 792 | rmse_denorm 793 | 794 | 795 | # In[85]: 796 | 797 | df_new_test['Kt'].describe() 798 | 799 | 800 | # ### Saving train and test losses to a csv 801 | 802 | # In[86]: 803 | 804 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss, 'iteration': train_iter}, columns=['Train Loss','iteration']) 805 | df_trainLoss.to_csv('RNN Paper Results/Exp1_PennState_TrainLoss.csv') 806 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss, 'iteration': test_iter}, columns=['Test Loss','iteration']) 807 | df_testLoss.to_csv('RNN Paper Results/Exp1_PenState_TestLoss.csv') 808 | 809 | 810 | # In[ ]: 811 | 812 | 813 | 814 | -------------------------------------------------------------------------------- /fixed-time-horizon-prediction/Exp_1.2/Fort_Peck_3hour.py: -------------------------------------------------------------------------------- 1 | 2 | # coding: utf-8 3 | 4 | # In[1]: 5 | 6 | import numpy as np 7 | import pandas as pd 8 | import datetime 9 | import glob 10 | import os.path 11 | from pandas.compat import StringIO 12 | 13 | 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI 15 | 16 | # In[2]: 17 | 18 | import itertools 19 | import matplotlib.pyplot as plt 20 | import pandas as pd 21 | import seaborn as sns 22 | 23 | 24 | # In[3]: 25 | 26 | #get_ipython().magic('matplotlib inline') 27 | sns.set_color_codes() 28 | 29 | 30 | # In[4]: 31 | 32 | import pvlib 33 | from pvlib import clearsky, atmosphere 34 | from pvlib.location import Location 35 | 36 | 37 | # In[5]: 38 | 39 | ftp = Location(48,-106.449, 'US/Mountain', 630.0216, 'Fort Peck') 40 | 41 | 42 | # In[6]: 43 | 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min', 45 | tz=ftp.tz) # 12 months of 2009 - For testing 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min', 47 | tz=ftp.tz) # 24 months of 2010 and 2011 - For training 48 | 49 | 50 | # In[7]: 51 | 52 | cs_2009 = ftp.get_clearsky(times2009) 53 | cs_2010and2011 = ftp.get_clearsky(times2010and2011) # ineichen with climatology table by default 54 | #cs_2011 = bvl.get_clearsky(times2011) 55 | 56 | 57 | # In[8]: 58 | 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 62 | 63 | 64 | # In[9]: 65 | 66 | cs_2009.reset_index(inplace=True) 67 | cs_2010and2011.reset_index(inplace=True) 68 | #cs_2011.reset_index(inplace=True) 69 | 70 | 71 | # In[10]: 72 | 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime()) 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year) 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month) 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day) 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour) 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute) 79 | 80 | 81 | # In[11]: 82 | 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime()) 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year) 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month) 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day) 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour) 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute) 89 | 90 | 91 | # In[12]: 92 | 93 | print(cs_2009.shape) 94 | print(cs_2010and2011.shape) 95 | #print(cs_2011.shape) 96 | 97 | 98 | # In[13]: 99 | 100 | cs_2009.drop(cs_2009.index[-1], inplace=True) 101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True) 102 | #cs_2011.drop(cs_2011.index[-1], inplace=True) 103 | 104 | 105 | # In[14]: 106 | 107 | print(cs_2009.shape) 108 | print(cs_2010and2011.shape) 109 | #print(cs_2011.shape) 110 | 111 | 112 | # In[15]: 113 | 114 | cs_2010and2011.head() 115 | 116 | 117 | # ### Import files from each year in a separate dataframe 118 | 119 | # 120 | # - year integer year, i.e., 1995 121 | # - jday integer Julian day (1 through 365 [or 366]) 122 | # - month integer number of the month (1-12) 123 | # - day integer day of the month(1-31) 124 | # - hour integer hour of the day (0-23) 125 | # - min integer minute of the hour (0-59) 126 | # - dt real decimal time (hour.decimalminutes, e.g., 23.5 = 2330) 127 | # - zen real solar zenith angle (degrees) 128 | # - dw_solar real downwelling global solar (Watts m^-2) 129 | # - uw_solar real upwelling global solar (Watts m^-2) 130 | # - direct_n real direct-normal solar (Watts m^-2) 131 | # - diffuse real downwelling diffuse solar (Watts m^-2) 132 | # - dw_ir real downwelling thermal infrared (Watts m^-2) 133 | # - dw_casetemp real downwelling IR case temp. (K) 134 | # - dw_dometemp real downwelling IR dome temp. (K) 135 | # - uw_ir real upwelling thermal infrared (Watts m^-2) 136 | # - uw_casetemp real upwelling IR case temp. (K) 137 | # - uw_dometemp real upwelling IR dome temp. (K) 138 | # - uvb real global UVB (milliWatts m^-2) 139 | # - par real photosynthetically active radiation (Watts m^-2) 140 | # - netsolar real net solar (dw_solar - uw_solar) (Watts m^-2) 141 | # - netir real net infrared (dw_ir - uw_ir) (Watts m^-2) 142 | # - totalnet real net radiation (netsolar+netir) (Watts m^-2) 143 | # - temp real 10-meter air temperature (?C) 144 | # - rh real relative humidity (%) 145 | # - windspd real wind speed (ms^-1) 146 | # - winddir real wind direction (degrees, clockwise from north) 147 | # - pressure real station pressure (mb) 148 | # 149 | 150 | # In[16]: 151 | 152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar', 153 | 'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp', 154 | 'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC', 155 | 'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC', 156 | 'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC', 157 | 'pressure','pressure_QC'] 158 | 159 | 160 | # In[17]: 161 | 162 | path = r'./data/Fort_Peck/Exp_1_train' 163 | all_files = glob.glob(path + "/*.dat") 164 | all_files.sort() 165 | 166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 167 | index_col=False,header=None, names=cols) for f in all_files],ignore_index=True) 168 | df_big_train.shape 169 | 170 | 171 | # In[18]: 172 | 173 | path = r'./data/Fort_Peck/Exp_1_test' 174 | all_files = glob.glob(path + "/*.dat") 175 | all_files.sort() 176 | 177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 178 | index_col=False,header=None, names=cols) for f in all_files),ignore_index=True) 179 | df_big_test.shape 180 | 181 | 182 | # In[21]: 183 | 184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape 185 | 186 | 187 | # ### Merging Clear Sky GHI And the big dataframe 188 | 189 | # In[ ]: 190 | 191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min']) 192 | df_train.shape 193 | 194 | 195 | # In[ ]: 196 | 197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min']) 198 | df_test.shape 199 | 200 | 201 | # In[ ]: 202 | 203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model 204 | df_test.drop(['index'], axis=1, inplace=True) 205 | 206 | 207 | # In[ ]: 208 | 209 | df_train.shape 210 | 211 | 212 | # ### Managing missing values 213 | 214 | # In[ ]: 215 | 216 | # Resetting index 217 | df_train.reset_index(drop=True, inplace=True) 218 | df_test.reset_index(drop=True, inplace=True) 219 | 220 | 221 | # In[ ]: 222 | 223 | # Dropping rows with two or more -9999.9 values in columns 224 | 225 | 226 | # In[ ]: 227 | 228 | # Step1: Get indices of all rows with 2 or more -999 229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0] 230 | # Step2: Drop those indices 231 | df_train.drop(missing_data_indices, axis=0, inplace=True) 232 | # Checking that the rows are dropped 233 | df_train.shape 234 | 235 | 236 | # In[ ]: 237 | 238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0] 239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True) 240 | df_test.shape 241 | 242 | 243 | # In[ ]: 244 | 245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column 246 | 247 | 248 | # #### First resetting index after dropping rows in the previous part of the code 249 | 250 | # In[ ]: 251 | 252 | # 2nd time - Reseting Index 253 | df_train.reset_index(drop=True, inplace=True) 254 | df_test.reset_index(drop=True, inplace=True) 255 | 256 | 257 | # In[ ]: 258 | 259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 260 | 261 | 262 | # In[ ]: 263 | 264 | len(one_miss_train_idx) 265 | 266 | 267 | # In[ ]: 268 | 269 | df_train.shape 270 | 271 | 272 | # In[ ]: 273 | 274 | col_names = df_train.columns 275 | from collections import defaultdict 276 | stats = defaultdict(int) 277 | total_single_missing_values = 0 278 | for name in col_names: 279 | col_mean = df_train[~(df_train[name] == -9999.9)][name].mean() 280 | missing_indices = np.where((df_train[name] == -9999.9)) 281 | stats[name] = len(missing_indices[0]) 282 | df_train[name].loc[missing_indices] = col_mean 283 | total_single_missing_values += sum(df_train[name] == -9999.9) 284 | 285 | 286 | 287 | # In[ ]: 288 | 289 | df_col_min = df_train.apply(min, axis=0) 290 | df_col_max = df_train.apply(max, axis =0) 291 | #print(df_col_min, df_col_max) 292 | 293 | 294 | # In[ ]: 295 | 296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 297 | 298 | 299 | # In[ ]: 300 | 301 | len(train) 302 | 303 | 304 | # In[ ]: 305 | 306 | # doing the same thing on test dataset 307 | 308 | 309 | # In[ ]: 310 | 311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 312 | len(one_miss_test_idx) 313 | 314 | 315 | # In[ ]: 316 | 317 | col_names_test = df_test.columns 318 | from collections import defaultdict 319 | stats_test = defaultdict(int) 320 | total_single_missing_values_test = 0 321 | for name in col_names_test: 322 | col_mean = df_test[~(df_test[name] == -9999.9)][name].mean() 323 | missing_indices = np.where((df_test[name] == -9999.9)) 324 | stats_test[name] = len(missing_indices[0]) 325 | df_test[name].loc[missing_indices] = col_mean 326 | total_single_missing_values_test += sum(df_test[name] == -9999.9) 327 | 328 | 329 | 330 | # In[8]: 331 | 332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 333 | 334 | 335 | # In[ ]: 336 | 337 | len(test) 338 | 339 | 340 | # In[ ]: 341 | 342 | df_train.shape 343 | 344 | 345 | # In[ ]: 346 | 347 | df_test.shape 348 | 349 | 350 | # ### Exploratory Data Analysis 351 | 352 | # In[ ]: 353 | 354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean() 355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean() 356 | j_day = df_test['jday'].unique() 357 | 358 | 359 | # In[ ]: 360 | 361 | fig = plt.figure() 362 | 363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8]) 364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8]) 365 | 366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red') 367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green') 368 | 369 | axes1.set_xlabel('Days') 370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)') 371 | axes1.set_title('Solar Irradiance - Test Year 2009') 372 | axes1.legend(loc='best') 373 | 374 | fig.savefig('./RNN Paper Results/Exp1_2/Fort_Peck/3hour_Figure 2.jpg', bbox_inches = 'tight') 375 | 376 | 377 | # In[ ]: 378 | 379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg') 380 | #plt.title('observed dw_solar vs clear sky ghi') 381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)') 382 | plt.ylabel('Clear Sky GHI (Watts/m^2)') 383 | plt.savefig('./RNN Paper Results/Exp1_2/Fort_Peck/3hour_Figure 3', bbox_inches='tight') 384 | 385 | 386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0 387 | 388 | # In[ ]: 389 | 390 | df_train = df_train[df_train['ghi']!=0] 391 | df_test = df_test[df_test['ghi']!=0] 392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi'] 393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi'] 394 | 395 | 396 | # In[ ]: 397 | 398 | df_train.reset_index(inplace=True) 399 | df_test.reset_index(inplace=True) 400 | 401 | 402 | # In[ ]: 403 | 404 | print("test Kt max: "+str(df_test['Kt'].max())) 405 | print("test Kt min: "+str(df_test['Kt'].min())) 406 | print("test Kt mean: "+str(df_test['Kt'].mean())) 407 | print("\n") 408 | print("train Kt max: "+str(df_train['Kt'].max())) 409 | print("train Kt min: "+str(df_train['Kt'].min())) 410 | print("train Kt mean: "+str(df_train['Kt'].mean())) 411 | 412 | 413 | # In[ ]: 414 | 415 | plt.plot(df_train['Kt']) 416 | 417 | 418 | # In[ ]: 419 | 420 | plt.plot(df_test['Kt']) 421 | 422 | 423 | # In[ ]: 424 | 425 | df_train= df_train[df_train['Kt']< 5000] 426 | df_train= df_train[df_train['Kt']> -1000] 427 | df_test= df_test[df_test['Kt']< 5000] 428 | df_test= df_test[df_test['Kt']> -1000] 429 | 430 | 431 | # #### Group the data (train dataframe) 432 | 433 | # In[ ]: 434 | 435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean() 436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean() 437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean() 438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean() 439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean() 440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean() 441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean() 442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean() 443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean() 444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean() 445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean() 446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean() 447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean() 448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean() 449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean() 450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean() 451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean() 452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean() 453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean() 454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean() 455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean() 456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean() 457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean() 458 | 459 | 460 | # In[ ]: 461 | 462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp, 463 | uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1) 464 | 465 | 466 | # In[ ]: 467 | 468 | df_new_train.head() 469 | 470 | 471 | # #### Groupdata - test dataframe 472 | 473 | # In[ ]: 474 | 475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean() 476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean() 477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean() 478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean() 479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean() 480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean() 481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean() 482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean() 483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean() 484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean() 485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean() 486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean() 487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean() 488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean() 489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean() 490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean() 491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean() 492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean() 493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean() 494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean() 495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean() 496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean() 497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean() 498 | 499 | 500 | # In[ ]: 501 | 502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir, 503 | test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp, 504 | test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh, 505 | test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1) 506 | 507 | 508 | # In[ ]: 509 | 510 | df_new_test.loc[2].xs(17,level='day') 511 | 512 | 513 | # ### Shifting Kt values to make 1 hour ahead forecast 514 | 515 | # #### Train dataset 516 | 517 | # In[ ]: 518 | 519 | levels_index= [] 520 | for m in df_new_train.index.levels: 521 | levels_index.append(m) 522 | 523 | 524 | # In[ ]: 525 | 526 | for i in levels_index[0]: 527 | for j in levels_index[1]: 528 | df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-3) 529 | 530 | 531 | # In[ ]: 532 | 533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())] 534 | 535 | 536 | # #### Test dataset 537 | 538 | # In[ ]: 539 | 540 | levels_index2= [] 541 | for m in df_new_test.index.levels: 542 | levels_index2.append(m) 543 | 544 | 545 | # In[ ]: 546 | 547 | for i in levels_index2[0]: 548 | for j in levels_index2[1]: 549 | df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-3) 550 | 551 | 552 | # In[ ]: 553 | 554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())] 555 | 556 | 557 | # In[ ]: 558 | 559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()] 560 | 561 | 562 | # ### Normalize train and test dataframe 563 | 564 | # In[ ]: 565 | 566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min()) 567 | test_norm = (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min()) 568 | 569 | 570 | # In[ ]: 571 | 572 | train_norm.reset_index(inplace=True,drop=True) 573 | test_norm.reset_index(inplace=True,drop=True) 574 | 575 | 576 | # ### Making train and test sets with train_norm and test_norm 577 | 578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize 579 | 580 | # In[89]: 581 | 582 | from fractions import gcd 583 | gcd(train_norm.shape[0],test_norm.shape[0]) 584 | 585 | 586 | # In[ ]: 587 | 588 | import math 589 | def roundup(x): 590 | return int(math.ceil(x / 100.0)) * 100 591 | 592 | 593 | # In[ ]: 594 | 595 | train_lim = roundup(train_norm.shape[0]) 596 | test_lim = roundup(test_norm.shape[0]) 597 | 598 | train_random = train_norm.sample(train_lim-train_norm.shape[0]) 599 | test_random = test_norm.sample(test_lim-test_norm.shape[0]) 600 | 601 | train_norm = train_norm.append(train_random) 602 | test_norm = test_norm.append(test_random) 603 | 604 | 605 | # In[ ]: 606 | 607 | X1 = train_norm.drop('Kt',axis=1) 608 | y1 = train_norm['Kt'] 609 | 610 | X2 = test_norm.drop('Kt',axis=1) 611 | y2 = test_norm['Kt'] 612 | 613 | 614 | # In[ ]: 615 | 616 | print("X1_train shape is {}".format(X1.shape)) 617 | print("y1_train shape is {}".format(y1.shape)) 618 | print("X2_test shape is {}".format(X2.shape)) 619 | print("y2_test shape is {}".format(y2.shape)) 620 | 621 | 622 | # In[ ]: 623 | 624 | X_train = np.array(X1) 625 | y_train = np.array(y1) 626 | X_test = np.array(X2) 627 | y_test = np.array(y2) 628 | 629 | 630 | # ### start of RNN 631 | 632 | # In[117]: 633 | 634 | import torch 635 | import torch.nn as nn 636 | import torchvision.transforms as transforms 637 | import torchvision.datasets as dsets 638 | from torch.autograd import Variable 639 | 640 | 641 | # In[118]: 642 | 643 | class RNNModel(nn.Module): 644 | def __init__(self, input_dim, hidden_dim, layer_dim, output_dim): 645 | super(RNNModel, self).__init__() 646 | #Hidden Dimension 647 | self.hidden_dim = hidden_dim 648 | 649 | # Number of hidden layers 650 | self.layer_dim = layer_dim 651 | 652 | #Building the RNN 653 | self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu') 654 | 655 | # Readout layer 656 | self.fc = nn.Linear(hidden_dim, output_dim) 657 | 658 | def forward(self, x): 659 | # Initializing the hidden state with zeros 660 | # (layer_dim, batch_size, hidden_dim) 661 | h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)) 662 | 663 | #One time step (the last one perhaps?) 664 | out, hn = self.rnn(x, h0) 665 | 666 | # Indexing hidden state of the last time step 667 | # out.size() --> ?? 668 | #out[:,-1,:] --> is it going to be 100,100 669 | out = self.fc(out[:,-1,:]) 670 | # out.size() --> 100,1 671 | return out 672 | 673 | 674 | 675 | # In[119]: 676 | 677 | # Instantiating Model Class 678 | input_dim = 22 679 | hidden_dim = 15 680 | layer_dim = 1 681 | output_dim = 1 682 | batch_size = 100 683 | 684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim) 685 | 686 | # Instantiating Loss Class 687 | criterion = nn.MSELoss() 688 | 689 | # Instantiate Optimizer Class 690 | learning_rate = 0.001 691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) 692 | 693 | # converting numpy array to torch tensor 694 | X_train = torch.from_numpy(X_train) 695 | y_train = torch.from_numpy(y_train) 696 | X_test = torch.from_numpy(X_test) 697 | y_test = torch.from_numpy(y_test) 698 | 699 | # initializing lists to store losses over epochs: 700 | train_loss = [] 701 | test_loss = [] 702 | train_iter = [] 703 | test_iter = [] 704 | 705 | 706 | # In[ ]: 707 | 708 | # Training the model 709 | seq_dim = 1 710 | 711 | n_iter =0 712 | num_samples = len(X_train) 713 | test_samples = len(X_test) 714 | batch_size = 100 715 | num_epochs = 1000 716 | feat_dim = X_train.shape[1] 717 | 718 | X_train = X_train.type(torch.FloatTensor) 719 | y_train = y_train.type(torch.FloatTensor) 720 | X_test = X_test.type(torch.FloatTensor) 721 | y_test = y_test.type(torch.FloatTensor) 722 | 723 | for epoch in range(num_epochs): 724 | for i in range(0, int(num_samples/batch_size -1)): 725 | 726 | 727 | features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 728 | Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size]) 729 | 730 | #print("Kt_value={}".format(Kt_value)) 731 | 732 | optimizer.zero_grad() 733 | 734 | outputs = model(features) 735 | #print("outputs ={}".format(outputs)) 736 | 737 | loss = criterion(outputs, Kt_value) 738 | 739 | train_loss.append(loss.data[0]) 740 | train_iter.append(n_iter) 741 | 742 | #print("loss = {}".format(loss)) 743 | loss.backward() 744 | 745 | optimizer.step() 746 | 747 | 748 | 749 | if n_iter%100 == 0: 750 | for i in range(0,int(test_samples/batch_size -1)): 751 | features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 752 | Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size]) 753 | 754 | outputs = model(features) 755 | 756 | mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples) 757 | 758 | test_iter.append(n_iter) 759 | test_loss.append(mse) 760 | 761 | print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse)) 762 | 763 | n_iter += 1 764 | 765 | 766 | # In[145]: 767 | 768 | print(len(test_loss)) 769 | #plt.plot(test_loss) 770 | plt.plot(train_loss,'-') 771 | #plt.ylim([0.000,0.99]) 772 | 773 | 774 | # In[146]: 775 | 776 | plt.plot(test_loss,'r') 777 | 778 | 779 | # #### Demornamization 780 | 781 | # In[161]: 782 | 783 | rmse = np.sqrt(mse) 784 | 785 | 786 | # In[243]: 787 | 788 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean() 789 | 790 | 791 | # In[244]: 792 | 793 | print("rmse_denorm",rmse_denorm) 794 | 795 | 796 | # In[259]: 797 | 798 | print(df_new_test['Kt'].describe()) 799 | 800 | 801 | # ### Saving train and test losses to a csv 802 | 803 | # In[ ]: 804 | 805 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss}, columns=['Train Loss']) 806 | df_trainLoss.to_csv('./RNN Paper Results/Exp1_2/Fort_Peck/3hour_FortPeck_TrainLoss.csv') 807 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss}, columns=['Test Loss']) 808 | df_testLoss.to_csv('./RNN Paper Results/Exp1_2/Fort_Peck/3hour_FortPeck_TestLoss.csv') 809 | 810 | -------------------------------------------------------------------------------- /fixed-time-horizon-prediction/Exp_1.2/Fort_Peck_4hour.py: -------------------------------------------------------------------------------- 1 | 2 | # coding: utf-8 3 | 4 | # In[1]: 5 | 6 | import numpy as np 7 | import pandas as pd 8 | import datetime 9 | import glob 10 | import os.path 11 | from pandas.compat import StringIO 12 | 13 | 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI 15 | 16 | # In[2]: 17 | 18 | import itertools 19 | import matplotlib.pyplot as plt 20 | import pandas as pd 21 | import seaborn as sns 22 | 23 | 24 | # In[3]: 25 | 26 | #get_ipython().magic('matplotlib inline') 27 | sns.set_color_codes() 28 | 29 | 30 | # In[4]: 31 | 32 | import pvlib 33 | from pvlib import clearsky, atmosphere 34 | from pvlib.location import Location 35 | 36 | 37 | # In[5]: 38 | 39 | ftp = Location(48,-106.449, 'US/Mountain', 630.0216, 'Fort Peck') 40 | 41 | 42 | # In[6]: 43 | 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min', 45 | tz=ftp.tz) # 12 months of 2009 - For testing 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min', 47 | tz=ftp.tz) # 24 months of 2010 and 2011 - For training 48 | 49 | 50 | # In[7]: 51 | 52 | cs_2009 = ftp.get_clearsky(times2009) 53 | cs_2010and2011 = ftp.get_clearsky(times2010and2011) # ineichen with climatology table by default 54 | #cs_2011 = bvl.get_clearsky(times2011) 55 | 56 | 57 | # In[8]: 58 | 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 62 | 63 | 64 | # In[9]: 65 | 66 | cs_2009.reset_index(inplace=True) 67 | cs_2010and2011.reset_index(inplace=True) 68 | #cs_2011.reset_index(inplace=True) 69 | 70 | 71 | # In[10]: 72 | 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime()) 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year) 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month) 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day) 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour) 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute) 79 | 80 | 81 | # In[11]: 82 | 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime()) 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year) 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month) 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day) 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour) 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute) 89 | 90 | 91 | # In[12]: 92 | 93 | print(cs_2009.shape) 94 | print(cs_2010and2011.shape) 95 | #print(cs_2011.shape) 96 | 97 | 98 | # In[13]: 99 | 100 | cs_2009.drop(cs_2009.index[-1], inplace=True) 101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True) 102 | #cs_2011.drop(cs_2011.index[-1], inplace=True) 103 | 104 | 105 | # In[14]: 106 | 107 | print(cs_2009.shape) 108 | print(cs_2010and2011.shape) 109 | #print(cs_2011.shape) 110 | 111 | 112 | # In[15]: 113 | 114 | cs_2010and2011.head() 115 | 116 | 117 | # ### Import files from each year in a separate dataframe 118 | 119 | # 120 | # - year integer year, i.e., 1995 121 | # - jday integer Julian day (1 through 365 [or 366]) 122 | # - month integer number of the month (1-12) 123 | # - day integer day of the month(1-31) 124 | # - hour integer hour of the day (0-23) 125 | # - min integer minute of the hour (0-59) 126 | # - dt real decimal time (hour.decimalminutes, e.g., 23.5 = 2330) 127 | # - zen real solar zenith angle (degrees) 128 | # - dw_solar real downwelling global solar (Watts m^-2) 129 | # - uw_solar real upwelling global solar (Watts m^-2) 130 | # - direct_n real direct-normal solar (Watts m^-2) 131 | # - diffuse real downwelling diffuse solar (Watts m^-2) 132 | # - dw_ir real downwelling thermal infrared (Watts m^-2) 133 | # - dw_casetemp real downwelling IR case temp. (K) 134 | # - dw_dometemp real downwelling IR dome temp. (K) 135 | # - uw_ir real upwelling thermal infrared (Watts m^-2) 136 | # - uw_casetemp real upwelling IR case temp. (K) 137 | # - uw_dometemp real upwelling IR dome temp. (K) 138 | # - uvb real global UVB (milliWatts m^-2) 139 | # - par real photosynthetically active radiation (Watts m^-2) 140 | # - netsolar real net solar (dw_solar - uw_solar) (Watts m^-2) 141 | # - netir real net infrared (dw_ir - uw_ir) (Watts m^-2) 142 | # - totalnet real net radiation (netsolar+netir) (Watts m^-2) 143 | # - temp real 10-meter air temperature (?C) 144 | # - rh real relative humidity (%) 145 | # - windspd real wind speed (ms^-1) 146 | # - winddir real wind direction (degrees, clockwise from north) 147 | # - pressure real station pressure (mb) 148 | # 149 | 150 | # In[16]: 151 | 152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar', 153 | 'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp', 154 | 'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC', 155 | 'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC', 156 | 'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC', 157 | 'pressure','pressure_QC'] 158 | 159 | 160 | # In[17]: 161 | 162 | path = r'./data/Fort_Peck/Exp_1_train' 163 | all_files = glob.glob(path + "/*.dat") 164 | all_files.sort() 165 | 166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 167 | index_col=False,header=None, names=cols) for f in all_files],ignore_index=True) 168 | df_big_train.shape 169 | 170 | 171 | # In[18]: 172 | 173 | path = r'./data/Fort_Peck/Exp_1_test' 174 | all_files = glob.glob(path + "/*.dat") 175 | all_files.sort() 176 | 177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 178 | index_col=False,header=None, names=cols) for f in all_files),ignore_index=True) 179 | df_big_test.shape 180 | 181 | 182 | # In[21]: 183 | 184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape 185 | 186 | 187 | # ### Merging Clear Sky GHI And the big dataframe 188 | 189 | # In[ ]: 190 | 191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min']) 192 | df_train.shape 193 | 194 | 195 | # In[ ]: 196 | 197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min']) 198 | df_test.shape 199 | 200 | 201 | # In[ ]: 202 | 203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model 204 | df_test.drop(['index'], axis=1, inplace=True) 205 | 206 | 207 | # In[ ]: 208 | 209 | df_train.shape 210 | 211 | 212 | # ### Managing missing values 213 | 214 | # In[ ]: 215 | 216 | # Resetting index 217 | df_train.reset_index(drop=True, inplace=True) 218 | df_test.reset_index(drop=True, inplace=True) 219 | 220 | 221 | # In[ ]: 222 | 223 | # Dropping rows with two or more -9999.9 values in columns 224 | 225 | 226 | # In[ ]: 227 | 228 | # Step1: Get indices of all rows with 2 or more -999 229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0] 230 | # Step2: Drop those indices 231 | df_train.drop(missing_data_indices, axis=0, inplace=True) 232 | # Checking that the rows are dropped 233 | df_train.shape 234 | 235 | 236 | # In[ ]: 237 | 238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0] 239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True) 240 | df_test.shape 241 | 242 | 243 | # In[ ]: 244 | 245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column 246 | 247 | 248 | # #### First resetting index after dropping rows in the previous part of the code 249 | 250 | # In[ ]: 251 | 252 | # 2nd time - Reseting Index 253 | df_train.reset_index(drop=True, inplace=True) 254 | df_test.reset_index(drop=True, inplace=True) 255 | 256 | 257 | # In[ ]: 258 | 259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 260 | 261 | 262 | # In[ ]: 263 | 264 | len(one_miss_train_idx) 265 | 266 | 267 | # In[ ]: 268 | 269 | df_train.shape 270 | 271 | 272 | # In[ ]: 273 | 274 | col_names = df_train.columns 275 | from collections import defaultdict 276 | stats = defaultdict(int) 277 | total_single_missing_values = 0 278 | for name in col_names: 279 | col_mean = df_train[~(df_train[name] == -9999.9)][name].mean() 280 | missing_indices = np.where((df_train[name] == -9999.9)) 281 | stats[name] = len(missing_indices[0]) 282 | df_train[name].loc[missing_indices] = col_mean 283 | total_single_missing_values += sum(df_train[name] == -9999.9) 284 | 285 | 286 | 287 | # In[ ]: 288 | 289 | df_col_min = df_train.apply(min, axis=0) 290 | df_col_max = df_train.apply(max, axis =0) 291 | #print(df_col_min, df_col_max) 292 | 293 | 294 | # In[ ]: 295 | 296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 297 | 298 | 299 | # In[ ]: 300 | 301 | len(train) 302 | 303 | 304 | # In[ ]: 305 | 306 | # doing the same thing on test dataset 307 | 308 | 309 | # In[ ]: 310 | 311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 312 | len(one_miss_test_idx) 313 | 314 | 315 | # In[ ]: 316 | 317 | col_names_test = df_test.columns 318 | from collections import defaultdict 319 | stats_test = defaultdict(int) 320 | total_single_missing_values_test = 0 321 | for name in col_names_test: 322 | col_mean = df_test[~(df_test[name] == -9999.9)][name].mean() 323 | missing_indices = np.where((df_test[name] == -9999.9)) 324 | stats_test[name] = len(missing_indices[0]) 325 | df_test[name].loc[missing_indices] = col_mean 326 | total_single_missing_values_test += sum(df_test[name] == -9999.9) 327 | 328 | 329 | 330 | # In[8]: 331 | 332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 333 | 334 | 335 | # In[ ]: 336 | 337 | len(test) 338 | 339 | 340 | # In[ ]: 341 | 342 | df_train.shape 343 | 344 | 345 | # In[ ]: 346 | 347 | df_test.shape 348 | 349 | 350 | # ### Exploratory Data Analysis 351 | 352 | # In[ ]: 353 | 354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean() 355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean() 356 | j_day = df_test['jday'].unique() 357 | 358 | 359 | # In[ ]: 360 | 361 | fig = plt.figure() 362 | 363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8]) 364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8]) 365 | 366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red') 367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green') 368 | 369 | axes1.set_xlabel('Days') 370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)') 371 | axes1.set_title('Solar Irradiance - Test Year 2009') 372 | axes1.legend(loc='best') 373 | 374 | fig.savefig('./RNN Paper Results/Exp1_2/Fort_Peck/4hour_Figure 2.jpg', bbox_inches = 'tight') 375 | 376 | 377 | # In[ ]: 378 | 379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg') 380 | #plt.title('observed dw_solar vs clear sky ghi') 381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)') 382 | plt.ylabel('Clear Sky GHI (Watts/m^2)') 383 | plt.savefig('./RNN Paper Results/Exp1_2/Fort_Peck/4hour_Figure 3', bbox_inches='tight') 384 | 385 | 386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0 387 | 388 | # In[ ]: 389 | 390 | df_train = df_train[df_train['ghi']!=0] 391 | df_test = df_test[df_test['ghi']!=0] 392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi'] 393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi'] 394 | 395 | 396 | # In[ ]: 397 | 398 | df_train.reset_index(inplace=True) 399 | df_test.reset_index(inplace=True) 400 | 401 | 402 | # In[ ]: 403 | 404 | print("test Kt max: "+str(df_test['Kt'].max())) 405 | print("test Kt min: "+str(df_test['Kt'].min())) 406 | print("test Kt mean: "+str(df_test['Kt'].mean())) 407 | print("\n") 408 | print("train Kt max: "+str(df_train['Kt'].max())) 409 | print("train Kt min: "+str(df_train['Kt'].min())) 410 | print("train Kt mean: "+str(df_train['Kt'].mean())) 411 | 412 | 413 | # In[ ]: 414 | 415 | plt.plot(df_train['Kt']) 416 | 417 | 418 | # In[ ]: 419 | 420 | plt.plot(df_test['Kt']) 421 | 422 | 423 | # In[ ]: 424 | 425 | df_train= df_train[df_train['Kt']< 5000] 426 | df_train= df_train[df_train['Kt']> -1000] 427 | df_test= df_test[df_test['Kt']< 5000] 428 | df_test= df_test[df_test['Kt']> -1000] 429 | 430 | 431 | # #### Group the data (train dataframe) 432 | 433 | # In[ ]: 434 | 435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean() 436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean() 437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean() 438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean() 439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean() 440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean() 441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean() 442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean() 443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean() 444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean() 445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean() 446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean() 447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean() 448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean() 449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean() 450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean() 451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean() 452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean() 453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean() 454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean() 455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean() 456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean() 457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean() 458 | 459 | 460 | # In[ ]: 461 | 462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp, 463 | uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1) 464 | 465 | 466 | # In[ ]: 467 | 468 | df_new_train.head() 469 | 470 | 471 | # #### Groupdata - test dataframe 472 | 473 | # In[ ]: 474 | 475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean() 476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean() 477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean() 478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean() 479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean() 480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean() 481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean() 482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean() 483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean() 484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean() 485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean() 486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean() 487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean() 488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean() 489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean() 490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean() 491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean() 492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean() 493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean() 494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean() 495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean() 496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean() 497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean() 498 | 499 | 500 | # In[ ]: 501 | 502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir, 503 | test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp, 504 | test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh, 505 | test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1) 506 | 507 | 508 | # In[ ]: 509 | 510 | df_new_test.loc[2].xs(17,level='day') 511 | 512 | 513 | # ### Shifting Kt values to make 1 hour ahead forecast 514 | 515 | # #### Train dataset 516 | 517 | # In[ ]: 518 | 519 | levels_index= [] 520 | for m in df_new_train.index.levels: 521 | levels_index.append(m) 522 | 523 | 524 | # In[ ]: 525 | 526 | for i in levels_index[0]: 527 | for j in levels_index[1]: 528 | df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-4) 529 | 530 | 531 | # In[ ]: 532 | 533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())] 534 | 535 | 536 | # #### Test dataset 537 | 538 | # In[ ]: 539 | 540 | levels_index2= [] 541 | for m in df_new_test.index.levels: 542 | levels_index2.append(m) 543 | 544 | 545 | # In[ ]: 546 | 547 | for i in levels_index2[0]: 548 | for j in levels_index2[1]: 549 | df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-4) 550 | 551 | 552 | # In[ ]: 553 | 554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())] 555 | 556 | 557 | # In[ ]: 558 | 559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()] 560 | 561 | 562 | # ### Normalize train and test dataframe 563 | 564 | # In[ ]: 565 | 566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min()) 567 | test_norm = (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min()) 568 | 569 | 570 | # In[ ]: 571 | 572 | train_norm.reset_index(inplace=True,drop=True) 573 | test_norm.reset_index(inplace=True,drop=True) 574 | 575 | 576 | # ### Making train and test sets with train_norm and test_norm 577 | 578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize 579 | 580 | # In[89]: 581 | 582 | from fractions import gcd 583 | gcd(train_norm.shape[0],test_norm.shape[0]) 584 | 585 | 586 | # In[ ]: 587 | 588 | import math 589 | def roundup(x): 590 | return int(math.ceil(x / 100.0)) * 100 591 | 592 | 593 | # In[ ]: 594 | 595 | train_lim = roundup(train_norm.shape[0]) 596 | test_lim = roundup(test_norm.shape[0]) 597 | 598 | train_random = train_norm.sample(train_lim-train_norm.shape[0]) 599 | test_random = test_norm.sample(test_lim-test_norm.shape[0]) 600 | 601 | train_norm = train_norm.append(train_random) 602 | test_norm = test_norm.append(test_random) 603 | 604 | 605 | # In[ ]: 606 | 607 | X1 = train_norm.drop('Kt',axis=1) 608 | y1 = train_norm['Kt'] 609 | 610 | X2 = test_norm.drop('Kt',axis=1) 611 | y2 = test_norm['Kt'] 612 | 613 | 614 | # In[ ]: 615 | 616 | print("X1_train shape is {}".format(X1.shape)) 617 | print("y1_train shape is {}".format(y1.shape)) 618 | print("X2_test shape is {}".format(X2.shape)) 619 | print("y2_test shape is {}".format(y2.shape)) 620 | 621 | 622 | # In[ ]: 623 | 624 | X_train = np.array(X1) 625 | y_train = np.array(y1) 626 | X_test = np.array(X2) 627 | y_test = np.array(y2) 628 | 629 | 630 | # ### start of RNN 631 | 632 | # In[117]: 633 | 634 | import torch 635 | import torch.nn as nn 636 | import torchvision.transforms as transforms 637 | import torchvision.datasets as dsets 638 | from torch.autograd import Variable 639 | 640 | 641 | # In[118]: 642 | 643 | class RNNModel(nn.Module): 644 | def __init__(self, input_dim, hidden_dim, layer_dim, output_dim): 645 | super(RNNModel, self).__init__() 646 | #Hidden Dimension 647 | self.hidden_dim = hidden_dim 648 | 649 | # Number of hidden layers 650 | self.layer_dim = layer_dim 651 | 652 | #Building the RNN 653 | self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu') 654 | 655 | # Readout layer 656 | self.fc = nn.Linear(hidden_dim, output_dim) 657 | 658 | def forward(self, x): 659 | # Initializing the hidden state with zeros 660 | # (layer_dim, batch_size, hidden_dim) 661 | h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)) 662 | 663 | #One time step (the last one perhaps?) 664 | out, hn = self.rnn(x, h0) 665 | 666 | # Indexing hidden state of the last time step 667 | # out.size() --> ?? 668 | #out[:,-1,:] --> is it going to be 100,100 669 | out = self.fc(out[:,-1,:]) 670 | # out.size() --> 100,1 671 | return out 672 | 673 | 674 | 675 | # In[119]: 676 | 677 | # Instantiating Model Class 678 | input_dim = 22 679 | hidden_dim = 15 680 | layer_dim = 1 681 | output_dim = 1 682 | batch_size = 100 683 | 684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim) 685 | 686 | # Instantiating Loss Class 687 | criterion = nn.MSELoss() 688 | 689 | # Instantiate Optimizer Class 690 | learning_rate = 0.001 691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) 692 | 693 | # converting numpy array to torch tensor 694 | X_train = torch.from_numpy(X_train) 695 | y_train = torch.from_numpy(y_train) 696 | X_test = torch.from_numpy(X_test) 697 | y_test = torch.from_numpy(y_test) 698 | 699 | # initializing lists to store losses over epochs: 700 | train_loss = [] 701 | test_loss = [] 702 | train_iter = [] 703 | test_iter = [] 704 | 705 | 706 | # In[ ]: 707 | 708 | # Training the model 709 | seq_dim = 1 710 | 711 | n_iter =0 712 | num_samples = len(X_train) 713 | test_samples = len(X_test) 714 | batch_size = 100 715 | num_epochs = 1000 716 | feat_dim = X_train.shape[1] 717 | 718 | X_train = X_train.type(torch.FloatTensor) 719 | y_train = y_train.type(torch.FloatTensor) 720 | X_test = X_test.type(torch.FloatTensor) 721 | y_test = y_test.type(torch.FloatTensor) 722 | 723 | for epoch in range(num_epochs): 724 | for i in range(0, int(num_samples/batch_size -1)): 725 | 726 | 727 | features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 728 | Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size]) 729 | 730 | #print("Kt_value={}".format(Kt_value)) 731 | 732 | optimizer.zero_grad() 733 | 734 | outputs = model(features) 735 | #print("outputs ={}".format(outputs)) 736 | 737 | loss = criterion(outputs, Kt_value) 738 | 739 | train_loss.append(loss.data[0]) 740 | train_iter.append(n_iter) 741 | 742 | #print("loss = {}".format(loss)) 743 | loss.backward() 744 | 745 | optimizer.step() 746 | 747 | 748 | 749 | if n_iter%100 == 0: 750 | for i in range(0,int(test_samples/batch_size -1)): 751 | features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 752 | Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size]) 753 | 754 | outputs = model(features) 755 | 756 | mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples) 757 | 758 | test_iter.append(n_iter) 759 | test_loss.append(mse) 760 | 761 | print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse)) 762 | 763 | n_iter += 1 764 | 765 | 766 | # In[145]: 767 | 768 | print(len(test_loss)) 769 | #plt.plot(test_loss) 770 | plt.plot(train_loss,'-') 771 | #plt.ylim([0.000,0.99]) 772 | 773 | 774 | # In[146]: 775 | 776 | plt.plot(test_loss,'r') 777 | 778 | 779 | # #### Demornamization 780 | 781 | # In[161]: 782 | 783 | rmse = np.sqrt(mse) 784 | 785 | 786 | # In[243]: 787 | 788 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean() 789 | 790 | 791 | # In[244]: 792 | 793 | print("rmse_denorm",rmse_denorm) 794 | 795 | 796 | # In[259]: 797 | 798 | print(df_new_test['Kt'].describe()) 799 | 800 | 801 | # ### Saving train and test losses to a csv 802 | 803 | # In[ ]: 804 | 805 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss}, columns=['Train Loss']) 806 | df_trainLoss.to_csv('./RNN Paper Results/Exp1_2/Fort_Peck/4hour_FortPeck_TrainLoss.csv') 807 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss}, columns=['Test Loss']) 808 | df_testLoss.to_csv('./RNN Paper Results/Exp1_2/Fort_Peck/4hour_FortPeck_TestLoss.csv') 809 | 810 | -------------------------------------------------------------------------------- /fixed-time-horizon-prediction/Exp_1.2/RNN_Fort_Peck.py: -------------------------------------------------------------------------------- 1 | 2 | # coding: utf-8 3 | 4 | # In[1]: 5 | 6 | import numpy as np 7 | import pandas as pd 8 | import datetime 9 | import glob 10 | import os.path 11 | from pandas.compat import StringIO 12 | 13 | 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI 15 | 16 | # In[2]: 17 | 18 | import itertools 19 | import matplotlib.pyplot as plt 20 | import pandas as pd 21 | import seaborn as sns 22 | 23 | 24 | # In[3]: 25 | 26 | #get_ipython().magic('matplotlib inline') 27 | sns.set_color_codes() 28 | 29 | 30 | # In[4]: 31 | 32 | import pvlib 33 | from pvlib import clearsky, atmosphere 34 | from pvlib.location import Location 35 | 36 | 37 | # In[5]: 38 | 39 | ftp = Location(48,-106.449, 'US/Mountain', 630.0216, 'Fort Peck') 40 | 41 | 42 | # In[6]: 43 | 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min', 45 | tz=ftp.tz) # 12 months of 2009 - For testing 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min', 47 | tz=ftp.tz) # 24 months of 2010 and 2011 - For training 48 | 49 | 50 | # In[9]: 51 | 52 | cs_2009 = ftp.get_clearsky(times2009) 53 | cs_2010and2011 = ftp.get_clearsky(times2010and2011) # ineichen with climatology table by default 54 | #cs_2011 = bvl.get_clearsky(times2011) 55 | 56 | 57 | # In[10]: 58 | 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 62 | 63 | 64 | # In[11]: 65 | 66 | cs_2009.reset_index(inplace=True) 67 | cs_2010and2011.reset_index(inplace=True) 68 | #cs_2011.reset_index(inplace=True) 69 | 70 | 71 | # In[12]: 72 | 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime()) 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year) 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month) 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day) 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour) 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute) 79 | 80 | 81 | # In[13]: 82 | 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime()) 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year) 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month) 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day) 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour) 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute) 89 | 90 | 91 | # In[14]: 92 | 93 | print(cs_2009.shape) 94 | print(cs_2010and2011.shape) 95 | #print(cs_2011.shape) 96 | 97 | 98 | # In[15]: 99 | 100 | cs_2009.drop(cs_2009.index[-1], inplace=True) 101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True) 102 | #cs_2011.drop(cs_2011.index[-1], inplace=True) 103 | 104 | 105 | # In[16]: 106 | 107 | print(cs_2009.shape) 108 | print(cs_2010and2011.shape) 109 | #print(cs_2011.shape) 110 | 111 | 112 | # In[17]: 113 | 114 | cs_2010and2011.head() 115 | 116 | 117 | # ### Import files from each year in a separate dataframe 118 | 119 | # 120 | # - year integer year, i.e., 1995 121 | # - jday integer Julian day (1 through 365 [or 366]) 122 | # - month integer number of the month (1-12) 123 | # - day integer day of the month(1-31) 124 | # - hour integer hour of the day (0-23) 125 | # - min integer minute of the hour (0-59) 126 | # - dt real decimal time (hour.decimalminutes, e.g., 23.5 = 2330) 127 | # - zen real solar zenith angle (degrees) 128 | # - dw_solar real downwelling global solar (Watts m^-2) 129 | # - uw_solar real upwelling global solar (Watts m^-2) 130 | # - direct_n real direct-normal solar (Watts m^-2) 131 | # - diffuse real downwelling diffuse solar (Watts m^-2) 132 | # - dw_ir real downwelling thermal infrared (Watts m^-2) 133 | # - dw_casetemp real downwelling IR case temp. (K) 134 | # - dw_dometemp real downwelling IR dome temp. (K) 135 | # - uw_ir real upwelling thermal infrared (Watts m^-2) 136 | # - uw_casetemp real upwelling IR case temp. (K) 137 | # - uw_dometemp real upwelling IR dome temp. (K) 138 | # - uvb real global UVB (milliWatts m^-2) 139 | # - par real photosynthetically active radiation (Watts m^-2) 140 | # - netsolar real net solar (dw_solar - uw_solar) (Watts m^-2) 141 | # - netir real net infrared (dw_ir - uw_ir) (Watts m^-2) 142 | # - totalnet real net radiation (netsolar+netir) (Watts m^-2) 143 | # - temp real 10-meter air temperature (?C) 144 | # - rh real relative humidity (%) 145 | # - windspd real wind speed (ms^-1) 146 | # - winddir real wind direction (degrees, clockwise from north) 147 | # - pressure real station pressure (mb) 148 | # 149 | 150 | # In[18]: 151 | 152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar', 153 | 'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp', 154 | 'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC', 155 | 'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC', 156 | 'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC', 157 | 'pressure','pressure_QC'] 158 | 159 | 160 | # In[19]: 161 | 162 | path = r'./data/Fort_Peck/Exp_1_train' 163 | all_files = glob.glob(path + "/*.dat") 164 | all_files.sort() 165 | 166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 167 | index_col=False,header=None, names=cols) for f in all_files],ignore_index=True) 168 | df_big_train.shape 169 | 170 | 171 | # In[20]: 172 | 173 | path = r'./data/Fort_Peck/Exp_1_test' 174 | all_files = glob.glob(path + "/*.dat") 175 | all_files.sort() 176 | 177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 178 | index_col=False,header=None, names=cols) for f in all_files),ignore_index=True) 179 | df_big_test.shape 180 | 181 | 182 | # In[21]: 183 | 184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape 185 | 186 | 187 | # ### Merging Clear Sky GHI And the big dataframe 188 | 189 | # In[26]: 190 | 191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min']) 192 | df_train.shape 193 | 194 | 195 | # In[25]: 196 | 197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min']) 198 | df_test.shape 199 | 200 | 201 | # In[24]: 202 | 203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model 204 | df_test.drop(['index'], axis=1, inplace=True) 205 | 206 | 207 | # In[ ]: 208 | 209 | df_train.shape 210 | 211 | 212 | # ### Managing missing values 213 | 214 | # In[ ]: 215 | 216 | # Resetting index 217 | df_train.reset_index(drop=True, inplace=True) 218 | df_test.reset_index(drop=True, inplace=True) 219 | 220 | 221 | # In[ ]: 222 | 223 | # Dropping rows with two or more -9999.9 values in columns 224 | 225 | 226 | # In[ ]: 227 | 228 | # Step1: Get indices of all rows with 2 or more -999 229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0] 230 | # Step2: Drop those indices 231 | df_train.drop(missing_data_indices, axis=0, inplace=True) 232 | # Checking that the rows are dropped 233 | df_train.shape 234 | 235 | 236 | # In[ ]: 237 | 238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0] 239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True) 240 | df_test.shape 241 | 242 | 243 | # In[ ]: 244 | 245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column 246 | 247 | 248 | # #### First resetting index after dropping rows in the previous part of the code 249 | 250 | # In[ ]: 251 | 252 | # 2nd time - Reseting Index 253 | df_train.reset_index(drop=True, inplace=True) 254 | df_test.reset_index(drop=True, inplace=True) 255 | 256 | 257 | # In[ ]: 258 | 259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 260 | 261 | 262 | # In[ ]: 263 | 264 | len(one_miss_train_idx) 265 | 266 | 267 | # In[ ]: 268 | 269 | df_train.shape 270 | 271 | 272 | # In[ ]: 273 | 274 | col_names = df_train.columns 275 | from collections import defaultdict 276 | stats = defaultdict(int) 277 | total_single_missing_values = 0 278 | for name in col_names: 279 | col_mean = df_train[~(df_train[name] == -9999.9)][name].mean() 280 | missing_indices = np.where((df_train[name] == -9999.9)) 281 | stats[name] = len(missing_indices[0]) 282 | df_train[name].loc[missing_indices] = col_mean 283 | total_single_missing_values += sum(df_train[name] == -9999.9) 284 | 285 | 286 | 287 | # In[ ]: 288 | 289 | df_col_min = df_train.apply(min, axis=0) 290 | df_col_max = df_train.apply(max, axis =0) 291 | #print(df_col_min, df_col_max) 292 | 293 | 294 | # In[ ]: 295 | 296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 297 | 298 | 299 | # In[ ]: 300 | 301 | len(train) 302 | 303 | 304 | # In[ ]: 305 | 306 | # doing the same thing on test dataset 307 | 308 | 309 | # In[ ]: 310 | 311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 312 | len(one_miss_test_idx) 313 | 314 | 315 | # In[ ]: 316 | 317 | col_names_test = df_test.columns 318 | from collections import defaultdict 319 | stats_test = defaultdict(int) 320 | total_single_missing_values_test = 0 321 | for name in col_names_test: 322 | col_mean = df_test[~(df_test[name] == -9999.9)][name].mean() 323 | missing_indices = np.where((df_test[name] == -9999.9)) 324 | stats_test[name] = len(missing_indices[0]) 325 | df_test[name].loc[missing_indices] = col_mean 326 | total_single_missing_values_test += sum(df_test[name] == -9999.9) 327 | 328 | 329 | 330 | # In[8]: 331 | 332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 333 | 334 | 335 | # In[ ]: 336 | 337 | len(test) 338 | 339 | 340 | # In[ ]: 341 | 342 | df_train.shape 343 | 344 | 345 | # In[ ]: 346 | 347 | df_test.shape 348 | 349 | 350 | # ### Exploratory Data Analysis 351 | 352 | # In[ ]: 353 | 354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean() 355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean() 356 | j_day = df_test['jday'].unique() 357 | 358 | 359 | # In[ ]: 360 | 361 | fig = plt.figure() 362 | 363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8]) 364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8]) 365 | 366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red') 367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green') 368 | 369 | axes1.set_xlabel('Days') 370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)') 371 | axes1.set_title('Solar Irradiance - Test Year 2009') 372 | axes1.legend(loc='best') 373 | 374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight') 375 | 376 | 377 | # In[ ]: 378 | 379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg') 380 | #plt.title('observed dw_solar vs clear sky ghi') 381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)') 382 | plt.ylabel('Clear Sky GHI (Watts/m^2)') 383 | plt.savefig('Figure 3', bbox_inches='tight') 384 | #plt.show() 385 | 386 | 387 | 388 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0 389 | 390 | # In[ ]: 391 | 392 | df_train = df_train[df_train['ghi']!=0] 393 | df_test = df_test[df_test['ghi']!=0] 394 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi'] 395 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi'] 396 | 397 | 398 | # In[ ]: 399 | 400 | df_train.reset_index(inplace=True) 401 | df_test.reset_index(inplace=True) 402 | 403 | 404 | # In[ ]: 405 | 406 | print("test Kt max: "+str(df_test['Kt'].max())) 407 | print("test Kt min: "+str(df_test['Kt'].min())) 408 | print("test Kt mean: "+str(df_test['Kt'].mean())) 409 | print("\n") 410 | print("train Kt max: "+str(df_train['Kt'].max())) 411 | print("train Kt min: "+str(df_train['Kt'].min())) 412 | print("train Kt mean: "+str(df_train['Kt'].mean())) 413 | 414 | 415 | # In[ ]: 416 | 417 | plt.plot(df_train['Kt']) 418 | #plt.show() 419 | 420 | 421 | # In[ ]: 422 | 423 | plt.plot(df_test['Kt']) 424 | #plt.show() 425 | 426 | 427 | # In[ ]: 428 | 429 | df_train= df_train[df_train['Kt']< 5000] 430 | df_train= df_train[df_train['Kt']> -1000] 431 | df_test= df_test[df_test['Kt']< 5000] 432 | df_test= df_test[df_test['Kt']> -1000] 433 | 434 | 435 | # #### Group the data (train dataframe) 436 | 437 | # In[ ]: 438 | 439 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean() 440 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean() 441 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean() 442 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean() 443 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean() 444 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean() 445 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean() 446 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean() 447 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean() 448 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean() 449 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean() 450 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean() 451 | par = df_train.groupby(['year','month','day','hour'])['par'].mean() 452 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean() 453 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean() 454 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean() 455 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean() 456 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean() 457 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean() 458 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean() 459 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean() 460 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean() 461 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean() 462 | 463 | 464 | # In[ ]: 465 | 466 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp, 467 | uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1) 468 | 469 | 470 | # In[ ]: 471 | 472 | df_new_train.head() 473 | 474 | 475 | # #### Groupdata - test dataframe 476 | 477 | # In[ ]: 478 | 479 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean() 480 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean() 481 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean() 482 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean() 483 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean() 484 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean() 485 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean() 486 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean() 487 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean() 488 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean() 489 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean() 490 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean() 491 | test_par = df_test.groupby(['month','day','hour'])['par'].mean() 492 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean() 493 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean() 494 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean() 495 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean() 496 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean() 497 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean() 498 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean() 499 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean() 500 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean() 501 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean() 502 | 503 | 504 | # In[ ]: 505 | 506 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir, 507 | test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp, 508 | test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh, 509 | test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1) 510 | 511 | 512 | # In[ ]: 513 | 514 | df_new_test.loc[2].xs(17,level='day') 515 | 516 | 517 | # ### Shifting Kt values to make 1 hour ahead forecast 518 | 519 | # #### Train dataset 520 | 521 | # In[ ]: 522 | 523 | levels_index= [] 524 | for m in df_new_train.index.levels: 525 | levels_index.append(m) 526 | 527 | 528 | # In[ ]: 529 | 530 | for i in levels_index[0]: 531 | for j in levels_index[1]: 532 | df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-1) 533 | 534 | 535 | # In[ ]: 536 | 537 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())] 538 | 539 | 540 | # #### Test dataset 541 | 542 | # In[ ]: 543 | 544 | levels_index2= [] 545 | for m in df_new_test.index.levels: 546 | levels_index2.append(m) 547 | 548 | 549 | # In[ ]: 550 | 551 | for i in levels_index2[0]: 552 | for j in levels_index2[1]: 553 | df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-1) 554 | 555 | 556 | # In[ ]: 557 | 558 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())] 559 | 560 | 561 | # In[ ]: 562 | 563 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()] 564 | 565 | 566 | # ### Normalize train and test dataframe 567 | 568 | # In[ ]: 569 | 570 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min()) 571 | test_norm = (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min()) 572 | 573 | 574 | # In[ ]: 575 | 576 | train_norm.reset_index(inplace=True,drop=True) 577 | test_norm.reset_index(inplace=True,drop=True) 578 | 579 | 580 | # ### Making train and test sets with train_norm and test_norm 581 | 582 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize 583 | 584 | # In[89]: 585 | 586 | from fractions import gcd 587 | gcd(train_norm.shape[0],test_norm.shape[0]) 588 | 589 | 590 | # In[ ]: 591 | 592 | import math 593 | def roundup(x): 594 | return int(math.ceil(x / 100.0)) * 100 595 | 596 | 597 | # In[ ]: 598 | 599 | train_lim = roundup(train_norm.shape[0]) 600 | test_lim = roundup(test_norm.shape[0]) 601 | 602 | train_random = train_norm.sample(train_lim-train_norm.shape[0]) 603 | test_random = test_norm.sample(test_lim-test_norm.shape[0]) 604 | 605 | train_norm = train_norm.append(train_random) 606 | test_norm = test_norm.append(test_random) 607 | 608 | 609 | # In[ ]: 610 | 611 | X1 = train_norm.drop('Kt',axis=1) 612 | y1 = train_norm['Kt'] 613 | 614 | X2 = test_norm.drop('Kt',axis=1) 615 | y2 = test_norm['Kt'] 616 | 617 | 618 | # In[ ]: 619 | 620 | print("X1_train shape is {}".format(X1.shape)) 621 | print("y1_train shape is {}".format(y1.shape)) 622 | print("X2_test shape is {}".format(X2.shape)) 623 | print("y2_test shape is {}".format(y2.shape)) 624 | 625 | 626 | # In[ ]: 627 | 628 | X_train = np.array(X1) 629 | y_train = np.array(y1) 630 | X_test = np.array(X2) 631 | y_test = np.array(y2) 632 | 633 | 634 | # ### start of RNN 635 | 636 | # In[117]: 637 | 638 | import torch 639 | import torch.nn as nn 640 | import torchvision.transforms as transforms 641 | import torchvision.datasets as dsets 642 | from torch.autograd import Variable 643 | 644 | 645 | # In[118]: 646 | 647 | class RNNModel(nn.Module): 648 | def __init__(self, input_dim, hidden_dim, layer_dim, output_dim): 649 | super(RNNModel, self).__init__() 650 | #Hidden Dimension 651 | self.hidden_dim = hidden_dim 652 | 653 | # Number of hidden layers 654 | self.layer_dim = layer_dim 655 | 656 | #Building the RNN 657 | self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu') 658 | 659 | # Readout layer 660 | self.fc = nn.Linear(hidden_dim, output_dim) 661 | 662 | def forward(self, x): 663 | # Initializing the hidden state with zeros 664 | # (layer_dim, batch_size, hidden_dim) 665 | h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)) 666 | 667 | #One time step (the last one perhaps?) 668 | out, hn = self.rnn(x, h0) 669 | 670 | # Indexing hidden state of the last time step 671 | # out.size() --> ?? 672 | #out[:,-1,:] --> is it going to be 100,100 673 | out = self.fc(out[:,-1,:]) 674 | # out.size() --> 100,1 675 | return out 676 | 677 | 678 | 679 | # In[119]: 680 | 681 | # Instantiating Model Class 682 | input_dim = 22 683 | hidden_dim = 15 684 | layer_dim = 1 685 | output_dim = 1 686 | batch_size = 100 687 | 688 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim) 689 | 690 | # Instantiating Loss Class 691 | criterion = nn.MSELoss() 692 | 693 | # Instantiate Optimizer Class 694 | learning_rate = 0.001 695 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) 696 | 697 | # converting numpy array to torch tensor 698 | X_train = torch.from_numpy(X_train) 699 | y_train = torch.from_numpy(y_train) 700 | X_test = torch.from_numpy(X_test) 701 | y_test = torch.from_numpy(y_test) 702 | 703 | # initializing lists to store losses over epochs: 704 | train_loss = [] 705 | test_loss = [] 706 | train_iter = [] 707 | test_iter = [] 708 | 709 | 710 | # In[ ]: 711 | 712 | # Training the model 713 | seq_dim = 1 714 | 715 | n_iter =0 716 | num_samples = len(X_train) 717 | test_samples = len(X_test) 718 | batch_size = 100 719 | num_epochs = 1000 720 | feat_dim = X_train.shape[1] 721 | 722 | X_train = X_train.type(torch.FloatTensor) 723 | y_train = y_train.type(torch.FloatTensor) 724 | X_test = X_test.type(torch.FloatTensor) 725 | y_test = y_test.type(torch.FloatTensor) 726 | 727 | for epoch in range(num_epochs): 728 | for i in range(0, int(num_samples/batch_size -1)): 729 | 730 | 731 | features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 732 | Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size]) 733 | 734 | #print("Kt_value={}".format(Kt_value)) 735 | 736 | optimizer.zero_grad() 737 | 738 | outputs = model(features) 739 | #print("outputs ={}".format(outputs)) 740 | 741 | loss = criterion(outputs, Kt_value) 742 | 743 | train_loss.append(loss.data[0]) 744 | train_iter.append(n_iter) 745 | 746 | #print("loss = {}".format(loss)) 747 | loss.backward() 748 | 749 | optimizer.step() 750 | 751 | n_iter += 1 752 | 753 | if n_iter%100 == 0: 754 | for i in range(0,int(test_samples/batch_size -1)): 755 | features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 756 | Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size]) 757 | 758 | outputs = model(features) 759 | 760 | mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples) 761 | 762 | test_iter.append(n_iter) 763 | test_loss.append(mse) 764 | 765 | print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse)) 766 | 767 | 768 | 769 | # In[145]: 770 | 771 | print(len(test_loss)) 772 | #plt.plot(test_loss) 773 | plt.plot(train_loss,'-') 774 | #plt.ylim([0.000,0.99]) 775 | 776 | 777 | # In[146]: 778 | 779 | plt.plot(test_loss,'r') 780 | 781 | 782 | # #### Demornamization 783 | 784 | # In[161]: 785 | 786 | rmse = np.sqrt(mse) 787 | 788 | 789 | # In[243]: 790 | 791 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean() 792 | 793 | 794 | # In[244]: 795 | 796 | print("rmse_denorm=",rmse_denorm) 797 | 798 | 799 | # In[259]: 800 | 801 | print(df_new_test['Kt'].describe()) 802 | 803 | 804 | # ### Saving train and test losses to a csv 805 | 806 | # In[ ]: 807 | 808 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss}, columns=['Train Loss']) 809 | df_trainLoss.to_csv('RNN Paper Results/Exp1_FortPeck_TrainLoss.csv') 810 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss}, columns=['Test Loss']) 811 | df_testLoss.to_csv('RNN Paper Results/Exp1_FortPeck_TestLoss.csv') 812 | 813 | -------------------------------------------------------------------------------- /fixed-time-horizon-prediction/Exp_1.2/RNN_Sioux_Falls.py: -------------------------------------------------------------------------------- 1 | 2 | # coding: utf-8 3 | 4 | # In[25]: 5 | 6 | import numpy as np 7 | import pandas as pd 8 | import datetime 9 | import glob 10 | import os.path 11 | from pandas.compat import StringIO 12 | 13 | 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI 15 | 16 | # In[26]: 17 | 18 | import itertools 19 | import matplotlib.pyplot as plt 20 | import pandas as pd 21 | import seaborn as sns 22 | 23 | 24 | # In[27]: 25 | 26 | #get_ipython().magic('matplotlib inline') 27 | sns.set_color_codes() 28 | 29 | 30 | # In[28]: 31 | 32 | import pvlib 33 | from pvlib import clearsky, atmosphere 34 | from pvlib.location import Location 35 | 36 | 37 | # In[29]: 38 | 39 | sif = Location(43.544,-96.73, 'US/Central', 448.086, 'Sioux Falls') 40 | 41 | 42 | # In[30]: 43 | 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min', 45 | tz=sif.tz) # 12 months of 2009 - For testing 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min', 47 | tz=sif.tz) # 24 months of 2010 and 2011 - For training 48 | 49 | 50 | # In[32]: 51 | 52 | cs_2009 = sif.get_clearsky(times2009) 53 | cs_2010and2011 = sif.get_clearsky(times2010and2011) # ineichen with climatology table by default 54 | #cs_2011 = bvl.get_clearsky(times2011) 55 | 56 | 57 | # In[33]: 58 | 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 62 | 63 | 64 | # In[34]: 65 | 66 | cs_2009.reset_index(inplace=True) 67 | cs_2010and2011.reset_index(inplace=True) 68 | #cs_2011.reset_index(inplace=True) 69 | 70 | 71 | # In[35]: 72 | 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime()) 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year) 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month) 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day) 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour) 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute) 79 | 80 | 81 | # In[36]: 82 | 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime()) 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year) 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month) 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day) 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour) 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute) 89 | 90 | 91 | # In[37]: 92 | 93 | print(cs_2009.shape) 94 | print(cs_2010and2011.shape) 95 | #print(cs_2011.shape) 96 | 97 | 98 | # In[38]: 99 | 100 | cs_2009.drop(cs_2009.index[-1], inplace=True) 101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True) 102 | #cs_2011.drop(cs_2011.index[-1], inplace=True) 103 | 104 | 105 | # In[39]: 106 | 107 | print(cs_2009.shape) 108 | print(cs_2010and2011.shape) 109 | #print(cs_2011.shape) 110 | 111 | 112 | # In[40]: 113 | 114 | cs_2010and2011.head() 115 | 116 | 117 | # ### Import files from each year in a separate dataframe 118 | 119 | # 120 | # - year integer year, i.e., 1995 121 | # - jday integer Julian day (1 through 365 [or 366]) 122 | # - month integer number of the month (1-12) 123 | # - day integer day of the month(1-31) 124 | # - hour integer hour of the day (0-23) 125 | # - min integer minute of the hour (0-59) 126 | # - dt real decimal time (hour.decimalminutes, e.g., 23.5 = 2330) 127 | # - zen real solar zenith angle (degrees) 128 | # - dw_solar real downwelling global solar (Watts m^-2) 129 | # - uw_solar real upwelling global solar (Watts m^-2) 130 | # - direct_n real direct-normal solar (Watts m^-2) 131 | # - diffuse real downwelling diffuse solar (Watts m^-2) 132 | # - dw_ir real downwelling thermal infrared (Watts m^-2) 133 | # - dw_casetemp real downwelling IR case temp. (K) 134 | # - dw_dometemp real downwelling IR dome temp. (K) 135 | # - uw_ir real upwelling thermal infrared (Watts m^-2) 136 | # - uw_casetemp real upwelling IR case temp. (K) 137 | # - uw_dometemp real upwelling IR dome temp. (K) 138 | # - uvb real global UVB (milliWatts m^-2) 139 | # - par real photosynthetically active radiation (Watts m^-2) 140 | # - netsolar real net solar (dw_solar - uw_solar) (Watts m^-2) 141 | # - netir real net infrared (dw_ir - uw_ir) (Watts m^-2) 142 | # - totalnet real net radiation (netsolar+netir) (Watts m^-2) 143 | # - temp real 10-meter air temperature (?C) 144 | # - rh real relative humidity (%) 145 | # - windspd real wind speed (ms^-1) 146 | # - winddir real wind direction (degrees, clockwise from north) 147 | # - pressure real station pressure (mb) 148 | # 149 | 150 | # In[41]: 151 | 152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar', 153 | 'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp', 154 | 'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC', 155 | 'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC', 156 | 'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC', 157 | 'pressure','pressure_QC'] 158 | 159 | 160 | # In[42]: 161 | 162 | path = r'./data/Sioux_Falls/Exp_1_train' 163 | all_files = glob.glob(path + "/*.dat") 164 | all_files.sort() 165 | 166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 167 | index_col=False,header=None, names=cols) for f in all_files],ignore_index=True) 168 | df_big_train.shape 169 | 170 | 171 | # In[43]: 172 | 173 | path = r'./data/Sioux_Falls/Exp_1_test' 174 | all_files = glob.glob(path + "/*.dat") 175 | all_files.sort() 176 | 177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 178 | index_col=False,header=None, names=cols) for f in all_files),ignore_index=True) 179 | df_big_test.shape 180 | 181 | 182 | # In[44]: 183 | 184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape 185 | 186 | 187 | # ### Merging Clear Sky GHI And the big dataframe 188 | 189 | # In[45]: 190 | 191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min']) 192 | df_train.shape 193 | 194 | 195 | # In[46]: 196 | 197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min']) 198 | df_test.shape 199 | 200 | 201 | # In[47]: 202 | 203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model 204 | df_test.drop(['index'], axis=1, inplace=True) 205 | 206 | 207 | # In[48]: 208 | 209 | df_train.shape 210 | 211 | 212 | # ### Managing missing values 213 | 214 | # In[49]: 215 | 216 | # Resetting index 217 | df_train.reset_index(drop=True, inplace=True) 218 | df_test.reset_index(drop=True, inplace=True) 219 | 220 | 221 | # In[50]: 222 | 223 | # Dropping rows with two or more -9999.9 values in columns 224 | 225 | 226 | # In[51]: 227 | 228 | # Step1: Get indices of all rows with 2 or more -999 229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0] 230 | # Step2: Drop those indices 231 | df_train.drop(missing_data_indices, axis=0, inplace=True) 232 | # Checking that the rows are dropped 233 | df_train.shape 234 | 235 | 236 | # In[52]: 237 | 238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0] 239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True) 240 | df_test.shape 241 | 242 | 243 | # In[53]: 244 | 245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column 246 | 247 | 248 | # #### First resetting index after dropping rows in the previous part of the code 249 | 250 | # In[54]: 251 | 252 | # 2nd time - Reseting Index 253 | df_train.reset_index(drop=True, inplace=True) 254 | df_test.reset_index(drop=True, inplace=True) 255 | 256 | 257 | # In[55]: 258 | 259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 260 | 261 | 262 | # In[56]: 263 | 264 | len(one_miss_train_idx) 265 | 266 | 267 | # In[57]: 268 | 269 | df_train.shape 270 | 271 | 272 | # In[58]: 273 | 274 | col_names = df_train.columns 275 | from collections import defaultdict 276 | stats = defaultdict(int) 277 | total_single_missing_values = 0 278 | for name in col_names: 279 | col_mean = df_train[~(df_train[name] == -9999.9)][name].mean() 280 | missing_indices = np.where((df_train[name] == -9999.9)) 281 | stats[name] = len(missing_indices[0]) 282 | df_train[name].loc[missing_indices] = col_mean 283 | total_single_missing_values += sum(df_train[name] == -9999.9) 284 | 285 | 286 | 287 | # In[59]: 288 | 289 | df_col_min = df_train.apply(min, axis=0) 290 | df_col_max = df_train.apply(max, axis =0) 291 | #print(df_col_min, df_col_max) 292 | 293 | 294 | # In[60]: 295 | 296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 297 | 298 | 299 | # In[61]: 300 | 301 | len(train) 302 | 303 | 304 | # In[62]: 305 | 306 | # doing the same thing on test dataset 307 | 308 | 309 | # In[63]: 310 | 311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 312 | len(one_miss_test_idx) 313 | 314 | 315 | # In[64]: 316 | 317 | col_names_test = df_test.columns 318 | from collections import defaultdict 319 | stats_test = defaultdict(int) 320 | total_single_missing_values_test = 0 321 | for name in col_names_test: 322 | col_mean = df_test[~(df_test[name] == -9999.9)][name].mean() 323 | missing_indices = np.where((df_test[name] == -9999.9)) 324 | stats_test[name] = len(missing_indices[0]) 325 | df_test[name].loc[missing_indices] = col_mean 326 | total_single_missing_values_test += sum(df_test[name] == -9999.9) 327 | 328 | 329 | 330 | # In[65]: 331 | 332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 333 | 334 | 335 | # In[66]: 336 | 337 | len(test) 338 | 339 | 340 | # In[67]: 341 | 342 | df_train.shape 343 | 344 | 345 | # In[68]: 346 | 347 | df_test.shape 348 | 349 | 350 | # ### Exploratory Data Analysis 351 | 352 | # In[69]: 353 | 354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean() 355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean() 356 | j_day = df_test['jday'].unique() 357 | 358 | 359 | # In[70]: 360 | 361 | fig = plt.figure() 362 | 363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8]) 364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8]) 365 | 366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red') 367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green') 368 | 369 | axes1.set_xlabel('Days') 370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)') 371 | axes1.set_title('Solar Irradiance - Test Year 2009') 372 | axes1.legend(loc='best') 373 | 374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight') 375 | 376 | 377 | # In[71]: 378 | 379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg') 380 | #plt.title('observed dw_solar vs clear sky ghi') 381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)') 382 | plt.ylabel('Clear Sky GHI (Watts/m^2)') 383 | plt.savefig('Figure 3', bbox_inches='tight') 384 | 385 | 386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0 387 | 388 | # In[72]: 389 | 390 | df_train = df_train[df_train['ghi']!=0] 391 | df_test = df_test[df_test['ghi']!=0] 392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi'] 393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi'] 394 | 395 | 396 | # In[ ]: 397 | 398 | df_train.reset_index(inplace=True) 399 | df_test.reset_index(inplace=True) 400 | 401 | 402 | # In[ ]: 403 | 404 | print("test Kt max: "+str(df_test['Kt'].max())) 405 | print("test Kt min: "+str(df_test['Kt'].min())) 406 | print("test Kt mean: "+str(df_test['Kt'].mean())) 407 | print("\n") 408 | print("train Kt max: "+str(df_train['Kt'].max())) 409 | print("train Kt min: "+str(df_train['Kt'].min())) 410 | print("train Kt mean: "+str(df_train['Kt'].mean())) 411 | 412 | 413 | # In[ ]: 414 | 415 | plt.plot(df_train['Kt']) 416 | 417 | 418 | # In[ ]: 419 | 420 | plt.plot(df_test['Kt']) 421 | 422 | 423 | # In[ ]: 424 | 425 | df_train= df_train[df_train['Kt']< 5000] 426 | df_train= df_train[df_train['Kt']> -1000] 427 | df_test= df_test[df_test['Kt']< 5000] 428 | df_test= df_test[df_test['Kt']> -1000] 429 | 430 | 431 | # #### Group the data (train dataframe) 432 | 433 | # In[ ]: 434 | 435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean() 436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean() 437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean() 438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean() 439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean() 440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean() 441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean() 442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean() 443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean() 444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean() 445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean() 446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean() 447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean() 448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean() 449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean() 450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean() 451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean() 452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean() 453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean() 454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean() 455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean() 456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean() 457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean() 458 | 459 | 460 | # In[ ]: 461 | 462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp, 463 | uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1) 464 | 465 | 466 | # In[ ]: 467 | 468 | df_new_train.head() 469 | 470 | 471 | # #### Groupdata - test dataframe 472 | 473 | # In[ ]: 474 | 475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean() 476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean() 477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean() 478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean() 479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean() 480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean() 481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean() 482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean() 483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean() 484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean() 485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean() 486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean() 487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean() 488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean() 489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean() 490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean() 491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean() 492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean() 493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean() 494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean() 495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean() 496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean() 497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean() 498 | 499 | 500 | # In[ ]: 501 | 502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir, 503 | test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp, 504 | test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh, 505 | test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1) 506 | 507 | 508 | # In[ ]: 509 | 510 | df_new_test.loc[2].xs(17,level='day') 511 | 512 | 513 | # ### Shifting Kt values to make 1 hour ahead forecast 514 | 515 | # #### Train dataset 516 | 517 | # In[ ]: 518 | 519 | levels_index= [] 520 | for m in df_new_train.index.levels: 521 | levels_index.append(m) 522 | 523 | 524 | # In[ ]: 525 | 526 | for i in levels_index[0]: 527 | for j in levels_index[1]: 528 | df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-1) 529 | 530 | 531 | # In[ ]: 532 | 533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())] 534 | 535 | 536 | # #### Test dataset 537 | 538 | # In[ ]: 539 | 540 | levels_index2= [] 541 | for m in df_new_test.index.levels: 542 | levels_index2.append(m) 543 | 544 | 545 | # In[ ]: 546 | 547 | for i in levels_index2[0]: 548 | for j in levels_index2[1]: 549 | df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-1) 550 | 551 | 552 | # In[ ]: 553 | 554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())] 555 | 556 | 557 | # In[ ]: 558 | 559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()] 560 | 561 | 562 | # ### Normalize train and test dataframe 563 | 564 | # In[ ]: 565 | 566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min()) 567 | test_norm = (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min()) 568 | 569 | 570 | # In[ ]: 571 | 572 | train_norm.reset_index(inplace=True,drop=True) 573 | test_norm.reset_index(inplace=True,drop=True) 574 | 575 | 576 | # ### Making train and test sets with train_norm and test_norm 577 | 578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize 579 | 580 | # In[ ]: 581 | 582 | from fractions import gcd 583 | gcd(train_norm.shape[0],test_norm.shape[0]) 584 | 585 | 586 | # In[ ]: 587 | 588 | import math 589 | def roundup(x): 590 | return int(math.ceil(x / 100.0)) * 100 591 | 592 | 593 | # In[ ]: 594 | 595 | train_lim = roundup(train_norm.shape[0]) 596 | test_lim = roundup(test_norm.shape[0]) 597 | 598 | train_random = train_norm.sample(train_lim-train_norm.shape[0]) 599 | test_random = test_norm.sample(test_lim-test_norm.shape[0]) 600 | 601 | train_norm = train_norm.append(train_random) 602 | test_norm = test_norm.append(test_random) 603 | 604 | 605 | # In[ ]: 606 | 607 | X1 = train_norm.drop('Kt',axis=1) 608 | y1 = train_norm['Kt'] 609 | 610 | X2 = test_norm.drop('Kt',axis=1) 611 | y2 = test_norm['Kt'] 612 | 613 | 614 | # In[ ]: 615 | 616 | print("X1_train shape is {}".format(X1.shape)) 617 | print("y1_train shape is {}".format(y1.shape)) 618 | print("X2_test shape is {}".format(X2.shape)) 619 | print("y2_test shape is {}".format(y2.shape)) 620 | 621 | 622 | # In[ ]: 623 | 624 | X_train = np.array(X1) 625 | y_train = np.array(y1) 626 | X_test = np.array(X2) 627 | y_test = np.array(y2) 628 | 629 | 630 | # ### start of RNN 631 | 632 | # In[ ]: 633 | 634 | import torch 635 | import torch.nn as nn 636 | import torchvision.transforms as transforms 637 | import torchvision.datasets as dsets 638 | from torch.autograd import Variable 639 | 640 | 641 | # In[ ]: 642 | 643 | class RNNModel(nn.Module): 644 | def __init__(self, input_dim, hidden_dim, layer_dim, output_dim): 645 | super(RNNModel, self).__init__() 646 | #Hidden Dimension 647 | self.hidden_dim = hidden_dim 648 | 649 | # Number of hidden layers 650 | self.layer_dim = layer_dim 651 | 652 | #Building the RNN 653 | self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu') 654 | 655 | # Readout layer 656 | self.fc = nn.Linear(hidden_dim, output_dim) 657 | 658 | def forward(self, x): 659 | # Initializing the hidden state with zeros 660 | # (layer_dim, batch_size, hidden_dim) 661 | h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)) 662 | 663 | #One time step (the last one perhaps?) 664 | out, hn = self.rnn(x, h0) 665 | 666 | # Indexing hidden state of the last time step 667 | # out.size() --> ?? 668 | #out[:,-1,:] --> is it going to be 100,100 669 | out = self.fc(out[:,-1,:]) 670 | # out.size() --> 100,1 671 | return out 672 | 673 | 674 | 675 | # In[ ]: 676 | 677 | # Instantiating Model Class 678 | input_dim = 22 679 | hidden_dim = 15 680 | layer_dim = 1 681 | output_dim = 1 682 | batch_size = 100 683 | 684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim) 685 | 686 | # Instantiating Loss Class 687 | criterion = nn.MSELoss() 688 | 689 | # Instantiate Optimizer Class 690 | learning_rate = 0.001 691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) 692 | 693 | # converting numpy array to torch tensor 694 | X_train = torch.from_numpy(X_train) 695 | y_train = torch.from_numpy(y_train) 696 | X_test = torch.from_numpy(X_test) 697 | y_test = torch.from_numpy(y_test) 698 | 699 | # initializing lists to store losses over epochs: 700 | train_loss = [] 701 | test_loss = [] 702 | train_iter = [] 703 | test_iter = [] 704 | 705 | 706 | # In[ ]: 707 | 708 | # Training the model 709 | seq_dim = 1 710 | 711 | n_iter =0 712 | num_samples = len(X_train) 713 | test_samples = len(X_test) 714 | batch_size = 100 715 | num_epochs = 1000 716 | feat_dim = X_train.shape[1] 717 | 718 | X_train = X_train.type(torch.FloatTensor) 719 | y_train = y_train.type(torch.FloatTensor) 720 | X_test = X_test.type(torch.FloatTensor) 721 | y_test = y_test.type(torch.FloatTensor) 722 | 723 | for epoch in range(num_epochs): 724 | for i in range(0, int(num_samples/batch_size -1)): 725 | 726 | 727 | features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 728 | Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size]) 729 | 730 | #print("Kt_value={}".format(Kt_value)) 731 | 732 | optimizer.zero_grad() 733 | 734 | outputs = model(features) 735 | #print("outputs ={}".format(outputs)) 736 | 737 | loss = criterion(outputs, Kt_value) 738 | 739 | train_loss.append(loss.data[0]) 740 | train_iter.append(n_iter) 741 | 742 | #print("loss = {}".format(loss)) 743 | loss.backward() 744 | 745 | optimizer.step() 746 | 747 | n_iter += 1 748 | 749 | if n_iter%100 == 0: 750 | for i in range(0,int(test_samples/batch_size -1)): 751 | features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 752 | Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size]) 753 | 754 | outputs = model(features) 755 | 756 | mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples) 757 | 758 | test_iter.append(n_iter) 759 | test_loss.append(mse) 760 | 761 | print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse)) 762 | 763 | 764 | 765 | # In[ ]: 766 | 767 | print(len(test_loss)) 768 | #plt.plot(test_loss) 769 | plt.plot(train_loss,'-') 770 | #plt.ylim([0.000,0.99]) 771 | 772 | 773 | # In[ ]: 774 | 775 | plt.plot(test_loss,'r') 776 | 777 | 778 | # #### Demornamization 779 | 780 | # In[ ]: 781 | 782 | rmse = np.sqrt(mse) 783 | 784 | 785 | # In[ ]: 786 | 787 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean() 788 | 789 | 790 | # In[ ]: 791 | 792 | print("rmse_denorm=",rmse_denorm) 793 | 794 | 795 | # In[ ]: 796 | 797 | print(df_new_test['Kt'].describe()) 798 | 799 | 800 | # ### Saving train and test losses to a csv 801 | 802 | # In[ ]: 803 | 804 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss, 'iteration':train_iter}, columns=['Train Loss','iteration']) 805 | df_trainLoss.to_csv('RNN Paper Results/Exp1_SiouxFalls_TrainLoss.csv') 806 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss, 'iteration':test_iter}, columns=['Test Loss','iteration']) 807 | df_testLoss.to_csv('RNN Paper Results/Exp1_SiouxFalls_TestLoss.csv') 808 | 809 | 810 | # In[ ]: 811 | 812 | 813 | 814 | -------------------------------------------------------------------------------- /fixed-time-horizon-prediction/Exp_1.2/Sioux_Falls_2hour.py: -------------------------------------------------------------------------------- 1 | 2 | # coding: utf-8 3 | 4 | # In[25]: 5 | 6 | import numpy as np 7 | import pandas as pd 8 | import datetime 9 | import glob 10 | import os.path 11 | from pandas.compat import StringIO 12 | 13 | 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI 15 | 16 | # In[26]: 17 | 18 | import itertools 19 | import matplotlib.pyplot as plt 20 | import pandas as pd 21 | import seaborn as sns 22 | 23 | 24 | # In[27]: 25 | 26 | #get_ipython().magic('matplotlib inline') 27 | sns.set_color_codes() 28 | 29 | 30 | # In[28]: 31 | 32 | import pvlib 33 | from pvlib import clearsky, atmosphere 34 | from pvlib.location import Location 35 | 36 | 37 | # In[29]: 38 | 39 | sif = Location(43.544,-96.73, 'US/Central', 448.086, 'Sioux Falls') 40 | 41 | 42 | # In[30]: 43 | 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min', 45 | tz=sif.tz) # 12 months of 2009 - For testing 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min', 47 | tz=sif.tz) # 24 months of 2010 and 2011 - For training 48 | 49 | 50 | # In[32]: 51 | 52 | cs_2009 = sif.get_clearsky(times2009) 53 | cs_2010and2011 = sif.get_clearsky(times2010and2011) # ineichen with climatology table by default 54 | #cs_2011 = bvl.get_clearsky(times2011) 55 | 56 | 57 | # In[33]: 58 | 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns 62 | 63 | 64 | # In[34]: 65 | 66 | cs_2009.reset_index(inplace=True) 67 | cs_2010and2011.reset_index(inplace=True) 68 | #cs_2011.reset_index(inplace=True) 69 | 70 | 71 | # In[35]: 72 | 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime()) 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year) 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month) 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day) 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour) 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute) 79 | 80 | 81 | # In[36]: 82 | 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime()) 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year) 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month) 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day) 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour) 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute) 89 | 90 | 91 | # In[37]: 92 | 93 | print(cs_2009.shape) 94 | print(cs_2010and2011.shape) 95 | #print(cs_2011.shape) 96 | 97 | 98 | # In[38]: 99 | 100 | cs_2009.drop(cs_2009.index[-1], inplace=True) 101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True) 102 | #cs_2011.drop(cs_2011.index[-1], inplace=True) 103 | 104 | 105 | # In[39]: 106 | 107 | print(cs_2009.shape) 108 | print(cs_2010and2011.shape) 109 | #print(cs_2011.shape) 110 | 111 | 112 | # In[40]: 113 | 114 | cs_2010and2011.head() 115 | 116 | 117 | # ### Import files from each year in a separate dataframe 118 | 119 | # 120 | # - year integer year, i.e., 1995 121 | # - jday integer Julian day (1 through 365 [or 366]) 122 | # - month integer number of the month (1-12) 123 | # - day integer day of the month(1-31) 124 | # - hour integer hour of the day (0-23) 125 | # - min integer minute of the hour (0-59) 126 | # - dt real decimal time (hour.decimalminutes, e.g., 23.5 = 2330) 127 | # - zen real solar zenith angle (degrees) 128 | # - dw_solar real downwelling global solar (Watts m^-2) 129 | # - uw_solar real upwelling global solar (Watts m^-2) 130 | # - direct_n real direct-normal solar (Watts m^-2) 131 | # - diffuse real downwelling diffuse solar (Watts m^-2) 132 | # - dw_ir real downwelling thermal infrared (Watts m^-2) 133 | # - dw_casetemp real downwelling IR case temp. (K) 134 | # - dw_dometemp real downwelling IR dome temp. (K) 135 | # - uw_ir real upwelling thermal infrared (Watts m^-2) 136 | # - uw_casetemp real upwelling IR case temp. (K) 137 | # - uw_dometemp real upwelling IR dome temp. (K) 138 | # - uvb real global UVB (milliWatts m^-2) 139 | # - par real photosynthetically active radiation (Watts m^-2) 140 | # - netsolar real net solar (dw_solar - uw_solar) (Watts m^-2) 141 | # - netir real net infrared (dw_ir - uw_ir) (Watts m^-2) 142 | # - totalnet real net radiation (netsolar+netir) (Watts m^-2) 143 | # - temp real 10-meter air temperature (?C) 144 | # - rh real relative humidity (%) 145 | # - windspd real wind speed (ms^-1) 146 | # - winddir real wind direction (degrees, clockwise from north) 147 | # - pressure real station pressure (mb) 148 | # 149 | 150 | # In[41]: 151 | 152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar', 153 | 'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp', 154 | 'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC', 155 | 'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC', 156 | 'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC', 157 | 'pressure','pressure_QC'] 158 | 159 | 160 | # In[42]: 161 | 162 | path = r'./data/Sioux_Falls/Exp_1_train' 163 | all_files = glob.glob(path + "/*.dat") 164 | all_files.sort() 165 | 166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 167 | index_col=False,header=None, names=cols) for f in all_files],ignore_index=True) 168 | df_big_train.shape 169 | 170 | 171 | # In[43]: 172 | 173 | path = r'./data/Sioux_Falls/Exp_1_test' 174 | all_files = glob.glob(path + "/*.dat") 175 | all_files.sort() 176 | 177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 178 | index_col=False,header=None, names=cols) for f in all_files),ignore_index=True) 179 | df_big_test.shape 180 | 181 | 182 | # In[44]: 183 | 184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape 185 | 186 | 187 | # ### Merging Clear Sky GHI And the big dataframe 188 | 189 | # In[45]: 190 | 191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min']) 192 | df_train.shape 193 | 194 | 195 | # In[46]: 196 | 197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min']) 198 | df_test.shape 199 | 200 | 201 | # In[47]: 202 | 203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model 204 | df_test.drop(['index'], axis=1, inplace=True) 205 | 206 | 207 | # In[48]: 208 | 209 | df_train.shape 210 | 211 | 212 | # ### Managing missing values 213 | 214 | # In[49]: 215 | 216 | # Resetting index 217 | df_train.reset_index(drop=True, inplace=True) 218 | df_test.reset_index(drop=True, inplace=True) 219 | 220 | 221 | # In[50]: 222 | 223 | # Dropping rows with two or more -9999.9 values in columns 224 | 225 | 226 | # In[51]: 227 | 228 | # Step1: Get indices of all rows with 2 or more -999 229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0] 230 | # Step2: Drop those indices 231 | df_train.drop(missing_data_indices, axis=0, inplace=True) 232 | # Checking that the rows are dropped 233 | df_train.shape 234 | 235 | 236 | # In[52]: 237 | 238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0] 239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True) 240 | df_test.shape 241 | 242 | 243 | # In[53]: 244 | 245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column 246 | 247 | 248 | # #### First resetting index after dropping rows in the previous part of the code 249 | 250 | # In[54]: 251 | 252 | # 2nd time - Reseting Index 253 | df_train.reset_index(drop=True, inplace=True) 254 | df_test.reset_index(drop=True, inplace=True) 255 | 256 | 257 | # In[55]: 258 | 259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 260 | 261 | 262 | # In[56]: 263 | 264 | len(one_miss_train_idx) 265 | 266 | 267 | # In[57]: 268 | 269 | df_train.shape 270 | 271 | 272 | # In[58]: 273 | 274 | col_names = df_train.columns 275 | from collections import defaultdict 276 | stats = defaultdict(int) 277 | total_single_missing_values = 0 278 | for name in col_names: 279 | col_mean = df_train[~(df_train[name] == -9999.9)][name].mean() 280 | missing_indices = np.where((df_train[name] == -9999.9)) 281 | stats[name] = len(missing_indices[0]) 282 | df_train[name].loc[missing_indices] = col_mean 283 | total_single_missing_values += sum(df_train[name] == -9999.9) 284 | 285 | 286 | 287 | # In[59]: 288 | 289 | df_col_min = df_train.apply(min, axis=0) 290 | df_col_max = df_train.apply(max, axis =0) 291 | #print(df_col_min, df_col_max) 292 | 293 | 294 | # In[60]: 295 | 296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0] 297 | 298 | 299 | # In[61]: 300 | 301 | len(train) 302 | 303 | 304 | # In[62]: 305 | 306 | # doing the same thing on test dataset 307 | 308 | 309 | # In[63]: 310 | 311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 312 | len(one_miss_test_idx) 313 | 314 | 315 | # In[64]: 316 | 317 | col_names_test = df_test.columns 318 | from collections import defaultdict 319 | stats_test = defaultdict(int) 320 | total_single_missing_values_test = 0 321 | for name in col_names_test: 322 | col_mean = df_test[~(df_test[name] == -9999.9)][name].mean() 323 | missing_indices = np.where((df_test[name] == -9999.9)) 324 | stats_test[name] = len(missing_indices[0]) 325 | df_test[name].loc[missing_indices] = col_mean 326 | total_single_missing_values_test += sum(df_test[name] == -9999.9) 327 | 328 | 329 | 330 | # In[65]: 331 | 332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0] 333 | 334 | 335 | # In[66]: 336 | 337 | len(test) 338 | 339 | 340 | # In[67]: 341 | 342 | df_train.shape 343 | 344 | 345 | # In[68]: 346 | 347 | df_test.shape 348 | 349 | 350 | # ### Exploratory Data Analysis 351 | 352 | # In[69]: 353 | 354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean() 355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean() 356 | j_day = df_test['jday'].unique() 357 | 358 | 359 | # In[70]: 360 | 361 | fig = plt.figure() 362 | 363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8]) 364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8]) 365 | 366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red') 367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green') 368 | 369 | axes1.set_xlabel('Days') 370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)') 371 | axes1.set_title('Solar Irradiance - Test Year 2009') 372 | axes1.legend(loc='best') 373 | 374 | fig.savefig('./RNN Paper Results/Exp1_2/Sioux_Falls/2hour_Figure 2.jpg', bbox_inches = 'tight') 375 | 376 | 377 | # In[71]: 378 | 379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg') 380 | #plt.title('observed dw_solar vs clear sky ghi') 381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)') 382 | plt.ylabel('Clear Sky GHI (Watts/m^2)') 383 | plt.savefig('./RNN Paper Results/Exp1_2/Sioux_Falls/2hour_Figure 3', bbox_inches='tight') 384 | 385 | 386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0 387 | 388 | # In[72]: 389 | 390 | df_train = df_train[df_train['ghi']!=0] 391 | df_test = df_test[df_test['ghi']!=0] 392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi'] 393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi'] 394 | 395 | 396 | # In[ ]: 397 | 398 | df_train.reset_index(inplace=True) 399 | df_test.reset_index(inplace=True) 400 | 401 | 402 | # In[ ]: 403 | 404 | print("test Kt max: "+str(df_test['Kt'].max())) 405 | print("test Kt min: "+str(df_test['Kt'].min())) 406 | print("test Kt mean: "+str(df_test['Kt'].mean())) 407 | print("\n") 408 | print("train Kt max: "+str(df_train['Kt'].max())) 409 | print("train Kt min: "+str(df_train['Kt'].min())) 410 | print("train Kt mean: "+str(df_train['Kt'].mean())) 411 | 412 | 413 | # In[ ]: 414 | 415 | plt.plot(df_train['Kt']) 416 | 417 | 418 | # In[ ]: 419 | 420 | plt.plot(df_test['Kt']) 421 | 422 | 423 | # In[ ]: 424 | 425 | df_train= df_train[df_train['Kt']< 5000] 426 | df_train= df_train[df_train['Kt']> -1000] 427 | df_test= df_test[df_test['Kt']< 5000] 428 | df_test= df_test[df_test['Kt']> -1000] 429 | 430 | 431 | # #### Group the data (train dataframe) 432 | 433 | # In[ ]: 434 | 435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean() 436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean() 437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean() 438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean() 439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean() 440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean() 441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean() 442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean() 443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean() 444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean() 445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean() 446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean() 447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean() 448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean() 449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean() 450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean() 451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean() 452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean() 453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean() 454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean() 455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean() 456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean() 457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean() 458 | 459 | 460 | # In[ ]: 461 | 462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp, 463 | uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1) 464 | 465 | 466 | # In[ ]: 467 | 468 | df_new_train.head() 469 | 470 | 471 | # #### Groupdata - test dataframe 472 | 473 | # In[ ]: 474 | 475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean() 476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean() 477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean() 478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean() 479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean() 480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean() 481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean() 482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean() 483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean() 484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean() 485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean() 486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean() 487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean() 488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean() 489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean() 490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean() 491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean() 492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean() 493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean() 494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean() 495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean() 496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean() 497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean() 498 | 499 | 500 | # In[ ]: 501 | 502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir, 503 | test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp, 504 | test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh, 505 | test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1) 506 | 507 | 508 | # In[ ]: 509 | 510 | df_new_test.loc[2].xs(17,level='day') 511 | 512 | 513 | # ### Shifting Kt values to make 1 hour ahead forecast 514 | 515 | # #### Train dataset 516 | 517 | # In[ ]: 518 | 519 | levels_index= [] 520 | for m in df_new_train.index.levels: 521 | levels_index.append(m) 522 | 523 | 524 | # In[ ]: 525 | 526 | for i in levels_index[0]: 527 | for j in levels_index[1]: 528 | df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-2) 529 | 530 | 531 | # In[ ]: 532 | 533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())] 534 | 535 | 536 | # #### Test dataset 537 | 538 | # In[ ]: 539 | 540 | levels_index2= [] 541 | for m in df_new_test.index.levels: 542 | levels_index2.append(m) 543 | 544 | 545 | # In[ ]: 546 | 547 | for i in levels_index2[0]: 548 | for j in levels_index2[1]: 549 | df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-2) 550 | 551 | 552 | # In[ ]: 553 | 554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())] 555 | 556 | 557 | # In[ ]: 558 | 559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()] 560 | 561 | 562 | # ### Normalize train and test dataframe 563 | 564 | # In[ ]: 565 | 566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min()) 567 | test_norm = (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min()) 568 | 569 | 570 | # In[ ]: 571 | 572 | train_norm.reset_index(inplace=True,drop=True) 573 | test_norm.reset_index(inplace=True,drop=True) 574 | 575 | 576 | # ### Making train and test sets with train_norm and test_norm 577 | 578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize 579 | 580 | # In[ ]: 581 | 582 | from fractions import gcd 583 | gcd(train_norm.shape[0],test_norm.shape[0]) 584 | 585 | 586 | # In[ ]: 587 | 588 | import math 589 | def roundup(x): 590 | return int(math.ceil(x / 100.0)) * 100 591 | 592 | 593 | # In[ ]: 594 | 595 | train_lim = roundup(train_norm.shape[0]) 596 | test_lim = roundup(test_norm.shape[0]) 597 | 598 | train_random = train_norm.sample(train_lim-train_norm.shape[0]) 599 | test_random = test_norm.sample(test_lim-test_norm.shape[0]) 600 | 601 | train_norm = train_norm.append(train_random) 602 | test_norm = test_norm.append(test_random) 603 | 604 | 605 | # In[ ]: 606 | 607 | X1 = train_norm.drop('Kt',axis=1) 608 | y1 = train_norm['Kt'] 609 | 610 | X2 = test_norm.drop('Kt',axis=1) 611 | y2 = test_norm['Kt'] 612 | 613 | 614 | # In[ ]: 615 | 616 | print("X1_train shape is {}".format(X1.shape)) 617 | print("y1_train shape is {}".format(y1.shape)) 618 | print("X2_test shape is {}".format(X2.shape)) 619 | print("y2_test shape is {}".format(y2.shape)) 620 | 621 | 622 | # In[ ]: 623 | 624 | X_train = np.array(X1) 625 | y_train = np.array(y1) 626 | X_test = np.array(X2) 627 | y_test = np.array(y2) 628 | 629 | 630 | # ### start of RNN 631 | 632 | # In[ ]: 633 | 634 | import torch 635 | import torch.nn as nn 636 | import torchvision.transforms as transforms 637 | import torchvision.datasets as dsets 638 | from torch.autograd import Variable 639 | 640 | 641 | # In[ ]: 642 | 643 | class RNNModel(nn.Module): 644 | def __init__(self, input_dim, hidden_dim, layer_dim, output_dim): 645 | super(RNNModel, self).__init__() 646 | #Hidden Dimension 647 | self.hidden_dim = hidden_dim 648 | 649 | # Number of hidden layers 650 | self.layer_dim = layer_dim 651 | 652 | #Building the RNN 653 | self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu') 654 | 655 | # Readout layer 656 | self.fc = nn.Linear(hidden_dim, output_dim) 657 | 658 | def forward(self, x): 659 | # Initializing the hidden state with zeros 660 | # (layer_dim, batch_size, hidden_dim) 661 | h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)) 662 | 663 | #One time step (the last one perhaps?) 664 | out, hn = self.rnn(x, h0) 665 | 666 | # Indexing hidden state of the last time step 667 | # out.size() --> ?? 668 | #out[:,-1,:] --> is it going to be 100,100 669 | out = self.fc(out[:,-1,:]) 670 | # out.size() --> 100,1 671 | return out 672 | 673 | 674 | 675 | # In[ ]: 676 | 677 | # Instantiating Model Class 678 | input_dim = 22 679 | hidden_dim = 15 680 | layer_dim = 1 681 | output_dim = 1 682 | batch_size = 100 683 | 684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim) 685 | 686 | # Instantiating Loss Class 687 | criterion = nn.MSELoss() 688 | 689 | # Instantiate Optimizer Class 690 | learning_rate = 0.001 691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) 692 | 693 | # converting numpy array to torch tensor 694 | X_train = torch.from_numpy(X_train) 695 | y_train = torch.from_numpy(y_train) 696 | X_test = torch.from_numpy(X_test) 697 | y_test = torch.from_numpy(y_test) 698 | 699 | # initializing lists to store losses over epochs: 700 | train_loss = [] 701 | test_loss = [] 702 | train_iter = [] 703 | test_iter = [] 704 | 705 | 706 | # In[ ]: 707 | 708 | # Training the model 709 | seq_dim = 1 710 | 711 | n_iter =0 712 | num_samples = len(X_train) 713 | test_samples = len(X_test) 714 | batch_size = 100 715 | num_epochs = 1000 716 | feat_dim = X_train.shape[1] 717 | 718 | X_train = X_train.type(torch.FloatTensor) 719 | y_train = y_train.type(torch.FloatTensor) 720 | X_test = X_test.type(torch.FloatTensor) 721 | y_test = y_test.type(torch.FloatTensor) 722 | 723 | for epoch in range(num_epochs): 724 | for i in range(0, int(num_samples/batch_size -1)): 725 | 726 | 727 | features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 728 | Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size]) 729 | 730 | #print("Kt_value={}".format(Kt_value)) 731 | 732 | optimizer.zero_grad() 733 | 734 | outputs = model(features) 735 | #print("outputs ={}".format(outputs)) 736 | 737 | loss = criterion(outputs, Kt_value) 738 | 739 | train_loss.append(loss.data[0]) 740 | train_iter.append(n_iter) 741 | 742 | #print("loss = {}".format(loss)) 743 | loss.backward() 744 | 745 | optimizer.step() 746 | 747 | n_iter += 1 748 | 749 | if n_iter%100 == 0: 750 | for i in range(0,int(test_samples/batch_size -1)): 751 | features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim) 752 | Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size]) 753 | 754 | outputs = model(features) 755 | 756 | mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples) 757 | 758 | test_iter.append(n_iter) 759 | test_loss.append(mse) 760 | 761 | print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse)) 762 | 763 | 764 | 765 | # In[ ]: 766 | 767 | print(len(test_loss)) 768 | #plt.plot(test_loss) 769 | plt.plot(train_loss,'-') 770 | #plt.ylim([0.000,0.99]) 771 | 772 | 773 | # In[ ]: 774 | 775 | plt.plot(test_loss,'r') 776 | 777 | 778 | # #### Demornamization 779 | 780 | # In[ ]: 781 | 782 | rmse = np.sqrt(mse) 783 | 784 | 785 | # In[ ]: 786 | 787 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean() 788 | 789 | 790 | # In[ ]: 791 | 792 | print("rmse_denorm=",rmse_denorm) 793 | 794 | 795 | # In[ ]: 796 | 797 | print(df_new_test['Kt'].describe()) 798 | 799 | 800 | # ### Saving train and test losses to a csv 801 | 802 | # In[ ]: 803 | 804 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss, 'iteration':train_iter}, columns=['Train Loss','iteration']) 805 | df_trainLoss.to_csv('./RNN Paper Results/Exp1_2/Sioux_Falls/2hour_SiouxFalls_TrainLoss.csv') 806 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss, 'iteration':test_iter}, columns=['Test Loss','iteration']) 807 | df_testLoss.to_csv('./RNN Paper Results/Exp1_2/Sioux_Falls/2hour_TestLoss.csv') 808 | 809 | 810 | # In[ ]: 811 | 812 | 813 | 814 | -------------------------------------------------------------------------------- /multi-tscale-slim.yaml: -------------------------------------------------------------------------------- 1 | name: multi-tscale 2 | channels: 3 | - defaults 4 | dependencies: 5 | - blas=1.0 6 | - ca-certificates=2019.1.23 7 | - certifi=2018.11.29 8 | - cffi=1.11.5 9 | - cudatoolkit=9.0 10 | - freetype=2.9.1 11 | - intel-openmp=2019.1 12 | - jpeg=9b 13 | - libedit=3.1.20170329 14 | - libffi=3.2.1 15 | - libgcc-ng=8.2.0 16 | - libgfortran-ng=7.3.0 17 | - libpng=1.6.36 18 | - libstdcxx-ng=8.2.0 19 | - libtiff=4.0.10 20 | - mkl=2018.0.3 21 | - mkl_fft=1.0.6 22 | - mkl_random=1.0.1 23 | - nccl=1.3.5 24 | - ncurses=6.1 25 | - ninja=1.8.2 26 | - numpy=1.15.4 27 | - numpy-base=1.15.4 28 | - olefile=0.46 29 | - openssl=1.1.1a 30 | - pandas=0.24.1 31 | - pillow=5.4.1 32 | - pip=10.0.1 33 | - pycparser=2.19 34 | - python=3.6.8 35 | - python-dateutil=2.7.5 36 | - pytorch=0.4.1 37 | - pytz=2018.9 38 | - readline=7.0 39 | - setuptools=39.2.0 40 | - six=1.12.0 41 | - sqlite=3.26.0 42 | - tk=8.6.8 43 | - torchvision=0.2.1 44 | - wheel=0.31.1 45 | - xz=5.2.4 46 | - zlib=1.2.11 47 | - zstd=1.3.7 48 | - pip: 49 | - atomicwrites==1.2.0 50 | - attrs==18.1.0 51 | - bleach==1.5.0 52 | - chardet==3.0.4 53 | - click==6.7 54 | - cmake==3.12.0 55 | - colorama==0.3.9 56 | - coverage==4.5.1 57 | - cycler==0.10.0 58 | - cython==0.29.2 59 | - decorator==4.3.0 60 | - dill==0.2.8.2 61 | - filelock==3.0.10 62 | - funcsigs==1.0.2 63 | - future==0.16.0 64 | - html5lib==0.9999999 65 | - hyperopt==0.1.1 66 | - idna==2.7 67 | - kiwisolver==1.0.1 68 | - lz4==2.1.0 69 | - markdown==2.6.11 70 | - matplotlib==2.2.2 71 | - more-itertools==4.3.0 72 | - networkx==2.1 73 | - nose2==0.8.0 74 | - numexpr==2.6.6 75 | - pluggy==0.7.1 76 | - psutil==5.4.7 77 | - pvlib==0.5.2 78 | - py==1.5.4 79 | - py-trees==0.8.3 80 | - pymongo==3.7.1 81 | - pyparsing==2.2.0 82 | - pytest==3.7.3 83 | - pyyaml==3.13 84 | - pyzmq==17.1.2 85 | - requests==2.19.1 86 | - scipy==1.1.0 87 | - seaborn==0.9.0 88 | - tables==3.4.4 89 | - torch==0.4.1 90 | - tqdm==4.26.0 91 | - urllib3==1.23 92 | - werkzeug==0.14.1 93 | - zmq==0.0.0 94 | --------------------------------------------------------------------------------