├── .gitignore
├── README.md
├── conda_env.yaml
├── data_dir_struct.txt
├── fixed-time-horizon-prediction
    ├── Exp_1.2
    │   ├── Bondville_3hour.py
    │   ├── Bondville_4hour.py
    │   ├── Boulder_3hour.py
    │   ├── Boulder_4hour.py
    │   ├── Desert_Rock_2hour.py
    │   ├── Desert_Rock_3hour.py
    │   ├── Desert_Rock_4hour.py
    │   ├── Exp_1.2_Bondville_2hour.py
    │   ├── Exp_1.2_Boulder_2hour.py
    │   ├── Exp_1.2_FortPeck_2hour.py
    │   ├── Exp_1.2_GoodwinCreek_2hour.py
    │   ├── Exp_1.2_PenState_2hour.py
    │   ├── Exp_1.3_DesertRock_3hour.py
    │   ├── Fort_Peck_3hour.py
    │   ├── Fort_Peck_4hour.py
    │   ├── Goodwin_Creek_2hour.py
    │   ├── Goodwin_Creek_3hour.py
    │   ├── Goodwin_Creek_4hour.py
    │   ├── Penn_State_2hour.py
    │   ├── Penn_State_3hour.py
    │   ├── Penn_State_4hour.py
    │   ├── RNN_Fort_Peck.py
    │   ├── RNN_Sioux_Falls.py
    │   ├── Sioux_Falls_2hour.py
    │   ├── Sioux_Falls_3hour.py
    │   └── Sioux_Falls_4hour.py
    ├── Exp_1.2_Bondville_2hour.ipynb
    ├── Exp_1.2_Boulder_2hour.ipynb
    ├── Exp_1.2_Desert_Rock_2hour.ipynb
    ├── Exp_1.2_FortPeck_2hour.ipynb
    ├── Exp_1.2_GoodwinCreek_2hour.ipynb
    ├── Exp_1.2_PenState_2hour.ipynb
    ├── Exp_1.2_SiouxFalls_2hour.ipynb
    ├── Exp_1.3_Bondville_3hour.ipynb
    ├── Exp_1_RNN_Bondville.ipynb
    ├── Exp_1_RNN_Boulder.ipynb
    ├── Exp_1_RNN_Desert_Rock.ipynb
    ├── Exp_1_RNN_Fort_Peck.ipynb
    ├── Exp_1_RNN_Goodwin_Creek.ipynb
    ├── Exp_1_RNN_Penn_State.ipynb
    └── Exp_1_RNN_Sioux_Falls.ipynb
├── multi-time-horizon-prediction
    ├── Exp_2.1_multi-time-scale_All_Locations.ipynb
    ├── Exp_2.1_multi-time-scale_All_Locations.py
    ├── Exp_2.1_multi-time-scale_Bondville_2009.py
    ├── Exp_2.1_multi-time-scale_Bondville_2015.py
    ├── Exp_2.1_multi-time-scale_Bondville_2016.py
    ├── Exp_2.1_multi-time-scale_Bondville_2017.py
    ├── Exp_2.1_multi-time-scale_Boulder_2009.py
    ├── Exp_2.1_multi-time-scale_Boulder_2015.py
    ├── Exp_2.1_multi-time-scale_Boulder_2016.py
    ├── Exp_2.1_multi-time-scale_Boulder_2017.py
    ├── Exp_2.1_multi-time-scale_Desert_Rock_2009.py
    ├── Exp_2.1_multi-time-scale_Desert_Rock_2015.py
    ├── Exp_2.1_multi-time-scale_Desert_Rock_2016.py
    ├── Exp_2.1_multi-time-scale_Desert_Rock_2017.py
    ├── Exp_2.1_multi-time-scale_Fort_Peck_2009.py
    ├── Exp_2.1_multi-time-scale_Fort_Peck_2015.py
    ├── Exp_2.1_multi-time-scale_Fort_Peck_2016.py
    ├── Exp_2.1_multi-time-scale_Fort_Peck_2017.py
    ├── Exp_2.1_multi-time-scale_Goodwin_Creek_2009.py
    ├── Exp_2.1_multi-time-scale_Goodwin_Creek_2015.py
    ├── Exp_2.1_multi-time-scale_Goodwin_Creek_2016.py
    ├── Exp_2.1_multi-time-scale_Goodwin_Creek_2017.py
    ├── Exp_2.1_multi-time-scale_Penn_State_2009.py
    ├── Exp_2.1_multi-time-scale_Penn_State_2015.py
    ├── Exp_2.1_multi-time-scale_Penn_State_2016.py
    ├── Exp_2.1_multi-time-scale_Penn_State_2017.py
    ├── Exp_2.1_multi-time-scale_Sioux_Falls_2009.py
    ├── Exp_2.1_multi-time-scale_Sioux_Falls_2015.py
    ├── Exp_2.1_multi-time-scale_Sioux_Falls_2016.py
    └── Exp_2.1_multi-time-scale_Sioux_Falls_2017.py
└── multi-tscale-slim.yaml


/.gitignore:
--------------------------------------------------------------------------------
1 | *.jpg
2 | *.zip
3 | .ipynb_checkpoints
4 | *Test.ipynb
5 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Multi-time Horizon Solar Forecasting using Recurrent Neural Networks
 2 | 
 3 | This repository contains code to reproduce the results published in the ["Multi-time-horizon Solar Forecasting Using Recurrent Neural Network"](https://arxiv.org/abs/1807.05459) paper. In addition, a LSTM implementation for multi-time horizon solar forecasting is available at this repository: ["PyTorch implementation of LSTM Model for Multi-time-horizon Solar Forecasting"](https://github.com/sakshi-mishra/LSTM_Solar_Forecasting).
 4 | 
 5 | ## Conda environment for running the code
 6 | 
 7 | A conda environment file is provided for convenience. Assuming you have Anaconda python distribution available on your computer, you can create a new conda environment with the necessary packages using the following command:
 8 | 
 9 | `conda env create -f multi-tscale-slim.yaml -n "multi_time_horizon"`
10 | 
11 | ## Predictions with fixed time horizon 
12 | The Jupyter Notebooks in [fixed-time-horizon-prediction](fixed-time-horizon-prediction) explain the experiments on forecasting solar on fixed time interval basis as described in Section V.A of the [paper](https://arxiv.org/abs/1807.05459). 
13 | 
14 |   There are seven different sites for which the prediction is being done, thus seven ipython notebooks for each site (1-hour ahead forecast).
15 |   * [fixed-time-horizon-prediction/Exp_1_RNN_Bondville.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Bondville.ipynb) : Code to train and predict for Bondville location, 1-hour ahead forecast
16 |   * [fixed-time-horizon-prediction/Exp_1_RNN_Boulder.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Boulder.ipynb): Code to train and predict for Boulder location, 1- hour ahead forecast
17 |   * [fixed-time-horizon-prediction/Exp_1_RNN_Desert_Rock.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Desert_Rock.ipynb): Code to train and predict for Desert Rock location, 1- hour ahead forecast
18 |   * [fixed-time-horizon-prediction/Exp_1_RNN_Fort_Peck.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Fort_Peck.ipynb): Code to train and predict for Fore Peck location, 1- hour ahead forecast
19 |   * [fixed-time-horizon-prediction/Exp_1_RNN_Goodwin_Creek.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Goodwin_Creek.ipynb): Code to train and predict for Goodwin Creek location, 1- hour ahead forecast
20 |   * [fixed-time-horizon-prediction/Exp_1_RNN_Penn_State.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Penn_State.ipynb): code to train and predict for Penn State location 1-hour ahead forecast
21 |   * [fixed-time-horizon-prediction/Exp_1_RNN_Sioux_Falls.ipynb](fixed-time-horizon-prediction/Exp_1_RNN_Sioux_Falls.ipynb): code to train and predict for Sioux Falls location 1-hour ahead forecast
22 | 
23 | 2-hour ahead forecast Jupyter Notebooks for all seven locations:
24 |   * [fixed-time-horizon-prediction/Exp_1.2_Bondville_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_Bondville_2hour.ipynb): code to train and predict for Bondville location 2-hour ahead forecast
25 |   * [fixed-time-horizon-prediction/Exp_1.2_Boulder_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_Boulder_2hour.ipynb): code to train and predict for Boulder location 2-hour ahead forecast
26 |   *  [fixed-time-horizon-prediction/Exp_1.2_Desert_Rock_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_Desert_Rock_2hour.ipynb) : code to train and predict for Desert Rock location 2-hour ahead forecast
27 |   *  [fixed-time-horizon-prediction/Exp_1.2_FortPeck_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_FortPeck_2hour.ipynb): code to train and predict for Fort Peck location 2-hour ahead forecast
28 |   *  [fixed-time-horizon-prediction/Exp_1.2_GoodwinCreek_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_GoodwinCreek_2hour.ipynb): code to train and predict for Goodwin Creek location 2-hour ahead forecast
29 |   *  [fixed-time-horizon-prediction/Exp_1.2_PenState_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_PenState_2hour.ipynb): code to train and predict for Penn State location 2-hour ahead forecast
30 |   *  [fixed-time-horizon-prediction/Exp_1.2_SiouxFalls_2hour.ipynb](fixed-time-horizon-prediction/Exp_1.2_SiouxFalls_2hour.ipynb): code to train and predict for Sioux Falls location 2-hour ahead forecast
31 | 
32 | 3-hour ahead forecast Jupyter Notebook for Bondville location:
33 |   *  [fixed-time-horizon-prediction/Exp_1.3_Bondville_3hour.ipynb](fixed-time-horizon-prediction/Exp_1.3_Bondville_3hour.ipynb): code to train and predict for Bondville location 3-hour ahead forecast
34 | 
35 | #### [fixed-time-horizon-prediction/Exp_1.2](fixed-time-horizon-prediction/Exp_1.2) folder contains the .py version of the Jupyter Notebooks listed above, along with additional .py files predicting 3-hour ahead and 4-ahead forecasts for all the seven locations.
36 | 
37 | ## Predictions with multi-time horizon
38 | 
39 | The python scripts in [multi-time-horizon-prediction](multi-time-horizon-prediction) explain the experiments on forecasting solar on multi-time-horizon basis as described in Section V.B of the [paper](https://arxiv.org/abs/1807.05459).
40 | 
41 | The models are trained on year 2010 and 2011. The predictions are done for year 2009, 2015, 2016 and 2017 for all seven locations. There are four test years and seven locations, so there are total 4*7 .py files for training and prediction purpose.
42 | 
43 | The Jupyter Notebook [multi-time-horizon-prediction/Exp_2.1_multi-time-scale_All_Locations.ipynb](multi-time-horizon-prediction/Exp_2.1_multi-time-scale_All_Locations.ipynb) contains the code for predicting the solar irradiance for all the locations as well as all the test years (2009, 2015, 2016 and 2017).
44 | 
45 | ### Training/Testing Data
46 | 
47 | The training and testing data needs to be downloaded from the [NOAA FTP server](ftp://aftp.cmdl.noaa.gov/data/radiation/surfrad/) for the locations/sites. You can use GNU wget to automate the download process. The scripts assume that the data is in the *data* folder as per the structure outlined in the [data_dir_struct.txt](data_dir_struct.txt) file.
48 | 
49 | If you face any issues running the code or reproducing the results, create an issue on this repo. Contributions are welcome too :)
50 | 
51 | ## Citing
52 | If you find this work useful for your research, please cite the paper:
53 | 
54 | ```bibtex
55 | @misc{1807.05459,
56 | Author = {Sakshi Mishra and Praveen Palanisamy},
57 | Title = {Multi-time-horizon Solar Forecasting Using Recurrent Neural Network},
58 | Year = {2018},
59 | Eprint = {arXiv:1807.05459},
60 | }
61 | ```
62 | 


--------------------------------------------------------------------------------
/conda_env.yaml:
--------------------------------------------------------------------------------
 1 | name: multi_time_horizon
 2 | channels:
 3 |   - defaults
 4 | dependencies:
 5 |   - ca-certificates=2018.03.07=0
 6 |   - certifi=2018.4.16=py36_0
 7 |   - libedit=3.1.20170329=h6b74fdf_2
 8 |   - libffi=3.2.1=hd88cf55_4
 9 |   - libgcc-ng=7.2.0=hdf63c60_3
10 |   - libstdcxx-ng=7.2.0=hdf63c60_3
11 |   - ncurses=6.1=hf484d3e_0
12 |   - openssl=1.0.2o=h20670df_0
13 |   - pip=10.0.1=py36_0
14 |   - python=3.6.6=hc3d631a_0
15 |   - readline=7.0=ha6073c6_4
16 |   - setuptools=39.2.0=py36_0
17 |   - sqlite=3.24.0=h84994c4_0
18 |   - tk=8.6.7=hc745277_3
19 |   - wheel=0.31.1=py36_0
20 |   - xz=5.2.4=h14c3975_4
21 |   - zlib=1.2.11=ha838bed_2
22 |   - pip:
23 |     - cycler==0.10.0
24 |     - kiwisolver==1.0.1
25 |     - matplotlib==2.2.2
26 |     - numexpr==2.6.6
27 |     - numpy==1.15.0
28 |     - pandas==0.23.3
29 |     - pvlib==0.5.2
30 |     - pyparsing==2.2.0
31 |     - python-dateutil==2.7.3
32 |     - pytz==2018.5
33 |     - scipy==1.1.0
34 |     - seaborn==0.9.0
35 |     - six==1.11.0
36 |     - tables==3.4.4
37 | 


--------------------------------------------------------------------------------
/data_dir_struct.txt:
--------------------------------------------------------------------------------
 1 | data
 2 | ├── Bondville
 3 | │   ├── Exp_1_test
 4 | │   │   ├── 2009
 5 | │   │   ├── 2015
 6 | │   │   ├── 2016
 7 | │   │   └── 2017
 8 | │   └── Exp_1_train
 9 | ├── Boulder
10 | │   ├── Exp_1_test
11 | │   │   ├── 2009
12 | │   │   ├── 2015
13 | │   │   ├── 2016
14 | │   │   └── 2017
15 | │   └── Exp_1_train
16 | ├── Desert_Rock
17 | │   ├── Exp_1_test
18 | │   │   ├── 2009
19 | │   │   ├── 2015
20 | │   │   ├── 2016
21 | │   │   └── 2017
22 | │   └── Exp_1_train
23 | ├── Fort_Peck
24 | │   ├── Exp_1_test
25 | │   │   ├── 2009
26 | │   │   ├── 2015
27 | │   │   ├── 2016
28 | │   │   └── 2017
29 | │   └── Exp_1_train
30 | ├── Goodwin_Creek
31 | │   ├── Exp_1_test
32 | │   │   ├── 2009
33 | │   │   ├── 2015
34 | │   │   ├── 2016
35 | │   │   └── 2017
36 | │   └── Exp_1_train
37 | ├── Penn_State
38 | │   ├── Exp_1_test
39 | │   │   ├── 2009
40 | │   │   ├── 2015
41 | │   │   ├── 2016
42 | │   │   └── 2017
43 | │   └── Exp_1_train
44 | └── Sioux_Falls
45 |     ├── Exp_1_test
46 |     │   ├── 2009
47 |     │   ├── 2015
48 |     │   ├── 2016
49 |     │   └── 2017
50 |     └── Exp_1_train


--------------------------------------------------------------------------------
/fixed-time-horizon-prediction/Exp_1.2/Exp_1.2_FortPeck_2hour.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # coding: utf-8
  3 | 
  4 | # In[1]:
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | import datetime
  9 | import glob
 10 | import os.path
 11 | from pandas.compat import StringIO
 12 | 
 13 | 
 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI
 15 | 
 16 | # In[2]:
 17 | 
 18 | import itertools
 19 | import matplotlib.pyplot as plt
 20 | import pandas as pd
 21 | import seaborn as sns
 22 | 
 23 | 
 24 | # In[3]:
 25 | 
 26 | #get_ipython().magic('matplotlib inline')
 27 | sns.set_color_codes()
 28 | 
 29 | 
 30 | # In[4]:
 31 | 
 32 | import pvlib
 33 | from pvlib import clearsky, atmosphere
 34 | from pvlib.location import Location
 35 | 
 36 | 
 37 | # In[5]:
 38 | 
 39 | ftp = Location(48,-106.449, 'US/Mountain', 630.0216, 'Fort Peck')
 40 | 
 41 | 
 42 | # In[6]:
 43 | 
 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min',
 45 |                         tz=ftp.tz)   # 12 months of 2009 - For testing
 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min',
 47 |                         tz=ftp.tz)   # 24 months of 2010 and 2011 - For training
 48 | 
 49 | 
 50 | # In[7]:
 51 | 
 52 | cs_2009 = ftp.get_clearsky(times2009) 
 53 | cs_2010and2011 = ftp.get_clearsky(times2010and2011) # ineichen with climatology table by default
 54 | #cs_2011 = bvl.get_clearsky(times2011) 
 55 | 
 56 | 
 57 | # In[8]:
 58 | 
 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 62 | 
 63 | 
 64 | # In[9]:
 65 | 
 66 | cs_2009.reset_index(inplace=True)
 67 | cs_2010and2011.reset_index(inplace=True)
 68 | #cs_2011.reset_index(inplace=True)
 69 | 
 70 | 
 71 | # In[10]:
 72 | 
 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime())
 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year)
 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month)
 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day)
 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour)
 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute)
 79 | 
 80 | 
 81 | # In[11]:
 82 | 
 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime())
 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year)
 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month)
 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day)
 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour)
 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute)
 89 | 
 90 | 
 91 | # In[12]:
 92 | 
 93 | print(cs_2009.shape)
 94 | print(cs_2010and2011.shape)
 95 | #print(cs_2011.shape)
 96 | 
 97 | 
 98 | # In[13]:
 99 | 
100 | cs_2009.drop(cs_2009.index[-1], inplace=True)
101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True)
102 | #cs_2011.drop(cs_2011.index[-1], inplace=True)
103 | 
104 | 
105 | # In[14]:
106 | 
107 | print(cs_2009.shape)
108 | print(cs_2010and2011.shape)
109 | #print(cs_2011.shape)
110 | 
111 | 
112 | # In[15]:
113 | 
114 | cs_2010and2011.head()
115 | 
116 | 
117 | # ### Import files from each year in a separate dataframe
118 | 
119 | # 
120 | # - year            integer	 year, i.e., 1995
121 | # - jday            integer	 Julian day (1 through 365 [or 366])
122 | # - month           integer	 number of the month (1-12)
123 | # - day             integer	 day of the month(1-31)
124 | # - hour            integer	 hour of the day (0-23)
125 | # - min             integer	 minute of the hour (0-59)
126 | # - dt              real	 decimal time (hour.decimalminutes, e.g., 23.5 = 2330)
127 | # - zen             real	 solar zenith angle (degrees)
128 | # - dw_solar        real	 downwelling global solar (Watts m^-2)
129 | # - uw_solar        real	 upwelling global solar (Watts m^-2)
130 | # - direct_n        real	 direct-normal solar (Watts m^-2)
131 | # - diffuse         real	 downwelling diffuse solar (Watts m^-2)
132 | # - dw_ir           real	 downwelling thermal infrared (Watts m^-2)
133 | # - dw_casetemp     real	 downwelling IR case temp. (K)
134 | # - dw_dometemp     real	 downwelling IR dome temp. (K)
135 | # - uw_ir           real	 upwelling thermal infrared (Watts m^-2)
136 | # - uw_casetemp     real	 upwelling IR case temp. (K)
137 | # - uw_dometemp     real	 upwelling IR dome temp. (K)
138 | # - uvb             real	 global UVB (milliWatts m^-2)
139 | # - par             real	 photosynthetically active radiation (Watts m^-2)
140 | # - netsolar        real	 net solar (dw_solar - uw_solar) (Watts m^-2)
141 | # - netir           real	 net infrared (dw_ir - uw_ir) (Watts m^-2)
142 | # - totalnet        real	 net radiation (netsolar+netir) (Watts m^-2)
143 | # - temp            real	 10-meter air temperature (?C)
144 | # - rh              real	 relative humidity (%)
145 | # - windspd         real	 wind speed (ms^-1)
146 | # - winddir         real	 wind direction (degrees, clockwise from north)
147 | # - pressure        real	 station pressure (mb)
148 | # 
149 | 
150 | # In[16]:
151 | 
152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar',
153 |        'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp',
154 |        'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC',
155 |        'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC',
156 |        'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC',
157 |        'pressure','pressure_QC']
158 | 
159 | 
160 | # In[17]:
161 | 
162 | path = r'./data/Fort_Peck/Exp_1_train'
163 | all_files = glob.glob(path + "/*.dat")
164 | all_files.sort()
165 | 
166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
167 |                  index_col=False,header=None, names=cols) for f in all_files],ignore_index=True)
168 | df_big_train.shape
169 | 
170 | 
171 | # In[18]:
172 | 
173 | path = r'./data/Fort_Peck/Exp_1_test'
174 | all_files = glob.glob(path + "/*.dat")
175 | all_files.sort()
176 | 
177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
178 |                  index_col=False,header=None, names=cols) for f in all_files),ignore_index=True)
179 | df_big_test.shape
180 | 
181 | 
182 | # In[21]:
183 | 
184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape
185 | 
186 | 
187 | # ### Merging Clear Sky GHI And the big dataframe
188 | 
189 | # In[ ]:
190 | 
191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min'])
192 | df_train.shape
193 | 
194 | 
195 | # In[ ]:
196 | 
197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min'])
198 | df_test.shape
199 | 
200 | 
201 | # In[ ]:
202 | 
203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model
204 | df_test.drop(['index'], axis=1, inplace=True)
205 | 
206 | 
207 | # In[ ]:
208 | 
209 | df_train.shape
210 | 
211 | 
212 | # ### Managing missing values
213 | 
214 | # In[ ]:
215 | 
216 | # Resetting index
217 | df_train.reset_index(drop=True, inplace=True)
218 | df_test.reset_index(drop=True, inplace=True)
219 | 
220 | 
221 | # In[ ]:
222 | 
223 | # Dropping rows with two or more -9999.9 values in columns
224 | 
225 | 
226 | # In[ ]:
227 | 
228 | # Step1: Get indices of all rows with 2 or more -999
229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0]
230 | # Step2: Drop those indices
231 | df_train.drop(missing_data_indices, axis=0, inplace=True)
232 | # Checking that the rows are dropped
233 | df_train.shape
234 | 
235 | 
236 | # In[ ]:
237 | 
238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0]
239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True)
240 | df_test.shape
241 | 
242 | 
243 | # In[ ]:
244 | 
245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column
246 | 
247 | 
248 | # #### First resetting index after dropping rows in the previous part of the code
249 | 
250 | # In[ ]:
251 | 
252 | # 2nd time - Reseting Index
253 | df_train.reset_index(drop=True, inplace=True)
254 | df_test.reset_index(drop=True, inplace=True)
255 | 
256 | 
257 | # In[ ]:
258 | 
259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
260 | 
261 | 
262 | # In[ ]:
263 | 
264 | len(one_miss_train_idx)
265 | 
266 | 
267 | # In[ ]:
268 | 
269 | df_train.shape
270 | 
271 | 
272 | # In[ ]:
273 | 
274 | col_names = df_train.columns
275 | from collections import defaultdict
276 | stats = defaultdict(int)
277 | total_single_missing_values = 0
278 | for name in col_names:
279 |     col_mean = df_train[~(df_train[name] == -9999.9)][name].mean()
280 |     missing_indices = np.where((df_train[name] == -9999.9))
281 |     stats[name] = len(missing_indices[0])
282 |     df_train[name].loc[missing_indices] = col_mean
283 |     total_single_missing_values += sum(df_train[name] == -9999.9)
284 |     
285 | 
286 | 
287 | # In[ ]:
288 | 
289 | df_col_min = df_train.apply(min, axis=0)
290 | df_col_max = df_train.apply(max, axis =0)
291 | #print(df_col_min, df_col_max)
292 | 
293 | 
294 | # In[ ]:
295 | 
296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
297 | 
298 | 
299 | # In[ ]:
300 | 
301 | len(train)
302 | 
303 | 
304 | # In[ ]:
305 | 
306 | # doing the same thing on test dataset
307 | 
308 | 
309 | # In[ ]:
310 | 
311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
312 | len(one_miss_test_idx)
313 | 
314 | 
315 | # In[ ]:
316 | 
317 | col_names_test = df_test.columns
318 | from collections import defaultdict
319 | stats_test = defaultdict(int)
320 | total_single_missing_values_test = 0
321 | for name in col_names_test:
322 |     col_mean = df_test[~(df_test[name] == -9999.9)][name].mean()
323 |     missing_indices = np.where((df_test[name] == -9999.9))
324 |     stats_test[name] = len(missing_indices[0])
325 |     df_test[name].loc[missing_indices] = col_mean
326 |     total_single_missing_values_test += sum(df_test[name] == -9999.9)
327 |     
328 | 
329 | 
330 | # In[8]:
331 | 
332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
333 | 
334 | 
335 | # In[ ]:
336 | 
337 | len(test)
338 | 
339 | 
340 | # In[ ]:
341 | 
342 | df_train.shape
343 | 
344 | 
345 | # In[ ]:
346 | 
347 | df_test.shape
348 | 
349 | 
350 | # ### Exploratory Data Analysis
351 | 
352 | # In[ ]:
353 | 
354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean()
355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean()
356 | j_day = df_test['jday'].unique()
357 | 
358 | 
359 | # In[ ]:
360 | 
361 | fig = plt.figure()
362 | 
363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8])
364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8])
365 | 
366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red')
367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green')
368 | 
369 | axes1.set_xlabel('Days')
370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)')
371 | axes1.set_title('Solar Irradiance - Test Year 2009')
372 | axes1.legend(loc='best')
373 | 
374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight')
375 | 
376 | 
377 | # In[ ]:
378 | 
379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg')
380 | #plt.title('observed dw_solar vs clear sky ghi')
381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)')
382 | plt.ylabel('Clear Sky GHI (Watts/m^2)')
383 | plt.savefig('Figure 3', bbox_inches='tight')
384 | 
385 | 
386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0
387 | 
388 | # In[ ]:
389 | 
390 | df_train = df_train[df_train['ghi']!=0]
391 | df_test = df_test[df_test['ghi']!=0]
392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi']
393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi']
394 | 
395 | 
396 | # In[ ]:
397 | 
398 | df_train.reset_index(inplace=True)
399 | df_test.reset_index(inplace=True)
400 | 
401 | 
402 | # In[ ]:
403 | 
404 | print("test Kt max: "+str(df_test['Kt'].max()))
405 | print("test Kt min: "+str(df_test['Kt'].min()))
406 | print("test Kt mean: "+str(df_test['Kt'].mean()))
407 | print("\n")
408 | print("train Kt max: "+str(df_train['Kt'].max()))
409 | print("train Kt min: "+str(df_train['Kt'].min()))
410 | print("train Kt mean: "+str(df_train['Kt'].mean()))
411 | 
412 | 
413 | # In[ ]:
414 | 
415 | plt.plot(df_train['Kt'])
416 | 
417 | 
418 | # In[ ]:
419 | 
420 | plt.plot(df_test['Kt'])
421 | 
422 | 
423 | # In[ ]:
424 | 
425 | df_train= df_train[df_train['Kt']< 5000]
426 | df_train= df_train[df_train['Kt']> -1000]
427 | df_test= df_test[df_test['Kt']< 5000]
428 | df_test= df_test[df_test['Kt']> -1000]
429 | 
430 | 
431 | # #### Group the data (train dataframe)
432 | 
433 | # In[ ]:
434 | 
435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean()
436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean()
437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean()
438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean()
439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean()
440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean()
441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean()
442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean()
443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean()
444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean()
445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean()
446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean()
447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean()
448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean()
449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean()
450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean()
451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean()
452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean()
453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean()
454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean()
455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean()
456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean()
457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean()
458 | 
459 | 
460 | # In[ ]:
461 | 
462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp,
463 |                     uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1)
464 | 
465 | 
466 | # In[ ]:
467 | 
468 | df_new_train.head()
469 | 
470 | 
471 | # #### Groupdata - test dataframe
472 | 
473 | # In[ ]:
474 | 
475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean()
476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean()
477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean()
478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean()
479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean()
480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean()
481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean()
482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean()
483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean()
484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean()
485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean()
486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean()
487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean()
488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean()
489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean()
490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean()
491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean()
492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean()
493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean()
494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean()
495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean()
496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean()
497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean()
498 | 
499 | 
500 | # In[ ]:
501 | 
502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir,
503 |                          test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp,
504 |                     test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh,
505 |                          test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1)
506 | 
507 | 
508 | # In[ ]:
509 | 
510 | df_new_test.loc[2].xs(17,level='day')
511 | 
512 | 
513 | # ### Shifting Kt values to make 1 hour ahead forecast
514 | 
515 | # #### Train dataset
516 | 
517 | # In[ ]:
518 | 
519 | levels_index= []
520 | for m in df_new_train.index.levels:
521 |     levels_index.append(m)
522 | 
523 | 
524 | # In[ ]:
525 | 
526 | for i in levels_index[0]:
527 |     for j in levels_index[1]:
528 |         df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-2)
529 | 
530 | 
531 | # In[ ]:
532 | 
533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())]
534 | 
535 | 
536 | # #### Test dataset
537 | 
538 | # In[ ]:
539 | 
540 | levels_index2= []
541 | for m in df_new_test.index.levels:
542 |     levels_index2.append(m)
543 | 
544 | 
545 | # In[ ]:
546 | 
547 | for i in levels_index2[0]:
548 |     for j in levels_index2[1]:
549 |         df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-2)
550 | 
551 | 
552 | # In[ ]:
553 | 
554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())]
555 | 
556 | 
557 | # In[ ]:
558 | 
559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()]
560 | 
561 | 
562 | # ### Normalize train and test dataframe
563 | 
564 | # In[ ]:
565 | 
566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min())
567 | test_norm =  (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min())
568 | 
569 | 
570 | # In[ ]:
571 | 
572 | train_norm.reset_index(inplace=True,drop=True)
573 | test_norm.reset_index(inplace=True,drop=True)
574 | 
575 | 
576 | # ### Making train and test sets with train_norm and test_norm
577 | 
578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize
579 | 
580 | # In[89]:
581 | 
582 | from fractions import gcd
583 | gcd(train_norm.shape[0],test_norm.shape[0])
584 | 
585 | 
586 | # In[ ]:
587 | 
588 | import math
589 | def roundup(x):
590 |     return int(math.ceil(x / 100.0)) * 100
591 | 
592 | 
593 | # In[ ]:
594 | 
595 | train_lim = roundup(train_norm.shape[0])
596 | test_lim = roundup(test_norm.shape[0])
597 | 
598 | train_random = train_norm.sample(train_lim-train_norm.shape[0])
599 | test_random = test_norm.sample(test_lim-test_norm.shape[0])
600 | 
601 | train_norm = train_norm.append(train_random)
602 | test_norm = test_norm.append(test_random)
603 | 
604 | 
605 | # In[ ]:
606 | 
607 | X1 = train_norm.drop('Kt',axis=1)
608 | y1 = train_norm['Kt']
609 | 
610 | X2 = test_norm.drop('Kt',axis=1)
611 | y2 = test_norm['Kt']
612 | 
613 | 
614 | # In[ ]:
615 | 
616 | print("X1_train shape is {}".format(X1.shape))
617 | print("y1_train shape is {}".format(y1.shape))
618 | print("X2_test shape is {}".format(X2.shape))
619 | print("y2_test shape is {}".format(y2.shape))
620 | 
621 | 
622 | # In[ ]:
623 | 
624 | X_train = np.array(X1)
625 | y_train  = np.array(y1)
626 | X_test = np.array(X2)
627 | y_test = np.array(y2)
628 | 
629 | 
630 | # ### start of RNN
631 | 
632 | # In[117]:
633 | 
634 | import torch
635 | import torch.nn as nn
636 | import torchvision.transforms as transforms
637 | import torchvision.datasets as dsets
638 | from torch.autograd import Variable
639 | 
640 | 
641 | # In[118]:
642 | 
643 | class RNNModel(nn.Module):
644 |     def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
645 |         super(RNNModel, self).__init__()
646 |         #Hidden Dimension
647 |         self.hidden_dim = hidden_dim
648 |         
649 |         # Number of hidden layers
650 |         self.layer_dim = layer_dim
651 |         
652 |         #Building the RNN
653 |         self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
654 |         
655 |         # Readout layer
656 |         self.fc = nn.Linear(hidden_dim, output_dim)
657 |         
658 |     def forward(self, x):
659 |         # Initializing the hidden state with zeros
660 |         # (layer_dim, batch_size, hidden_dim)
661 |         h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
662 |         
663 |         #One time step (the last one perhaps?)
664 |         out, hn = self.rnn(x, h0)
665 |         
666 |         # Indexing hidden state of the last time step
667 |         # out.size() --> ??
668 |         #out[:,-1,:] --> is it going to be 100,100
669 |         out = self.fc(out[:,-1,:])
670 |         # out.size() --> 100,1
671 |         return out
672 |         
673 | 
674 | 
675 | # In[119]:
676 | 
677 | # Instantiating Model Class
678 | input_dim = 22
679 | hidden_dim = 15
680 | layer_dim = 1
681 | output_dim = 1
682 | batch_size = 100
683 | 
684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)
685 | 
686 | # Instantiating Loss Class
687 | criterion = nn.MSELoss()
688 | 
689 | # Instantiate Optimizer Class
690 | learning_rate = 0.001
691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
692 | 
693 | # converting numpy array to torch tensor
694 | X_train = torch.from_numpy(X_train)
695 | y_train = torch.from_numpy(y_train)
696 | X_test = torch.from_numpy(X_test)
697 | y_test = torch.from_numpy(y_test)
698 | 
699 | # initializing lists to store losses over epochs:
700 | train_loss = []
701 | test_loss = []
702 | train_iter = []
703 | test_iter = []
704 | 
705 | 
706 | # In[ ]:
707 | 
708 | # Training the model
709 | seq_dim = 1
710 | 
711 | n_iter =0
712 | num_samples = len(X_train)
713 | test_samples = len(X_test)
714 | batch_size = 100
715 | num_epochs = 1000
716 | feat_dim = X_train.shape[1]
717 | 
718 | X_train = X_train.type(torch.FloatTensor)
719 | y_train = y_train.type(torch.FloatTensor)
720 | X_test = X_test.type(torch.FloatTensor)
721 | y_test = y_test.type(torch.FloatTensor)
722 | 
723 | for epoch in range(num_epochs):
724 |     for i in range(0, int(num_samples/batch_size -1)):
725 |         
726 |         
727 |         features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
728 |         Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size])
729 |         
730 |         #print("Kt_value={}".format(Kt_value))
731 |         
732 |         optimizer.zero_grad()
733 |         
734 |         outputs = model(features)
735 |         #print("outputs ={}".format(outputs))
736 |         
737 |         loss = criterion(outputs, Kt_value)
738 |         
739 |         train_loss.append(loss.data[0])
740 |         train_iter.append(n_iter)
741 | 
742 |         #print("loss = {}".format(loss))
743 |         loss.backward()
744 |         
745 |         optimizer.step()
746 |         
747 |        
748 |             
749 |         if n_iter%100 == 0:
750 |             for i in range(0,int(test_samples/batch_size -1)):
751 |                 features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
752 |                 Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size])
753 |                 
754 |                 outputs = model(features)
755 |                 
756 |                 mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples)
757 |                 
758 |                 test_iter.append(n_iter)
759 |                 test_loss.append(mse)
760 |                 
761 |             print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse))
762 |          
763 |         n_iter += 1  
764 | 
765 | 
766 | # In[145]:
767 | 
768 | print(len(test_loss))
769 | #plt.plot(test_loss)
770 | plt.plot(train_loss,'-')
771 | #plt.ylim([0.000,0.99])
772 | 
773 | 
774 | # In[146]:
775 | 
776 | plt.plot(test_loss,'r')
777 | 
778 | 
779 | # #### Demornamization
780 | 
781 | # In[161]:
782 | 
783 | rmse = np.sqrt(mse)
784 | 
785 | 
786 | # In[243]:
787 | 
788 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean()
789 | 
790 | 
791 | # In[244]:
792 | 
793 | print("rmse_denorm",rmse_denorm)
794 | 
795 | 
796 | # In[259]:
797 | 
798 | print(df_new_test['Kt'].describe())
799 | 
800 | 
801 | # ### Saving train and test losses to a csv
802 | 
803 | # In[ ]:
804 | 
805 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss}, columns=['Train Loss'])
806 | df_trainLoss.to_csv('RNN Paper Results/Exp1_2_FortPeck_TrainLoss.csv')
807 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss}, columns=['Test Loss'])
808 | df_testLoss.to_csv('RNN Paper Results/Exp1_2_FortPeck_TestLoss.csv')
809 | 
810 | 


--------------------------------------------------------------------------------
/fixed-time-horizon-prediction/Exp_1.2/Exp_1.2_GoodwinCreek_2hour.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # coding: utf-8
  3 | 
  4 | # In[1]:
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | import datetime
  9 | import glob
 10 | import os.path
 11 | from pandas.compat import StringIO
 12 | 
 13 | 
 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI
 15 | 
 16 | # In[2]:
 17 | 
 18 | import itertools
 19 | import matplotlib.pyplot as plt
 20 | import pandas as pd
 21 | import seaborn as sns
 22 | 
 23 | 
 24 | # In[3]:
 25 | 
 26 | get_ipython().magic('matplotlib inline')
 27 | sns.set_color_codes()
 28 | 
 29 | 
 30 | # In[4]:
 31 | 
 32 | import pvlib
 33 | from pvlib import clearsky, atmosphere
 34 | from pvlib.location import Location
 35 | 
 36 | 
 37 | # In[5]:
 38 | 
 39 | gwc = Location(34.2487,-89.8925, 'US/Central', 98, 'Goodwin Creek')
 40 | 
 41 | 
 42 | # In[6]:
 43 | 
 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min',
 45 |                         tz=gwc.tz)   # 12 months of 2009 - For testing
 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min',
 47 |                         tz=gwc.tz)   # 24 months of 2010 and 2011 - For training
 48 | 
 49 | 
 50 | # In[7]:
 51 | 
 52 | cs_2009 = gwc.get_clearsky(times2009) 
 53 | cs_2010and2011 = gwc.get_clearsky(times2010and2011) # ineichen with climatology table by default
 54 | #cs_2011 = bvl.get_clearsky(times2011) 
 55 | 
 56 | 
 57 | # In[8]:
 58 | 
 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 62 | 
 63 | 
 64 | # In[9]:
 65 | 
 66 | cs_2009.reset_index(inplace=True)
 67 | cs_2010and2011.reset_index(inplace=True)
 68 | #cs_2011.reset_index(inplace=True)
 69 | 
 70 | 
 71 | # In[10]:
 72 | 
 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime())
 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year)
 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month)
 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day)
 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour)
 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute)
 79 | 
 80 | 
 81 | # In[11]:
 82 | 
 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime())
 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year)
 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month)
 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day)
 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour)
 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute)
 89 | 
 90 | 
 91 | # In[12]:
 92 | 
 93 | print(cs_2009.shape)
 94 | print(cs_2010and2011.shape)
 95 | #print(cs_2011.shape)
 96 | 
 97 | 
 98 | # In[13]:
 99 | 
100 | cs_2009.drop(cs_2009.index[-1], inplace=True)
101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True)
102 | #cs_2011.drop(cs_2011.index[-1], inplace=True)
103 | 
104 | 
105 | # In[14]:
106 | 
107 | print(cs_2009.shape)
108 | print(cs_2010and2011.shape)
109 | #print(cs_2011.shape)
110 | 
111 | 
112 | # In[15]:
113 | 
114 | cs_2010and2011.head()
115 | 
116 | 
117 | # ### Import files from each year in a separate dataframe
118 | 
119 | # 
120 | # - year            integer	 year, i.e., 1995
121 | # - jday            integer	 Julian day (1 through 365 [or 366])
122 | # - month           integer	 number of the month (1-12)
123 | # - day             integer	 day of the month(1-31)
124 | # - hour            integer	 hour of the day (0-23)
125 | # - min             integer	 minute of the hour (0-59)
126 | # - dt              real	 decimal time (hour.decimalminutes, e.g., 23.5 = 2330)
127 | # - zen             real	 solar zenith angle (degrees)
128 | # - dw_solar        real	 downwelling global solar (Watts m^-2)
129 | # - uw_solar        real	 upwelling global solar (Watts m^-2)
130 | # - direct_n        real	 direct-normal solar (Watts m^-2)
131 | # - diffuse         real	 downwelling diffuse solar (Watts m^-2)
132 | # - dw_ir           real	 downwelling thermal infrared (Watts m^-2)
133 | # - dw_casetemp     real	 downwelling IR case temp. (K)
134 | # - dw_dometemp     real	 downwelling IR dome temp. (K)
135 | # - uw_ir           real	 upwelling thermal infrared (Watts m^-2)
136 | # - uw_casetemp     real	 upwelling IR case temp. (K)
137 | # - uw_dometemp     real	 upwelling IR dome temp. (K)
138 | # - uvb             real	 global UVB (milliWatts m^-2)
139 | # - par             real	 photosynthetically active radiation (Watts m^-2)
140 | # - netsolar        real	 net solar (dw_solar - uw_solar) (Watts m^-2)
141 | # - netir           real	 net infrared (dw_ir - uw_ir) (Watts m^-2)
142 | # - totalnet        real	 net radiation (netsolar+netir) (Watts m^-2)
143 | # - temp            real	 10-meter air temperature (?C)
144 | # - rh              real	 relative humidity (%)
145 | # - windspd         real	 wind speed (ms^-1)
146 | # - winddir         real	 wind direction (degrees, clockwise from north)
147 | # - pressure        real	 station pressure (mb)
148 | # 
149 | 
150 | # In[16]:
151 | 
152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar',
153 |        'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp',
154 |        'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC',
155 |        'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC',
156 |        'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC',
157 |        'pressure','pressure_QC']
158 | 
159 | 
160 | # In[17]:
161 | 
162 | path = r'.\\data\\Goodwin_Creek\\Exp_1_train'
163 | all_files = glob.glob(path + "/*.dat")
164 | all_files.sort()
165 | 
166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
167 |                  index_col=False,header=None, names=cols) for f in all_files],ignore_index=True)
168 | df_big_train.shape
169 | 
170 | 
171 | # In[18]:
172 | 
173 | path = r'.\\data\\Goodwin_Creek\\Exp_1_test'
174 | all_files = glob.glob(path + "/*.dat")
175 | all_files.sort()
176 | 
177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
178 |                  index_col=False,header=None, names=cols) for f in all_files),ignore_index=True)
179 | df_big_test.shape
180 | 
181 | 
182 | # In[19]:
183 | 
184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape
185 | 
186 | 
187 | # ### Merging Clear Sky GHI And the big dataframe
188 | 
189 | # In[20]:
190 | 
191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min'])
192 | df_train.shape
193 | 
194 | 
195 | # In[21]:
196 | 
197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min'])
198 | df_test.shape
199 | 
200 | 
201 | # In[22]:
202 | 
203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model
204 | df_test.drop(['index'], axis=1, inplace=True)
205 | 
206 | 
207 | # In[23]:
208 | 
209 | df_train.shape
210 | 
211 | 
212 | # ### Managing missing values
213 | 
214 | # In[24]:
215 | 
216 | # Resetting index
217 | df_train.reset_index(drop=True, inplace=True)
218 | df_test.reset_index(drop=True, inplace=True)
219 | 
220 | 
221 | # In[25]:
222 | 
223 | # Dropping rows with two or more -9999.9 values in columns
224 | 
225 | 
226 | # In[26]:
227 | 
228 | # Step1: Get indices of all rows with 2 or more -999
229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0]
230 | # Step2: Drop those indices
231 | df_train.drop(missing_data_indices, axis=0, inplace=True)
232 | # Checking that the rows are dropped
233 | df_train.shape
234 | 
235 | 
236 | # In[27]:
237 | 
238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0]
239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True)
240 | df_test.shape
241 | 
242 | 
243 | # In[ ]:
244 | 
245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column
246 | 
247 | 
248 | # #### First resetting index after dropping rows in the previous part of the code
249 | 
250 | # In[28]:
251 | 
252 | # 2nd time - Reseting Index
253 | df_train.reset_index(drop=True, inplace=True)
254 | df_test.reset_index(drop=True, inplace=True)
255 | 
256 | 
257 | # In[29]:
258 | 
259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
260 | 
261 | 
262 | # In[30]:
263 | 
264 | len(one_miss_train_idx)
265 | 
266 | 
267 | # In[31]:
268 | 
269 | df_train.shape
270 | 
271 | 
272 | # In[32]:
273 | 
274 | col_names = df_train.columns
275 | from collections import defaultdict
276 | stats = defaultdict(int)
277 | total_single_missing_values = 0
278 | for name in col_names:
279 |     col_mean = df_train[~(df_train[name] == -9999.9)][name].mean()
280 |     missing_indices = np.where((df_train[name] == -9999.9))
281 |     stats[name] = len(missing_indices[0])
282 |     df_train[name].loc[missing_indices] = col_mean
283 |     total_single_missing_values += sum(df_train[name] == -9999.9)
284 |     
285 | 
286 | 
287 | # In[33]:
288 | 
289 | df_col_min = df_train.apply(min, axis=0)
290 | df_col_max = df_train.apply(max, axis =0)
291 | #print(df_col_min, df_col_max)
292 | 
293 | 
294 | # In[34]:
295 | 
296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
297 | 
298 | 
299 | # In[35]:
300 | 
301 | len(train)
302 | 
303 | 
304 | # In[36]:
305 | 
306 | # doing the same thing on test dataset
307 | 
308 | 
309 | # In[37]:
310 | 
311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
312 | len(one_miss_test_idx)
313 | 
314 | 
315 | # In[38]:
316 | 
317 | col_names_test = df_test.columns
318 | from collections import defaultdict
319 | stats_test = defaultdict(int)
320 | total_single_missing_values_test = 0
321 | for name in col_names_test:
322 |     col_mean = df_test[~(df_test[name] == -9999.9)][name].mean()
323 |     missing_indices = np.where((df_test[name] == -9999.9))
324 |     stats_test[name] = len(missing_indices[0])
325 |     df_test[name].loc[missing_indices] = col_mean
326 |     total_single_missing_values_test += sum(df_test[name] == -9999.9)
327 |     
328 | 
329 | 
330 | # In[39]:
331 | 
332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
333 | 
334 | 
335 | # In[40]:
336 | 
337 | len(test)
338 | 
339 | 
340 | # In[41]:
341 | 
342 | df_train.shape
343 | 
344 | 
345 | # In[42]:
346 | 
347 | df_test.shape
348 | 
349 | 
350 | # ### Exploratory Data Analysis
351 | 
352 | # In[ ]:
353 | 
354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean()
355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean()
356 | j_day = df_test['jday'].unique()
357 | 
358 | 
359 | # In[ ]:
360 | 
361 | fig = plt.figure()
362 | 
363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8])
364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8])
365 | 
366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red')
367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green')
368 | 
369 | axes1.set_xlabel('Days')
370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)')
371 | axes1.set_title('Solar Irradiance - Test Year 2009')
372 | axes1.legend(loc='best')
373 | 
374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight')
375 | 
376 | 
377 | # In[ ]:
378 | 
379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg')
380 | #plt.title('observed dw_solar vs clear sky ghi')
381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)')
382 | plt.ylabel('Clear Sky GHI (Watts/m^2)')
383 | plt.savefig('Figure 3', bbox_inches='tight')
384 | 
385 | 
386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0
387 | 
388 | # In[43]:
389 | 
390 | df_train = df_train[df_train['ghi']!=0]
391 | df_test = df_test[df_test['ghi']!=0]
392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi']
393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi']
394 | 
395 | 
396 | # In[44]:
397 | 
398 | df_train.reset_index(inplace=True)
399 | df_test.reset_index(inplace=True)
400 | 
401 | 
402 | # In[45]:
403 | 
404 | print("test Kt max: "+str(df_test['Kt'].max()))
405 | print("test Kt min: "+str(df_test['Kt'].min()))
406 | print("test Kt mean: "+str(df_test['Kt'].mean()))
407 | print("\n")
408 | print("train Kt max: "+str(df_train['Kt'].max()))
409 | print("train Kt min: "+str(df_train['Kt'].min()))
410 | print("train Kt mean: "+str(df_train['Kt'].mean()))
411 | 
412 | 
413 | # In[46]:
414 | 
415 | plt.plot(df_train['Kt'])
416 | 
417 | 
418 | # In[47]:
419 | 
420 | plt.plot(df_test['Kt'])
421 | 
422 | 
423 | # In[48]:
424 | 
425 | df_train= df_train[df_train['Kt']< 5000]
426 | df_train= df_train[df_train['Kt']> -1000]
427 | df_test= df_test[df_test['Kt']< 5000]
428 | df_test= df_test[df_test['Kt']> -1000]
429 | 
430 | 
431 | # #### Group the data (train dataframe)
432 | 
433 | # In[49]:
434 | 
435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean()
436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean()
437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean()
438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean()
439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean()
440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean()
441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean()
442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean()
443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean()
444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean()
445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean()
446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean()
447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean()
448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean()
449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean()
450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean()
451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean()
452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean()
453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean()
454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean()
455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean()
456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean()
457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean()
458 | 
459 | 
460 | # In[50]:
461 | 
462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp,
463 |                     uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1)
464 | 
465 | 
466 | # In[51]:
467 | 
468 | df_new_train.head()
469 | 
470 | 
471 | # #### Groupdata - test dataframe
472 | 
473 | # In[52]:
474 | 
475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean()
476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean()
477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean()
478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean()
479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean()
480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean()
481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean()
482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean()
483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean()
484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean()
485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean()
486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean()
487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean()
488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean()
489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean()
490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean()
491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean()
492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean()
493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean()
494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean()
495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean()
496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean()
497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean()
498 | 
499 | 
500 | # In[53]:
501 | 
502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir,
503 |                          test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp,
504 |                     test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh,
505 |                          test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1)
506 | 
507 | 
508 | # In[54]:
509 | 
510 | df_new_test.loc[2].xs(17,level='day')
511 | 
512 | 
513 | # ### Shifting Kt values to make 1 hour ahead forecast
514 | 
515 | # #### Train dataset
516 | 
517 | # In[55]:
518 | 
519 | levels_index= []
520 | for m in df_new_train.index.levels:
521 |     levels_index.append(m)
522 | 
523 | 
524 | # In[56]:
525 | 
526 | for i in levels_index[0]:
527 |     for j in levels_index[1]:
528 |         df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-2)
529 | 
530 | 
531 | # In[57]:
532 | 
533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())]
534 | 
535 | 
536 | # #### Test dataset
537 | 
538 | # In[58]:
539 | 
540 | levels_index2= []
541 | for m in df_new_test.index.levels:
542 |     levels_index2.append(m)
543 | 
544 | 
545 | # In[59]:
546 | 
547 | for i in levels_index2[0]:
548 |     for j in levels_index2[1]:
549 |         df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-2)
550 | 
551 | 
552 | # In[60]:
553 | 
554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())]
555 | 
556 | 
557 | # In[61]:
558 | 
559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()]
560 | 
561 | 
562 | # ### Normalize train and test dataframe
563 | 
564 | # In[62]:
565 | 
566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min())
567 | test_norm =  (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min())
568 | 
569 | 
570 | # In[63]:
571 | 
572 | train_norm.reset_index(inplace=True,drop=True)
573 | test_norm.reset_index(inplace=True,drop=True)
574 | 
575 | 
576 | # ### Making train and test sets with train_norm and test_norm
577 | 
578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize
579 | 
580 | # In[89]:
581 | 
582 | from fractions import gcd
583 | gcd(train_norm.shape[0],test_norm.shape[0])
584 | 
585 | 
586 | # In[64]:
587 | 
588 | import math
589 | def roundup(x):
590 |     return int(math.ceil(x / 100.0)) * 100
591 | 
592 | 
593 | # In[65]:
594 | 
595 | train_lim = roundup(train_norm.shape[0])
596 | test_lim = roundup(test_norm.shape[0])
597 | 
598 | train_random = train_norm.sample(train_lim-train_norm.shape[0])
599 | test_random = test_norm.sample(test_lim-test_norm.shape[0])
600 | 
601 | train_norm = train_norm.append(train_random)
602 | test_norm = test_norm.append(test_random)
603 | 
604 | 
605 | # In[66]:
606 | 
607 | X1 = train_norm.drop('Kt',axis=1)
608 | y1 = train_norm['Kt']
609 | 
610 | X2 = test_norm.drop('Kt',axis=1)
611 | y2 = test_norm['Kt']
612 | 
613 | 
614 | # In[67]:
615 | 
616 | print("X1_train shape is {}".format(X1.shape))
617 | print("y1_train shape is {}".format(y1.shape))
618 | print("X2_test shape is {}".format(X2.shape))
619 | print("y2_test shape is {}".format(y2.shape))
620 | 
621 | 
622 | # In[68]:
623 | 
624 | X_train = np.array(X1)
625 | y_train  = np.array(y1)
626 | X_test = np.array(X2)
627 | y_test = np.array(y2)
628 | 
629 | 
630 | # ### start of RNN
631 | 
632 | # In[69]:
633 | 
634 | import torch
635 | import torch.nn as nn
636 | import torchvision.transforms as transforms
637 | import torchvision.datasets as dsets
638 | from torch.autograd import Variable
639 | 
640 | 
641 | # In[70]:
642 | 
643 | class RNNModel(nn.Module):
644 |     def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
645 |         super(RNNModel, self).__init__()
646 |         #Hidden Dimension
647 |         self.hidden_dim = hidden_dim
648 |         
649 |         # Number of hidden layers
650 |         self.layer_dim = layer_dim
651 |         
652 |         #Building the RNN
653 |         self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
654 |         
655 |         # Readout layer
656 |         self.fc = nn.Linear(hidden_dim, output_dim)
657 |         
658 |     def forward(self, x):
659 |         # Initializing the hidden state with zeros
660 |         # (layer_dim, batch_size, hidden_dim)
661 |         h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
662 |         
663 |         #One time step (the last one perhaps?)
664 |         out, hn = self.rnn(x, h0)
665 |         
666 |         # Indexing hidden state of the last time step
667 |         # out.size() --> ??
668 |         #out[:,-1,:] --> is it going to be 100,100
669 |         out = self.fc(out[:,-1,:])
670 |         # out.size() --> 100,1
671 |         return out
672 |         
673 | 
674 | 
675 | # In[71]:
676 | 
677 | # Instantiating Model Class
678 | input_dim = 22
679 | hidden_dim = 15
680 | layer_dim = 1
681 | output_dim = 1
682 | batch_size = 100
683 | 
684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)
685 | 
686 | # Instantiating Loss Class
687 | criterion = nn.MSELoss()
688 | 
689 | # Instantiate Optimizer Class
690 | learning_rate = 0.001
691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
692 | 
693 | # converting numpy array to torch tensor
694 | X_train = torch.from_numpy(X_train)
695 | y_train = torch.from_numpy(y_train)
696 | X_test = torch.from_numpy(X_test)
697 | y_test = torch.from_numpy(y_test)
698 | 
699 | # initializing lists to store losses over epochs:
700 | train_loss = []
701 | test_loss = []
702 | train_iter = []
703 | test_iter = []
704 | 
705 | 
706 | # In[72]:
707 | 
708 | # Training the model
709 | seq_dim = 1
710 | 
711 | n_iter =0
712 | num_samples = len(X_train)
713 | test_samples = len(X_test)
714 | batch_size = 100
715 | num_epochs = 1000
716 | feat_dim = X_train.shape[1]
717 | 
718 | X_train = X_train.type(torch.FloatTensor)
719 | y_train = y_train.type(torch.FloatTensor)
720 | X_test = X_test.type(torch.FloatTensor)
721 | y_test = y_test.type(torch.FloatTensor)
722 | 
723 | for epoch in range(num_epochs):
724 |     for i in range(0, int(num_samples/batch_size -1)):
725 |         
726 |         
727 |         features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
728 |         Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size])
729 |         
730 |         #print("Kt_value={}".format(Kt_value))
731 |         
732 |         optimizer.zero_grad()
733 |         
734 |         outputs = model(features)
735 |         #print("outputs ={}".format(outputs))
736 |         
737 |         loss = criterion(outputs, Kt_value)
738 |         
739 |         train_loss.append(loss.data[0])
740 |         train_iter.append(n_iter)
741 | 
742 |         #print("loss = {}".format(loss))
743 |         loss.backward()
744 |         
745 |         optimizer.step()
746 |         
747 |        
748 |             
749 |         if n_iter%100 == 0:
750 |             for i in range(0,int(test_samples/batch_size -1)):
751 |                 features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
752 |                 Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size])
753 |                 
754 |                 outputs = model(features)
755 |                 
756 |                 mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples)
757 |                 
758 |                 test_iter.append(n_iter)
759 |                 test_loss.append(mse)
760 |                 
761 |             print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse))
762 |         n_iter += 1       
763 | 
764 | 
765 | # In[73]:
766 | 
767 | print(len(test_loss))
768 | #plt.plot(test_loss)
769 | plt.plot(train_loss,'-')
770 | #plt.ylim([0.000,0.99])
771 | 
772 | 
773 | # In[74]:
774 | 
775 | plt.plot(test_loss,'r')
776 | 
777 | 
778 | # #### Demornamization
779 | 
780 | # In[75]:
781 | 
782 | rmse = np.sqrt(mse)
783 | 
784 | 
785 | # In[76]:
786 | 
787 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean()
788 | 
789 | 
790 | # In[77]:
791 | 
792 | rmse_denorm
793 | 
794 | 
795 | # In[78]:
796 | 
797 | df_new_test['Kt'].describe()
798 | 
799 | 
800 | # ### Saving train and test losses to a csv
801 | 
802 | # In[79]:
803 | 
804 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss,'iteration':train_iter}, columns=['Train Loss','iteration'])
805 | df_trainLoss.to_csv('RNN Paper Results/Exp1_GoodwinCreek_TrainLoss.csv')
806 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss,'iteration':test_iter}, columns=['Test Loss','iteration'])
807 | df_testLoss.to_csv('RNN Paper Results/Exp1_GoodwinCreek_TestLoss.csv')
808 | 
809 | 
810 | # In[ ]:
811 | 
812 | 
813 | 
814 | 


--------------------------------------------------------------------------------
/fixed-time-horizon-prediction/Exp_1.2/Exp_1.2_PenState_2hour.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # coding: utf-8
  3 | 
  4 | # In[7]:
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | import datetime
  9 | import glob
 10 | import os.path
 11 | from pandas.compat import StringIO
 12 | 
 13 | 
 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI
 15 | 
 16 | # In[8]:
 17 | 
 18 | import itertools
 19 | import matplotlib.pyplot as plt
 20 | import pandas as pd
 21 | import seaborn as sns
 22 | 
 23 | 
 24 | # In[9]:
 25 | 
 26 | get_ipython().magic('matplotlib inline')
 27 | sns.set_color_codes()
 28 | 
 29 | 
 30 | # In[5]:
 31 | 
 32 | import pvlib
 33 | from pvlib import clearsky, atmosphere
 34 | from pvlib.location import Location
 35 | 
 36 | 
 37 | # In[10]:
 38 | 
 39 | pns = Location(40.798,-77.859, 'US/Eastern', 351.74, 'Penn State')
 40 | 
 41 | 
 42 | # In[11]:
 43 | 
 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min',
 45 |                         tz=pns.tz)   # 12 months of 2009 - For testing
 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min',
 47 |                         tz=pns.tz)   # 24 months of 2010 and 2011 - For training
 48 | 
 49 | 
 50 | # In[12]:
 51 | 
 52 | cs_2009 = pns.get_clearsky(times2009) 
 53 | cs_2010and2011 = pns.get_clearsky(times2010and2011) # ineichen with climatology table by default
 54 | #cs_2011 = bvl.get_clearsky(times2011) 
 55 | 
 56 | 
 57 | # In[13]:
 58 | 
 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 62 | 
 63 | 
 64 | # In[14]:
 65 | 
 66 | cs_2009.reset_index(inplace=True)
 67 | cs_2010and2011.reset_index(inplace=True)
 68 | #cs_2011.reset_index(inplace=True)
 69 | 
 70 | 
 71 | # In[15]:
 72 | 
 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime())
 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year)
 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month)
 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day)
 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour)
 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute)
 79 | 
 80 | 
 81 | # In[16]:
 82 | 
 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime())
 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year)
 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month)
 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day)
 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour)
 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute)
 89 | 
 90 | 
 91 | # In[17]:
 92 | 
 93 | print(cs_2009.shape)
 94 | print(cs_2010and2011.shape)
 95 | #print(cs_2011.shape)
 96 | 
 97 | 
 98 | # In[18]:
 99 | 
100 | cs_2009.drop(cs_2009.index[-1], inplace=True)
101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True)
102 | #cs_2011.drop(cs_2011.index[-1], inplace=True)
103 | 
104 | 
105 | # In[19]:
106 | 
107 | print(cs_2009.shape)
108 | print(cs_2010and2011.shape)
109 | #print(cs_2011.shape)
110 | 
111 | 
112 | # In[20]:
113 | 
114 | cs_2010and2011.head()
115 | 
116 | 
117 | # ### Import files from each year in a separate dataframe
118 | 
119 | # 
120 | # - year            integer	 year, i.e., 1995
121 | # - jday            integer	 Julian day (1 through 365 [or 366])
122 | # - month           integer	 number of the month (1-12)
123 | # - day             integer	 day of the month(1-31)
124 | # - hour            integer	 hour of the day (0-23)
125 | # - min             integer	 minute of the hour (0-59)
126 | # - dt              real	 decimal time (hour.decimalminutes, e.g., 23.5 = 2330)
127 | # - zen             real	 solar zenith angle (degrees)
128 | # - dw_solar        real	 downwelling global solar (Watts m^-2)
129 | # - uw_solar        real	 upwelling global solar (Watts m^-2)
130 | # - direct_n        real	 direct-normal solar (Watts m^-2)
131 | # - diffuse         real	 downwelling diffuse solar (Watts m^-2)
132 | # - dw_ir           real	 downwelling thermal infrared (Watts m^-2)
133 | # - dw_casetemp     real	 downwelling IR case temp. (K)
134 | # - dw_dometemp     real	 downwelling IR dome temp. (K)
135 | # - uw_ir           real	 upwelling thermal infrared (Watts m^-2)
136 | # - uw_casetemp     real	 upwelling IR case temp. (K)
137 | # - uw_dometemp     real	 upwelling IR dome temp. (K)
138 | # - uvb             real	 global UVB (milliWatts m^-2)
139 | # - par             real	 photosynthetically active radiation (Watts m^-2)
140 | # - netsolar        real	 net solar (dw_solar - uw_solar) (Watts m^-2)
141 | # - netir           real	 net infrared (dw_ir - uw_ir) (Watts m^-2)
142 | # - totalnet        real	 net radiation (netsolar+netir) (Watts m^-2)
143 | # - temp            real	 10-meter air temperature (?C)
144 | # - rh              real	 relative humidity (%)
145 | # - windspd         real	 wind speed (ms^-1)
146 | # - winddir         real	 wind direction (degrees, clockwise from north)
147 | # - pressure        real	 station pressure (mb)
148 | # 
149 | 
150 | # In[21]:
151 | 
152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar',
153 |        'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp',
154 |        'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC',
155 |        'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC',
156 |        'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC',
157 |        'pressure','pressure_QC']
158 | 
159 | 
160 | # In[22]:
161 | 
162 | path = r'.\\data\\Penn_State\\Exp_1_train'
163 | all_files = glob.glob(path + "/*.dat")
164 | all_files.sort()
165 | 
166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
167 |                  index_col=False,header=None, names=cols) for f in all_files],ignore_index=True)
168 | df_big_train.shape
169 | 
170 | 
171 | # In[23]:
172 | 
173 | path = r'.\\data\\Penn_State\\Exp_1_test'
174 | all_files = glob.glob(path + "/*.dat")
175 | all_files.sort()
176 | 
177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
178 |                  index_col=False,header=None, names=cols) for f in all_files),ignore_index=True)
179 | df_big_test.shape
180 | 
181 | 
182 | # In[24]:
183 | 
184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape
185 | 
186 | 
187 | # ### Merging Clear Sky GHI And the big dataframe
188 | 
189 | # In[25]:
190 | 
191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min'])
192 | df_train.shape
193 | 
194 | 
195 | # In[26]:
196 | 
197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min'])
198 | df_test.shape
199 | 
200 | 
201 | # In[27]:
202 | 
203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model
204 | df_test.drop(['index'], axis=1, inplace=True)
205 | 
206 | 
207 | # In[28]:
208 | 
209 | df_train.shape
210 | 
211 | 
212 | # ### Managing missing values
213 | 
214 | # In[29]:
215 | 
216 | # Resetting index
217 | df_train.reset_index(drop=True, inplace=True)
218 | df_test.reset_index(drop=True, inplace=True)
219 | 
220 | 
221 | # In[ ]:
222 | 
223 | # Dropping rows with two or more -9999.9 values in columns
224 | 
225 | 
226 | # In[30]:
227 | 
228 | # Step1: Get indices of all rows with 2 or more -999
229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0]
230 | # Step2: Drop those indices
231 | df_train.drop(missing_data_indices, axis=0, inplace=True)
232 | # Checking that the rows are dropped
233 | df_train.shape
234 | 
235 | 
236 | # In[31]:
237 | 
238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0]
239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True)
240 | df_test.shape
241 | 
242 | 
243 | # In[ ]:
244 | 
245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column
246 | 
247 | 
248 | # #### First resetting index after dropping rows in the previous part of the code
249 | 
250 | # In[32]:
251 | 
252 | # 2nd time - Reseting Index
253 | df_train.reset_index(drop=True, inplace=True)
254 | df_test.reset_index(drop=True, inplace=True)
255 | 
256 | 
257 | # In[33]:
258 | 
259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
260 | 
261 | 
262 | # In[34]:
263 | 
264 | len(one_miss_train_idx)
265 | 
266 | 
267 | # In[35]:
268 | 
269 | df_train.shape
270 | 
271 | 
272 | # In[36]:
273 | 
274 | col_names = df_train.columns
275 | from collections import defaultdict
276 | stats = defaultdict(int)
277 | total_single_missing_values = 0
278 | for name in col_names:
279 |     col_mean = df_train[~(df_train[name] == -9999.9)][name].mean()
280 |     missing_indices = np.where((df_train[name] == -9999.9))
281 |     stats[name] = len(missing_indices[0])
282 |     df_train[name].loc[missing_indices] = col_mean
283 |     total_single_missing_values += sum(df_train[name] == -9999.9)
284 |     
285 | 
286 | 
287 | # In[37]:
288 | 
289 | df_col_min = df_train.apply(min, axis=0)
290 | df_col_max = df_train.apply(max, axis =0)
291 | #print(df_col_min, df_col_max)
292 | 
293 | 
294 | # In[38]:
295 | 
296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
297 | 
298 | 
299 | # In[39]:
300 | 
301 | len(train)
302 | 
303 | 
304 | # In[40]:
305 | 
306 | # doing the same thing on test dataset
307 | 
308 | 
309 | # In[41]:
310 | 
311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
312 | len(one_miss_test_idx)
313 | 
314 | 
315 | # In[42]:
316 | 
317 | col_names_test = df_test.columns
318 | from collections import defaultdict
319 | stats_test = defaultdict(int)
320 | total_single_missing_values_test = 0
321 | for name in col_names_test:
322 |     col_mean = df_test[~(df_test[name] == -9999.9)][name].mean()
323 |     missing_indices = np.where((df_test[name] == -9999.9))
324 |     stats_test[name] = len(missing_indices[0])
325 |     df_test[name].loc[missing_indices] = col_mean
326 |     total_single_missing_values_test += sum(df_test[name] == -9999.9)
327 |     
328 | 
329 | 
330 | # In[43]:
331 | 
332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
333 | 
334 | 
335 | # In[44]:
336 | 
337 | len(test)
338 | 
339 | 
340 | # In[45]:
341 | 
342 | df_train.shape
343 | 
344 | 
345 | # In[46]:
346 | 
347 | df_test.shape
348 | 
349 | 
350 | # ### Exploratory Data Analysis
351 | 
352 | # In[47]:
353 | 
354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean()
355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean()
356 | j_day = df_test['jday'].unique()
357 | 
358 | 
359 | # In[48]:
360 | 
361 | fig = plt.figure()
362 | 
363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8])
364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8])
365 | 
366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red')
367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green')
368 | 
369 | axes1.set_xlabel('Days')
370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)')
371 | axes1.set_title('Solar Irradiance - Test Year 2009')
372 | axes1.legend(loc='best')
373 | 
374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight')
375 | 
376 | 
377 | # In[49]:
378 | 
379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg')
380 | #plt.title('observed dw_solar vs clear sky ghi')
381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)')
382 | plt.ylabel('Clear Sky GHI (Watts/m^2)')
383 | plt.savefig('Figure 3', bbox_inches='tight')
384 | 
385 | 
386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0
387 | 
388 | # In[50]:
389 | 
390 | df_train = df_train[df_train['ghi']!=0]
391 | df_test = df_test[df_test['ghi']!=0]
392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi']
393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi']
394 | 
395 | 
396 | # In[51]:
397 | 
398 | df_train.reset_index(inplace=True)
399 | df_test.reset_index(inplace=True)
400 | 
401 | 
402 | # In[52]:
403 | 
404 | print("test Kt max: "+str(df_test['Kt'].max()))
405 | print("test Kt min: "+str(df_test['Kt'].min()))
406 | print("test Kt mean: "+str(df_test['Kt'].mean()))
407 | print("\n")
408 | print("train Kt max: "+str(df_train['Kt'].max()))
409 | print("train Kt min: "+str(df_train['Kt'].min()))
410 | print("train Kt mean: "+str(df_train['Kt'].mean()))
411 | 
412 | 
413 | # In[53]:
414 | 
415 | plt.plot(df_train['Kt'])
416 | 
417 | 
418 | # In[54]:
419 | 
420 | plt.plot(df_test['Kt'])
421 | 
422 | 
423 | # In[55]:
424 | 
425 | df_train= df_train[df_train['Kt']< 5000]
426 | df_train= df_train[df_train['Kt']> -1000]
427 | df_test= df_test[df_test['Kt']< 5000]
428 | df_test= df_test[df_test['Kt']> -1000]
429 | 
430 | 
431 | # #### Group the data (train dataframe)
432 | 
433 | # In[56]:
434 | 
435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean()
436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean()
437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean()
438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean()
439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean()
440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean()
441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean()
442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean()
443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean()
444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean()
445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean()
446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean()
447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean()
448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean()
449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean()
450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean()
451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean()
452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean()
453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean()
454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean()
455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean()
456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean()
457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean()
458 | 
459 | 
460 | # In[57]:
461 | 
462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp,
463 |                     uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1)
464 | 
465 | 
466 | # In[58]:
467 | 
468 | df_new_train.head()
469 | 
470 | 
471 | # #### Groupdata - test dataframe
472 | 
473 | # In[59]:
474 | 
475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean()
476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean()
477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean()
478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean()
479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean()
480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean()
481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean()
482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean()
483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean()
484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean()
485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean()
486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean()
487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean()
488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean()
489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean()
490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean()
491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean()
492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean()
493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean()
494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean()
495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean()
496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean()
497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean()
498 | 
499 | 
500 | # In[60]:
501 | 
502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir,
503 |                          test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp,
504 |                     test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh,
505 |                          test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1)
506 | 
507 | 
508 | # In[61]:
509 | 
510 | df_new_test.loc[2].xs(17,level='day')
511 | 
512 | 
513 | # ### Shifting Kt values to make 1 hour ahead forecast
514 | 
515 | # #### Train dataset
516 | 
517 | # In[62]:
518 | 
519 | levels_index= []
520 | for m in df_new_train.index.levels:
521 |     levels_index.append(m)
522 | 
523 | 
524 | # In[63]:
525 | 
526 | for i in levels_index[0]:
527 |     for j in levels_index[1]:
528 |         df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-2)
529 | 
530 | 
531 | # In[64]:
532 | 
533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())]
534 | 
535 | 
536 | # #### Test dataset
537 | 
538 | # In[65]:
539 | 
540 | levels_index2= []
541 | for m in df_new_test.index.levels:
542 |     levels_index2.append(m)
543 | 
544 | 
545 | # In[66]:
546 | 
547 | for i in levels_index2[0]:
548 |     for j in levels_index2[1]:
549 |         df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-2)
550 | 
551 | 
552 | # In[67]:
553 | 
554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())]
555 | 
556 | 
557 | # In[68]:
558 | 
559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()]
560 | 
561 | 
562 | # ### Normalize train and test dataframe
563 | 
564 | # In[69]:
565 | 
566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min())
567 | test_norm =  (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min())
568 | 
569 | 
570 | # In[70]:
571 | 
572 | train_norm.reset_index(inplace=True,drop=True)
573 | test_norm.reset_index(inplace=True,drop=True)
574 | 
575 | 
576 | # ### Making train and test sets with train_norm and test_norm
577 | 
578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize
579 | 
580 | # In[89]:
581 | 
582 | from fractions import gcd
583 | gcd(train_norm.shape[0],test_norm.shape[0])
584 | 
585 | 
586 | # In[71]:
587 | 
588 | import math
589 | def roundup(x):
590 |     return int(math.ceil(x / 100.0)) * 100
591 | 
592 | 
593 | # In[72]:
594 | 
595 | train_lim = roundup(train_norm.shape[0])
596 | test_lim = roundup(test_norm.shape[0])
597 | 
598 | train_random = train_norm.sample(train_lim-train_norm.shape[0])
599 | test_random = test_norm.sample(test_lim-test_norm.shape[0])
600 | 
601 | train_norm = train_norm.append(train_random)
602 | test_norm = test_norm.append(test_random)
603 | 
604 | 
605 | # In[73]:
606 | 
607 | X1 = train_norm.drop('Kt',axis=1)
608 | y1 = train_norm['Kt']
609 | 
610 | X2 = test_norm.drop('Kt',axis=1)
611 | y2 = test_norm['Kt']
612 | 
613 | 
614 | # In[74]:
615 | 
616 | print("X1_train shape is {}".format(X1.shape))
617 | print("y1_train shape is {}".format(y1.shape))
618 | print("X2_test shape is {}".format(X2.shape))
619 | print("y2_test shape is {}".format(y2.shape))
620 | 
621 | 
622 | # In[75]:
623 | 
624 | X_train = np.array(X1)
625 | y_train  = np.array(y1)
626 | X_test = np.array(X2)
627 | y_test = np.array(y2)
628 | 
629 | 
630 | # ### start of RNN
631 | 
632 | # In[76]:
633 | 
634 | import torch
635 | import torch.nn as nn
636 | import torchvision.transforms as transforms
637 | import torchvision.datasets as dsets
638 | from torch.autograd import Variable
639 | 
640 | 
641 | # In[77]:
642 | 
643 | class RNNModel(nn.Module):
644 |     def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
645 |         super(RNNModel, self).__init__()
646 |         #Hidden Dimension
647 |         self.hidden_dim = hidden_dim
648 |         
649 |         # Number of hidden layers
650 |         self.layer_dim = layer_dim
651 |         
652 |         #Building the RNN
653 |         self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
654 |         
655 |         # Readout layer
656 |         self.fc = nn.Linear(hidden_dim, output_dim)
657 |         
658 |     def forward(self, x):
659 |         # Initializing the hidden state with zeros
660 |         # (layer_dim, batch_size, hidden_dim)
661 |         h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
662 |         
663 |         #One time step (the last one perhaps?)
664 |         out, hn = self.rnn(x, h0)
665 |         
666 |         # Indexing hidden state of the last time step
667 |         # out.size() --> ??
668 |         #out[:,-1,:] --> is it going to be 100,100
669 |         out = self.fc(out[:,-1,:])
670 |         # out.size() --> 100,1
671 |         return out
672 |         
673 | 
674 | 
675 | # In[78]:
676 | 
677 | # Instantiating Model Class
678 | input_dim = 22
679 | hidden_dim = 15
680 | layer_dim = 1
681 | output_dim = 1
682 | batch_size = 100
683 | 
684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)
685 | 
686 | # Instantiating Loss Class
687 | criterion = nn.MSELoss()
688 | 
689 | # Instantiate Optimizer Class
690 | learning_rate = 0.001
691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
692 | 
693 | # converting numpy array to torch tensor
694 | X_train = torch.from_numpy(X_train)
695 | y_train = torch.from_numpy(y_train)
696 | X_test = torch.from_numpy(X_test)
697 | y_test = torch.from_numpy(y_test)
698 | 
699 | # initializing lists to store losses over epochs:
700 | train_loss = []
701 | test_loss = []
702 | train_iter = []
703 | test_iter = []
704 | 
705 | 
706 | # In[79]:
707 | 
708 | # Training the model
709 | seq_dim = 1
710 | 
711 | n_iter =0
712 | num_samples = len(X_train)
713 | test_samples = len(X_test)
714 | batch_size = 100
715 | num_epochs = 1000
716 | feat_dim = X_train.shape[1]
717 | 
718 | X_train = X_train.type(torch.FloatTensor)
719 | y_train = y_train.type(torch.FloatTensor)
720 | X_test = X_test.type(torch.FloatTensor)
721 | y_test = y_test.type(torch.FloatTensor)
722 | 
723 | for epoch in range(num_epochs):
724 |     for i in range(0, int(num_samples/batch_size -1)):
725 |         
726 |         
727 |         features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
728 |         Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size])
729 |         
730 |         #print("Kt_value={}".format(Kt_value))
731 |         
732 |         optimizer.zero_grad()
733 |         
734 |         outputs = model(features)
735 |         #print("outputs ={}".format(outputs))
736 |         
737 |         loss = criterion(outputs, Kt_value)
738 |         
739 |         train_loss.append(loss.data[0])
740 |         train_iter.append(n_iter)
741 | 
742 |         #print("loss = {}".format(loss))
743 |         loss.backward()
744 |         
745 |         optimizer.step()
746 |         
747 |         
748 |             
749 |         if n_iter%100 == 0:
750 |             for i in range(0,int(test_samples/batch_size -1)):
751 |                 features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
752 |                 Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size])
753 |                 
754 |                 outputs = model(features)
755 |                 
756 |                 mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples)
757 |                 
758 |                 test_iter.append(n_iter)
759 |                 test_loss.append(mse)
760 |                 
761 |             print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse))
762 |         n_iter += 1         
763 | 
764 | 
765 | # In[80]:
766 | 
767 | print(len(test_loss))
768 | #plt.plot(test_loss)
769 | plt.plot(train_loss,'-')
770 | #plt.ylim([0.000,0.99])
771 | 
772 | 
773 | # In[81]:
774 | 
775 | plt.plot(test_loss,'r')
776 | 
777 | 
778 | # #### Demornamization
779 | 
780 | # In[82]:
781 | 
782 | rmse = np.sqrt(mse)
783 | 
784 | 
785 | # In[83]:
786 | 
787 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean()
788 | 
789 | 
790 | # In[84]:
791 | 
792 | rmse_denorm
793 | 
794 | 
795 | # In[85]:
796 | 
797 | df_new_test['Kt'].describe()
798 | 
799 | 
800 | # ### Saving train and test losses to a csv
801 | 
802 | # In[86]:
803 | 
804 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss, 'iteration': train_iter}, columns=['Train Loss','iteration'])
805 | df_trainLoss.to_csv('RNN Paper Results/Exp1_PennState_TrainLoss.csv')
806 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss, 'iteration': test_iter}, columns=['Test Loss','iteration'])
807 | df_testLoss.to_csv('RNN Paper Results/Exp1_PenState_TestLoss.csv')
808 | 
809 | 
810 | # In[ ]:
811 | 
812 | 
813 | 
814 | 


--------------------------------------------------------------------------------
/fixed-time-horizon-prediction/Exp_1.2/Fort_Peck_3hour.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # coding: utf-8
  3 | 
  4 | # In[1]:
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | import datetime
  9 | import glob
 10 | import os.path
 11 | from pandas.compat import StringIO
 12 | 
 13 | 
 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI
 15 | 
 16 | # In[2]:
 17 | 
 18 | import itertools
 19 | import matplotlib.pyplot as plt
 20 | import pandas as pd
 21 | import seaborn as sns
 22 | 
 23 | 
 24 | # In[3]:
 25 | 
 26 | #get_ipython().magic('matplotlib inline')
 27 | sns.set_color_codes()
 28 | 
 29 | 
 30 | # In[4]:
 31 | 
 32 | import pvlib
 33 | from pvlib import clearsky, atmosphere
 34 | from pvlib.location import Location
 35 | 
 36 | 
 37 | # In[5]:
 38 | 
 39 | ftp = Location(48,-106.449, 'US/Mountain', 630.0216, 'Fort Peck')
 40 | 
 41 | 
 42 | # In[6]:
 43 | 
 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min',
 45 |                         tz=ftp.tz)   # 12 months of 2009 - For testing
 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min',
 47 |                         tz=ftp.tz)   # 24 months of 2010 and 2011 - For training
 48 | 
 49 | 
 50 | # In[7]:
 51 | 
 52 | cs_2009 = ftp.get_clearsky(times2009) 
 53 | cs_2010and2011 = ftp.get_clearsky(times2010and2011) # ineichen with climatology table by default
 54 | #cs_2011 = bvl.get_clearsky(times2011) 
 55 | 
 56 | 
 57 | # In[8]:
 58 | 
 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 62 | 
 63 | 
 64 | # In[9]:
 65 | 
 66 | cs_2009.reset_index(inplace=True)
 67 | cs_2010and2011.reset_index(inplace=True)
 68 | #cs_2011.reset_index(inplace=True)
 69 | 
 70 | 
 71 | # In[10]:
 72 | 
 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime())
 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year)
 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month)
 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day)
 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour)
 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute)
 79 | 
 80 | 
 81 | # In[11]:
 82 | 
 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime())
 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year)
 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month)
 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day)
 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour)
 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute)
 89 | 
 90 | 
 91 | # In[12]:
 92 | 
 93 | print(cs_2009.shape)
 94 | print(cs_2010and2011.shape)
 95 | #print(cs_2011.shape)
 96 | 
 97 | 
 98 | # In[13]:
 99 | 
100 | cs_2009.drop(cs_2009.index[-1], inplace=True)
101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True)
102 | #cs_2011.drop(cs_2011.index[-1], inplace=True)
103 | 
104 | 
105 | # In[14]:
106 | 
107 | print(cs_2009.shape)
108 | print(cs_2010and2011.shape)
109 | #print(cs_2011.shape)
110 | 
111 | 
112 | # In[15]:
113 | 
114 | cs_2010and2011.head()
115 | 
116 | 
117 | # ### Import files from each year in a separate dataframe
118 | 
119 | # 
120 | # - year            integer	 year, i.e., 1995
121 | # - jday            integer	 Julian day (1 through 365 [or 366])
122 | # - month           integer	 number of the month (1-12)
123 | # - day             integer	 day of the month(1-31)
124 | # - hour            integer	 hour of the day (0-23)
125 | # - min             integer	 minute of the hour (0-59)
126 | # - dt              real	 decimal time (hour.decimalminutes, e.g., 23.5 = 2330)
127 | # - zen             real	 solar zenith angle (degrees)
128 | # - dw_solar        real	 downwelling global solar (Watts m^-2)
129 | # - uw_solar        real	 upwelling global solar (Watts m^-2)
130 | # - direct_n        real	 direct-normal solar (Watts m^-2)
131 | # - diffuse         real	 downwelling diffuse solar (Watts m^-2)
132 | # - dw_ir           real	 downwelling thermal infrared (Watts m^-2)
133 | # - dw_casetemp     real	 downwelling IR case temp. (K)
134 | # - dw_dometemp     real	 downwelling IR dome temp. (K)
135 | # - uw_ir           real	 upwelling thermal infrared (Watts m^-2)
136 | # - uw_casetemp     real	 upwelling IR case temp. (K)
137 | # - uw_dometemp     real	 upwelling IR dome temp. (K)
138 | # - uvb             real	 global UVB (milliWatts m^-2)
139 | # - par             real	 photosynthetically active radiation (Watts m^-2)
140 | # - netsolar        real	 net solar (dw_solar - uw_solar) (Watts m^-2)
141 | # - netir           real	 net infrared (dw_ir - uw_ir) (Watts m^-2)
142 | # - totalnet        real	 net radiation (netsolar+netir) (Watts m^-2)
143 | # - temp            real	 10-meter air temperature (?C)
144 | # - rh              real	 relative humidity (%)
145 | # - windspd         real	 wind speed (ms^-1)
146 | # - winddir         real	 wind direction (degrees, clockwise from north)
147 | # - pressure        real	 station pressure (mb)
148 | # 
149 | 
150 | # In[16]:
151 | 
152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar',
153 |        'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp',
154 |        'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC',
155 |        'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC',
156 |        'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC',
157 |        'pressure','pressure_QC']
158 | 
159 | 
160 | # In[17]:
161 | 
162 | path = r'./data/Fort_Peck/Exp_1_train'
163 | all_files = glob.glob(path + "/*.dat")
164 | all_files.sort()
165 | 
166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
167 |                  index_col=False,header=None, names=cols) for f in all_files],ignore_index=True)
168 | df_big_train.shape
169 | 
170 | 
171 | # In[18]:
172 | 
173 | path = r'./data/Fort_Peck/Exp_1_test'
174 | all_files = glob.glob(path + "/*.dat")
175 | all_files.sort()
176 | 
177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
178 |                  index_col=False,header=None, names=cols) for f in all_files),ignore_index=True)
179 | df_big_test.shape
180 | 
181 | 
182 | # In[21]:
183 | 
184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape
185 | 
186 | 
187 | # ### Merging Clear Sky GHI And the big dataframe
188 | 
189 | # In[ ]:
190 | 
191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min'])
192 | df_train.shape
193 | 
194 | 
195 | # In[ ]:
196 | 
197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min'])
198 | df_test.shape
199 | 
200 | 
201 | # In[ ]:
202 | 
203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model
204 | df_test.drop(['index'], axis=1, inplace=True)
205 | 
206 | 
207 | # In[ ]:
208 | 
209 | df_train.shape
210 | 
211 | 
212 | # ### Managing missing values
213 | 
214 | # In[ ]:
215 | 
216 | # Resetting index
217 | df_train.reset_index(drop=True, inplace=True)
218 | df_test.reset_index(drop=True, inplace=True)
219 | 
220 | 
221 | # In[ ]:
222 | 
223 | # Dropping rows with two or more -9999.9 values in columns
224 | 
225 | 
226 | # In[ ]:
227 | 
228 | # Step1: Get indices of all rows with 2 or more -999
229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0]
230 | # Step2: Drop those indices
231 | df_train.drop(missing_data_indices, axis=0, inplace=True)
232 | # Checking that the rows are dropped
233 | df_train.shape
234 | 
235 | 
236 | # In[ ]:
237 | 
238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0]
239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True)
240 | df_test.shape
241 | 
242 | 
243 | # In[ ]:
244 | 
245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column
246 | 
247 | 
248 | # #### First resetting index after dropping rows in the previous part of the code
249 | 
250 | # In[ ]:
251 | 
252 | # 2nd time - Reseting Index
253 | df_train.reset_index(drop=True, inplace=True)
254 | df_test.reset_index(drop=True, inplace=True)
255 | 
256 | 
257 | # In[ ]:
258 | 
259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
260 | 
261 | 
262 | # In[ ]:
263 | 
264 | len(one_miss_train_idx)
265 | 
266 | 
267 | # In[ ]:
268 | 
269 | df_train.shape
270 | 
271 | 
272 | # In[ ]:
273 | 
274 | col_names = df_train.columns
275 | from collections import defaultdict
276 | stats = defaultdict(int)
277 | total_single_missing_values = 0
278 | for name in col_names:
279 |     col_mean = df_train[~(df_train[name] == -9999.9)][name].mean()
280 |     missing_indices = np.where((df_train[name] == -9999.9))
281 |     stats[name] = len(missing_indices[0])
282 |     df_train[name].loc[missing_indices] = col_mean
283 |     total_single_missing_values += sum(df_train[name] == -9999.9)
284 |     
285 | 
286 | 
287 | # In[ ]:
288 | 
289 | df_col_min = df_train.apply(min, axis=0)
290 | df_col_max = df_train.apply(max, axis =0)
291 | #print(df_col_min, df_col_max)
292 | 
293 | 
294 | # In[ ]:
295 | 
296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
297 | 
298 | 
299 | # In[ ]:
300 | 
301 | len(train)
302 | 
303 | 
304 | # In[ ]:
305 | 
306 | # doing the same thing on test dataset
307 | 
308 | 
309 | # In[ ]:
310 | 
311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
312 | len(one_miss_test_idx)
313 | 
314 | 
315 | # In[ ]:
316 | 
317 | col_names_test = df_test.columns
318 | from collections import defaultdict
319 | stats_test = defaultdict(int)
320 | total_single_missing_values_test = 0
321 | for name in col_names_test:
322 |     col_mean = df_test[~(df_test[name] == -9999.9)][name].mean()
323 |     missing_indices = np.where((df_test[name] == -9999.9))
324 |     stats_test[name] = len(missing_indices[0])
325 |     df_test[name].loc[missing_indices] = col_mean
326 |     total_single_missing_values_test += sum(df_test[name] == -9999.9)
327 |     
328 | 
329 | 
330 | # In[8]:
331 | 
332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
333 | 
334 | 
335 | # In[ ]:
336 | 
337 | len(test)
338 | 
339 | 
340 | # In[ ]:
341 | 
342 | df_train.shape
343 | 
344 | 
345 | # In[ ]:
346 | 
347 | df_test.shape
348 | 
349 | 
350 | # ### Exploratory Data Analysis
351 | 
352 | # In[ ]:
353 | 
354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean()
355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean()
356 | j_day = df_test['jday'].unique()
357 | 
358 | 
359 | # In[ ]:
360 | 
361 | fig = plt.figure()
362 | 
363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8])
364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8])
365 | 
366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red')
367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green')
368 | 
369 | axes1.set_xlabel('Days')
370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)')
371 | axes1.set_title('Solar Irradiance - Test Year 2009')
372 | axes1.legend(loc='best')
373 | 
374 | fig.savefig('./RNN Paper Results/Exp1_2/Fort_Peck/3hour_Figure 2.jpg', bbox_inches = 'tight')
375 | 
376 | 
377 | # In[ ]:
378 | 
379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg')
380 | #plt.title('observed dw_solar vs clear sky ghi')
381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)')
382 | plt.ylabel('Clear Sky GHI (Watts/m^2)')
383 | plt.savefig('./RNN Paper Results/Exp1_2/Fort_Peck/3hour_Figure 3', bbox_inches='tight')
384 | 
385 | 
386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0
387 | 
388 | # In[ ]:
389 | 
390 | df_train = df_train[df_train['ghi']!=0]
391 | df_test = df_test[df_test['ghi']!=0]
392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi']
393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi']
394 | 
395 | 
396 | # In[ ]:
397 | 
398 | df_train.reset_index(inplace=True)
399 | df_test.reset_index(inplace=True)
400 | 
401 | 
402 | # In[ ]:
403 | 
404 | print("test Kt max: "+str(df_test['Kt'].max()))
405 | print("test Kt min: "+str(df_test['Kt'].min()))
406 | print("test Kt mean: "+str(df_test['Kt'].mean()))
407 | print("\n")
408 | print("train Kt max: "+str(df_train['Kt'].max()))
409 | print("train Kt min: "+str(df_train['Kt'].min()))
410 | print("train Kt mean: "+str(df_train['Kt'].mean()))
411 | 
412 | 
413 | # In[ ]:
414 | 
415 | plt.plot(df_train['Kt'])
416 | 
417 | 
418 | # In[ ]:
419 | 
420 | plt.plot(df_test['Kt'])
421 | 
422 | 
423 | # In[ ]:
424 | 
425 | df_train= df_train[df_train['Kt']< 5000]
426 | df_train= df_train[df_train['Kt']> -1000]
427 | df_test= df_test[df_test['Kt']< 5000]
428 | df_test= df_test[df_test['Kt']> -1000]
429 | 
430 | 
431 | # #### Group the data (train dataframe)
432 | 
433 | # In[ ]:
434 | 
435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean()
436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean()
437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean()
438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean()
439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean()
440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean()
441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean()
442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean()
443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean()
444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean()
445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean()
446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean()
447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean()
448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean()
449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean()
450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean()
451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean()
452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean()
453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean()
454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean()
455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean()
456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean()
457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean()
458 | 
459 | 
460 | # In[ ]:
461 | 
462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp,
463 |                     uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1)
464 | 
465 | 
466 | # In[ ]:
467 | 
468 | df_new_train.head()
469 | 
470 | 
471 | # #### Groupdata - test dataframe
472 | 
473 | # In[ ]:
474 | 
475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean()
476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean()
477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean()
478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean()
479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean()
480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean()
481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean()
482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean()
483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean()
484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean()
485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean()
486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean()
487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean()
488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean()
489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean()
490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean()
491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean()
492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean()
493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean()
494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean()
495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean()
496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean()
497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean()
498 | 
499 | 
500 | # In[ ]:
501 | 
502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir,
503 |                          test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp,
504 |                     test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh,
505 |                          test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1)
506 | 
507 | 
508 | # In[ ]:
509 | 
510 | df_new_test.loc[2].xs(17,level='day')
511 | 
512 | 
513 | # ### Shifting Kt values to make 1 hour ahead forecast
514 | 
515 | # #### Train dataset
516 | 
517 | # In[ ]:
518 | 
519 | levels_index= []
520 | for m in df_new_train.index.levels:
521 |     levels_index.append(m)
522 | 
523 | 
524 | # In[ ]:
525 | 
526 | for i in levels_index[0]:
527 |     for j in levels_index[1]:
528 |         df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-3)
529 | 
530 | 
531 | # In[ ]:
532 | 
533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())]
534 | 
535 | 
536 | # #### Test dataset
537 | 
538 | # In[ ]:
539 | 
540 | levels_index2= []
541 | for m in df_new_test.index.levels:
542 |     levels_index2.append(m)
543 | 
544 | 
545 | # In[ ]:
546 | 
547 | for i in levels_index2[0]:
548 |     for j in levels_index2[1]:
549 |         df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-3)
550 | 
551 | 
552 | # In[ ]:
553 | 
554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())]
555 | 
556 | 
557 | # In[ ]:
558 | 
559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()]
560 | 
561 | 
562 | # ### Normalize train and test dataframe
563 | 
564 | # In[ ]:
565 | 
566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min())
567 | test_norm =  (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min())
568 | 
569 | 
570 | # In[ ]:
571 | 
572 | train_norm.reset_index(inplace=True,drop=True)
573 | test_norm.reset_index(inplace=True,drop=True)
574 | 
575 | 
576 | # ### Making train and test sets with train_norm and test_norm
577 | 
578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize
579 | 
580 | # In[89]:
581 | 
582 | from fractions import gcd
583 | gcd(train_norm.shape[0],test_norm.shape[0])
584 | 
585 | 
586 | # In[ ]:
587 | 
588 | import math
589 | def roundup(x):
590 |     return int(math.ceil(x / 100.0)) * 100
591 | 
592 | 
593 | # In[ ]:
594 | 
595 | train_lim = roundup(train_norm.shape[0])
596 | test_lim = roundup(test_norm.shape[0])
597 | 
598 | train_random = train_norm.sample(train_lim-train_norm.shape[0])
599 | test_random = test_norm.sample(test_lim-test_norm.shape[0])
600 | 
601 | train_norm = train_norm.append(train_random)
602 | test_norm = test_norm.append(test_random)
603 | 
604 | 
605 | # In[ ]:
606 | 
607 | X1 = train_norm.drop('Kt',axis=1)
608 | y1 = train_norm['Kt']
609 | 
610 | X2 = test_norm.drop('Kt',axis=1)
611 | y2 = test_norm['Kt']
612 | 
613 | 
614 | # In[ ]:
615 | 
616 | print("X1_train shape is {}".format(X1.shape))
617 | print("y1_train shape is {}".format(y1.shape))
618 | print("X2_test shape is {}".format(X2.shape))
619 | print("y2_test shape is {}".format(y2.shape))
620 | 
621 | 
622 | # In[ ]:
623 | 
624 | X_train = np.array(X1)
625 | y_train  = np.array(y1)
626 | X_test = np.array(X2)
627 | y_test = np.array(y2)
628 | 
629 | 
630 | # ### start of RNN
631 | 
632 | # In[117]:
633 | 
634 | import torch
635 | import torch.nn as nn
636 | import torchvision.transforms as transforms
637 | import torchvision.datasets as dsets
638 | from torch.autograd import Variable
639 | 
640 | 
641 | # In[118]:
642 | 
643 | class RNNModel(nn.Module):
644 |     def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
645 |         super(RNNModel, self).__init__()
646 |         #Hidden Dimension
647 |         self.hidden_dim = hidden_dim
648 |         
649 |         # Number of hidden layers
650 |         self.layer_dim = layer_dim
651 |         
652 |         #Building the RNN
653 |         self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
654 |         
655 |         # Readout layer
656 |         self.fc = nn.Linear(hidden_dim, output_dim)
657 |         
658 |     def forward(self, x):
659 |         # Initializing the hidden state with zeros
660 |         # (layer_dim, batch_size, hidden_dim)
661 |         h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
662 |         
663 |         #One time step (the last one perhaps?)
664 |         out, hn = self.rnn(x, h0)
665 |         
666 |         # Indexing hidden state of the last time step
667 |         # out.size() --> ??
668 |         #out[:,-1,:] --> is it going to be 100,100
669 |         out = self.fc(out[:,-1,:])
670 |         # out.size() --> 100,1
671 |         return out
672 |         
673 | 
674 | 
675 | # In[119]:
676 | 
677 | # Instantiating Model Class
678 | input_dim = 22
679 | hidden_dim = 15
680 | layer_dim = 1
681 | output_dim = 1
682 | batch_size = 100
683 | 
684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)
685 | 
686 | # Instantiating Loss Class
687 | criterion = nn.MSELoss()
688 | 
689 | # Instantiate Optimizer Class
690 | learning_rate = 0.001
691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
692 | 
693 | # converting numpy array to torch tensor
694 | X_train = torch.from_numpy(X_train)
695 | y_train = torch.from_numpy(y_train)
696 | X_test = torch.from_numpy(X_test)
697 | y_test = torch.from_numpy(y_test)
698 | 
699 | # initializing lists to store losses over epochs:
700 | train_loss = []
701 | test_loss = []
702 | train_iter = []
703 | test_iter = []
704 | 
705 | 
706 | # In[ ]:
707 | 
708 | # Training the model
709 | seq_dim = 1
710 | 
711 | n_iter =0
712 | num_samples = len(X_train)
713 | test_samples = len(X_test)
714 | batch_size = 100
715 | num_epochs = 1000
716 | feat_dim = X_train.shape[1]
717 | 
718 | X_train = X_train.type(torch.FloatTensor)
719 | y_train = y_train.type(torch.FloatTensor)
720 | X_test = X_test.type(torch.FloatTensor)
721 | y_test = y_test.type(torch.FloatTensor)
722 | 
723 | for epoch in range(num_epochs):
724 |     for i in range(0, int(num_samples/batch_size -1)):
725 |         
726 |         
727 |         features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
728 |         Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size])
729 |         
730 |         #print("Kt_value={}".format(Kt_value))
731 |         
732 |         optimizer.zero_grad()
733 |         
734 |         outputs = model(features)
735 |         #print("outputs ={}".format(outputs))
736 |         
737 |         loss = criterion(outputs, Kt_value)
738 |         
739 |         train_loss.append(loss.data[0])
740 |         train_iter.append(n_iter)
741 | 
742 |         #print("loss = {}".format(loss))
743 |         loss.backward()
744 |         
745 |         optimizer.step()
746 |         
747 |        
748 |             
749 |         if n_iter%100 == 0:
750 |             for i in range(0,int(test_samples/batch_size -1)):
751 |                 features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
752 |                 Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size])
753 |                 
754 |                 outputs = model(features)
755 |                 
756 |                 mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples)
757 |                 
758 |                 test_iter.append(n_iter)
759 |                 test_loss.append(mse)
760 |                 
761 |             print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse))
762 |          
763 |         n_iter += 1  
764 | 
765 | 
766 | # In[145]:
767 | 
768 | print(len(test_loss))
769 | #plt.plot(test_loss)
770 | plt.plot(train_loss,'-')
771 | #plt.ylim([0.000,0.99])
772 | 
773 | 
774 | # In[146]:
775 | 
776 | plt.plot(test_loss,'r')
777 | 
778 | 
779 | # #### Demornamization
780 | 
781 | # In[161]:
782 | 
783 | rmse = np.sqrt(mse)
784 | 
785 | 
786 | # In[243]:
787 | 
788 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean()
789 | 
790 | 
791 | # In[244]:
792 | 
793 | print("rmse_denorm",rmse_denorm)
794 | 
795 | 
796 | # In[259]:
797 | 
798 | print(df_new_test['Kt'].describe())
799 | 
800 | 
801 | # ### Saving train and test losses to a csv
802 | 
803 | # In[ ]:
804 | 
805 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss}, columns=['Train Loss'])
806 | df_trainLoss.to_csv('./RNN Paper Results/Exp1_2/Fort_Peck/3hour_FortPeck_TrainLoss.csv')
807 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss}, columns=['Test Loss'])
808 | df_testLoss.to_csv('./RNN Paper Results/Exp1_2/Fort_Peck/3hour_FortPeck_TestLoss.csv')
809 | 
810 | 


--------------------------------------------------------------------------------
/fixed-time-horizon-prediction/Exp_1.2/Fort_Peck_4hour.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # coding: utf-8
  3 | 
  4 | # In[1]:
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | import datetime
  9 | import glob
 10 | import os.path
 11 | from pandas.compat import StringIO
 12 | 
 13 | 
 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI
 15 | 
 16 | # In[2]:
 17 | 
 18 | import itertools
 19 | import matplotlib.pyplot as plt
 20 | import pandas as pd
 21 | import seaborn as sns
 22 | 
 23 | 
 24 | # In[3]:
 25 | 
 26 | #get_ipython().magic('matplotlib inline')
 27 | sns.set_color_codes()
 28 | 
 29 | 
 30 | # In[4]:
 31 | 
 32 | import pvlib
 33 | from pvlib import clearsky, atmosphere
 34 | from pvlib.location import Location
 35 | 
 36 | 
 37 | # In[5]:
 38 | 
 39 | ftp = Location(48,-106.449, 'US/Mountain', 630.0216, 'Fort Peck')
 40 | 
 41 | 
 42 | # In[6]:
 43 | 
 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min',
 45 |                         tz=ftp.tz)   # 12 months of 2009 - For testing
 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min',
 47 |                         tz=ftp.tz)   # 24 months of 2010 and 2011 - For training
 48 | 
 49 | 
 50 | # In[7]:
 51 | 
 52 | cs_2009 = ftp.get_clearsky(times2009) 
 53 | cs_2010and2011 = ftp.get_clearsky(times2010and2011) # ineichen with climatology table by default
 54 | #cs_2011 = bvl.get_clearsky(times2011) 
 55 | 
 56 | 
 57 | # In[8]:
 58 | 
 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 62 | 
 63 | 
 64 | # In[9]:
 65 | 
 66 | cs_2009.reset_index(inplace=True)
 67 | cs_2010and2011.reset_index(inplace=True)
 68 | #cs_2011.reset_index(inplace=True)
 69 | 
 70 | 
 71 | # In[10]:
 72 | 
 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime())
 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year)
 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month)
 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day)
 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour)
 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute)
 79 | 
 80 | 
 81 | # In[11]:
 82 | 
 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime())
 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year)
 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month)
 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day)
 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour)
 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute)
 89 | 
 90 | 
 91 | # In[12]:
 92 | 
 93 | print(cs_2009.shape)
 94 | print(cs_2010and2011.shape)
 95 | #print(cs_2011.shape)
 96 | 
 97 | 
 98 | # In[13]:
 99 | 
100 | cs_2009.drop(cs_2009.index[-1], inplace=True)
101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True)
102 | #cs_2011.drop(cs_2011.index[-1], inplace=True)
103 | 
104 | 
105 | # In[14]:
106 | 
107 | print(cs_2009.shape)
108 | print(cs_2010and2011.shape)
109 | #print(cs_2011.shape)
110 | 
111 | 
112 | # In[15]:
113 | 
114 | cs_2010and2011.head()
115 | 
116 | 
117 | # ### Import files from each year in a separate dataframe
118 | 
119 | # 
120 | # - year            integer	 year, i.e., 1995
121 | # - jday            integer	 Julian day (1 through 365 [or 366])
122 | # - month           integer	 number of the month (1-12)
123 | # - day             integer	 day of the month(1-31)
124 | # - hour            integer	 hour of the day (0-23)
125 | # - min             integer	 minute of the hour (0-59)
126 | # - dt              real	 decimal time (hour.decimalminutes, e.g., 23.5 = 2330)
127 | # - zen             real	 solar zenith angle (degrees)
128 | # - dw_solar        real	 downwelling global solar (Watts m^-2)
129 | # - uw_solar        real	 upwelling global solar (Watts m^-2)
130 | # - direct_n        real	 direct-normal solar (Watts m^-2)
131 | # - diffuse         real	 downwelling diffuse solar (Watts m^-2)
132 | # - dw_ir           real	 downwelling thermal infrared (Watts m^-2)
133 | # - dw_casetemp     real	 downwelling IR case temp. (K)
134 | # - dw_dometemp     real	 downwelling IR dome temp. (K)
135 | # - uw_ir           real	 upwelling thermal infrared (Watts m^-2)
136 | # - uw_casetemp     real	 upwelling IR case temp. (K)
137 | # - uw_dometemp     real	 upwelling IR dome temp. (K)
138 | # - uvb             real	 global UVB (milliWatts m^-2)
139 | # - par             real	 photosynthetically active radiation (Watts m^-2)
140 | # - netsolar        real	 net solar (dw_solar - uw_solar) (Watts m^-2)
141 | # - netir           real	 net infrared (dw_ir - uw_ir) (Watts m^-2)
142 | # - totalnet        real	 net radiation (netsolar+netir) (Watts m^-2)
143 | # - temp            real	 10-meter air temperature (?C)
144 | # - rh              real	 relative humidity (%)
145 | # - windspd         real	 wind speed (ms^-1)
146 | # - winddir         real	 wind direction (degrees, clockwise from north)
147 | # - pressure        real	 station pressure (mb)
148 | # 
149 | 
150 | # In[16]:
151 | 
152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar',
153 |        'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp',
154 |        'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC',
155 |        'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC',
156 |        'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC',
157 |        'pressure','pressure_QC']
158 | 
159 | 
160 | # In[17]:
161 | 
162 | path = r'./data/Fort_Peck/Exp_1_train'
163 | all_files = glob.glob(path + "/*.dat")
164 | all_files.sort()
165 | 
166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
167 |                  index_col=False,header=None, names=cols) for f in all_files],ignore_index=True)
168 | df_big_train.shape
169 | 
170 | 
171 | # In[18]:
172 | 
173 | path = r'./data/Fort_Peck/Exp_1_test'
174 | all_files = glob.glob(path + "/*.dat")
175 | all_files.sort()
176 | 
177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
178 |                  index_col=False,header=None, names=cols) for f in all_files),ignore_index=True)
179 | df_big_test.shape
180 | 
181 | 
182 | # In[21]:
183 | 
184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape
185 | 
186 | 
187 | # ### Merging Clear Sky GHI And the big dataframe
188 | 
189 | # In[ ]:
190 | 
191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min'])
192 | df_train.shape
193 | 
194 | 
195 | # In[ ]:
196 | 
197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min'])
198 | df_test.shape
199 | 
200 | 
201 | # In[ ]:
202 | 
203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model
204 | df_test.drop(['index'], axis=1, inplace=True)
205 | 
206 | 
207 | # In[ ]:
208 | 
209 | df_train.shape
210 | 
211 | 
212 | # ### Managing missing values
213 | 
214 | # In[ ]:
215 | 
216 | # Resetting index
217 | df_train.reset_index(drop=True, inplace=True)
218 | df_test.reset_index(drop=True, inplace=True)
219 | 
220 | 
221 | # In[ ]:
222 | 
223 | # Dropping rows with two or more -9999.9 values in columns
224 | 
225 | 
226 | # In[ ]:
227 | 
228 | # Step1: Get indices of all rows with 2 or more -999
229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0]
230 | # Step2: Drop those indices
231 | df_train.drop(missing_data_indices, axis=0, inplace=True)
232 | # Checking that the rows are dropped
233 | df_train.shape
234 | 
235 | 
236 | # In[ ]:
237 | 
238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0]
239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True)
240 | df_test.shape
241 | 
242 | 
243 | # In[ ]:
244 | 
245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column
246 | 
247 | 
248 | # #### First resetting index after dropping rows in the previous part of the code
249 | 
250 | # In[ ]:
251 | 
252 | # 2nd time - Reseting Index
253 | df_train.reset_index(drop=True, inplace=True)
254 | df_test.reset_index(drop=True, inplace=True)
255 | 
256 | 
257 | # In[ ]:
258 | 
259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
260 | 
261 | 
262 | # In[ ]:
263 | 
264 | len(one_miss_train_idx)
265 | 
266 | 
267 | # In[ ]:
268 | 
269 | df_train.shape
270 | 
271 | 
272 | # In[ ]:
273 | 
274 | col_names = df_train.columns
275 | from collections import defaultdict
276 | stats = defaultdict(int)
277 | total_single_missing_values = 0
278 | for name in col_names:
279 |     col_mean = df_train[~(df_train[name] == -9999.9)][name].mean()
280 |     missing_indices = np.where((df_train[name] == -9999.9))
281 |     stats[name] = len(missing_indices[0])
282 |     df_train[name].loc[missing_indices] = col_mean
283 |     total_single_missing_values += sum(df_train[name] == -9999.9)
284 |     
285 | 
286 | 
287 | # In[ ]:
288 | 
289 | df_col_min = df_train.apply(min, axis=0)
290 | df_col_max = df_train.apply(max, axis =0)
291 | #print(df_col_min, df_col_max)
292 | 
293 | 
294 | # In[ ]:
295 | 
296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
297 | 
298 | 
299 | # In[ ]:
300 | 
301 | len(train)
302 | 
303 | 
304 | # In[ ]:
305 | 
306 | # doing the same thing on test dataset
307 | 
308 | 
309 | # In[ ]:
310 | 
311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
312 | len(one_miss_test_idx)
313 | 
314 | 
315 | # In[ ]:
316 | 
317 | col_names_test = df_test.columns
318 | from collections import defaultdict
319 | stats_test = defaultdict(int)
320 | total_single_missing_values_test = 0
321 | for name in col_names_test:
322 |     col_mean = df_test[~(df_test[name] == -9999.9)][name].mean()
323 |     missing_indices = np.where((df_test[name] == -9999.9))
324 |     stats_test[name] = len(missing_indices[0])
325 |     df_test[name].loc[missing_indices] = col_mean
326 |     total_single_missing_values_test += sum(df_test[name] == -9999.9)
327 |     
328 | 
329 | 
330 | # In[8]:
331 | 
332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
333 | 
334 | 
335 | # In[ ]:
336 | 
337 | len(test)
338 | 
339 | 
340 | # In[ ]:
341 | 
342 | df_train.shape
343 | 
344 | 
345 | # In[ ]:
346 | 
347 | df_test.shape
348 | 
349 | 
350 | # ### Exploratory Data Analysis
351 | 
352 | # In[ ]:
353 | 
354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean()
355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean()
356 | j_day = df_test['jday'].unique()
357 | 
358 | 
359 | # In[ ]:
360 | 
361 | fig = plt.figure()
362 | 
363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8])
364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8])
365 | 
366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red')
367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green')
368 | 
369 | axes1.set_xlabel('Days')
370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)')
371 | axes1.set_title('Solar Irradiance - Test Year 2009')
372 | axes1.legend(loc='best')
373 | 
374 | fig.savefig('./RNN Paper Results/Exp1_2/Fort_Peck/4hour_Figure 2.jpg', bbox_inches = 'tight')
375 | 
376 | 
377 | # In[ ]:
378 | 
379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg')
380 | #plt.title('observed dw_solar vs clear sky ghi')
381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)')
382 | plt.ylabel('Clear Sky GHI (Watts/m^2)')
383 | plt.savefig('./RNN Paper Results/Exp1_2/Fort_Peck/4hour_Figure 3', bbox_inches='tight')
384 | 
385 | 
386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0
387 | 
388 | # In[ ]:
389 | 
390 | df_train = df_train[df_train['ghi']!=0]
391 | df_test = df_test[df_test['ghi']!=0]
392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi']
393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi']
394 | 
395 | 
396 | # In[ ]:
397 | 
398 | df_train.reset_index(inplace=True)
399 | df_test.reset_index(inplace=True)
400 | 
401 | 
402 | # In[ ]:
403 | 
404 | print("test Kt max: "+str(df_test['Kt'].max()))
405 | print("test Kt min: "+str(df_test['Kt'].min()))
406 | print("test Kt mean: "+str(df_test['Kt'].mean()))
407 | print("\n")
408 | print("train Kt max: "+str(df_train['Kt'].max()))
409 | print("train Kt min: "+str(df_train['Kt'].min()))
410 | print("train Kt mean: "+str(df_train['Kt'].mean()))
411 | 
412 | 
413 | # In[ ]:
414 | 
415 | plt.plot(df_train['Kt'])
416 | 
417 | 
418 | # In[ ]:
419 | 
420 | plt.plot(df_test['Kt'])
421 | 
422 | 
423 | # In[ ]:
424 | 
425 | df_train= df_train[df_train['Kt']< 5000]
426 | df_train= df_train[df_train['Kt']> -1000]
427 | df_test= df_test[df_test['Kt']< 5000]
428 | df_test= df_test[df_test['Kt']> -1000]
429 | 
430 | 
431 | # #### Group the data (train dataframe)
432 | 
433 | # In[ ]:
434 | 
435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean()
436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean()
437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean()
438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean()
439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean()
440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean()
441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean()
442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean()
443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean()
444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean()
445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean()
446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean()
447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean()
448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean()
449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean()
450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean()
451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean()
452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean()
453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean()
454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean()
455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean()
456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean()
457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean()
458 | 
459 | 
460 | # In[ ]:
461 | 
462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp,
463 |                     uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1)
464 | 
465 | 
466 | # In[ ]:
467 | 
468 | df_new_train.head()
469 | 
470 | 
471 | # #### Groupdata - test dataframe
472 | 
473 | # In[ ]:
474 | 
475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean()
476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean()
477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean()
478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean()
479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean()
480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean()
481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean()
482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean()
483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean()
484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean()
485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean()
486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean()
487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean()
488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean()
489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean()
490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean()
491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean()
492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean()
493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean()
494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean()
495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean()
496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean()
497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean()
498 | 
499 | 
500 | # In[ ]:
501 | 
502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir,
503 |                          test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp,
504 |                     test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh,
505 |                          test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1)
506 | 
507 | 
508 | # In[ ]:
509 | 
510 | df_new_test.loc[2].xs(17,level='day')
511 | 
512 | 
513 | # ### Shifting Kt values to make 1 hour ahead forecast
514 | 
515 | # #### Train dataset
516 | 
517 | # In[ ]:
518 | 
519 | levels_index= []
520 | for m in df_new_train.index.levels:
521 |     levels_index.append(m)
522 | 
523 | 
524 | # In[ ]:
525 | 
526 | for i in levels_index[0]:
527 |     for j in levels_index[1]:
528 |         df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-4)
529 | 
530 | 
531 | # In[ ]:
532 | 
533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())]
534 | 
535 | 
536 | # #### Test dataset
537 | 
538 | # In[ ]:
539 | 
540 | levels_index2= []
541 | for m in df_new_test.index.levels:
542 |     levels_index2.append(m)
543 | 
544 | 
545 | # In[ ]:
546 | 
547 | for i in levels_index2[0]:
548 |     for j in levels_index2[1]:
549 |         df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-4)
550 | 
551 | 
552 | # In[ ]:
553 | 
554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())]
555 | 
556 | 
557 | # In[ ]:
558 | 
559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()]
560 | 
561 | 
562 | # ### Normalize train and test dataframe
563 | 
564 | # In[ ]:
565 | 
566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min())
567 | test_norm =  (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min())
568 | 
569 | 
570 | # In[ ]:
571 | 
572 | train_norm.reset_index(inplace=True,drop=True)
573 | test_norm.reset_index(inplace=True,drop=True)
574 | 
575 | 
576 | # ### Making train and test sets with train_norm and test_norm
577 | 
578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize
579 | 
580 | # In[89]:
581 | 
582 | from fractions import gcd
583 | gcd(train_norm.shape[0],test_norm.shape[0])
584 | 
585 | 
586 | # In[ ]:
587 | 
588 | import math
589 | def roundup(x):
590 |     return int(math.ceil(x / 100.0)) * 100
591 | 
592 | 
593 | # In[ ]:
594 | 
595 | train_lim = roundup(train_norm.shape[0])
596 | test_lim = roundup(test_norm.shape[0])
597 | 
598 | train_random = train_norm.sample(train_lim-train_norm.shape[0])
599 | test_random = test_norm.sample(test_lim-test_norm.shape[0])
600 | 
601 | train_norm = train_norm.append(train_random)
602 | test_norm = test_norm.append(test_random)
603 | 
604 | 
605 | # In[ ]:
606 | 
607 | X1 = train_norm.drop('Kt',axis=1)
608 | y1 = train_norm['Kt']
609 | 
610 | X2 = test_norm.drop('Kt',axis=1)
611 | y2 = test_norm['Kt']
612 | 
613 | 
614 | # In[ ]:
615 | 
616 | print("X1_train shape is {}".format(X1.shape))
617 | print("y1_train shape is {}".format(y1.shape))
618 | print("X2_test shape is {}".format(X2.shape))
619 | print("y2_test shape is {}".format(y2.shape))
620 | 
621 | 
622 | # In[ ]:
623 | 
624 | X_train = np.array(X1)
625 | y_train  = np.array(y1)
626 | X_test = np.array(X2)
627 | y_test = np.array(y2)
628 | 
629 | 
630 | # ### start of RNN
631 | 
632 | # In[117]:
633 | 
634 | import torch
635 | import torch.nn as nn
636 | import torchvision.transforms as transforms
637 | import torchvision.datasets as dsets
638 | from torch.autograd import Variable
639 | 
640 | 
641 | # In[118]:
642 | 
643 | class RNNModel(nn.Module):
644 |     def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
645 |         super(RNNModel, self).__init__()
646 |         #Hidden Dimension
647 |         self.hidden_dim = hidden_dim
648 |         
649 |         # Number of hidden layers
650 |         self.layer_dim = layer_dim
651 |         
652 |         #Building the RNN
653 |         self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
654 |         
655 |         # Readout layer
656 |         self.fc = nn.Linear(hidden_dim, output_dim)
657 |         
658 |     def forward(self, x):
659 |         # Initializing the hidden state with zeros
660 |         # (layer_dim, batch_size, hidden_dim)
661 |         h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
662 |         
663 |         #One time step (the last one perhaps?)
664 |         out, hn = self.rnn(x, h0)
665 |         
666 |         # Indexing hidden state of the last time step
667 |         # out.size() --> ??
668 |         #out[:,-1,:] --> is it going to be 100,100
669 |         out = self.fc(out[:,-1,:])
670 |         # out.size() --> 100,1
671 |         return out
672 |         
673 | 
674 | 
675 | # In[119]:
676 | 
677 | # Instantiating Model Class
678 | input_dim = 22
679 | hidden_dim = 15
680 | layer_dim = 1
681 | output_dim = 1
682 | batch_size = 100
683 | 
684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)
685 | 
686 | # Instantiating Loss Class
687 | criterion = nn.MSELoss()
688 | 
689 | # Instantiate Optimizer Class
690 | learning_rate = 0.001
691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
692 | 
693 | # converting numpy array to torch tensor
694 | X_train = torch.from_numpy(X_train)
695 | y_train = torch.from_numpy(y_train)
696 | X_test = torch.from_numpy(X_test)
697 | y_test = torch.from_numpy(y_test)
698 | 
699 | # initializing lists to store losses over epochs:
700 | train_loss = []
701 | test_loss = []
702 | train_iter = []
703 | test_iter = []
704 | 
705 | 
706 | # In[ ]:
707 | 
708 | # Training the model
709 | seq_dim = 1
710 | 
711 | n_iter =0
712 | num_samples = len(X_train)
713 | test_samples = len(X_test)
714 | batch_size = 100
715 | num_epochs = 1000
716 | feat_dim = X_train.shape[1]
717 | 
718 | X_train = X_train.type(torch.FloatTensor)
719 | y_train = y_train.type(torch.FloatTensor)
720 | X_test = X_test.type(torch.FloatTensor)
721 | y_test = y_test.type(torch.FloatTensor)
722 | 
723 | for epoch in range(num_epochs):
724 |     for i in range(0, int(num_samples/batch_size -1)):
725 |         
726 |         
727 |         features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
728 |         Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size])
729 |         
730 |         #print("Kt_value={}".format(Kt_value))
731 |         
732 |         optimizer.zero_grad()
733 |         
734 |         outputs = model(features)
735 |         #print("outputs ={}".format(outputs))
736 |         
737 |         loss = criterion(outputs, Kt_value)
738 |         
739 |         train_loss.append(loss.data[0])
740 |         train_iter.append(n_iter)
741 | 
742 |         #print("loss = {}".format(loss))
743 |         loss.backward()
744 |         
745 |         optimizer.step()
746 |         
747 |        
748 |             
749 |         if n_iter%100 == 0:
750 |             for i in range(0,int(test_samples/batch_size -1)):
751 |                 features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
752 |                 Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size])
753 |                 
754 |                 outputs = model(features)
755 |                 
756 |                 mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples)
757 |                 
758 |                 test_iter.append(n_iter)
759 |                 test_loss.append(mse)
760 |                 
761 |             print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse))
762 |          
763 |         n_iter += 1  
764 | 
765 | 
766 | # In[145]:
767 | 
768 | print(len(test_loss))
769 | #plt.plot(test_loss)
770 | plt.plot(train_loss,'-')
771 | #plt.ylim([0.000,0.99])
772 | 
773 | 
774 | # In[146]:
775 | 
776 | plt.plot(test_loss,'r')
777 | 
778 | 
779 | # #### Demornamization
780 | 
781 | # In[161]:
782 | 
783 | rmse = np.sqrt(mse)
784 | 
785 | 
786 | # In[243]:
787 | 
788 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean()
789 | 
790 | 
791 | # In[244]:
792 | 
793 | print("rmse_denorm",rmse_denorm)
794 | 
795 | 
796 | # In[259]:
797 | 
798 | print(df_new_test['Kt'].describe())
799 | 
800 | 
801 | # ### Saving train and test losses to a csv
802 | 
803 | # In[ ]:
804 | 
805 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss}, columns=['Train Loss'])
806 | df_trainLoss.to_csv('./RNN Paper Results/Exp1_2/Fort_Peck/4hour_FortPeck_TrainLoss.csv')
807 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss}, columns=['Test Loss'])
808 | df_testLoss.to_csv('./RNN Paper Results/Exp1_2/Fort_Peck/4hour_FortPeck_TestLoss.csv')
809 | 
810 | 


--------------------------------------------------------------------------------
/fixed-time-horizon-prediction/Exp_1.2/RNN_Fort_Peck.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # coding: utf-8
  3 | 
  4 | # In[1]:
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | import datetime
  9 | import glob
 10 | import os.path
 11 | from pandas.compat import StringIO
 12 | 
 13 | 
 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI
 15 | 
 16 | # In[2]:
 17 | 
 18 | import itertools
 19 | import matplotlib.pyplot as plt
 20 | import pandas as pd
 21 | import seaborn as sns
 22 | 
 23 | 
 24 | # In[3]:
 25 | 
 26 | #get_ipython().magic('matplotlib inline')
 27 | sns.set_color_codes()
 28 | 
 29 | 
 30 | # In[4]:
 31 | 
 32 | import pvlib
 33 | from pvlib import clearsky, atmosphere
 34 | from pvlib.location import Location
 35 | 
 36 | 
 37 | # In[5]:
 38 | 
 39 | ftp = Location(48,-106.449, 'US/Mountain', 630.0216, 'Fort Peck')
 40 | 
 41 | 
 42 | # In[6]:
 43 | 
 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min',
 45 |                         tz=ftp.tz)   # 12 months of 2009 - For testing
 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min',
 47 |                         tz=ftp.tz)   # 24 months of 2010 and 2011 - For training
 48 | 
 49 | 
 50 | # In[9]:
 51 | 
 52 | cs_2009 = ftp.get_clearsky(times2009) 
 53 | cs_2010and2011 = ftp.get_clearsky(times2010and2011) # ineichen with climatology table by default
 54 | #cs_2011 = bvl.get_clearsky(times2011) 
 55 | 
 56 | 
 57 | # In[10]:
 58 | 
 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 62 | 
 63 | 
 64 | # In[11]:
 65 | 
 66 | cs_2009.reset_index(inplace=True)
 67 | cs_2010and2011.reset_index(inplace=True)
 68 | #cs_2011.reset_index(inplace=True)
 69 | 
 70 | 
 71 | # In[12]:
 72 | 
 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime())
 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year)
 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month)
 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day)
 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour)
 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute)
 79 | 
 80 | 
 81 | # In[13]:
 82 | 
 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime())
 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year)
 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month)
 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day)
 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour)
 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute)
 89 | 
 90 | 
 91 | # In[14]:
 92 | 
 93 | print(cs_2009.shape)
 94 | print(cs_2010and2011.shape)
 95 | #print(cs_2011.shape)
 96 | 
 97 | 
 98 | # In[15]:
 99 | 
100 | cs_2009.drop(cs_2009.index[-1], inplace=True)
101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True)
102 | #cs_2011.drop(cs_2011.index[-1], inplace=True)
103 | 
104 | 
105 | # In[16]:
106 | 
107 | print(cs_2009.shape)
108 | print(cs_2010and2011.shape)
109 | #print(cs_2011.shape)
110 | 
111 | 
112 | # In[17]:
113 | 
114 | cs_2010and2011.head()
115 | 
116 | 
117 | # ### Import files from each year in a separate dataframe
118 | 
119 | # 
120 | # - year            integer	 year, i.e., 1995
121 | # - jday            integer	 Julian day (1 through 365 [or 366])
122 | # - month           integer	 number of the month (1-12)
123 | # - day             integer	 day of the month(1-31)
124 | # - hour            integer	 hour of the day (0-23)
125 | # - min             integer	 minute of the hour (0-59)
126 | # - dt              real	 decimal time (hour.decimalminutes, e.g., 23.5 = 2330)
127 | # - zen             real	 solar zenith angle (degrees)
128 | # - dw_solar        real	 downwelling global solar (Watts m^-2)
129 | # - uw_solar        real	 upwelling global solar (Watts m^-2)
130 | # - direct_n        real	 direct-normal solar (Watts m^-2)
131 | # - diffuse         real	 downwelling diffuse solar (Watts m^-2)
132 | # - dw_ir           real	 downwelling thermal infrared (Watts m^-2)
133 | # - dw_casetemp     real	 downwelling IR case temp. (K)
134 | # - dw_dometemp     real	 downwelling IR dome temp. (K)
135 | # - uw_ir           real	 upwelling thermal infrared (Watts m^-2)
136 | # - uw_casetemp     real	 upwelling IR case temp. (K)
137 | # - uw_dometemp     real	 upwelling IR dome temp. (K)
138 | # - uvb             real	 global UVB (milliWatts m^-2)
139 | # - par             real	 photosynthetically active radiation (Watts m^-2)
140 | # - netsolar        real	 net solar (dw_solar - uw_solar) (Watts m^-2)
141 | # - netir           real	 net infrared (dw_ir - uw_ir) (Watts m^-2)
142 | # - totalnet        real	 net radiation (netsolar+netir) (Watts m^-2)
143 | # - temp            real	 10-meter air temperature (?C)
144 | # - rh              real	 relative humidity (%)
145 | # - windspd         real	 wind speed (ms^-1)
146 | # - winddir         real	 wind direction (degrees, clockwise from north)
147 | # - pressure        real	 station pressure (mb)
148 | # 
149 | 
150 | # In[18]:
151 | 
152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar',
153 |        'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp',
154 |        'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC',
155 |        'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC',
156 |        'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC',
157 |        'pressure','pressure_QC']
158 | 
159 | 
160 | # In[19]:
161 | 
162 | path = r'./data/Fort_Peck/Exp_1_train'
163 | all_files = glob.glob(path + "/*.dat")
164 | all_files.sort()
165 | 
166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
167 |                  index_col=False,header=None, names=cols) for f in all_files],ignore_index=True)
168 | df_big_train.shape
169 | 
170 | 
171 | # In[20]:
172 | 
173 | path = r'./data/Fort_Peck/Exp_1_test'
174 | all_files = glob.glob(path + "/*.dat")
175 | all_files.sort()
176 | 
177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
178 |                  index_col=False,header=None, names=cols) for f in all_files),ignore_index=True)
179 | df_big_test.shape
180 | 
181 | 
182 | # In[21]:
183 | 
184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape
185 | 
186 | 
187 | # ### Merging Clear Sky GHI And the big dataframe
188 | 
189 | # In[26]:
190 | 
191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min'])
192 | df_train.shape
193 | 
194 | 
195 | # In[25]:
196 | 
197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min'])
198 | df_test.shape
199 | 
200 | 
201 | # In[24]:
202 | 
203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model
204 | df_test.drop(['index'], axis=1, inplace=True)
205 | 
206 | 
207 | # In[ ]:
208 | 
209 | df_train.shape
210 | 
211 | 
212 | # ### Managing missing values
213 | 
214 | # In[ ]:
215 | 
216 | # Resetting index
217 | df_train.reset_index(drop=True, inplace=True)
218 | df_test.reset_index(drop=True, inplace=True)
219 | 
220 | 
221 | # In[ ]:
222 | 
223 | # Dropping rows with two or more -9999.9 values in columns
224 | 
225 | 
226 | # In[ ]:
227 | 
228 | # Step1: Get indices of all rows with 2 or more -999
229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0]
230 | # Step2: Drop those indices
231 | df_train.drop(missing_data_indices, axis=0, inplace=True)
232 | # Checking that the rows are dropped
233 | df_train.shape
234 | 
235 | 
236 | # In[ ]:
237 | 
238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0]
239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True)
240 | df_test.shape
241 | 
242 | 
243 | # In[ ]:
244 | 
245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column
246 | 
247 | 
248 | # #### First resetting index after dropping rows in the previous part of the code
249 | 
250 | # In[ ]:
251 | 
252 | # 2nd time - Reseting Index
253 | df_train.reset_index(drop=True, inplace=True)
254 | df_test.reset_index(drop=True, inplace=True)
255 | 
256 | 
257 | # In[ ]:
258 | 
259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
260 | 
261 | 
262 | # In[ ]:
263 | 
264 | len(one_miss_train_idx)
265 | 
266 | 
267 | # In[ ]:
268 | 
269 | df_train.shape
270 | 
271 | 
272 | # In[ ]:
273 | 
274 | col_names = df_train.columns
275 | from collections import defaultdict
276 | stats = defaultdict(int)
277 | total_single_missing_values = 0
278 | for name in col_names:
279 |     col_mean = df_train[~(df_train[name] == -9999.9)][name].mean()
280 |     missing_indices = np.where((df_train[name] == -9999.9))
281 |     stats[name] = len(missing_indices[0])
282 |     df_train[name].loc[missing_indices] = col_mean
283 |     total_single_missing_values += sum(df_train[name] == -9999.9)
284 |     
285 | 
286 | 
287 | # In[ ]:
288 | 
289 | df_col_min = df_train.apply(min, axis=0)
290 | df_col_max = df_train.apply(max, axis =0)
291 | #print(df_col_min, df_col_max)
292 | 
293 | 
294 | # In[ ]:
295 | 
296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
297 | 
298 | 
299 | # In[ ]:
300 | 
301 | len(train)
302 | 
303 | 
304 | # In[ ]:
305 | 
306 | # doing the same thing on test dataset
307 | 
308 | 
309 | # In[ ]:
310 | 
311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
312 | len(one_miss_test_idx)
313 | 
314 | 
315 | # In[ ]:
316 | 
317 | col_names_test = df_test.columns
318 | from collections import defaultdict
319 | stats_test = defaultdict(int)
320 | total_single_missing_values_test = 0
321 | for name in col_names_test:
322 |     col_mean = df_test[~(df_test[name] == -9999.9)][name].mean()
323 |     missing_indices = np.where((df_test[name] == -9999.9))
324 |     stats_test[name] = len(missing_indices[0])
325 |     df_test[name].loc[missing_indices] = col_mean
326 |     total_single_missing_values_test += sum(df_test[name] == -9999.9)
327 |     
328 | 
329 | 
330 | # In[8]:
331 | 
332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
333 | 
334 | 
335 | # In[ ]:
336 | 
337 | len(test)
338 | 
339 | 
340 | # In[ ]:
341 | 
342 | df_train.shape
343 | 
344 | 
345 | # In[ ]:
346 | 
347 | df_test.shape
348 | 
349 | 
350 | # ### Exploratory Data Analysis
351 | 
352 | # In[ ]:
353 | 
354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean()
355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean()
356 | j_day = df_test['jday'].unique()
357 | 
358 | 
359 | # In[ ]:
360 | 
361 | fig = plt.figure()
362 | 
363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8])
364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8])
365 | 
366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red')
367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green')
368 | 
369 | axes1.set_xlabel('Days')
370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)')
371 | axes1.set_title('Solar Irradiance - Test Year 2009')
372 | axes1.legend(loc='best')
373 | 
374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight')
375 | 
376 | 
377 | # In[ ]:
378 | 
379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg')
380 | #plt.title('observed dw_solar vs clear sky ghi')
381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)')
382 | plt.ylabel('Clear Sky GHI (Watts/m^2)')
383 | plt.savefig('Figure 3', bbox_inches='tight')
384 | #plt.show()
385 | 
386 | 
387 | 
388 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0
389 | 
390 | # In[ ]:
391 | 
392 | df_train = df_train[df_train['ghi']!=0]
393 | df_test = df_test[df_test['ghi']!=0]
394 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi']
395 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi']
396 | 
397 | 
398 | # In[ ]:
399 | 
400 | df_train.reset_index(inplace=True)
401 | df_test.reset_index(inplace=True)
402 | 
403 | 
404 | # In[ ]:
405 | 
406 | print("test Kt max: "+str(df_test['Kt'].max()))
407 | print("test Kt min: "+str(df_test['Kt'].min()))
408 | print("test Kt mean: "+str(df_test['Kt'].mean()))
409 | print("\n")
410 | print("train Kt max: "+str(df_train['Kt'].max()))
411 | print("train Kt min: "+str(df_train['Kt'].min()))
412 | print("train Kt mean: "+str(df_train['Kt'].mean()))
413 | 
414 | 
415 | # In[ ]:
416 | 
417 | plt.plot(df_train['Kt'])
418 | #plt.show()
419 | 
420 | 
421 | # In[ ]:
422 | 
423 | plt.plot(df_test['Kt'])
424 | #plt.show()
425 | 
426 | 
427 | # In[ ]:
428 | 
429 | df_train= df_train[df_train['Kt']< 5000]
430 | df_train= df_train[df_train['Kt']> -1000]
431 | df_test= df_test[df_test['Kt']< 5000]
432 | df_test= df_test[df_test['Kt']> -1000]
433 | 
434 | 
435 | # #### Group the data (train dataframe)
436 | 
437 | # In[ ]:
438 | 
439 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean()
440 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean()
441 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean()
442 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean()
443 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean()
444 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean()
445 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean()
446 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean()
447 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean()
448 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean()
449 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean()
450 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean()
451 | par = df_train.groupby(['year','month','day','hour'])['par'].mean()
452 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean()
453 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean()
454 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean()
455 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean()
456 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean()
457 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean()
458 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean()
459 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean()
460 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean()
461 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean()
462 | 
463 | 
464 | # In[ ]:
465 | 
466 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp,
467 |                     uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1)
468 | 
469 | 
470 | # In[ ]:
471 | 
472 | df_new_train.head()
473 | 
474 | 
475 | # #### Groupdata - test dataframe
476 | 
477 | # In[ ]:
478 | 
479 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean()
480 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean()
481 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean()
482 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean()
483 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean()
484 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean()
485 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean()
486 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean()
487 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean()
488 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean()
489 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean()
490 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean()
491 | test_par = df_test.groupby(['month','day','hour'])['par'].mean()
492 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean()
493 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean()
494 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean()
495 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean()
496 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean()
497 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean()
498 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean()
499 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean()
500 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean()
501 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean()
502 | 
503 | 
504 | # In[ ]:
505 | 
506 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir,
507 |                          test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp,
508 |                     test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh,
509 |                          test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1)
510 | 
511 | 
512 | # In[ ]:
513 | 
514 | df_new_test.loc[2].xs(17,level='day')
515 | 
516 | 
517 | # ### Shifting Kt values to make 1 hour ahead forecast
518 | 
519 | # #### Train dataset
520 | 
521 | # In[ ]:
522 | 
523 | levels_index= []
524 | for m in df_new_train.index.levels:
525 |     levels_index.append(m)
526 | 
527 | 
528 | # In[ ]:
529 | 
530 | for i in levels_index[0]:
531 |     for j in levels_index[1]:
532 |         df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-1)
533 | 
534 | 
535 | # In[ ]:
536 | 
537 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())]
538 | 
539 | 
540 | # #### Test dataset
541 | 
542 | # In[ ]:
543 | 
544 | levels_index2= []
545 | for m in df_new_test.index.levels:
546 |     levels_index2.append(m)
547 | 
548 | 
549 | # In[ ]:
550 | 
551 | for i in levels_index2[0]:
552 |     for j in levels_index2[1]:
553 |         df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-1)
554 | 
555 | 
556 | # In[ ]:
557 | 
558 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())]
559 | 
560 | 
561 | # In[ ]:
562 | 
563 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()]
564 | 
565 | 
566 | # ### Normalize train and test dataframe
567 | 
568 | # In[ ]:
569 | 
570 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min())
571 | test_norm =  (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min())
572 | 
573 | 
574 | # In[ ]:
575 | 
576 | train_norm.reset_index(inplace=True,drop=True)
577 | test_norm.reset_index(inplace=True,drop=True)
578 | 
579 | 
580 | # ### Making train and test sets with train_norm and test_norm
581 | 
582 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize
583 | 
584 | # In[89]:
585 | 
586 | from fractions import gcd
587 | gcd(train_norm.shape[0],test_norm.shape[0])
588 | 
589 | 
590 | # In[ ]:
591 | 
592 | import math
593 | def roundup(x):
594 |     return int(math.ceil(x / 100.0)) * 100
595 | 
596 | 
597 | # In[ ]:
598 | 
599 | train_lim = roundup(train_norm.shape[0])
600 | test_lim = roundup(test_norm.shape[0])
601 | 
602 | train_random = train_norm.sample(train_lim-train_norm.shape[0])
603 | test_random = test_norm.sample(test_lim-test_norm.shape[0])
604 | 
605 | train_norm = train_norm.append(train_random)
606 | test_norm = test_norm.append(test_random)
607 | 
608 | 
609 | # In[ ]:
610 | 
611 | X1 = train_norm.drop('Kt',axis=1)
612 | y1 = train_norm['Kt']
613 | 
614 | X2 = test_norm.drop('Kt',axis=1)
615 | y2 = test_norm['Kt']
616 | 
617 | 
618 | # In[ ]:
619 | 
620 | print("X1_train shape is {}".format(X1.shape))
621 | print("y1_train shape is {}".format(y1.shape))
622 | print("X2_test shape is {}".format(X2.shape))
623 | print("y2_test shape is {}".format(y2.shape))
624 | 
625 | 
626 | # In[ ]:
627 | 
628 | X_train = np.array(X1)
629 | y_train  = np.array(y1)
630 | X_test = np.array(X2)
631 | y_test = np.array(y2)
632 | 
633 | 
634 | # ### start of RNN
635 | 
636 | # In[117]:
637 | 
638 | import torch
639 | import torch.nn as nn
640 | import torchvision.transforms as transforms
641 | import torchvision.datasets as dsets
642 | from torch.autograd import Variable
643 | 
644 | 
645 | # In[118]:
646 | 
647 | class RNNModel(nn.Module):
648 |     def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
649 |         super(RNNModel, self).__init__()
650 |         #Hidden Dimension
651 |         self.hidden_dim = hidden_dim
652 |         
653 |         # Number of hidden layers
654 |         self.layer_dim = layer_dim
655 |         
656 |         #Building the RNN
657 |         self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
658 |         
659 |         # Readout layer
660 |         self.fc = nn.Linear(hidden_dim, output_dim)
661 |         
662 |     def forward(self, x):
663 |         # Initializing the hidden state with zeros
664 |         # (layer_dim, batch_size, hidden_dim)
665 |         h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
666 |         
667 |         #One time step (the last one perhaps?)
668 |         out, hn = self.rnn(x, h0)
669 |         
670 |         # Indexing hidden state of the last time step
671 |         # out.size() --> ??
672 |         #out[:,-1,:] --> is it going to be 100,100
673 |         out = self.fc(out[:,-1,:])
674 |         # out.size() --> 100,1
675 |         return out
676 |         
677 | 
678 | 
679 | # In[119]:
680 | 
681 | # Instantiating Model Class
682 | input_dim = 22
683 | hidden_dim = 15
684 | layer_dim = 1
685 | output_dim = 1
686 | batch_size = 100
687 | 
688 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)
689 | 
690 | # Instantiating Loss Class
691 | criterion = nn.MSELoss()
692 | 
693 | # Instantiate Optimizer Class
694 | learning_rate = 0.001
695 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
696 | 
697 | # converting numpy array to torch tensor
698 | X_train = torch.from_numpy(X_train)
699 | y_train = torch.from_numpy(y_train)
700 | X_test = torch.from_numpy(X_test)
701 | y_test = torch.from_numpy(y_test)
702 | 
703 | # initializing lists to store losses over epochs:
704 | train_loss = []
705 | test_loss = []
706 | train_iter = []
707 | test_iter = []
708 | 
709 | 
710 | # In[ ]:
711 | 
712 | # Training the model
713 | seq_dim = 1
714 | 
715 | n_iter =0
716 | num_samples = len(X_train)
717 | test_samples = len(X_test)
718 | batch_size = 100
719 | num_epochs = 1000
720 | feat_dim = X_train.shape[1]
721 | 
722 | X_train = X_train.type(torch.FloatTensor)
723 | y_train = y_train.type(torch.FloatTensor)
724 | X_test = X_test.type(torch.FloatTensor)
725 | y_test = y_test.type(torch.FloatTensor)
726 | 
727 | for epoch in range(num_epochs):
728 |     for i in range(0, int(num_samples/batch_size -1)):
729 |         
730 |         
731 |         features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
732 |         Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size])
733 |         
734 |         #print("Kt_value={}".format(Kt_value))
735 |         
736 |         optimizer.zero_grad()
737 |         
738 |         outputs = model(features)
739 |         #print("outputs ={}".format(outputs))
740 |         
741 |         loss = criterion(outputs, Kt_value)
742 |         
743 |         train_loss.append(loss.data[0])
744 |         train_iter.append(n_iter)
745 | 
746 |         #print("loss = {}".format(loss))
747 |         loss.backward()
748 |         
749 |         optimizer.step()
750 |         
751 |         n_iter += 1  
752 |             
753 |         if n_iter%100 == 0:
754 |             for i in range(0,int(test_samples/batch_size -1)):
755 |                 features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
756 |                 Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size])
757 |                 
758 |                 outputs = model(features)
759 |                 
760 |                 mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples)
761 |                 
762 |                 test_iter.append(n_iter)
763 |                 test_loss.append(mse)
764 |                 
765 |             print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse))
766 |                
767 | 
768 | 
769 | # In[145]:
770 | 
771 | print(len(test_loss))
772 | #plt.plot(test_loss)
773 | plt.plot(train_loss,'-')
774 | #plt.ylim([0.000,0.99])
775 | 
776 | 
777 | # In[146]:
778 | 
779 | plt.plot(test_loss,'r')
780 | 
781 | 
782 | # #### Demornamization
783 | 
784 | # In[161]:
785 | 
786 | rmse = np.sqrt(mse)
787 | 
788 | 
789 | # In[243]:
790 | 
791 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean()
792 | 
793 | 
794 | # In[244]:
795 | 
796 | print("rmse_denorm=",rmse_denorm)
797 | 
798 | 
799 | # In[259]:
800 | 
801 | print(df_new_test['Kt'].describe())
802 | 
803 | 
804 | # ### Saving train and test losses to a csv
805 | 
806 | # In[ ]:
807 | 
808 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss}, columns=['Train Loss'])
809 | df_trainLoss.to_csv('RNN Paper Results/Exp1_FortPeck_TrainLoss.csv')
810 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss}, columns=['Test Loss'])
811 | df_testLoss.to_csv('RNN Paper Results/Exp1_FortPeck_TestLoss.csv')
812 | 
813 | 


--------------------------------------------------------------------------------
/fixed-time-horizon-prediction/Exp_1.2/RNN_Sioux_Falls.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # coding: utf-8
  3 | 
  4 | # In[25]:
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | import datetime
  9 | import glob
 10 | import os.path
 11 | from pandas.compat import StringIO
 12 | 
 13 | 
 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI
 15 | 
 16 | # In[26]:
 17 | 
 18 | import itertools
 19 | import matplotlib.pyplot as plt
 20 | import pandas as pd
 21 | import seaborn as sns
 22 | 
 23 | 
 24 | # In[27]:
 25 | 
 26 | #get_ipython().magic('matplotlib inline')
 27 | sns.set_color_codes()
 28 | 
 29 | 
 30 | # In[28]:
 31 | 
 32 | import pvlib
 33 | from pvlib import clearsky, atmosphere
 34 | from pvlib.location import Location
 35 | 
 36 | 
 37 | # In[29]:
 38 | 
 39 | sif = Location(43.544,-96.73, 'US/Central', 448.086, 'Sioux Falls')
 40 | 
 41 | 
 42 | # In[30]:
 43 | 
 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min',
 45 |                         tz=sif.tz)   # 12 months of 2009 - For testing
 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min',
 47 |                         tz=sif.tz)   # 24 months of 2010 and 2011 - For training
 48 | 
 49 | 
 50 | # In[32]:
 51 | 
 52 | cs_2009 = sif.get_clearsky(times2009) 
 53 | cs_2010and2011 = sif.get_clearsky(times2010and2011) # ineichen with climatology table by default
 54 | #cs_2011 = bvl.get_clearsky(times2011) 
 55 | 
 56 | 
 57 | # In[33]:
 58 | 
 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 62 | 
 63 | 
 64 | # In[34]:
 65 | 
 66 | cs_2009.reset_index(inplace=True)
 67 | cs_2010and2011.reset_index(inplace=True)
 68 | #cs_2011.reset_index(inplace=True)
 69 | 
 70 | 
 71 | # In[35]:
 72 | 
 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime())
 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year)
 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month)
 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day)
 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour)
 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute)
 79 | 
 80 | 
 81 | # In[36]:
 82 | 
 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime())
 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year)
 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month)
 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day)
 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour)
 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute)
 89 | 
 90 | 
 91 | # In[37]:
 92 | 
 93 | print(cs_2009.shape)
 94 | print(cs_2010and2011.shape)
 95 | #print(cs_2011.shape)
 96 | 
 97 | 
 98 | # In[38]:
 99 | 
100 | cs_2009.drop(cs_2009.index[-1], inplace=True)
101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True)
102 | #cs_2011.drop(cs_2011.index[-1], inplace=True)
103 | 
104 | 
105 | # In[39]:
106 | 
107 | print(cs_2009.shape)
108 | print(cs_2010and2011.shape)
109 | #print(cs_2011.shape)
110 | 
111 | 
112 | # In[40]:
113 | 
114 | cs_2010and2011.head()
115 | 
116 | 
117 | # ### Import files from each year in a separate dataframe
118 | 
119 | # 
120 | # - year            integer	 year, i.e., 1995
121 | # - jday            integer	 Julian day (1 through 365 [or 366])
122 | # - month           integer	 number of the month (1-12)
123 | # - day             integer	 day of the month(1-31)
124 | # - hour            integer	 hour of the day (0-23)
125 | # - min             integer	 minute of the hour (0-59)
126 | # - dt              real	 decimal time (hour.decimalminutes, e.g., 23.5 = 2330)
127 | # - zen             real	 solar zenith angle (degrees)
128 | # - dw_solar        real	 downwelling global solar (Watts m^-2)
129 | # - uw_solar        real	 upwelling global solar (Watts m^-2)
130 | # - direct_n        real	 direct-normal solar (Watts m^-2)
131 | # - diffuse         real	 downwelling diffuse solar (Watts m^-2)
132 | # - dw_ir           real	 downwelling thermal infrared (Watts m^-2)
133 | # - dw_casetemp     real	 downwelling IR case temp. (K)
134 | # - dw_dometemp     real	 downwelling IR dome temp. (K)
135 | # - uw_ir           real	 upwelling thermal infrared (Watts m^-2)
136 | # - uw_casetemp     real	 upwelling IR case temp. (K)
137 | # - uw_dometemp     real	 upwelling IR dome temp. (K)
138 | # - uvb             real	 global UVB (milliWatts m^-2)
139 | # - par             real	 photosynthetically active radiation (Watts m^-2)
140 | # - netsolar        real	 net solar (dw_solar - uw_solar) (Watts m^-2)
141 | # - netir           real	 net infrared (dw_ir - uw_ir) (Watts m^-2)
142 | # - totalnet        real	 net radiation (netsolar+netir) (Watts m^-2)
143 | # - temp            real	 10-meter air temperature (?C)
144 | # - rh              real	 relative humidity (%)
145 | # - windspd         real	 wind speed (ms^-1)
146 | # - winddir         real	 wind direction (degrees, clockwise from north)
147 | # - pressure        real	 station pressure (mb)
148 | # 
149 | 
150 | # In[41]:
151 | 
152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar',
153 |        'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp',
154 |        'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC',
155 |        'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC',
156 |        'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC',
157 |        'pressure','pressure_QC']
158 | 
159 | 
160 | # In[42]:
161 | 
162 | path = r'./data/Sioux_Falls/Exp_1_train'
163 | all_files = glob.glob(path + "/*.dat")
164 | all_files.sort()
165 | 
166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
167 |                  index_col=False,header=None, names=cols) for f in all_files],ignore_index=True)
168 | df_big_train.shape
169 | 
170 | 
171 | # In[43]:
172 | 
173 | path = r'./data/Sioux_Falls/Exp_1_test'
174 | all_files = glob.glob(path + "/*.dat")
175 | all_files.sort()
176 | 
177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
178 |                  index_col=False,header=None, names=cols) for f in all_files),ignore_index=True)
179 | df_big_test.shape
180 | 
181 | 
182 | # In[44]:
183 | 
184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape
185 | 
186 | 
187 | # ### Merging Clear Sky GHI And the big dataframe
188 | 
189 | # In[45]:
190 | 
191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min'])
192 | df_train.shape
193 | 
194 | 
195 | # In[46]:
196 | 
197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min'])
198 | df_test.shape
199 | 
200 | 
201 | # In[47]:
202 | 
203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model
204 | df_test.drop(['index'], axis=1, inplace=True)
205 | 
206 | 
207 | # In[48]:
208 | 
209 | df_train.shape
210 | 
211 | 
212 | # ### Managing missing values
213 | 
214 | # In[49]:
215 | 
216 | # Resetting index
217 | df_train.reset_index(drop=True, inplace=True)
218 | df_test.reset_index(drop=True, inplace=True)
219 | 
220 | 
221 | # In[50]:
222 | 
223 | # Dropping rows with two or more -9999.9 values in columns
224 | 
225 | 
226 | # In[51]:
227 | 
228 | # Step1: Get indices of all rows with 2 or more -999
229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0]
230 | # Step2: Drop those indices
231 | df_train.drop(missing_data_indices, axis=0, inplace=True)
232 | # Checking that the rows are dropped
233 | df_train.shape
234 | 
235 | 
236 | # In[52]:
237 | 
238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0]
239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True)
240 | df_test.shape
241 | 
242 | 
243 | # In[53]:
244 | 
245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column
246 | 
247 | 
248 | # #### First resetting index after dropping rows in the previous part of the code
249 | 
250 | # In[54]:
251 | 
252 | # 2nd time - Reseting Index
253 | df_train.reset_index(drop=True, inplace=True)
254 | df_test.reset_index(drop=True, inplace=True)
255 | 
256 | 
257 | # In[55]:
258 | 
259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
260 | 
261 | 
262 | # In[56]:
263 | 
264 | len(one_miss_train_idx)
265 | 
266 | 
267 | # In[57]:
268 | 
269 | df_train.shape
270 | 
271 | 
272 | # In[58]:
273 | 
274 | col_names = df_train.columns
275 | from collections import defaultdict
276 | stats = defaultdict(int)
277 | total_single_missing_values = 0
278 | for name in col_names:
279 |     col_mean = df_train[~(df_train[name] == -9999.9)][name].mean()
280 |     missing_indices = np.where((df_train[name] == -9999.9))
281 |     stats[name] = len(missing_indices[0])
282 |     df_train[name].loc[missing_indices] = col_mean
283 |     total_single_missing_values += sum(df_train[name] == -9999.9)
284 |     
285 | 
286 | 
287 | # In[59]:
288 | 
289 | df_col_min = df_train.apply(min, axis=0)
290 | df_col_max = df_train.apply(max, axis =0)
291 | #print(df_col_min, df_col_max)
292 | 
293 | 
294 | # In[60]:
295 | 
296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
297 | 
298 | 
299 | # In[61]:
300 | 
301 | len(train)
302 | 
303 | 
304 | # In[62]:
305 | 
306 | # doing the same thing on test dataset
307 | 
308 | 
309 | # In[63]:
310 | 
311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
312 | len(one_miss_test_idx)
313 | 
314 | 
315 | # In[64]:
316 | 
317 | col_names_test = df_test.columns
318 | from collections import defaultdict
319 | stats_test = defaultdict(int)
320 | total_single_missing_values_test = 0
321 | for name in col_names_test:
322 |     col_mean = df_test[~(df_test[name] == -9999.9)][name].mean()
323 |     missing_indices = np.where((df_test[name] == -9999.9))
324 |     stats_test[name] = len(missing_indices[0])
325 |     df_test[name].loc[missing_indices] = col_mean
326 |     total_single_missing_values_test += sum(df_test[name] == -9999.9)
327 |     
328 | 
329 | 
330 | # In[65]:
331 | 
332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
333 | 
334 | 
335 | # In[66]:
336 | 
337 | len(test)
338 | 
339 | 
340 | # In[67]:
341 | 
342 | df_train.shape
343 | 
344 | 
345 | # In[68]:
346 | 
347 | df_test.shape
348 | 
349 | 
350 | # ### Exploratory Data Analysis
351 | 
352 | # In[69]:
353 | 
354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean()
355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean()
356 | j_day = df_test['jday'].unique()
357 | 
358 | 
359 | # In[70]:
360 | 
361 | fig = plt.figure()
362 | 
363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8])
364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8])
365 | 
366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red')
367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green')
368 | 
369 | axes1.set_xlabel('Days')
370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)')
371 | axes1.set_title('Solar Irradiance - Test Year 2009')
372 | axes1.legend(loc='best')
373 | 
374 | fig.savefig('Figure 2.jpg', bbox_inches = 'tight')
375 | 
376 | 
377 | # In[71]:
378 | 
379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg')
380 | #plt.title('observed dw_solar vs clear sky ghi')
381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)')
382 | plt.ylabel('Clear Sky GHI (Watts/m^2)')
383 | plt.savefig('Figure 3', bbox_inches='tight')
384 | 
385 | 
386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0
387 | 
388 | # In[72]:
389 | 
390 | df_train = df_train[df_train['ghi']!=0]
391 | df_test = df_test[df_test['ghi']!=0]
392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi']
393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi']
394 | 
395 | 
396 | # In[ ]:
397 | 
398 | df_train.reset_index(inplace=True)
399 | df_test.reset_index(inplace=True)
400 | 
401 | 
402 | # In[ ]:
403 | 
404 | print("test Kt max: "+str(df_test['Kt'].max()))
405 | print("test Kt min: "+str(df_test['Kt'].min()))
406 | print("test Kt mean: "+str(df_test['Kt'].mean()))
407 | print("\n")
408 | print("train Kt max: "+str(df_train['Kt'].max()))
409 | print("train Kt min: "+str(df_train['Kt'].min()))
410 | print("train Kt mean: "+str(df_train['Kt'].mean()))
411 | 
412 | 
413 | # In[ ]:
414 | 
415 | plt.plot(df_train['Kt'])
416 | 
417 | 
418 | # In[ ]:
419 | 
420 | plt.plot(df_test['Kt'])
421 | 
422 | 
423 | # In[ ]:
424 | 
425 | df_train= df_train[df_train['Kt']< 5000]
426 | df_train= df_train[df_train['Kt']> -1000]
427 | df_test= df_test[df_test['Kt']< 5000]
428 | df_test= df_test[df_test['Kt']> -1000]
429 | 
430 | 
431 | # #### Group the data (train dataframe)
432 | 
433 | # In[ ]:
434 | 
435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean()
436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean()
437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean()
438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean()
439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean()
440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean()
441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean()
442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean()
443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean()
444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean()
445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean()
446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean()
447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean()
448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean()
449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean()
450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean()
451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean()
452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean()
453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean()
454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean()
455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean()
456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean()
457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean()
458 | 
459 | 
460 | # In[ ]:
461 | 
462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp,
463 |                     uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1)
464 | 
465 | 
466 | # In[ ]:
467 | 
468 | df_new_train.head()
469 | 
470 | 
471 | # #### Groupdata - test dataframe
472 | 
473 | # In[ ]:
474 | 
475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean()
476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean()
477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean()
478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean()
479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean()
480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean()
481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean()
482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean()
483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean()
484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean()
485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean()
486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean()
487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean()
488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean()
489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean()
490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean()
491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean()
492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean()
493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean()
494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean()
495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean()
496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean()
497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean()
498 | 
499 | 
500 | # In[ ]:
501 | 
502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir,
503 |                          test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp,
504 |                     test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh,
505 |                          test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1)
506 | 
507 | 
508 | # In[ ]:
509 | 
510 | df_new_test.loc[2].xs(17,level='day')
511 | 
512 | 
513 | # ### Shifting Kt values to make 1 hour ahead forecast
514 | 
515 | # #### Train dataset
516 | 
517 | # In[ ]:
518 | 
519 | levels_index= []
520 | for m in df_new_train.index.levels:
521 |     levels_index.append(m)
522 | 
523 | 
524 | # In[ ]:
525 | 
526 | for i in levels_index[0]:
527 |     for j in levels_index[1]:
528 |         df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-1)
529 | 
530 | 
531 | # In[ ]:
532 | 
533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())]
534 | 
535 | 
536 | # #### Test dataset
537 | 
538 | # In[ ]:
539 | 
540 | levels_index2= []
541 | for m in df_new_test.index.levels:
542 |     levels_index2.append(m)
543 | 
544 | 
545 | # In[ ]:
546 | 
547 | for i in levels_index2[0]:
548 |     for j in levels_index2[1]:
549 |         df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-1)
550 | 
551 | 
552 | # In[ ]:
553 | 
554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())]
555 | 
556 | 
557 | # In[ ]:
558 | 
559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()]
560 | 
561 | 
562 | # ### Normalize train and test dataframe
563 | 
564 | # In[ ]:
565 | 
566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min())
567 | test_norm =  (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min())
568 | 
569 | 
570 | # In[ ]:
571 | 
572 | train_norm.reset_index(inplace=True,drop=True)
573 | test_norm.reset_index(inplace=True,drop=True)
574 | 
575 | 
576 | # ### Making train and test sets with train_norm and test_norm
577 | 
578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize
579 | 
580 | # In[ ]:
581 | 
582 | from fractions import gcd
583 | gcd(train_norm.shape[0],test_norm.shape[0])
584 | 
585 | 
586 | # In[ ]:
587 | 
588 | import math
589 | def roundup(x):
590 |     return int(math.ceil(x / 100.0)) * 100
591 | 
592 | 
593 | # In[ ]:
594 | 
595 | train_lim = roundup(train_norm.shape[0])
596 | test_lim = roundup(test_norm.shape[0])
597 | 
598 | train_random = train_norm.sample(train_lim-train_norm.shape[0])
599 | test_random = test_norm.sample(test_lim-test_norm.shape[0])
600 | 
601 | train_norm = train_norm.append(train_random)
602 | test_norm = test_norm.append(test_random)
603 | 
604 | 
605 | # In[ ]:
606 | 
607 | X1 = train_norm.drop('Kt',axis=1)
608 | y1 = train_norm['Kt']
609 | 
610 | X2 = test_norm.drop('Kt',axis=1)
611 | y2 = test_norm['Kt']
612 | 
613 | 
614 | # In[ ]:
615 | 
616 | print("X1_train shape is {}".format(X1.shape))
617 | print("y1_train shape is {}".format(y1.shape))
618 | print("X2_test shape is {}".format(X2.shape))
619 | print("y2_test shape is {}".format(y2.shape))
620 | 
621 | 
622 | # In[ ]:
623 | 
624 | X_train = np.array(X1)
625 | y_train  = np.array(y1)
626 | X_test = np.array(X2)
627 | y_test = np.array(y2)
628 | 
629 | 
630 | # ### start of RNN
631 | 
632 | # In[ ]:
633 | 
634 | import torch
635 | import torch.nn as nn
636 | import torchvision.transforms as transforms
637 | import torchvision.datasets as dsets
638 | from torch.autograd import Variable
639 | 
640 | 
641 | # In[ ]:
642 | 
643 | class RNNModel(nn.Module):
644 |     def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
645 |         super(RNNModel, self).__init__()
646 |         #Hidden Dimension
647 |         self.hidden_dim = hidden_dim
648 |         
649 |         # Number of hidden layers
650 |         self.layer_dim = layer_dim
651 |         
652 |         #Building the RNN
653 |         self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
654 |         
655 |         # Readout layer
656 |         self.fc = nn.Linear(hidden_dim, output_dim)
657 |         
658 |     def forward(self, x):
659 |         # Initializing the hidden state with zeros
660 |         # (layer_dim, batch_size, hidden_dim)
661 |         h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
662 |         
663 |         #One time step (the last one perhaps?)
664 |         out, hn = self.rnn(x, h0)
665 |         
666 |         # Indexing hidden state of the last time step
667 |         # out.size() --> ??
668 |         #out[:,-1,:] --> is it going to be 100,100
669 |         out = self.fc(out[:,-1,:])
670 |         # out.size() --> 100,1
671 |         return out
672 |         
673 | 
674 | 
675 | # In[ ]:
676 | 
677 | # Instantiating Model Class
678 | input_dim = 22
679 | hidden_dim = 15
680 | layer_dim = 1
681 | output_dim = 1
682 | batch_size = 100
683 | 
684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)
685 | 
686 | # Instantiating Loss Class
687 | criterion = nn.MSELoss()
688 | 
689 | # Instantiate Optimizer Class
690 | learning_rate = 0.001
691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
692 | 
693 | # converting numpy array to torch tensor
694 | X_train = torch.from_numpy(X_train)
695 | y_train = torch.from_numpy(y_train)
696 | X_test = torch.from_numpy(X_test)
697 | y_test = torch.from_numpy(y_test)
698 | 
699 | # initializing lists to store losses over epochs:
700 | train_loss = []
701 | test_loss = []
702 | train_iter = []
703 | test_iter = []
704 | 
705 | 
706 | # In[ ]:
707 | 
708 | # Training the model
709 | seq_dim = 1
710 | 
711 | n_iter =0
712 | num_samples = len(X_train)
713 | test_samples = len(X_test)
714 | batch_size = 100
715 | num_epochs = 1000
716 | feat_dim = X_train.shape[1]
717 | 
718 | X_train = X_train.type(torch.FloatTensor)
719 | y_train = y_train.type(torch.FloatTensor)
720 | X_test = X_test.type(torch.FloatTensor)
721 | y_test = y_test.type(torch.FloatTensor)
722 | 
723 | for epoch in range(num_epochs):
724 |     for i in range(0, int(num_samples/batch_size -1)):
725 |         
726 |         
727 |         features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
728 |         Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size])
729 |         
730 |         #print("Kt_value={}".format(Kt_value))
731 |         
732 |         optimizer.zero_grad()
733 |         
734 |         outputs = model(features)
735 |         #print("outputs ={}".format(outputs))
736 |         
737 |         loss = criterion(outputs, Kt_value)
738 |         
739 |         train_loss.append(loss.data[0])
740 |         train_iter.append(n_iter)
741 | 
742 |         #print("loss = {}".format(loss))
743 |         loss.backward()
744 |         
745 |         optimizer.step()
746 |         
747 |         n_iter += 1  
748 |             
749 |         if n_iter%100 == 0:
750 |             for i in range(0,int(test_samples/batch_size -1)):
751 |                 features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
752 |                 Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size])
753 |                 
754 |                 outputs = model(features)
755 |                 
756 |                 mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples)
757 |                 
758 |                 test_iter.append(n_iter)
759 |                 test_loss.append(mse)
760 |                 
761 |             print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse))
762 |                
763 | 
764 | 
765 | # In[ ]:
766 | 
767 | print(len(test_loss))
768 | #plt.plot(test_loss)
769 | plt.plot(train_loss,'-')
770 | #plt.ylim([0.000,0.99])
771 | 
772 | 
773 | # In[ ]:
774 | 
775 | plt.plot(test_loss,'r')
776 | 
777 | 
778 | # #### Demornamization
779 | 
780 | # In[ ]:
781 | 
782 | rmse = np.sqrt(mse)
783 | 
784 | 
785 | # In[ ]:
786 | 
787 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean()
788 | 
789 | 
790 | # In[ ]:
791 | 
792 | print("rmse_denorm=",rmse_denorm)
793 | 
794 | 
795 | # In[ ]:
796 | 
797 | print(df_new_test['Kt'].describe())
798 | 
799 | 
800 | # ### Saving train and test losses to a csv
801 | 
802 | # In[ ]:
803 | 
804 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss, 'iteration':train_iter}, columns=['Train Loss','iteration'])
805 | df_trainLoss.to_csv('RNN Paper Results/Exp1_SiouxFalls_TrainLoss.csv')
806 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss, 'iteration':test_iter}, columns=['Test Loss','iteration'])
807 | df_testLoss.to_csv('RNN Paper Results/Exp1_SiouxFalls_TestLoss.csv')
808 | 
809 | 
810 | # In[ ]:
811 | 
812 | 
813 | 
814 | 


--------------------------------------------------------------------------------
/fixed-time-horizon-prediction/Exp_1.2/Sioux_Falls_2hour.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # coding: utf-8
  3 | 
  4 | # In[25]:
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | import datetime
  9 | import glob
 10 | import os.path
 11 | from pandas.compat import StringIO
 12 | 
 13 | 
 14 | # ### NREL Bird Model implementation: for obtaining clear sky GHI
 15 | 
 16 | # In[26]:
 17 | 
 18 | import itertools
 19 | import matplotlib.pyplot as plt
 20 | import pandas as pd
 21 | import seaborn as sns
 22 | 
 23 | 
 24 | # In[27]:
 25 | 
 26 | #get_ipython().magic('matplotlib inline')
 27 | sns.set_color_codes()
 28 | 
 29 | 
 30 | # In[28]:
 31 | 
 32 | import pvlib
 33 | from pvlib import clearsky, atmosphere
 34 | from pvlib.location import Location
 35 | 
 36 | 
 37 | # In[29]:
 38 | 
 39 | sif = Location(43.544,-96.73, 'US/Central', 448.086, 'Sioux Falls')
 40 | 
 41 | 
 42 | # In[30]:
 43 | 
 44 | times2009 = pd.DatetimeIndex(start='2009-01-01', end='2010-01-01', freq='1min',
 45 |                         tz=sif.tz)   # 12 months of 2009 - For testing
 46 | times2010and2011 = pd.DatetimeIndex(start='2010-01-01', end='2012-01-01', freq='1min',
 47 |                         tz=sif.tz)   # 24 months of 2010 and 2011 - For training
 48 | 
 49 | 
 50 | # In[32]:
 51 | 
 52 | cs_2009 = sif.get_clearsky(times2009) 
 53 | cs_2010and2011 = sif.get_clearsky(times2010and2011) # ineichen with climatology table by default
 54 | #cs_2011 = bvl.get_clearsky(times2011) 
 55 | 
 56 | 
 57 | # In[33]:
 58 | 
 59 | cs_2009.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 60 | cs_2010and2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 61 | #cs_2011.drop(['dni','dhi'],axis=1, inplace=True) #updating the same dataframe by dropping two columns
 62 | 
 63 | 
 64 | # In[34]:
 65 | 
 66 | cs_2009.reset_index(inplace=True)
 67 | cs_2010and2011.reset_index(inplace=True)
 68 | #cs_2011.reset_index(inplace=True)
 69 | 
 70 | 
 71 | # In[35]:
 72 | 
 73 | cs_2009['index']=cs_2009['index'].apply(lambda x:x.to_datetime())
 74 | cs_2009['year'] = cs_2009['index'].apply(lambda x:x.year)
 75 | cs_2009['month'] = cs_2009['index'].apply(lambda x:x.month)
 76 | cs_2009['day'] = cs_2009['index'].apply(lambda x:x.day)
 77 | cs_2009['hour'] = cs_2009['index'].apply(lambda x:x.hour)
 78 | cs_2009['min'] = cs_2009['index'].apply(lambda x:x.minute)
 79 | 
 80 | 
 81 | # In[36]:
 82 | 
 83 | cs_2010and2011['index']=cs_2010and2011['index'].apply(lambda x:x.to_datetime())
 84 | cs_2010and2011['year'] = cs_2010and2011['index'].apply(lambda x:x.year)
 85 | cs_2010and2011['month'] = cs_2010and2011['index'].apply(lambda x:x.month)
 86 | cs_2010and2011['day'] = cs_2010and2011['index'].apply(lambda x:x.day)
 87 | cs_2010and2011['hour'] = cs_2010and2011['index'].apply(lambda x:x.hour)
 88 | cs_2010and2011['min'] = cs_2010and2011['index'].apply(lambda x:x.minute)
 89 | 
 90 | 
 91 | # In[37]:
 92 | 
 93 | print(cs_2009.shape)
 94 | print(cs_2010and2011.shape)
 95 | #print(cs_2011.shape)
 96 | 
 97 | 
 98 | # In[38]:
 99 | 
100 | cs_2009.drop(cs_2009.index[-1], inplace=True)
101 | cs_2010and2011.drop(cs_2010and2011.index[-1], inplace=True)
102 | #cs_2011.drop(cs_2011.index[-1], inplace=True)
103 | 
104 | 
105 | # In[39]:
106 | 
107 | print(cs_2009.shape)
108 | print(cs_2010and2011.shape)
109 | #print(cs_2011.shape)
110 | 
111 | 
112 | # In[40]:
113 | 
114 | cs_2010and2011.head()
115 | 
116 | 
117 | # ### Import files from each year in a separate dataframe
118 | 
119 | # 
120 | # - year            integer	 year, i.e., 1995
121 | # - jday            integer	 Julian day (1 through 365 [or 366])
122 | # - month           integer	 number of the month (1-12)
123 | # - day             integer	 day of the month(1-31)
124 | # - hour            integer	 hour of the day (0-23)
125 | # - min             integer	 minute of the hour (0-59)
126 | # - dt              real	 decimal time (hour.decimalminutes, e.g., 23.5 = 2330)
127 | # - zen             real	 solar zenith angle (degrees)
128 | # - dw_solar        real	 downwelling global solar (Watts m^-2)
129 | # - uw_solar        real	 upwelling global solar (Watts m^-2)
130 | # - direct_n        real	 direct-normal solar (Watts m^-2)
131 | # - diffuse         real	 downwelling diffuse solar (Watts m^-2)
132 | # - dw_ir           real	 downwelling thermal infrared (Watts m^-2)
133 | # - dw_casetemp     real	 downwelling IR case temp. (K)
134 | # - dw_dometemp     real	 downwelling IR dome temp. (K)
135 | # - uw_ir           real	 upwelling thermal infrared (Watts m^-2)
136 | # - uw_casetemp     real	 upwelling IR case temp. (K)
137 | # - uw_dometemp     real	 upwelling IR dome temp. (K)
138 | # - uvb             real	 global UVB (milliWatts m^-2)
139 | # - par             real	 photosynthetically active radiation (Watts m^-2)
140 | # - netsolar        real	 net solar (dw_solar - uw_solar) (Watts m^-2)
141 | # - netir           real	 net infrared (dw_ir - uw_ir) (Watts m^-2)
142 | # - totalnet        real	 net radiation (netsolar+netir) (Watts m^-2)
143 | # - temp            real	 10-meter air temperature (?C)
144 | # - rh              real	 relative humidity (%)
145 | # - windspd         real	 wind speed (ms^-1)
146 | # - winddir         real	 wind direction (degrees, clockwise from north)
147 | # - pressure        real	 station pressure (mb)
148 | # 
149 | 
150 | # In[41]:
151 | 
152 | cols = ['year', 'jday', 'month', 'day','hour','min','dt','zen','dw_solar','dw_solar_QC','uw_solar',
153 |        'uw_solar_QC', 'direct_n','direct_n_QC','diffuse', 'diffuse_QC', 'dw_ir', 'dw_ir_QC', 'dw_casetemp',
154 |        'dw_casetemp_QC', 'dw_dometemp','dw_dometemp_QC','uw_ir', 'uw_ir_QC', 'uw_casetemp','uw_casetemp_QC',
155 |        'uw_dometemp','uw_dometemp_QC','uvb','uvb_QC','par','par_QC','netsolar','netsolar_QC','netir','netir_QC',
156 |        'totalnet','totalnet_QC','temp','temp_QC','rh','rh_QC','windspd','windspd_QC','winddir','winddir_QC',
157 |        'pressure','pressure_QC']
158 | 
159 | 
160 | # In[42]:
161 | 
162 | path = r'./data/Sioux_Falls/Exp_1_train'
163 | all_files = glob.glob(path + "/*.dat")
164 | all_files.sort()
165 | 
166 | df_big_train = pd.concat([pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
167 |                  index_col=False,header=None, names=cols) for f in all_files],ignore_index=True)
168 | df_big_train.shape
169 | 
170 | 
171 | # In[43]:
172 | 
173 | path = r'./data/Sioux_Falls/Exp_1_test'
174 | all_files = glob.glob(path + "/*.dat")
175 | all_files.sort()
176 | 
177 | df_big_test = pd.concat((pd.read_csv(f, skipinitialspace = True, quotechar = '"',skiprows=(2),delimiter=' ', 
178 |                  index_col=False,header=None, names=cols) for f in all_files),ignore_index=True)
179 | df_big_test.shape
180 | 
181 | 
182 | # In[44]:
183 | 
184 | df_big_test[df_big_test['dw_solar']==-9999.9].shape
185 | 
186 | 
187 | # ### Merging Clear Sky GHI And the big dataframe
188 | 
189 | # In[45]:
190 | 
191 | df_train = pd.merge(df_big_train, cs_2010and2011, on=['year','month','day','hour','min'])
192 | df_train.shape
193 | 
194 | 
195 | # In[46]:
196 | 
197 | df_test = pd.merge(df_big_test, cs_2009, on=['year','month','day','hour','min'])
198 | df_test.shape
199 | 
200 | 
201 | # In[47]:
202 | 
203 | df_train.drop(['index'],axis=1, inplace=True) #updating the same dataframe by dropping the index columns from clear sky model
204 | df_test.drop(['index'], axis=1, inplace=True)
205 | 
206 | 
207 | # In[48]:
208 | 
209 | df_train.shape
210 | 
211 | 
212 | # ### Managing missing values
213 | 
214 | # In[49]:
215 | 
216 | # Resetting index
217 | df_train.reset_index(drop=True, inplace=True)
218 | df_test.reset_index(drop=True, inplace=True)
219 | 
220 | 
221 | # In[50]:
222 | 
223 | # Dropping rows with two or more -9999.9 values in columns
224 | 
225 | 
226 | # In[51]:
227 | 
228 | # Step1: Get indices of all rows with 2 or more -999
229 | missing_data_indices = np.where((df_train <=-9999.9).apply(sum, axis=1)>=2)[0]
230 | # Step2: Drop those indices
231 | df_train.drop(missing_data_indices, axis=0, inplace=True)
232 | # Checking that the rows are dropped
233 | df_train.shape
234 | 
235 | 
236 | # In[52]:
237 | 
238 | missing_data_indices_test = np.where((df_test <= -9999.9).apply(sum, axis=1)>=2)[0]
239 | df_test.drop(missing_data_indices_test, axis=0, inplace=True)
240 | df_test.shape
241 | 
242 | 
243 | # In[53]:
244 | 
245 | # For the rows with only one cell as -9999.9, replacing this cell with the mean of the column
246 | 
247 | 
248 | # #### First resetting index after dropping rows in the previous part of the code
249 | 
250 | # In[54]:
251 | 
252 | # 2nd time - Reseting Index
253 | df_train.reset_index(drop=True, inplace=True)
254 | df_test.reset_index(drop=True, inplace=True)
255 | 
256 | 
257 | # In[55]:
258 | 
259 | one_miss_train_idx = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
260 | 
261 | 
262 | # In[56]:
263 | 
264 | len(one_miss_train_idx)
265 | 
266 | 
267 | # In[57]:
268 | 
269 | df_train.shape
270 | 
271 | 
272 | # In[58]:
273 | 
274 | col_names = df_train.columns
275 | from collections import defaultdict
276 | stats = defaultdict(int)
277 | total_single_missing_values = 0
278 | for name in col_names:
279 |     col_mean = df_train[~(df_train[name] == -9999.9)][name].mean()
280 |     missing_indices = np.where((df_train[name] == -9999.9))
281 |     stats[name] = len(missing_indices[0])
282 |     df_train[name].loc[missing_indices] = col_mean
283 |     total_single_missing_values += sum(df_train[name] == -9999.9)
284 |     
285 | 
286 | 
287 | # In[59]:
288 | 
289 | df_col_min = df_train.apply(min, axis=0)
290 | df_col_max = df_train.apply(max, axis =0)
291 | #print(df_col_min, df_col_max)
292 | 
293 | 
294 | # In[60]:
295 | 
296 | train = np.where((df_train <=-9999.9).apply(sum, axis=1)==1)[0]
297 | 
298 | 
299 | # In[61]:
300 | 
301 | len(train)
302 | 
303 | 
304 | # In[62]:
305 | 
306 | # doing the same thing on test dataset
307 | 
308 | 
309 | # In[63]:
310 | 
311 | one_miss_test_idx = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
312 | len(one_miss_test_idx)
313 | 
314 | 
315 | # In[64]:
316 | 
317 | col_names_test = df_test.columns
318 | from collections import defaultdict
319 | stats_test = defaultdict(int)
320 | total_single_missing_values_test = 0
321 | for name in col_names_test:
322 |     col_mean = df_test[~(df_test[name] == -9999.9)][name].mean()
323 |     missing_indices = np.where((df_test[name] == -9999.9))
324 |     stats_test[name] = len(missing_indices[0])
325 |     df_test[name].loc[missing_indices] = col_mean
326 |     total_single_missing_values_test += sum(df_test[name] == -9999.9)
327 |     
328 | 
329 | 
330 | # In[65]:
331 | 
332 | test = np.where((df_test <=-9999.9).apply(sum, axis=1)==1)[0]
333 | 
334 | 
335 | # In[66]:
336 | 
337 | len(test)
338 | 
339 | 
340 | # In[67]:
341 | 
342 | df_train.shape
343 | 
344 | 
345 | # In[68]:
346 | 
347 | df_test.shape
348 | 
349 | 
350 | # ### Exploratory Data Analysis
351 | 
352 | # In[69]:
353 | 
354 | dw_solar_everyday = df_test.groupby(['jday'])['dw_solar'].mean()
355 | ghi_everyday = df_test.groupby(['jday'])['ghi'].mean()
356 | j_day = df_test['jday'].unique()
357 | 
358 | 
359 | # In[70]:
360 | 
361 | fig = plt.figure()
362 | 
363 | axes1 = fig.add_axes([0.1,0.1,0.8,0.8])
364 | #axes2 = fig.add_axes([0.1,0.1,0.8,0.8])
365 | 
366 | axes1.scatter(j_day,dw_solar_everyday,label='Observed dw_solar',color='red')
367 | axes1.scatter(j_day, ghi_everyday, label='Clear Sky GHI',color='green')
368 | 
369 | axes1.set_xlabel('Days')
370 | axes1.set_ylabel('Solar Irradiance (Watts /m^2)')
371 | axes1.set_title('Solar Irradiance - Test Year 2009')
372 | axes1.legend(loc='best')
373 | 
374 | fig.savefig('./RNN Paper Results/Exp1_2/Sioux_Falls/2hour_Figure 2.jpg', bbox_inches = 'tight')
375 | 
376 | 
377 | # In[71]:
378 | 
379 | sns.jointplot(x=dw_solar_everyday,y=ghi_everyday,kind='reg')
380 | #plt.title('observed dw_solar vs clear sky ghi')
381 | plt.xlabel('Observed global downwelling solar (Watts/m^2)')
382 | plt.ylabel('Clear Sky GHI (Watts/m^2)')
383 | plt.savefig('./RNN Paper Results/Exp1_2/Sioux_Falls/2hour_Figure 3', bbox_inches='tight')
384 | 
385 | 
386 | # ### making the Kt (clear sky index at time t) column by first removing rows with ghi==0
387 | 
388 | # In[72]:
389 | 
390 | df_train = df_train[df_train['ghi']!=0]
391 | df_test = df_test[df_test['ghi']!=0]
392 | df_train['Kt'] = df_train['dw_solar']/df_train['ghi']
393 | df_test['Kt'] = df_test['dw_solar']/df_test['ghi']
394 | 
395 | 
396 | # In[ ]:
397 | 
398 | df_train.reset_index(inplace=True)
399 | df_test.reset_index(inplace=True)
400 | 
401 | 
402 | # In[ ]:
403 | 
404 | print("test Kt max: "+str(df_test['Kt'].max()))
405 | print("test Kt min: "+str(df_test['Kt'].min()))
406 | print("test Kt mean: "+str(df_test['Kt'].mean()))
407 | print("\n")
408 | print("train Kt max: "+str(df_train['Kt'].max()))
409 | print("train Kt min: "+str(df_train['Kt'].min()))
410 | print("train Kt mean: "+str(df_train['Kt'].mean()))
411 | 
412 | 
413 | # In[ ]:
414 | 
415 | plt.plot(df_train['Kt'])
416 | 
417 | 
418 | # In[ ]:
419 | 
420 | plt.plot(df_test['Kt'])
421 | 
422 | 
423 | # In[ ]:
424 | 
425 | df_train= df_train[df_train['Kt']< 5000]
426 | df_train= df_train[df_train['Kt']> -1000]
427 | df_test= df_test[df_test['Kt']< 5000]
428 | df_test= df_test[df_test['Kt']> -1000]
429 | 
430 | 
431 | # #### Group the data (train dataframe)
432 | 
433 | # In[ ]:
434 | 
435 | zen = df_train.groupby(['year','month','day','hour'])['zen'].mean()
436 | dw_solar = df_train.groupby(['year','month','day','hour'])['dw_solar'].mean()
437 | uw_solar = df_train.groupby(['year','month','day','hour'])['uw_solar'].mean()
438 | direct_n = df_train.groupby(['year','month','day','hour'])['direct_n'].mean()
439 | diffuse = df_train.groupby(['year','month','day','hour'])['diffuse'].mean()
440 | dw_ir = df_train.groupby(['year','month','day','hour'])['dw_ir'].mean()
441 | dw_casetemp = df_train.groupby(['year','month','day','hour'])['dw_casetemp'].mean()
442 | dw_dometemp = df_train.groupby(['year','month','day','hour'])['dw_dometemp'].mean()
443 | uw_ir = df_train.groupby(['year','month','day','hour'])['uw_ir'].mean()
444 | uw_casetemp = df_train.groupby(['year','month','day','hour'])['uw_casetemp'].mean()
445 | uw_dometemp = df_train.groupby(['year','month','day','hour'])['uw_dometemp'].mean()
446 | uvb = df_train.groupby(['year','month','day','hour'])['uvb'].mean()
447 | par = df_train.groupby(['year','month','day','hour'])['par'].mean()
448 | netsolar = df_train.groupby(['year','month','day','hour'])['netsolar'].mean()
449 | netir = df_train.groupby(['year','month','day','hour'])['netir'].mean()
450 | totalnet = df_train.groupby(['year','month','day','hour'])['totalnet'].mean()
451 | temp = df_train.groupby(['year','month','day','hour'])['temp'].mean()
452 | rh = df_train.groupby(['year','month','day','hour'])['rh'].mean()
453 | windspd = df_train.groupby(['year','month','day','hour'])['windspd'].mean()
454 | winddir = df_train.groupby(['year','month','day','hour'])['winddir'].mean()
455 | pressure = df_train.groupby(['year','month','day','hour'])['pressure'].mean()
456 | ghi = df_train.groupby(['year','month','day','hour'])['ghi'].mean()
457 | Kt = df_train.groupby(['year','month','day','hour'])['Kt'].mean()
458 | 
459 | 
460 | # In[ ]:
461 | 
462 | df_new_train = pd.concat([zen,dw_solar,uw_solar,direct_n,diffuse,dw_ir,dw_casetemp,dw_dometemp,uw_ir,uw_casetemp,uw_dometemp,
463 |                     uvb,par,netsolar,netir,totalnet,temp,rh,windspd,winddir,pressure,ghi,Kt], axis=1)
464 | 
465 | 
466 | # In[ ]:
467 | 
468 | df_new_train.head()
469 | 
470 | 
471 | # #### Groupdata - test dataframe
472 | 
473 | # In[ ]:
474 | 
475 | test_zen = df_test.groupby(['month','day','hour'])['zen'].mean()
476 | test_dw_solar = df_test.groupby(['month','day','hour'])['dw_solar'].mean()
477 | test_uw_solar = df_test.groupby(['month','day','hour'])['uw_solar'].mean()
478 | test_direct_n = df_test.groupby(['month','day','hour'])['direct_n'].mean()
479 | test_diffuse = df_test.groupby(['month','day','hour'])['diffuse'].mean()
480 | test_dw_ir = df_test.groupby(['month','day','hour'])['dw_ir'].mean()
481 | test_dw_casetemp = df_test.groupby(['month','day','hour'])['dw_casetemp'].mean()
482 | test_dw_dometemp = df_test.groupby(['month','day','hour'])['dw_dometemp'].mean()
483 | test_uw_ir = df_test.groupby(['month','day','hour'])['uw_ir'].mean()
484 | test_uw_casetemp = df_test.groupby(['month','day','hour'])['uw_casetemp'].mean()
485 | test_uw_dometemp = df_test.groupby(['month','day','hour'])['uw_dometemp'].mean()
486 | test_uvb = df_test.groupby(['month','day','hour'])['uvb'].mean()
487 | test_par = df_test.groupby(['month','day','hour'])['par'].mean()
488 | test_netsolar = df_test.groupby(['month','day','hour'])['netsolar'].mean()
489 | test_netir = df_test.groupby(['month','day','hour'])['netir'].mean()
490 | test_totalnet = df_test.groupby(['month','day','hour'])['totalnet'].mean()
491 | test_temp = df_test.groupby(['month','day','hour'])['temp'].mean()
492 | test_rh = df_test.groupby(['month','day','hour'])['rh'].mean()
493 | test_windspd = df_test.groupby(['month','day','hour'])['windspd'].mean()
494 | test_winddir = df_test.groupby(['month','day','hour'])['winddir'].mean()
495 | test_pressure = df_test.groupby(['month','day','hour'])['pressure'].mean()
496 | test_ghi = df_test.groupby(['month','day','hour'])['ghi'].mean()
497 | test_Kt = df_test.groupby(['month','day','hour'])['Kt'].mean()
498 | 
499 | 
500 | # In[ ]:
501 | 
502 | df_new_test = pd.concat([test_zen,test_dw_solar,test_uw_solar,test_direct_n,test_diffuse,test_dw_ir,
503 |                          test_dw_casetemp,test_dw_dometemp,test_uw_ir,test_uw_casetemp,test_uw_dometemp,
504 |                     test_uvb,test_par,test_netsolar,test_netir,test_totalnet,test_temp,test_rh,
505 |                          test_windspd,test_winddir,test_pressure,test_ghi,test_Kt], axis=1)
506 | 
507 | 
508 | # In[ ]:
509 | 
510 | df_new_test.loc[2].xs(17,level='day')
511 | 
512 | 
513 | # ### Shifting Kt values to make 1 hour ahead forecast
514 | 
515 | # #### Train dataset
516 | 
517 | # In[ ]:
518 | 
519 | levels_index= []
520 | for m in df_new_train.index.levels:
521 |     levels_index.append(m)
522 | 
523 | 
524 | # In[ ]:
525 | 
526 | for i in levels_index[0]:
527 |     for j in levels_index[1]:
528 |         df_new_train.loc[i].loc[j]['Kt'] = df_new_train.loc[i].loc[j]['Kt'].shift(-2)
529 | 
530 | 
531 | # In[ ]:
532 | 
533 | df_new_train = df_new_train[~(df_new_train['Kt'].isnull())]
534 | 
535 | 
536 | # #### Test dataset
537 | 
538 | # In[ ]:
539 | 
540 | levels_index2= []
541 | for m in df_new_test.index.levels:
542 |     levels_index2.append(m)
543 | 
544 | 
545 | # In[ ]:
546 | 
547 | for i in levels_index2[0]:
548 |     for j in levels_index2[1]:
549 |         df_new_test.loc[i].loc[j]['Kt'] = df_new_test.loc[i].loc[j]['Kt'].shift(-2)
550 | 
551 | 
552 | # In[ ]:
553 | 
554 | df_new_test = df_new_test[~(df_new_test['Kt'].isnull())]
555 | 
556 | 
557 | # In[ ]:
558 | 
559 | df_new_test[df_new_test['Kt']==df_new_test['Kt'].max()]
560 | 
561 | 
562 | # ### Normalize train and test dataframe
563 | 
564 | # In[ ]:
565 | 
566 | train_norm = (df_new_train - df_new_train.mean()) / (df_new_train.max() - df_new_train.min())
567 | test_norm =  (df_new_test - df_new_test.mean()) / (df_new_test.max() - df_new_test.min())
568 | 
569 | 
570 | # In[ ]:
571 | 
572 | train_norm.reset_index(inplace=True,drop=True)
573 | test_norm.reset_index(inplace=True,drop=True)
574 | 
575 | 
576 | # ### Making train and test sets with train_norm and test_norm
577 | 
578 | # #### finding the gcf (greatest common factor) of train and test dataset's length and chop off the extra rows to make it divisible with the batchsize
579 | 
580 | # In[ ]:
581 | 
582 | from fractions import gcd
583 | gcd(train_norm.shape[0],test_norm.shape[0])
584 | 
585 | 
586 | # In[ ]:
587 | 
588 | import math
589 | def roundup(x):
590 |     return int(math.ceil(x / 100.0)) * 100
591 | 
592 | 
593 | # In[ ]:
594 | 
595 | train_lim = roundup(train_norm.shape[0])
596 | test_lim = roundup(test_norm.shape[0])
597 | 
598 | train_random = train_norm.sample(train_lim-train_norm.shape[0])
599 | test_random = test_norm.sample(test_lim-test_norm.shape[0])
600 | 
601 | train_norm = train_norm.append(train_random)
602 | test_norm = test_norm.append(test_random)
603 | 
604 | 
605 | # In[ ]:
606 | 
607 | X1 = train_norm.drop('Kt',axis=1)
608 | y1 = train_norm['Kt']
609 | 
610 | X2 = test_norm.drop('Kt',axis=1)
611 | y2 = test_norm['Kt']
612 | 
613 | 
614 | # In[ ]:
615 | 
616 | print("X1_train shape is {}".format(X1.shape))
617 | print("y1_train shape is {}".format(y1.shape))
618 | print("X2_test shape is {}".format(X2.shape))
619 | print("y2_test shape is {}".format(y2.shape))
620 | 
621 | 
622 | # In[ ]:
623 | 
624 | X_train = np.array(X1)
625 | y_train  = np.array(y1)
626 | X_test = np.array(X2)
627 | y_test = np.array(y2)
628 | 
629 | 
630 | # ### start of RNN
631 | 
632 | # In[ ]:
633 | 
634 | import torch
635 | import torch.nn as nn
636 | import torchvision.transforms as transforms
637 | import torchvision.datasets as dsets
638 | from torch.autograd import Variable
639 | 
640 | 
641 | # In[ ]:
642 | 
643 | class RNNModel(nn.Module):
644 |     def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
645 |         super(RNNModel, self).__init__()
646 |         #Hidden Dimension
647 |         self.hidden_dim = hidden_dim
648 |         
649 |         # Number of hidden layers
650 |         self.layer_dim = layer_dim
651 |         
652 |         #Building the RNN
653 |         self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
654 |         
655 |         # Readout layer
656 |         self.fc = nn.Linear(hidden_dim, output_dim)
657 |         
658 |     def forward(self, x):
659 |         # Initializing the hidden state with zeros
660 |         # (layer_dim, batch_size, hidden_dim)
661 |         h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
662 |         
663 |         #One time step (the last one perhaps?)
664 |         out, hn = self.rnn(x, h0)
665 |         
666 |         # Indexing hidden state of the last time step
667 |         # out.size() --> ??
668 |         #out[:,-1,:] --> is it going to be 100,100
669 |         out = self.fc(out[:,-1,:])
670 |         # out.size() --> 100,1
671 |         return out
672 |         
673 | 
674 | 
675 | # In[ ]:
676 | 
677 | # Instantiating Model Class
678 | input_dim = 22
679 | hidden_dim = 15
680 | layer_dim = 1
681 | output_dim = 1
682 | batch_size = 100
683 | 
684 | model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)
685 | 
686 | # Instantiating Loss Class
687 | criterion = nn.MSELoss()
688 | 
689 | # Instantiate Optimizer Class
690 | learning_rate = 0.001
691 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
692 | 
693 | # converting numpy array to torch tensor
694 | X_train = torch.from_numpy(X_train)
695 | y_train = torch.from_numpy(y_train)
696 | X_test = torch.from_numpy(X_test)
697 | y_test = torch.from_numpy(y_test)
698 | 
699 | # initializing lists to store losses over epochs:
700 | train_loss = []
701 | test_loss = []
702 | train_iter = []
703 | test_iter = []
704 | 
705 | 
706 | # In[ ]:
707 | 
708 | # Training the model
709 | seq_dim = 1
710 | 
711 | n_iter =0
712 | num_samples = len(X_train)
713 | test_samples = len(X_test)
714 | batch_size = 100
715 | num_epochs = 1000
716 | feat_dim = X_train.shape[1]
717 | 
718 | X_train = X_train.type(torch.FloatTensor)
719 | y_train = y_train.type(torch.FloatTensor)
720 | X_test = X_test.type(torch.FloatTensor)
721 | y_test = y_test.type(torch.FloatTensor)
722 | 
723 | for epoch in range(num_epochs):
724 |     for i in range(0, int(num_samples/batch_size -1)):
725 |         
726 |         
727 |         features = Variable(X_train[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
728 |         Kt_value = Variable(y_train[i*batch_size:(i+1)*batch_size])
729 |         
730 |         #print("Kt_value={}".format(Kt_value))
731 |         
732 |         optimizer.zero_grad()
733 |         
734 |         outputs = model(features)
735 |         #print("outputs ={}".format(outputs))
736 |         
737 |         loss = criterion(outputs, Kt_value)
738 |         
739 |         train_loss.append(loss.data[0])
740 |         train_iter.append(n_iter)
741 | 
742 |         #print("loss = {}".format(loss))
743 |         loss.backward()
744 |         
745 |         optimizer.step()
746 |         
747 |         n_iter += 1  
748 |             
749 |         if n_iter%100 == 0:
750 |             for i in range(0,int(test_samples/batch_size -1)):
751 |                 features = Variable(X_test[i*batch_size:(i+1)*batch_size, :]).view(-1, seq_dim, feat_dim)
752 |                 Kt_test = Variable(y_test[i*batch_size:(i+1)*batch_size])
753 |                 
754 |                 outputs = model(features)
755 |                 
756 |                 mse = np.sqrt(np.mean((Kt_test.data.numpy() - outputs.data.numpy().squeeze())**2)/num_samples)
757 |                 
758 |                 test_iter.append(n_iter)
759 |                 test_loss.append(mse)
760 |                 
761 |             print('Epoch: {} Iteration: {}. Train_MSE: {}. Test_MSE: {}'.format(epoch, n_iter, loss.data[0], mse))
762 |                
763 | 
764 | 
765 | # In[ ]:
766 | 
767 | print(len(test_loss))
768 | #plt.plot(test_loss)
769 | plt.plot(train_loss,'-')
770 | #plt.ylim([0.000,0.99])
771 | 
772 | 
773 | # In[ ]:
774 | 
775 | plt.plot(test_loss,'r')
776 | 
777 | 
778 | # #### Demornamization
779 | 
780 | # In[ ]:
781 | 
782 | rmse = np.sqrt(mse)
783 | 
784 | 
785 | # In[ ]:
786 | 
787 | rmse_denorm = (rmse * (df_new_test['Kt'].max() - df_new_test['Kt'].min()))+ df_new_test['Kt'].mean()
788 | 
789 | 
790 | # In[ ]:
791 | 
792 | print("rmse_denorm=",rmse_denorm)
793 | 
794 | 
795 | # In[ ]:
796 | 
797 | print(df_new_test['Kt'].describe())
798 | 
799 | 
800 | # ### Saving train and test losses to a csv
801 | 
802 | # In[ ]:
803 | 
804 | df_trainLoss = pd.DataFrame(data={'Train Loss':train_loss, 'iteration':train_iter}, columns=['Train Loss','iteration'])
805 | df_trainLoss.to_csv('./RNN Paper Results/Exp1_2/Sioux_Falls/2hour_SiouxFalls_TrainLoss.csv')
806 | df_testLoss = pd.DataFrame(data={'Test Loss':test_loss, 'iteration':test_iter}, columns=['Test Loss','iteration'])
807 | df_testLoss.to_csv('./RNN Paper Results/Exp1_2/Sioux_Falls/2hour_TestLoss.csv')
808 | 
809 | 
810 | # In[ ]:
811 | 
812 | 
813 | 
814 | 


--------------------------------------------------------------------------------
/multi-tscale-slim.yaml:
--------------------------------------------------------------------------------
 1 | name: multi-tscale
 2 | channels:
 3 |   - defaults
 4 | dependencies:
 5 |   - blas=1.0
 6 |   - ca-certificates=2019.1.23
 7 |   - certifi=2018.11.29
 8 |   - cffi=1.11.5
 9 |   - cudatoolkit=9.0
10 |   - freetype=2.9.1
11 |   - intel-openmp=2019.1
12 |   - jpeg=9b
13 |   - libedit=3.1.20170329
14 |   - libffi=3.2.1
15 |   - libgcc-ng=8.2.0
16 |   - libgfortran-ng=7.3.0
17 |   - libpng=1.6.36
18 |   - libstdcxx-ng=8.2.0
19 |   - libtiff=4.0.10
20 |   - mkl=2018.0.3
21 |   - mkl_fft=1.0.6
22 |   - mkl_random=1.0.1
23 |   - nccl=1.3.5
24 |   - ncurses=6.1
25 |   - ninja=1.8.2
26 |   - numpy=1.15.4
27 |   - numpy-base=1.15.4
28 |   - olefile=0.46
29 |   - openssl=1.1.1a
30 |   - pandas=0.24.1
31 |   - pillow=5.4.1
32 |   - pip=10.0.1
33 |   - pycparser=2.19
34 |   - python=3.6.8
35 |   - python-dateutil=2.7.5
36 |   - pytorch=0.4.1
37 |   - pytz=2018.9
38 |   - readline=7.0
39 |   - setuptools=39.2.0
40 |   - six=1.12.0
41 |   - sqlite=3.26.0
42 |   - tk=8.6.8
43 |   - torchvision=0.2.1
44 |   - wheel=0.31.1
45 |   - xz=5.2.4
46 |   - zlib=1.2.11
47 |   - zstd=1.3.7
48 |   - pip:
49 |     - atomicwrites==1.2.0
50 |     - attrs==18.1.0
51 |     - bleach==1.5.0
52 |     - chardet==3.0.4
53 |     - click==6.7
54 |     - cmake==3.12.0
55 |     - colorama==0.3.9
56 |     - coverage==4.5.1
57 |     - cycler==0.10.0
58 |     - cython==0.29.2
59 |     - decorator==4.3.0
60 |     - dill==0.2.8.2
61 |     - filelock==3.0.10
62 |     - funcsigs==1.0.2
63 |     - future==0.16.0
64 |     - html5lib==0.9999999
65 |     - hyperopt==0.1.1
66 |     - idna==2.7
67 |     - kiwisolver==1.0.1
68 |     - lz4==2.1.0
69 |     - markdown==2.6.11
70 |     - matplotlib==2.2.2
71 |     - more-itertools==4.3.0
72 |     - networkx==2.1
73 |     - nose2==0.8.0
74 |     - numexpr==2.6.6
75 |     - pluggy==0.7.1
76 |     - psutil==5.4.7
77 |     - pvlib==0.5.2
78 |     - py==1.5.4
79 |     - py-trees==0.8.3
80 |     - pymongo==3.7.1
81 |     - pyparsing==2.2.0
82 |     - pytest==3.7.3
83 |     - pyyaml==3.13
84 |     - pyzmq==17.1.2
85 |     - requests==2.19.1
86 |     - scipy==1.1.0
87 |     - seaborn==0.9.0
88 |     - tables==3.4.4
89 |     - torch==0.4.1
90 |     - tqdm==4.26.0
91 |     - urllib3==1.23
92 |     - werkzeug==0.14.1
93 |     - zmq==0.0.0
94 | 


--------------------------------------------------------------------------------