├── .gitignore
├── LICENSE
├── README.md
├── buoypy
    ├── __init__.py
    ├── buoypy.py
    └── get_data.py
├── figures
    ├── historic.png
    ├── historic_range.png
    └── realtime.png
├── scripts
    └── buoy_data_analysis.ipynb
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.DS_Store
2 | *.pyc
3 | *.db
4 | *.egg-info/
5 | *.zip
6 | *.ipynb_checkpoints
7 | __pycache__/
8 | .ipynb_checkpoints
9 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2015 Ariel Rokem, The University of Washington eScience Institute
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # DEPRECATED: See [seebuoy](https://github.com/nickc1/seebuoy) for NDBC data.
  2 | 
  3 | 
  4 | buoypy
  5 | ========
  6 | 
  7 | Functions to query the [NDBC](http://www.ndbc.noaa.gov/).
  8 | 
  9 | Returns pandas dataframes for the wave parameters.
 10 | 
 11 | Data descriptions - [Link][http://www.ndbc.noaa.gov/measdes.shtml]
 12 | 
 13 | 
 14 | # Real Time - Last 45 Days
 15 | 
 16 | The real time data for all of their buoys can be found at:
 17 | http://www.ndbc.noaa.gov/data/realtime2/
 18 | 
 19 | The realtime data provided is :
 20 | 
 21 | |File					|Description
 22 | |-------------|-----------------------
 23 | |.data_spec		|	Raw Spectral Wave Data
 24 | |.spec				|	Spectral Wave Summary Data
 25 | |.swdir				|	Spectral Wave Data (alpha1)
 26 | |.swdir2			|	Spectral Wave Data (alpha2)
 27 | |.swr1				|	Spectral Wave Data (r1)
 28 | |.swr2				|	Spectral Wave Data (r2)
 29 | |.txt					|	Standard Meteorological Data
 30 | 
 31 | The data headers for each of the files are provided below.
 32 | 
 33 | ##### .data_spec
 34 | 
 35 | |YY	|MM	|DD	|hh	|mm	|Sep_Freq	|spec_1	|(freq_1)	|spec_2	|(freq_2)	|spec_3	|(freq_3)	|...
 36 | |---|---|---|---|---|---------|-------|---------|-------|---------|-------|---------|---
 37 | 
 38 | ##### .spec
 39 | 
 40 | |YY	|MM	|DD	|hh	|mm	|WVHT	|SwH	|SwP	|WWH	|WWP	|SwD	|WWD	|STEEPNESS	|APD	|MWD
 41 | |---|---|---|---|---|-----|-----|-----|-----|-----|-----|-----|-----------|-----|---
 42 | |yr	|mo	|dy	|hr	|mn	|m		|m		|sec 	|m		|sec 	|-		|degT	| -					|sec	|degT
 43 | 
 44 | ##### .swdir
 45 | 
 46 | |YY	|MM	|DD	|hh	|mm	|alpha1_1	|(freq_1)	|alpha1_2	|(freq_2)	|alpha1_3	|(freq_3)	|...
 47 | |---|---|---|---|---|---------|---------|---------|---------|---------|---------|---
 48 | 
 49 | ##### .swdir2
 50 | 
 51 | |YY	|MM	|DD	|hh	|mm	|alpha2_1	|(freq_1)	|alpha2_2	|(freq_2)	|alpha2_3	|(freq_3)	|...
 52 | |---|---|---|---|---|---------|---------|---------|---------|---------|---------|---
 53 | 
 54 | ##### .swr1
 55 | 
 56 | |YY	|MM	|DD	|hh	|mm	|r1_1	|(freq_1)	|r1_2	|(freq_2)	|r1_3	|(freq_3)	|...
 57 | |---|---|---|---|---|-----|---------|-----|---------|-----|---------|---
 58 | 
 59 | ##### .swr2
 60 | 
 61 | |YY	|MM	|DD	|hh	|mm	|r2_1	|(freq_1)	|r2_2	|(freq_2)	|r2_3	|(freq_3)	|...
 62 | |---|---|---|---|---|-----|---------|-----|---------|-----|---------|---
 63 | 
 64 | 
 65 | ##### .txt
 66 | 
 67 | |YY	|MM	|DD	|hh	|mm	|WDIR	|WSPD	|GST	|WVHT	|DPD	|APD	|MWD	|PRES	|ATMP	|WTMP	|DEWP	|VIS	|PTDY	|TIDE
 68 | |---|---|---|---|---|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----
 69 | |yr	|mo	|dy	|hr	|mn	|degT	|m/s	|m/s 	|m		|sec 	|sec	|degT	|hPa	|degC	|degC	|degC	|nmi	|hPa	|ft
 70 | 
 71 | 
 72 | 
 73 | | Method			| Description                   |
 74 | | ----------- |------------------------------ |
 75 | | data_spec 	| raw spectral wave data 				|
 76 | | get_spec 		| spectral wave summaries				|
 77 | | get_swdir 	| spectral wave data (alpha1) 	|
 78 | | get_swdir2 	| spectral wave data (alpha2) 	|
 79 | | get_swr1 		| spectral wave data (r1) 			|
 80 | | get_swr2 		| spectral wave data (r2) 			|
 81 | | get_txt			| standard meteorological data  |
 82 | 
 83 | 
 84 | #Examples
 85 | 
 86 | ```python
 87 | import buoypy as bp
 88 | 
 89 | buoy = 41108 #wilmington harbor
 90 | B = bp.realtime(buoy) #wilmington harbor
 91 | 
 92 | df = B.txt()
 93 | 
 94 | # plotting
 95 | fig,ax = plt.subplots(2,sharex=True)
 96 | df.WVHT.plot(ax=ax[0])
 97 | ax[0].set_ylabel('Wave Height (m)',fontsize=14)
 98 | 
 99 | df.DPD.plot(ax=ax[1])
100 | ax[1].set_ylabel('Dominant Period (sec)',fontsize=14)
101 | ax[1].set_xlabel('')
102 | sns.despine()
103 | ```
104 | 
105 | ![bouypy realtime](/figures/realtime.png)
106 | 
107 | 
108 | # Historic Data - All information from a buoy
109 | 
110 | All buoys have different years that they are online. This aims to grab all the available data. Currently only grabbing the standard Meteorological data is implemented.
111 | 
112 | The historic data provided is:
113 | 
114 | |Description
115 | |-----------------------
116 | |	Standard Meteorological Data
117 | 
118 | 
119 | |YY	|MM	|DD	|hh	|mm	|WDIR	|WSPD	|GST	|WVHT	|DPD	|APD	|MWD	|PRES	|ATMP	|WTMP	|DEWP	|VIS	|TIDE
120 | |---|---|---|---|---|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----
121 | |yr	|mo	|dy	|hr	|mn	|degT	|m/s	|m/s 	|m		|sec 	|sec	|degT	|hPa	|degC	|degC	|degC	|nmi	|ft
122 | 
123 | 
124 | 
125 | ```python
126 | import buoypy as bp
127 | 
128 | buoy = 41108
129 | year = 2014
130 | 
131 | H = bp.historic_data(buoy,year)
132 | df = H.get_stand_meteo()
133 | 
134 | # plotting
135 | fig,ax = plt.subplots(2,sharex=True)
136 | df.WVHT.plot(ax=ax[0])
137 | ax[0].set_ylabel('Wave Height (m)',fontsize=14)
138 | 
139 | df.DPD.plot(ax=ax[1])
140 | ax[1].set_ylabel('Dominant Period (sec)',fontsize=14)
141 | ax[1].set_xlabel('')
142 | sns.despine()
143 | ```
144 | 
145 | ![bouypy historic](/figures/historic.png)
146 | 
147 | Notice that the buoy went offline from the end of April, 2014 to mid August, 2014.
148 | 
149 | 
150 | # Historic Range - Grab data from a range of years
151 | 
152 | ```python
153 | 
154 | import buoypy as bp
155 | buoy = 41108
156 | year = np.NAN
157 | year_range = (2010,2018)
158 | 
159 | H = bp.historic_data(buoy,year,year_range)
160 | X = H.get_all_stand_meteo()
161 | 
162 | #plotting
163 | fig,ax = plt.subplots(2,sharex=True)
164 | X.WVHT.plot(ax=ax[0])
165 | ax[0].set_ylabel('Wave Height (m)',fontsize=14)
166 | X.DPD.plot(ax=ax[1])
167 | ax[1].set_ylabel('Dominant Period (sec)',fontsize=14)
168 | ax[1].set_xlabel('')
169 | sns.despine()
170 | ```
171 | 
172 | ![bouypy historic range](/figures/historic_range.png)
173 | 


--------------------------------------------------------------------------------
/buoypy/__init__.py:
--------------------------------------------------------------------------------
1 | from .buoypy import *


--------------------------------------------------------------------------------
/buoypy/buoypy.py:
--------------------------------------------------------------------------------
  1 | """
  2 | By Nick Cortale
  3 | nickc1.github.io
  4 | 
  5 | Functions to query the NDBC (http://www.ndbc.noaa.gov/).
  6 | 
  7 | The realtime data for all of their buoys can be found at:
  8 | http://www.ndbc.noaa.gov/data/realtime2/
  9 | 
 10 | Info about all of noaa data can be found at:
 11 | http://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf
 12 | 
 13 | What all the values mean:
 14 | http://www.ndbc.noaa.gov/measdes.shtml
 15 | 
 16 | Each buoy has the data:
 17 | 
 18 | File                    Parameters
 19 | ----                    ----------
 20 | .data_spec         Raw Spectral Wave Data
 21 | .ocean             Oceanographic Data
 22 | .spec              Spectral Wave Summary Data
 23 | .supl              Supplemental Measurements Data
 24 | .swdir             Spectral Wave Data (alpha1)
 25 | .swdir2            Spectral Wave Data (alpha2)
 26 | .swr1              Spectral Wave Data (r1)
 27 | .swr2              Spectral Wave Data (r2)
 28 | .txt               Standard Meteorological Data
 29 | 
 30 | 
 31 | 
 32 | Example:
 33 | import buoypy as bp
 34 | 
 35 | # Get the last 45 days of data
 36 | rt = bp.realtime(41013) #frying pan shoals buoy
 37 | ocean_data = rt.get_ocean()  #get Oceanographic data
 38 | 
 39 | wave_data.head()
 40 | 
 41 | Out[7]:
 42 |                     WVHT    SwH SwP WWH WWP SwD WWD STEEPNESS   APD MWD
 43 | 2016-02-04 17:42:00 1.6     1.3 7.1 0.9 4.5 S   S   STEEP       5.3 169
 44 | 2016-02-04 16:42:00 1.7     1.5 7.7 0.9 5.0 S   S   STEEP       5.4 174
 45 | 2016-02-04 15:41:00 2.0     0.0 NaN 2.0 7.1 NaN S   STEEP       5.3 174
 46 | 2016-02-04 14:41:00 2.0     1.2 7.7 1.5 5.9 SSE SSE STEEP       5.5 167
 47 | 2016-02-04 13:41:00 2.0     1.7 7.1 0.9 4.8 S   SSE STEEP       5.7 175
 48 | 
 49 | 
 50 | 
 51 | TODO:
 52 | Make functions with except statements always spit out the same
 53 | column headings.
 54 | 
 55 | """
 56 | 
 57 | import pandas as pd
 58 | import numpy as np
 59 | import datetime
 60 | 
 61 | class realtime:
 62 | 
 63 |     def __init__(self, buoy):
 64 | 
 65 |         self.link = 'http://www.ndbc.noaa.gov/data/realtime2/{}'.format(buoy)
 66 | 
 67 |     def data_spec(self):
 68 |         """
 69 |         Get the raw spectral wave data from the buoy. The seperation
 70 |         frequency is dropped to keep the data clean.
 71 | 
 72 |         Parameters
 73 |         ----------
 74 |         buoy : string
 75 |             Buoy number ex: '41013' is off wilmington, nc
 76 | 
 77 |         Returns
 78 |         -------
 79 |         df : pandas dataframe (date, frequency)
 80 |             data frame containing the raw spectral data. index is the date
 81 |             and the columns are each of the frequencies
 82 | 
 83 |         """
 84 |         
 85 |         link = "{}.{}".format(self.link, 'data_spec')
 86 | 
 87 |         #combine the first five date columns YY MM DD hh mm and make index
 88 |         df = pd.read_csv(link, delim_whitespace=True, skiprows=1, header=None,
 89 |             parse_dates=[[0,1,2,3,4]], index_col=0)
 90 | 
 91 | 
 92 |         #convert the dates to datetimes
 93 |         df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
 94 | 
 95 |         specs = df.iloc[:,1::2]
 96 |         freqs = df.iloc[0,2::2]
 97 | 
 98 |         specs.columns=freqs
 99 | 
100 |         #remove the parenthesis from the column index
101 |         specs.columns = [cname.replace('(','').replace(')','')
102 |             for cname in specs.columns]
103 | 
104 |         return specs
105 | 
106 | 
107 |     def ocean(self):
108 |         """
109 |         Retrieve oceanic data. For the buoys explored,
110 |         O2%, O2PPM, CLCON, TURB, PH, EH were always NaNs
111 | 
112 | 
113 |         Returns
114 |         -------
115 |         df : pandas dataframe
116 |             Index is the date and columns are:
117 |             DEPTH   m
118 |             OTMP    degc
119 |             COND    mS/cm
120 |             SAL     PSU
121 |             O2%     %
122 |             02PPM   ppm
123 |             CLCON   ug/l
124 |             TURB    FTU
125 |             PH      -
126 |             EH      mv
127 | 
128 |         """
129 | 
130 |         link = "{}.{}".format(self.link, 'ocean')
131 | 
132 |         #combine the first five date columns YY MM DD hh mm and make index
133 |         df = pd.read_csv(link, delim_whitespace=True, na_values='MM',
134 |             parse_dates=[[0,1,2,3,4]], index_col=0)
135 | 
136 |         #units are in the second row drop them
137 |         df.drop(df.index[0], inplace=True)
138 | 
139 |         #convert the dates to datetimes
140 |         df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
141 | 
142 |         #convert to floats
143 |         cols = ['DEPTH','OTMP','COND','SAL']
144 |         df[cols] = df[cols].astype(float)
145 | 
146 | 
147 |         return df
148 | 
149 | 
150 |     def spec(self):
151 |         """
152 |         Get the spectral wave data from the ndbc. Something is wrong with
153 |         the data for this parameter. The columns seem to change randomly.
154 |         Refreshing the data page will yield different column names from
155 |         minute to minute.
156 | 
157 |         parameters
158 |         ----------
159 |         buoy : string
160 |             Buoy number ex: '41013' is off wilmington, nc
161 | 
162 |         Returns
163 |         -------
164 |         df : pandas dataframe
165 |             data frame containing the spectral data. index is the date
166 |             and the columns are:
167 | 
168 |             HO, SwH, SwP, WWH, WWP, SwD, WWD, STEEPNESS, AVP, MWD
169 | 
170 |             OR
171 | 
172 |             WVHT  SwH  SwP  WWH  WWP SwD WWD  STEEPNESS  APD MWD
173 | 
174 | 
175 |         """
176 | 
177 |         link = "{}.{}".format(self.link, 'spec')
178 | 
179 |         #combine the first five date columns YY MM DD hh mm and make index
180 |         df = pd.read_csv(link, delim_whitespace=True, na_values='MM',
181 |             parse_dates=[[0,1,2,3,4]], index_col=0)
182 | 
183 |         try:
184 |             #units are in the second row drop them
185 |             #df.columns = df.columns + '('+ df.iloc[0] + ')'
186 |             df.drop(df.index[0], inplace=True)
187 | 
188 |             #convert the dates to datetimes
189 |             df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
190 | 
191 |             #convert to floats
192 |             cols = ['WVHT','SwH','SwP','WWH','WWP','APD','MWD']
193 |             df[cols] = df[cols].astype(float)
194 |         except:
195 | 
196 |             #convert the dates to datetimes
197 |             df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
198 | 
199 |             #convert to floats
200 |             cols = ['H0','SwH','SwP','WWH','WWP','AVP','MWD']
201 |             df[cols] = df[cols].astype(float)
202 | 
203 | 
204 |         return df
205 | 
206 | 
207 | 
208 |     def supl(self):
209 |         """
210 |         Get supplemental data
211 | 
212 |         Returns
213 |         -------
214 |         data frame containing the spectral data. index is the date
215 |         and the columns are:
216 | 
217 |         PRES        hpa
218 |         PTIME       hhmm
219 |         WSPD        m/s
220 |         WDIR        degT
221 |         WTIME       hhmm
222 | 
223 | 
224 |         """
225 | 
226 |         link = "{}.{}".format(self.link, 'supl')
227 | 
228 |         #combine the first five date columns YY MM DD hh mm and make index
229 |         df = pd.read_csv(link, delim_whitespace=True, na_values='MM',
230 |             parse_dates=[[0,1,2,3,4]], index_col=0)
231 | 
232 |         #units are in the second row drop them
233 |         df.drop(df.index[0], inplace=True)
234 | 
235 |         #convert the dates to datetimes
236 |         df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
237 | 
238 |         #convert to floats
239 |         cols = ['PRES','PTIME','WSPD','WDIR','WTIME']
240 |         df[cols] = df[cols].astype(float)
241 | 
242 |         return df
243 | 
244 | 
245 |     def swdir(self):
246 |         """
247 |         Spectral wave data for alpha 1.
248 | 
249 |         Returns
250 |         -------
251 | 
252 |         specs : pandas dataframe
253 |             Index is the date and the columns are the spectrum. Values in
254 |             the table indicate how much energy is at each spectrum.
255 |         """
256 | 
257 | 
258 |         link = "{}.{}".format(self.link, 'swdir')
259 | 
260 |         #combine the first five date columns YY MM DD hh mm and make index
261 |         df = pd.read_csv(link,delim_whitespace=True,skiprows=1,na_values=999,
262 |             header=None, parse_dates=[[0,1,2,3,4]], index_col=0)
263 | 
264 |         #convert the dates to datetimes
265 |         df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
266 | 
267 |         specs = df.iloc[:,0::2]
268 |         freqs = df.iloc[0,1::2]
269 | 
270 |         specs.columns=freqs
271 | 
272 |         #remove the parenthesis from the column index
273 |         specs.columns = [cname.replace('(','').replace(')','')
274 |             for cname in specs.columns]
275 | 
276 |         return specs
277 | 
278 |     def swdir2(self):
279 |         """
280 |         Spectral wave data for alpha 2.
281 | 
282 |         Returns
283 |         -------
284 | 
285 |         specs : pandas dataframe
286 |             Index is the date and the columns are the spectrum. Values in
287 |             the table indicate how much energy is at each spectrum.
288 |         """
289 | 
290 |         link = "{}.{}".format(self.link, 'swdir2')
291 | 
292 |         #combine the first five date columns YY MM DD hh mm and make index
293 |         df = pd.read_csv(link,delim_whitespace=True,skiprows=1,
294 |             header=None, parse_dates=[[0,1,2,3,4]], index_col=0)
295 | 
296 |         #convert the dates to datetimes
297 |         df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
298 | 
299 |         specs = df.iloc[:,0::2]
300 |         freqs = df.iloc[0,1::2]
301 | 
302 |         specs.columns=freqs
303 | 
304 |         #remove the parenthesis from the column index
305 |         specs.columns = [cname.replace('(','').replace(')','')
306 |             for cname in specs.columns]
307 | 
308 |         return specs
309 | 
310 |     def swr1(self):
311 |         """
312 |         Spectral wave data for r1.
313 | 
314 |         Returns
315 |         -------
316 | 
317 |         specs : pandas dataframe
318 |             Index is the date and the columns are the spectrum. Values in
319 |             the table indicate how much energy is at each spectrum.
320 |         """
321 | 
322 | 
323 | 
324 |         link = "{}.{}".format(self.link, 'swr1')
325 |         #combine the first five date columns YY MM DD hh mm and make index
326 |         df = pd.read_csv(link,delim_whitespace=True,skiprows=1,
327 |             header=None, parse_dates=[[0,1,2,3,4]], index_col=0)
328 | 
329 |         #convert the dates to datetimes
330 |         df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
331 | 
332 |         specs = df.iloc[:,0::2]
333 |         freqs = df.iloc[0,1::2]
334 | 
335 |         specs.columns=freqs
336 | 
337 |         #remove the parenthesis from the column index
338 |         specs.columns = [cname.replace('(','').replace(')','')
339 |             for cname in specs.columns]
340 | 
341 |         return specs
342 | 
343 |     def swr2(self):
344 |         """
345 |         Spectral wave data for r2.
346 | 
347 |         Returns
348 |         -------
349 | 
350 |         specs : pandas dataframe
351 |             Index is the date and the columns are the spectrum. Values in
352 |             the table indicate how much energy is at each spectrum.
353 |         """
354 | 
355 | 
356 |         link = "{}.{}".format(self.link, 'swr2')
357 | 
358 |         #combine the first five date columns YY MM DD hh mm and make index
359 |         df = pd.read_csv(link,delim_whitespace=True,skiprows=1,
360 |             header=None, parse_dates=[[0,1,2,3,4]], index_col=0)
361 | 
362 |         #convert the dates to datetimes
363 |         df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
364 | 
365 |         specs = df.iloc[:,0::2]
366 |         freqs = df.iloc[0,1::2]
367 | 
368 |         specs.columns=freqs
369 | 
370 |         #remove the parenthesis from the column index
371 |         specs.columns = [cname.replace('(','').replace(')','')
372 |             for cname in specs.columns]
373 | 
374 |         return specs
375 | 
376 |     def txt(self):
377 |         """
378 |         Retrieve standard Meteorological data. NDBC seems to be updating
379 |         the data with different column names, so this metric can return
380 |         two possible data frames with different column names:
381 | 
382 |         Returns
383 |         -------
384 | 
385 |         df : pandas dataframe
386 |             Index is the date and the columns can be:
387 | 
388 |             ['WDIR','WSPD','GST','WVHT','DPD','APD','MWD',
389 |             'PRES','ATMP','WTMP','DEWP','VIS','PTDY','TIDE']
390 | 
391 |             or
392 | 
393 |             ['WD','WSPD','GST','WVHT','DPD','APD','MWD','BARO',
394 |             'ATMP','WTMP','DEWP','VIS','PTDY','TIDE']
395 | 
396 |         """
397 | 
398 |         link = "{}.{}".format(self.link, 'txt')
399 |         #combine the first five date columns YY MM DD hh mm and make index
400 |         df = pd.read_csv(link, delim_whitespace=True, na_values='MM',
401 |             parse_dates=[[0,1,2,3,4]], index_col=0)
402 | 
403 |         try:
404 |             #first column is units, so drop it
405 |             df.drop(df.index[0], inplace=True)
406 |             #convert the dates to datetimes
407 |             df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
408 | 
409 |             #convert to floats
410 |             cols = ['WDIR','WSPD','GST','WVHT','DPD','APD','MWD',
411 |             'PRES','ATMP','WTMP','DEWP','VIS','PTDY','TIDE']
412 |             df[cols] = df[cols].astype(float)
413 |         except:
414 | 
415 |             #convert the dates to datetimes
416 |             df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
417 | 
418 |             #convert to floats
419 |             cols = ['WD','WSPD','GST','WVHT','DPD','APD','MWD','BARO',
420 |             'ATMP','WTMP','DEWP','VIS','PTDY','TIDE']
421 |             df[cols] = df[cols].astype(float)
422 |         df.index.name='Date'
423 |         return df
424 | 
425 | ################################################
426 | ################################################
427 | 
428 | class historic_data:
429 | 
430 |     def __init__(self, buoy, year, year_range=None):
431 | 
432 |         link = 'http://www.ndbc.noaa.gov/view_text_file.php?filename='
433 |         link += '{}h{}.txt.gz&dir=data/historical/'.format(buoy, year)
434 |         self.link = link
435 | 
436 |     def get_stand_meteo(self,link = None):
437 |         '''
438 |         Standard Meteorological Data. Data header was changed in 2007. Thus
439 |         the need for the if statement below.
440 | 
441 | 
442 | 
443 |         WDIR    Wind direction (degrees clockwise from true N)
444 |         WSPD    Wind speed (m/s) averaged over an eight-minute period
445 |         GST     Peak 5 or 8 second gust speed (m/s)
446 |         WVHT    Significant wave height (meters) is calculated as
447 |                 the average of the highest one-third of all of the
448 |                 wave heights during the 20-minute sampling period.
449 |         DPD     Dominant wave period (seconds) is the period with the maximum wave energy.
450 |         APD     Average wave period (seconds) of all waves during the 20-minute period.
451 |         MWD     The direction from which the waves at the dominant period (DPD) are coming.
452 |                 (degrees clockwise from true N)
453 |         PRES    Sea level pressure (hPa).
454 |         ATMP    Air temperature (Celsius).
455 |         WTMP    Sea surface temperature (Celsius).
456 |         DEWP    Dewpoint temperature
457 |         VIS     Station visibility (nautical miles).
458 |         PTDY    Pressure Tendency
459 |         TIDE    The water level in feet above or below Mean Lower Low Water (MLLW).
460 |         '''
461 | 
462 |         link = self.link + 'stdmet/'
463 | 
464 |         #combine the first five date columns YY MM DD hh and make index
465 |         df = pd.read_csv(link, header=0, delim_whitespace=True, dtype=object,
466 |             na_values=[99,999,9999,99.,999.,9999.])
467 | 
468 | 
469 |         #2007 and on format
470 |         if df.iloc[0,0] =='#yr':
471 | 
472 | 
473 |             df = df.rename(columns={'#YY': 'YY'}) #get rid of hash
474 | 
475 |             #make the indices
476 |             
477 |             df.drop(0, inplace=True) #first row is units, so drop them
478 |             
479 |             d = df.YY + ' ' + df.MM+ ' ' + df.DD + ' ' + df.hh + ' ' + df.mm
480 |             ind = pd.to_datetime(d, format="%Y %m %d %H %M")
481 | 
482 |             df.index = ind
483 | 
484 |             #drop useless columns and rename the ones we want
485 |             df.drop(['YY','MM','DD','hh','mm'], axis=1, inplace=True)
486 |             df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD',
487 |                 'PRES', 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']
488 | 
489 | 
490 |         #before 2006 to 2000
491 |         else:
492 |             date_str = df.YYYY + ' ' + df.MM + ' ' + df.DD + ' ' + df.hh
493 | 
494 |             ind = pd.to_datetime(date_str,format="%Y %m %d %H")
495 | 
496 |             df.index = ind
497 | 
498 |             #some data has a minute column. Some doesn't.
499 | 
500 |             if 'mm' in df.columns:
501 |                 df.drop(['YYYY','MM','DD','hh','mm'], axis=1, inplace=True)
502 |             else:
503 |                 df.drop(['YYYY','MM','DD','hh'], axis=1, inplace=True)
504 | 
505 | 
506 |             df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD',
507 |                 'MWD', 'PRES', 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']
508 | 
509 | 
510 |         # all data should be floats
511 |         df = df.astype('float')
512 | 
513 |         return df
514 | 
515 |     def get_all_stand_meteo(self):
516 |         """
517 |         Retrieves all the standard meterological data. Calls get_stand_meteo.
518 |         It also checks to make sure that the years that were requested are
519 |         available. Data is not available for the same years at all the buoys.
520 | 
521 |         Returns
522 |         -------
523 |         df : pandas dataframe
524 |             Contains all the data from all the years that were specified
525 |             in year_range.
526 |         """
527 | 
528 |         start,stop = self.year_range
529 | 
530 |         #see what is on the NDBC so we only pull the years that are available
531 |         links = []
532 |         for ii in range(start,stop+1):
533 | 
534 |             base = 'http://www.ndbc.noaa.gov/view_text_file.php?filename='
535 |             end = '.txt.gz&dir=data/historical/stdmet/'
536 |             link = base + str(self.buoy) + 'h' + str(ii) + end
537 | 
538 |             try:
539 |                 urllib2.urlopen(link)
540 |                 links.append(link)
541 | 
542 |             except:
543 |                 print(str(ii) + ' not in records')
544 | 
545 |         #need to also retrieve jan, feb, march, etc.
546 |         month = ['Jan','Feb','Mar','Apr','May','Jun',
547 |             'Jul','Aug','Sep','Oct','Nov','Dec']
548 |         k = [1,2,3,4,5,6,7,8,9,'a','b','c'] #for the links
549 | 
550 |         for ii in range(len(month)):
551 |             mid = '.txt.gz&dir=data/stdmet/'
552 |             link = base + str(self.buoy) + str(k[ii]) + '2016' + mid + str(month[ii]) +'/'
553 | 
554 |             try:
555 |                 urllib2.urlopen(link)
556 |                 links.append(link)
557 | 
558 |             except:
559 |                 print(str(month[ii]) + '2016' + ' not in records')
560 |                 print(link)
561 | 
562 | 
563 |         # start grabbing some data
564 |         df=pd.DataFrame() #initialize empty df
565 | 
566 |         for L in links:
567 | 
568 |             new_df = self.get_stand_meteo(link=L)
569 |             print('Link : ' + L)
570 |             df = df.append(new_df)
571 | 
572 |         return df
573 | 
574 | 
575 | class write_data(historic_data):
576 | 
577 |     def __init__(self, buoy, year, year_range,db_name = 'buoydata.db'):
578 |         self.buoy = buoy
579 |         self.year = year
580 |         self.year_range=year_range
581 |         self.db_name = db_name
582 | 
583 |     def write_all_stand_meteo(self):
584 |         """
585 |         Write the standard meteological data to the database. See get_all_stand_meteo
586 |         for a discription of the data. Which is in the historic data class.
587 | 
588 |         Returns
589 |         -------
590 |         df : pandas dataframe (date, frequency)
591 |             data frame containing the raw spectral data. index is the date
592 |             and the columns are each of the frequencies
593 | 
594 |         """
595 | 
596 |         #hist = self.historic_data(self.buoy,self.year,year_range=self.year_range)
597 |         df = self.get_all_stand_meteo()
598 | 
599 |         #write the df to disk
600 |         disk_engine = create_engine('sqlite:///' + self.db_name)
601 | 
602 |         table_name = str(self.buoy) + '_buoy'
603 |         df.to_sql(table_name,disk_engine,if_exists='append')
604 |         sql = disk_engine.execute("""DELETE FROM wave_data
605 |             WHERE rowid not in
606 |             (SELECT max(rowid) FROM wave_data GROUP BY date)""")
607 | 
608 |         print(str(self.buoy) + 'written to database : ' + str(self.db_name))
609 | 
610 | 
611 |         return True
612 | 
613 | 
614 | class read_data:
615 |     """
616 |     Reads the data from the setup database
617 |     """
618 | 
619 |     def __init__(self, buoy, year_range=None):
620 |         self.buoy = buoy
621 |         self.year_range = year_range
622 |         self.disk_eng = 'sqlite:///buoydata.db'
623 | 
624 | 
625 |     def get_stand_meteo(self):
626 | 
627 |         disk_engine = create_engine(self.disk_eng)
628 | 
629 | 
630 |         df = pd.read_sql_query(" SELECT * FROM " + "'" + str(self.buoy) + '_buoy' + "'", disk_engine)
631 | 
632 |         #give it a datetime index since it was stripped by sqllite
633 |         df.index = pd.to_datetime(df['index'])
634 |         df.index.name='date'
635 |         df.drop('index',axis=1,inplace=True)
636 | 
637 |         if self.year_range:
638 |             print("""this is not implemented in SQL. Could be slow.
639 |                     Get out while you can!!!""" )
640 | 
641 |             start,stop = (self.year_range)
642 |             begin = df.index.searchsorted(datetime.datetime(start, 1, 1))
643 |             end = df.index.searchsorted(datetime.datetime(stop, 12, 31))
644 |             df = df.ix[begin:end]
645 | 
646 | 
647 | 
648 |         return df
649 | 


--------------------------------------------------------------------------------
/buoypy/get_data.py:
--------------------------------------------------------------------------------
  1 | """
  2 | By Nick Cortale
  3 | nickc1.github.io
  4 | 
  5 | Functions to query the NDBC (http://www.ndbc.noaa.gov/).
  6 | 
  7 | The realtime data for all of their buoys can be found at:
  8 | http://www.ndbc.noaa.gov/data/realtime2/
  9 | 
 10 | Info about all of noaa data can be found at:
 11 | http://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf
 12 | 
 13 | What all the values mean:
 14 | http://www.ndbc.noaa.gov/measdes.shtml
 15 | 
 16 | Each buoy has the data:
 17 | 
 18 | File                    Parameters
 19 | ----                    ----------
 20 | .data_spec         Raw Spectral Wave Data
 21 | .ocean             Oceanographic Data
 22 | .spec              Spectral Wave Summary Data
 23 | .supl              Supplemental Measurements Data
 24 | .swdir             Spectral Wave Data (alpha1)
 25 | .swdir2            Spectral Wave Data (alpha2)
 26 | .swr1              Spectral Wave Data (r1)
 27 | .swr2              Spectral Wave Data (r2)
 28 | .txt               Standard Meteorological Data
 29 | 
 30 | 
 31 | 
 32 | Example:
 33 | import buoypy as bp
 34 | 
 35 | # Get the last 45 days of data
 36 | rt = bp.realtime(41013) #frying pan shoals buoy
 37 | ocean_data = rt.get_ocean()  #get Oceanographic	data
 38 | 
 39 | wave_data.head()
 40 | 
 41 | Out[7]:
 42 | 					WVHT	SwH SwP	WWH	WWP	SwD	WWD	STEEPNESS	APD	MWD
 43 | 2016-02-04 17:42:00	1.6		1.3	7.1	0.9	4.5	S	S	STEEP		5.3	169
 44 | 2016-02-04 16:42:00	1.7		1.5	7.7	0.9	5.0	S	S	STEEP		5.4	174
 45 | 2016-02-04 15:41:00	2.0		0.0	NaN	2.0	7.1	NaN	S	STEEP		5.3	174
 46 | 2016-02-04 14:41:00	2.0		1.2	7.7	1.5	5.9	SSE	SSE	STEEP		5.5	167
 47 | 2016-02-04 13:41:00	2.0		1.7	7.1	0.9	4.8	S	SSE	STEEP		5.7	175
 48 | 
 49 | 
 50 | 
 51 | TODO:
 52 | Make functions with except statements always spit out the same 
 53 | column headings.
 54 | 
 55 | """
 56 | 
 57 | import pandas as pd
 58 | import numpy as np
 59 | import urllib2
 60 | from sqlalchemy import create_engine # database connection
 61 | import datetime
 62 | 
 63 | class formatter:
 64 | 	"""
 65 | 	Correctly formats the data contained in the link into a 
 66 | 	pandas dataframe.
 67 | 	"""
 68 | 
 69 | 	def __init__(self,link):
 70 | 		self.link = link
 71 | 
 72 | 	def format_stand_meteo(self):
 73 | 		"""
 74 | 		Format the standard Meteorological data.
 75 | 		"""
 76 | 
 77 | 		df = pd.read_csv(self.link,delim_whitespace=True,
 78 | 			na_values=[99,999,9999,99.,999.,9999.])
 79 | 
 80 | 		#2007 and on format
 81 | 		if df.iloc[0,0] =='#yr':
 82 | 
 83 | 
 84 | 			df = df.rename(columns={'#YY': 'YY'}) #get rid of hash
 85 | 
 86 | 			#make the indices
 87 | 			date_str = df.YY + ' ' + df.MM+ ' ' + df.DD + ' ' + df.hh + ' ' + df.mm
 88 | 			df.drop(0,inplace=True) #first row is units, so drop them
 89 | 			ind = pd.to_datetime(date_str.drop(0),format="%Y %m %d %H %M")
 90 | 
 91 | 			df.index = ind
 92 | 
 93 | 			#drop useless columns and rename the ones we want
 94 | 			df.drop(['YY','MM','DD','hh','mm'],axis=1,inplace=True)
 95 | 			df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES',
 96 | 				'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']
 97 | 
 98 | 
 99 | 		#before 2006 to 2000
100 | 		else:
101 | 			date_str = df.YYYY.astype('str') + ' ' + df.MM.astype('str') + \
102 | 				' ' + df.DD.astype('str') + ' ' + df.hh.astype('str')
103 | 
104 | 			ind = pd.to_datetime(date_str,format="%Y %m %d %H")
105 | 
106 | 			df.index = ind
107 | 
108 | 			#drop useless columns and rename the ones we want
109 | 			#######################
110 | 			'''FIX MEEEEE!!!!!!!
111 | 			Get rid of the try except
112 | 			some have minute column'''
113 | 
114 | 			#this is hacky and bad
115 | 			try:
116 | 				df.drop(['YYYY','MM','DD','hh','mm'],axis=1,inplace=True)
117 | 				df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES',
118 | 				'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']
119 | 
120 | 			except:
121 | 				df.drop(['YYYY','MM','DD','hh'],axis=1,inplace=True)
122 | 				df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES',
123 | 				'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']
124 | 
125 | 
126 | 		# all data should be floats
127 | 		df = df.astype('float')
128 | 		nvals = [99,999,9999,99.0,999.0,9999.0]
129 | 		df.replace(nvals,np.nan,inplace=True)
130 | 
131 | 		return df
132 | 
133 | ################################################
134 | ################################################
135 | 
136 | class realtime:
137 | 	"""
138 | 	Retrieves the last 45 days worth of data for a specific buoy.
139 | 	Realtime data is formatted a little different from all the other data.
140 | 
141 | 	
142 | 	"""
143 | 
144 | 	def __init__(self, buoy):
145 | 		self.buoy = buoy
146 | 
147 | 	def get_data_spec(self):
148 | 		"""
149 | 		Get the raw spectral wave data from the buoy. The seperation
150 | 		frequency is dropped to keep the data clean.
151 | 
152 | 		Parameters
153 | 		----------
154 | 		buoy : string
155 | 			Buoy number ex: '41013' is off wilmington, nc
156 | 
157 | 		Returns
158 | 		-------
159 | 		df : pandas dataframe (date, frequency)
160 | 			data frame containing the raw spectral data. index is the date
161 | 			and the columns are each of the frequencies
162 | 
163 | 		"""
164 | 
165 | 		params = 'data_spec'
166 | 		base = 'http://www.ndbc.noaa.gov/data/realtime2/'
167 | 		link = base + str(self.buoy) + '.' + params
168 | 
169 | 		#combine the first five date columns YY MM DD hh mm and make index
170 | 		df = pd.read_csv(link,delim_whitespace=True,skiprows=1,header=None,
171 | 			parse_dates=[[0,1,2,3,4]], index_col=0)
172 | 
173 | 
174 | 		#convert the dates to datetimes
175 | 		df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
176 | 
177 | 		specs = df.iloc[:,1::2]
178 | 		freqs = df.iloc[0,2::2]
179 | 
180 | 		specs.columns=freqs
181 | 
182 | 		#remove the parenthesis from the column index
183 | 		specs.columns = [cname.replace('(','').replace(')','') 
184 | 			for cname in specs.columns]
185 | 
186 | 		return specs
187 | 
188 | 
189 | 	def get_ocean(self):
190 | 		"""
191 | 		Retrieve oceanic data. For the buoys explored,
192 | 		O2%, O2PPM, CLCON, TURB, PH, EH were always NaNs
193 | 
194 | 
195 | 		Returns
196 | 		-------
197 | 		df : pandas dataframe
198 | 			Index is the date and columns are:
199 | 			DEPTH	m
200 | 			OTMP	degc
201 | 			COND	mS/cm 
202 | 			SAL 	PSU
203 | 			O2%		%
204 | 			02PPM	ppm
205 | 			CLCON	ug/l
206 | 			TURB	FTU
207 | 			PH 		-
208 | 			EH 		mv
209 | 
210 | 		"""
211 | 
212 | 		params = 'ocean'
213 | 		base = 'http://www.ndbc.noaa.gov/data/realtime2/'
214 | 		link = base + str(self.buoy) + '.' + params
215 | 
216 | 		#combine the first five date columns YY MM DD hh mm and make index
217 | 		df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 
218 | 			parse_dates=[[0,1,2,3,4]], index_col=0)
219 | 
220 | 		#units are in the second row drop them
221 | 		df.drop(df.index[0], inplace=True)
222 | 
223 | 		#convert the dates to datetimes
224 | 		df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
225 | 
226 | 		#convert to floats
227 | 		cols = ['DEPTH','OTMP','COND','SAL']
228 | 		df[cols] = df[cols].astype(float)
229 | 
230 | 		
231 | 		return df
232 | 
233 | 
234 | 	def get_spec(self):
235 | 		"""
236 | 		Get the spectral wave data from the ndbc. Something is wrong with
237 | 		the data for this parameter. The columns seem to change randomly.
238 | 		Refreshing the data page will yield different column names from
239 | 		minute to minute.
240 | 
241 | 		parameters
242 | 		----------
243 | 		buoy : string
244 | 			Buoy number ex: '41013' is off wilmington, nc
245 | 
246 | 		Returns
247 | 		-------
248 | 		df : pandas dataframe
249 | 			data frame containing the spectral data. index is the date
250 | 			and the columns are:
251 | 
252 | 			HO, SwH, SwP, WWH, WWP, SwD, WWD, STEEPNESS, AVP, MWD
253 | 
254 | 			OR
255 | 
256 | 			WVHT  SwH  SwP  WWH  WWP SwD WWD  STEEPNESS  APD MWD
257 | 
258 | 
259 | 		"""
260 | 
261 | 		params = 'spec'
262 | 		base = 'http://www.ndbc.noaa.gov/data/realtime2/'
263 | 		link = base + str(self.buoy) + '.' + params
264 | 
265 | 		#combine the first five date columns YY MM DD hh mm and make index
266 | 		df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 
267 | 		parse_dates=[[0,1,2,3,4]], index_col=0)
268 | 
269 | 		try:
270 | 			#units are in the second row drop them
271 | 			#df.columns = df.columns + '('+ df.iloc[0] + ')'
272 | 			df.drop(df.index[0], inplace=True)
273 | 
274 | 			#convert the dates to datetimes
275 | 			df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
276 | 
277 | 			#convert to floats
278 | 			cols = ['WVHT','SwH','SwP','WWH','WWP','APD','MWD']
279 | 			df[cols] = df[cols].astype(float)
280 | 		except:
281 | 
282 | 			#convert the dates to datetimes
283 | 			df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
284 | 
285 | 			#convert to floats
286 | 			cols = ['H0','SwH','SwP','WWH','WWP','AVP','MWD']
287 | 			df[cols] = df[cols].astype(float)
288 | 			
289 | 
290 | 		return df
291 | 
292 | 
293 | 
294 | 	def get_supl(self):
295 | 		"""
296 | 		Get supplemental data
297 | 
298 | 		Returns
299 | 		-------
300 | 		data frame containing the spectral data. index is the date
301 | 		and the columns are:
302 | 
303 | 		PRES		hpa
304 | 		PTIME		hhmm
305 | 		WSPD		m/s
306 | 		WDIR		degT
307 | 		WTIME		hhmm
308 | 
309 | 
310 | 		"""
311 | 		params = 'supl'
312 | 		base = 'http://www.ndbc.noaa.gov/data/realtime2/'
313 | 		link = base + str(self.buoy) + '.' + params
314 | 
315 | 		#combine the first five date columns YY MM DD hh mm and make index
316 | 		df = pd.read_csv(link, delim_whitespace=True, na_values='MM',
317 | 			parse_dates=[[0,1,2,3,4]], index_col=0)
318 | 
319 | 		#units are in the second row drop them
320 | 		df.drop(df.index[0], inplace=True)
321 | 
322 | 		#convert the dates to datetimes
323 | 		df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
324 | 
325 | 		#convert to floats
326 | 		cols = ['PRES','PTIME','WSPD','WDIR','WTIME']
327 | 		df[cols] = df[cols].astype(float)
328 | 
329 | 		return df
330 | 
331 | 
332 | 	def get_swdir(self):
333 | 		"""
334 | 		Spectral wave data for alpha 1.
335 | 
336 | 		Returns
337 | 		-------
338 | 
339 | 		specs : pandas dataframe
340 | 			Index is the date and the columns are the spectrum. Values in
341 | 			the table indicate how much energy is at each spectrum.
342 | 		"""
343 | 
344 | 
345 | 		params = 'swdir'
346 | 		base = 'http://www.ndbc.noaa.gov/data/realtime2/'
347 | 		link = base + str(self.buoy) + '.' + params
348 | 
349 | 		#combine the first five date columns YY MM DD hh mm and make index
350 | 		df = pd.read_csv(link,delim_whitespace=True,skiprows=1,na_values=999,
351 | 			header=None, parse_dates=[[0,1,2,3,4]], index_col=0)
352 | 
353 | 		#convert the dates to datetimes
354 | 		df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
355 | 
356 | 		specs = df.iloc[:,0::2]
357 | 		freqs = df.iloc[0,1::2]
358 | 
359 | 		specs.columns=freqs
360 | 
361 | 		#remove the parenthesis from the column index
362 | 		specs.columns = [cname.replace('(','').replace(')','') 
363 | 			for cname in specs.columns]
364 | 
365 | 		return specs
366 | 
367 | 	def get_swdir2(self):
368 | 		"""
369 | 		Spectral wave data for alpha 2.
370 | 
371 | 		Returns
372 | 		-------
373 | 
374 | 		specs : pandas dataframe
375 | 			Index is the date and the columns are the spectrum. Values in
376 | 			the table indicate how much energy is at each spectrum.
377 | 		"""
378 | 		params = 'swdir2'
379 | 		base = 'http://www.ndbc.noaa.gov/data/realtime2/'
380 | 		link = base + str(self.buoy) + '.' + params
381 | 
382 | 		#combine the first five date columns YY MM DD hh mm and make index
383 | 		df = pd.read_csv(link,delim_whitespace=True,skiprows=1,
384 | 			header=None, parse_dates=[[0,1,2,3,4]], index_col=0)
385 | 
386 | 		#convert the dates to datetimes
387 | 		df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
388 | 
389 | 		specs = df.iloc[:,0::2]
390 | 		freqs = df.iloc[0,1::2]
391 | 
392 | 		specs.columns=freqs
393 | 
394 | 		#remove the parenthesis from the column index
395 | 		specs.columns = [cname.replace('(','').replace(')','') 
396 | 			for cname in specs.columns]
397 | 
398 | 		return specs
399 | 
400 | 	def get_swr1(self):
401 | 		"""
402 | 		Spectral wave data for r1.
403 | 
404 | 		Returns
405 | 		-------
406 | 
407 | 		specs : pandas dataframe
408 | 			Index is the date and the columns are the spectrum. Values in
409 | 			the table indicate how much energy is at each spectrum.
410 | 		"""
411 | 
412 | 
413 | 		params = 'swr1'
414 | 		base = 'http://www.ndbc.noaa.gov/data/realtime2/'
415 | 		link = base + str(self.buoy) + '.' + params
416 | 
417 | 		#combine the first five date columns YY MM DD hh mm and make index
418 | 		df = pd.read_csv(link,delim_whitespace=True,skiprows=1,
419 | 			header=None, parse_dates=[[0,1,2,3,4]], index_col=0)
420 | 
421 | 		#convert the dates to datetimes
422 | 		df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
423 | 
424 | 		specs = df.iloc[:,0::2]
425 | 		freqs = df.iloc[0,1::2]
426 | 
427 | 		specs.columns=freqs
428 | 
429 | 		#remove the parenthesis from the column index
430 | 		specs.columns = [cname.replace('(','').replace(')','') 
431 | 			for cname in specs.columns]
432 | 
433 | 		return specs
434 | 
435 | 	def get_swr2(self):
436 | 		"""
437 | 		Spectral wave data for r2.
438 | 
439 | 		Returns
440 | 		-------
441 | 
442 | 		specs : pandas dataframe
443 | 			Index is the date and the columns are the spectrum. Values in
444 | 			the table indicate how much energy is at each spectrum.
445 | 		"""
446 | 
447 | 		params = 'swr2'
448 | 		base = 'http://www.ndbc.noaa.gov/data/realtime2/'
449 | 		link = base + str(self.buoy) + '.' + params
450 | 
451 | 		#combine the first five date columns YY MM DD hh mm and make index
452 | 		df = pd.read_csv(link,delim_whitespace=True,skiprows=1,
453 | 			header=None, parse_dates=[[0,1,2,3,4]], index_col=0)
454 | 
455 | 		#convert the dates to datetimes
456 | 		df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
457 | 
458 | 		specs = df.iloc[:,0::2]
459 | 		freqs = df.iloc[0,1::2]
460 | 
461 | 		specs.columns=freqs
462 | 
463 | 		#remove the parenthesis from the column index
464 | 		specs.columns = [cname.replace('(','').replace(')','') 
465 | 			for cname in specs.columns]
466 | 
467 | 		return specs
468 | 
469 | 	def get_txt(self):
470 | 		"""
471 | 		Retrieve standard Meteorological data. NDBC seems to be updating
472 | 		the data with different column names, so this metric can return 
473 | 		two possible data frames with different column names:
474 | 
475 | 		Returns
476 | 		-------
477 | 
478 | 		df : pandas dataframe
479 | 			Index is the date and the columns can be:
480 | 
481 | 			['WDIR','WSPD','GST','WVHT','DPD','APD','MWD',
482 | 			'PRES','ATMP','WTMP','DEWP','VIS','PTDY','TIDE']
483 | 
484 | 			or
485 | 
486 | 			['WD','WSPD','GST','WVHT','DPD','APD','MWD','BARO',
487 | 			'ATMP','WTMP','DEWP','VIS','PTDY','TIDE']
488 | 
489 | 		"""
490 | 
491 | 		params = 'txt'
492 | 		base = 'http://www.ndbc.noaa.gov/data/realtime2/'
493 | 		link = base + str(self.buoy) + '.' + params
494 | 
495 | 		#combine the first five date columns YY MM DD hh mm and make index
496 | 		df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 
497 | 			parse_dates=[[0,1,2,3,4]], index_col=0)
498 | 
499 | 		try:
500 | 			#first column is units, so drop it
501 | 			df.drop(df.index[0], inplace=True)
502 | 			#convert the dates to datetimes
503 | 			df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
504 | 
505 | 			#convert to floats
506 | 			cols = ['WDIR','WSPD','GST','WVHT','DPD','APD','MWD',
507 | 			'PRES','ATMP','WTMP','DEWP','VIS','PTDY','TIDE']
508 | 			df[cols] = df[cols].astype(float)
509 | 		except:
510 | 
511 | 			#convert the dates to datetimes
512 | 			df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M")
513 | 
514 | 			#convert to floats
515 | 			cols = ['WD','WSPD','GST','WVHT','DPD','APD','MWD','BARO',
516 | 			'ATMP','WTMP','DEWP','VIS','PTDY','TIDE']
517 | 			df[cols] = df[cols].astype(float)
518 | 		return df
519 | 
520 | ################################################
521 | ################################################
522 | 
523 | class get_months(formatter):
524 | 	"""
525 | 	Before a year is complete ndbc stores there data monthly.
526 | 	This class will get all that scrap data.
527 | 	"""
528 | 
529 | 	def __init__(self, buoy, year=None):
530 | 		self.buoy = buoy
531 | 		self.year = year
532 | 
533 | 	def get_stand_meteo(self):
534 | 		#see what is on the NDBC so we only pull the years that are available
535 | 		links = []
536 | 
537 | 		#need to also retrieve jan, feb, march, etc.
538 | 		month = ['Jan','Feb','Mar','Apr','May','Jun',
539 | 				'Jul','Aug','Sep','Oct','Nov','Dec']
540 | 		k = [1,2,3,4,5,6,7,8,9,'a','b','c'] #for the links
541 | 
542 | 		#NDBC sometimes lags the new months in january and feb
543 | 		#Might need to define a year on init
544 | 		if not self.year:
545 | 			self.year = str(datetime.date.today().year)
546 | 
547 | 			if datetime.date.month <= 2:
548 | 				print "using" + self.year + "to get the months. Might be wrong!"
549 | 
550 | 		#for contstructing links
551 | 		base = 'http://www.ndbc.noaa.gov/view_text_file.php?filename='
552 | 		base2 = 'http://www.ndbc.noaa.gov/data/stdmet/'
553 | 		mid = '.txt.gz&dir=data/stdmet/'
554 | 
555 | 		for ii in range(len(month)):
556 | 			
557 | 			#links can come in 2 formats
558 | 			link = base + str(self.buoy) + str(k[ii]) + self.year + mid + str(month[ii]) +'/'
559 | 			link2 = base2 + month[ii] + '/' + str(self.buoy) + '.txt'
560 | 			
561 | 			try:
562 | 				urllib2.urlopen(link)
563 | 				links.append(link)
564 | 
565 | 			except:
566 | 				print(str(month[ii]) + '2015' + ' not in records')
567 | 				print link
568 | 
569 | 			#need to try the second link
570 | 			try: 
571 | 				urllib2.urlopen(link2)
572 | 				links.append(link2)
573 | 				print(link2 + 'was found in records')
574 | 			except:
575 | 				pass
576 | 
577 | 
578 | 		# start grabbing some data
579 | 		df=pd.DataFrame() 
580 | 
581 | 		for L in links:
582 | 			self.link=L
583 | 			new_df = self.format_stand_meteo()
584 | 			print 'Link : ' + L
585 | 			df = df.append(new_df)
586 | 
587 | 		return df
588 | 
589 | ################################################
590 | ################################################
591 | 
592 | class get_historic(formatter):
593 | 
594 | 	def __init__(self, buoy, year,year_range=None):
595 | 		self.buoy = buoy
596 | 		self.year = year
597 | 
598 | 	def hist_stand_meteo(self,link = None):
599 | 		'''
600 | 		Standard Meteorological Data. Data header was changed in 2007. Thus
601 | 		the need for the if statement below.
602 | 
603 | 
604 | 
605 | 		WDIR	Wind direction (degrees clockwise from true N)
606 | 		WSPD	Wind speed (m/s) averaged over an eight-minute period 
607 | 		GST		Peak 5 or 8 second gust speed (m/s) 
608 | 		WVHT	Significant wave height (meters) is calculated as 
609 | 				the average of the highest one-third of all of the 
610 | 				wave heights during the 20-minute sampling period. 
611 | 		DPD		Dominant wave period (seconds) is the period with the maximum wave energy.
612 | 		APD		Average wave period (seconds) of all waves during the 20-minute period. 
613 | 		MWD		The direction from which the waves at the dominant period (DPD) are coming. 
614 | 				(degrees clockwise from true N)
615 | 		PRES	Sea level pressure (hPa). 
616 | 		ATMP	Air temperature (Celsius). 
617 | 		WTMP	Sea surface temperature (Celsius). 
618 | 		DEWP	Dewpoint temperature 
619 | 		VIS		Station visibility (nautical miles). 
620 | 		PTDY	Pressure Tendency 
621 | 		TIDE	The water level in feet above or below Mean Lower Low Water (MLLW).
622 | 		'''
623 | 
624 | 
625 | 		if not link:
626 | 			base = 'http://www.ndbc.noaa.gov/view_text_file.php?filename='
627 | 			link = base + str(self.buoy) + 'h' + str(self.year) + '.txt.gz&dir=data/historical/stdmet/'
628 | 
629 | 		#combine the first five date columns YY MM DD hh and make index
630 | 		df = pd.read_csv(link,delim_whitespace=True,na_values=[99,999,9999,99.,999.,9999.])
631 | 
632 | 		#2007 and on format
633 | 		if df.iloc[0,0] =='#yr':
634 | 
635 | 
636 | 			df = df.rename(columns={'#YY': 'YY'}) #get rid of hash
637 | 
638 | 			#make the indices
639 | 			date_str = df.YY + ' ' + df.MM+ ' ' + df.DD + ' ' + df.hh + ' ' + df.mm
640 | 			df.drop(0,inplace=True) #first row is units, so drop them
641 | 			ind = pd.to_datetime(date_str.drop(0),format="%Y %m %d %H %M")
642 | 
643 | 			df.index = ind
644 | 
645 | 			#drop useless columns and rename the ones we want
646 | 			df.drop(['YY','MM','DD','hh','mm'],axis=1,inplace=True)
647 | 			df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES',
648 | 				'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']
649 | 
650 | 
651 | 		#before 2006 to 2000
652 | 		else:
653 | 			date_str = df.YYYY.astype('str') + ' ' + df.MM.astype('str') + \
654 | 				' ' + df.DD.astype('str') + ' ' + df.hh.astype('str')
655 | 
656 | 			ind = pd.to_datetime(date_str,format="%Y %m %d %H")
657 | 
658 | 			df.index = ind
659 | 
660 | 			#drop useless columns and rename the ones we want
661 | 			#######################
662 | 			'''FIX MEEEEE!!!!!!!
663 | 			Get rid of the try except
664 | 			some have minute column'''
665 | 
666 | 			#this is hacky and bad
667 | 			try:
668 | 				df.drop(['YYYY','MM','DD','hh','mm'],axis=1,inplace=True)
669 | 				df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES',
670 | 				'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']
671 | 
672 | 			except:
673 | 				df.drop(['YYYY','MM','DD','hh'],axis=1,inplace=True)
674 | 				df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES',
675 | 				'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']
676 | 
677 | 
678 | 		# all data should be floats
679 | 		df = df.astype('float')
680 | 		nvals = [99,999,9999,99.0,999.0,9999.0]
681 | 		df.replace(nvals,np.nan,inplace=True)
682 | 
683 | 		return df
684 | 
685 | ################################################
686 | ################################################
687 | 
688 | 
689 | class makecall(get_historic,get_months):
690 | 
691 | 	def __init__(self,year_range):
692 | 		self.year_range = year_range
693 | 
694 | 
695 | 	def get_all_stand_meteo(self):
696 | 		"""
697 | 		Retrieves all the standard meterological data. Calls get_stand_meteo.
698 | 		It also checks to make sure that the years that were requested are
699 | 		available. Data is not available for the same years at all the buoys.
700 | 
701 | 		Returns
702 | 		-------
703 | 		df : pandas dataframe
704 | 			Contains all the data from all the years that were specified
705 | 			in year_range.
706 | 		"""
707 | 
708 | 		start_yr,stop_yr = self.year_range
709 | 
710 | 		#see what is on the NDBC so we only pull the years that are available
711 | 		links = []
712 | 		for ii in range(start,stop+1):
713 | 
714 | 			base = 'http://www.ndbc.noaa.gov/view_text_file.php?filename='
715 | 			end = '.txt.gz&dir=data/historical/stdmet/'
716 | 			link = base + str(self.buoy) + 'h' + str(ii) + end
717 | 
718 | 			try:
719 | 				urllib2.urlopen(link)
720 | 				links.append(link)
721 | 
722 | 			except:
723 | 				print(str(ii) + ' not in records')
724 | 
725 | 		#need to also retrieve jan, feb, march, etc.
726 | 		month = ['Jan','Feb','Mar','Apr','May','Jun',
727 | 			'Jul','Aug','Sep','Oct','Nov','Dec']
728 | 		k = [1,2,3,4,5,6,7,8,9,'a','b','c'] #for the links
729 | 
730 | 		for ii in range(len(month)):
731 | 			mid = '.txt.gz&dir=data/stdmet/'
732 | 			link = base + str(self.buoy) + str(k[ii]) + '2015' + mid + str(month[ii]) +'/'
733 | 
734 | 			try:
735 | 				urllib2.urlopen(link)
736 | 				links.append(link)
737 | 
738 | 			except:
739 | 				print(str(month[ii]) + '2015' + ' not in records')
740 | 				print link
741 | 
742 | 
743 | 		# start grabbing some data
744 | 		df=pd.DataFrame() #initialize empty df
745 | 
746 | 		for L in links:
747 | 
748 | 			new_df = self.get_stand_meteo(link=L)
749 | 			print 'Link : ' + L
750 | 			df = df.append(new_df)
751 | 
752 | 		return df
753 | 
754 | 
755 | 
756 | 
757 | 
758 | 
759 | 
760 | 
761 | 
762 | 
763 | 
764 | 
765 | 
766 | 
767 | 
768 | 
769 | #


--------------------------------------------------------------------------------
/figures/historic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickc1/buoypy/64912437e2b07dfaccddb6cd4d53b66c168a97b8/figures/historic.png


--------------------------------------------------------------------------------
/figures/historic_range.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickc1/buoypy/64912437e2b07dfaccddb6cd4d53b66c168a97b8/figures/historic_range.png


--------------------------------------------------------------------------------
/figures/realtime.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nickc1/buoypy/64912437e2b07dfaccddb6cd4d53b66c168a97b8/figures/realtime.png


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | setup(
 4 | 	name='buoypy',
 5 | 	author='Nick Cortale',
 6 | 	version='0.0.1',
 7 | 	description='buoypy scrapes the National Data Buoy Center and dumps it into pandas dataframes.',
 8 | 	packages=['buoypy']
 9 | 
10 | )


--------------------------------------------------------------------------------