├── .gitignore ├── LICENSE ├── README.md ├── buoypy ├── __init__.py ├── buoypy.py └── get_data.py ├── figures ├── historic.png ├── historic_range.png └── realtime.png ├── scripts └── buoy_data_analysis.ipynb └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store 2 | *.pyc 3 | *.db 4 | *.egg-info/ 5 | *.zip 6 | *.ipynb_checkpoints 7 | __pycache__/ 8 | .ipynb_checkpoints 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Ariel Rokem, The University of Washington eScience Institute 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DEPRECATED: See [seebuoy](https://github.com/nickc1/seebuoy) for NDBC data. 2 | 3 | 4 | buoypy 5 | ======== 6 | 7 | Functions to query the [NDBC](http://www.ndbc.noaa.gov/). 8 | 9 | Returns pandas dataframes for the wave parameters. 10 | 11 | Data descriptions - [Link][http://www.ndbc.noaa.gov/measdes.shtml] 12 | 13 | 14 | # Real Time - Last 45 Days 15 | 16 | The real time data for all of their buoys can be found at: 17 | http://www.ndbc.noaa.gov/data/realtime2/ 18 | 19 | The realtime data provided is : 20 | 21 | |File |Description 22 | |-------------|----------------------- 23 | |.data_spec | Raw Spectral Wave Data 24 | |.spec | Spectral Wave Summary Data 25 | |.swdir | Spectral Wave Data (alpha1) 26 | |.swdir2 | Spectral Wave Data (alpha2) 27 | |.swr1 | Spectral Wave Data (r1) 28 | |.swr2 | Spectral Wave Data (r2) 29 | |.txt | Standard Meteorological Data 30 | 31 | The data headers for each of the files are provided below. 32 | 33 | ##### .data_spec 34 | 35 | |YY |MM |DD |hh |mm |Sep_Freq |spec_1 |(freq_1) |spec_2 |(freq_2) |spec_3 |(freq_3) |... 36 | |---|---|---|---|---|---------|-------|---------|-------|---------|-------|---------|--- 37 | 38 | ##### .spec 39 | 40 | |YY |MM |DD |hh |mm |WVHT |SwH |SwP |WWH |WWP |SwD |WWD |STEEPNESS |APD |MWD 41 | |---|---|---|---|---|-----|-----|-----|-----|-----|-----|-----|-----------|-----|--- 42 | |yr |mo |dy |hr |mn |m |m |sec |m |sec |- |degT | - |sec |degT 43 | 44 | ##### .swdir 45 | 46 | |YY |MM |DD |hh |mm |alpha1_1 |(freq_1) |alpha1_2 |(freq_2) |alpha1_3 |(freq_3) |... 47 | |---|---|---|---|---|---------|---------|---------|---------|---------|---------|--- 48 | 49 | ##### .swdir2 50 | 51 | |YY |MM |DD |hh |mm |alpha2_1 |(freq_1) |alpha2_2 |(freq_2) |alpha2_3 |(freq_3) |... 52 | |---|---|---|---|---|---------|---------|---------|---------|---------|---------|--- 53 | 54 | ##### .swr1 55 | 56 | |YY |MM |DD |hh |mm |r1_1 |(freq_1) |r1_2 |(freq_2) |r1_3 |(freq_3) |... 57 | |---|---|---|---|---|-----|---------|-----|---------|-----|---------|--- 58 | 59 | ##### .swr2 60 | 61 | |YY |MM |DD |hh |mm |r2_1 |(freq_1) |r2_2 |(freq_2) |r2_3 |(freq_3) |... 62 | |---|---|---|---|---|-----|---------|-----|---------|-----|---------|--- 63 | 64 | 65 | ##### .txt 66 | 67 | |YY |MM |DD |hh |mm |WDIR |WSPD |GST |WVHT |DPD |APD |MWD |PRES |ATMP |WTMP |DEWP |VIS |PTDY |TIDE 68 | |---|---|---|---|---|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|---- 69 | |yr |mo |dy |hr |mn |degT |m/s |m/s |m |sec |sec |degT |hPa |degC |degC |degC |nmi |hPa |ft 70 | 71 | 72 | 73 | | Method | Description | 74 | | ----------- |------------------------------ | 75 | | data_spec | raw spectral wave data | 76 | | get_spec | spectral wave summaries | 77 | | get_swdir | spectral wave data (alpha1) | 78 | | get_swdir2 | spectral wave data (alpha2) | 79 | | get_swr1 | spectral wave data (r1) | 80 | | get_swr2 | spectral wave data (r2) | 81 | | get_txt | standard meteorological data | 82 | 83 | 84 | #Examples 85 | 86 | ```python 87 | import buoypy as bp 88 | 89 | buoy = 41108 #wilmington harbor 90 | B = bp.realtime(buoy) #wilmington harbor 91 | 92 | df = B.txt() 93 | 94 | # plotting 95 | fig,ax = plt.subplots(2,sharex=True) 96 | df.WVHT.plot(ax=ax[0]) 97 | ax[0].set_ylabel('Wave Height (m)',fontsize=14) 98 | 99 | df.DPD.plot(ax=ax[1]) 100 | ax[1].set_ylabel('Dominant Period (sec)',fontsize=14) 101 | ax[1].set_xlabel('') 102 | sns.despine() 103 | ``` 104 | 105 | ![bouypy realtime](/figures/realtime.png) 106 | 107 | 108 | # Historic Data - All information from a buoy 109 | 110 | All buoys have different years that they are online. This aims to grab all the available data. Currently only grabbing the standard Meteorological data is implemented. 111 | 112 | The historic data provided is: 113 | 114 | |Description 115 | |----------------------- 116 | | Standard Meteorological Data 117 | 118 | 119 | |YY |MM |DD |hh |mm |WDIR |WSPD |GST |WVHT |DPD |APD |MWD |PRES |ATMP |WTMP |DEWP |VIS |TIDE 120 | |---|---|---|---|---|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|---- 121 | |yr |mo |dy |hr |mn |degT |m/s |m/s |m |sec |sec |degT |hPa |degC |degC |degC |nmi |ft 122 | 123 | 124 | 125 | ```python 126 | import buoypy as bp 127 | 128 | buoy = 41108 129 | year = 2014 130 | 131 | H = bp.historic_data(buoy,year) 132 | df = H.get_stand_meteo() 133 | 134 | # plotting 135 | fig,ax = plt.subplots(2,sharex=True) 136 | df.WVHT.plot(ax=ax[0]) 137 | ax[0].set_ylabel('Wave Height (m)',fontsize=14) 138 | 139 | df.DPD.plot(ax=ax[1]) 140 | ax[1].set_ylabel('Dominant Period (sec)',fontsize=14) 141 | ax[1].set_xlabel('') 142 | sns.despine() 143 | ``` 144 | 145 | ![bouypy historic](/figures/historic.png) 146 | 147 | Notice that the buoy went offline from the end of April, 2014 to mid August, 2014. 148 | 149 | 150 | # Historic Range - Grab data from a range of years 151 | 152 | ```python 153 | 154 | import buoypy as bp 155 | buoy = 41108 156 | year = np.NAN 157 | year_range = (2010,2018) 158 | 159 | H = bp.historic_data(buoy,year,year_range) 160 | X = H.get_all_stand_meteo() 161 | 162 | #plotting 163 | fig,ax = plt.subplots(2,sharex=True) 164 | X.WVHT.plot(ax=ax[0]) 165 | ax[0].set_ylabel('Wave Height (m)',fontsize=14) 166 | X.DPD.plot(ax=ax[1]) 167 | ax[1].set_ylabel('Dominant Period (sec)',fontsize=14) 168 | ax[1].set_xlabel('') 169 | sns.despine() 170 | ``` 171 | 172 | ![bouypy historic range](/figures/historic_range.png) 173 | -------------------------------------------------------------------------------- /buoypy/__init__.py: -------------------------------------------------------------------------------- 1 | from .buoypy import * -------------------------------------------------------------------------------- /buoypy/buoypy.py: -------------------------------------------------------------------------------- 1 | """ 2 | By Nick Cortale 3 | nickc1.github.io 4 | 5 | Functions to query the NDBC (http://www.ndbc.noaa.gov/). 6 | 7 | The realtime data for all of their buoys can be found at: 8 | http://www.ndbc.noaa.gov/data/realtime2/ 9 | 10 | Info about all of noaa data can be found at: 11 | http://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf 12 | 13 | What all the values mean: 14 | http://www.ndbc.noaa.gov/measdes.shtml 15 | 16 | Each buoy has the data: 17 | 18 | File Parameters 19 | ---- ---------- 20 | .data_spec Raw Spectral Wave Data 21 | .ocean Oceanographic Data 22 | .spec Spectral Wave Summary Data 23 | .supl Supplemental Measurements Data 24 | .swdir Spectral Wave Data (alpha1) 25 | .swdir2 Spectral Wave Data (alpha2) 26 | .swr1 Spectral Wave Data (r1) 27 | .swr2 Spectral Wave Data (r2) 28 | .txt Standard Meteorological Data 29 | 30 | 31 | 32 | Example: 33 | import buoypy as bp 34 | 35 | # Get the last 45 days of data 36 | rt = bp.realtime(41013) #frying pan shoals buoy 37 | ocean_data = rt.get_ocean() #get Oceanographic data 38 | 39 | wave_data.head() 40 | 41 | Out[7]: 42 | WVHT SwH SwP WWH WWP SwD WWD STEEPNESS APD MWD 43 | 2016-02-04 17:42:00 1.6 1.3 7.1 0.9 4.5 S S STEEP 5.3 169 44 | 2016-02-04 16:42:00 1.7 1.5 7.7 0.9 5.0 S S STEEP 5.4 174 45 | 2016-02-04 15:41:00 2.0 0.0 NaN 2.0 7.1 NaN S STEEP 5.3 174 46 | 2016-02-04 14:41:00 2.0 1.2 7.7 1.5 5.9 SSE SSE STEEP 5.5 167 47 | 2016-02-04 13:41:00 2.0 1.7 7.1 0.9 4.8 S SSE STEEP 5.7 175 48 | 49 | 50 | 51 | TODO: 52 | Make functions with except statements always spit out the same 53 | column headings. 54 | 55 | """ 56 | 57 | import pandas as pd 58 | import numpy as np 59 | import datetime 60 | 61 | class realtime: 62 | 63 | def __init__(self, buoy): 64 | 65 | self.link = 'http://www.ndbc.noaa.gov/data/realtime2/{}'.format(buoy) 66 | 67 | def data_spec(self): 68 | """ 69 | Get the raw spectral wave data from the buoy. The seperation 70 | frequency is dropped to keep the data clean. 71 | 72 | Parameters 73 | ---------- 74 | buoy : string 75 | Buoy number ex: '41013' is off wilmington, nc 76 | 77 | Returns 78 | ------- 79 | df : pandas dataframe (date, frequency) 80 | data frame containing the raw spectral data. index is the date 81 | and the columns are each of the frequencies 82 | 83 | """ 84 | 85 | link = "{}.{}".format(self.link, 'data_spec') 86 | 87 | #combine the first five date columns YY MM DD hh mm and make index 88 | df = pd.read_csv(link, delim_whitespace=True, skiprows=1, header=None, 89 | parse_dates=[[0,1,2,3,4]], index_col=0) 90 | 91 | 92 | #convert the dates to datetimes 93 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 94 | 95 | specs = df.iloc[:,1::2] 96 | freqs = df.iloc[0,2::2] 97 | 98 | specs.columns=freqs 99 | 100 | #remove the parenthesis from the column index 101 | specs.columns = [cname.replace('(','').replace(')','') 102 | for cname in specs.columns] 103 | 104 | return specs 105 | 106 | 107 | def ocean(self): 108 | """ 109 | Retrieve oceanic data. For the buoys explored, 110 | O2%, O2PPM, CLCON, TURB, PH, EH were always NaNs 111 | 112 | 113 | Returns 114 | ------- 115 | df : pandas dataframe 116 | Index is the date and columns are: 117 | DEPTH m 118 | OTMP degc 119 | COND mS/cm 120 | SAL PSU 121 | O2% % 122 | 02PPM ppm 123 | CLCON ug/l 124 | TURB FTU 125 | PH - 126 | EH mv 127 | 128 | """ 129 | 130 | link = "{}.{}".format(self.link, 'ocean') 131 | 132 | #combine the first five date columns YY MM DD hh mm and make index 133 | df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 134 | parse_dates=[[0,1,2,3,4]], index_col=0) 135 | 136 | #units are in the second row drop them 137 | df.drop(df.index[0], inplace=True) 138 | 139 | #convert the dates to datetimes 140 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 141 | 142 | #convert to floats 143 | cols = ['DEPTH','OTMP','COND','SAL'] 144 | df[cols] = df[cols].astype(float) 145 | 146 | 147 | return df 148 | 149 | 150 | def spec(self): 151 | """ 152 | Get the spectral wave data from the ndbc. Something is wrong with 153 | the data for this parameter. The columns seem to change randomly. 154 | Refreshing the data page will yield different column names from 155 | minute to minute. 156 | 157 | parameters 158 | ---------- 159 | buoy : string 160 | Buoy number ex: '41013' is off wilmington, nc 161 | 162 | Returns 163 | ------- 164 | df : pandas dataframe 165 | data frame containing the spectral data. index is the date 166 | and the columns are: 167 | 168 | HO, SwH, SwP, WWH, WWP, SwD, WWD, STEEPNESS, AVP, MWD 169 | 170 | OR 171 | 172 | WVHT SwH SwP WWH WWP SwD WWD STEEPNESS APD MWD 173 | 174 | 175 | """ 176 | 177 | link = "{}.{}".format(self.link, 'spec') 178 | 179 | #combine the first five date columns YY MM DD hh mm and make index 180 | df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 181 | parse_dates=[[0,1,2,3,4]], index_col=0) 182 | 183 | try: 184 | #units are in the second row drop them 185 | #df.columns = df.columns + '('+ df.iloc[0] + ')' 186 | df.drop(df.index[0], inplace=True) 187 | 188 | #convert the dates to datetimes 189 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 190 | 191 | #convert to floats 192 | cols = ['WVHT','SwH','SwP','WWH','WWP','APD','MWD'] 193 | df[cols] = df[cols].astype(float) 194 | except: 195 | 196 | #convert the dates to datetimes 197 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 198 | 199 | #convert to floats 200 | cols = ['H0','SwH','SwP','WWH','WWP','AVP','MWD'] 201 | df[cols] = df[cols].astype(float) 202 | 203 | 204 | return df 205 | 206 | 207 | 208 | def supl(self): 209 | """ 210 | Get supplemental data 211 | 212 | Returns 213 | ------- 214 | data frame containing the spectral data. index is the date 215 | and the columns are: 216 | 217 | PRES hpa 218 | PTIME hhmm 219 | WSPD m/s 220 | WDIR degT 221 | WTIME hhmm 222 | 223 | 224 | """ 225 | 226 | link = "{}.{}".format(self.link, 'supl') 227 | 228 | #combine the first five date columns YY MM DD hh mm and make index 229 | df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 230 | parse_dates=[[0,1,2,3,4]], index_col=0) 231 | 232 | #units are in the second row drop them 233 | df.drop(df.index[0], inplace=True) 234 | 235 | #convert the dates to datetimes 236 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 237 | 238 | #convert to floats 239 | cols = ['PRES','PTIME','WSPD','WDIR','WTIME'] 240 | df[cols] = df[cols].astype(float) 241 | 242 | return df 243 | 244 | 245 | def swdir(self): 246 | """ 247 | Spectral wave data for alpha 1. 248 | 249 | Returns 250 | ------- 251 | 252 | specs : pandas dataframe 253 | Index is the date and the columns are the spectrum. Values in 254 | the table indicate how much energy is at each spectrum. 255 | """ 256 | 257 | 258 | link = "{}.{}".format(self.link, 'swdir') 259 | 260 | #combine the first five date columns YY MM DD hh mm and make index 261 | df = pd.read_csv(link,delim_whitespace=True,skiprows=1,na_values=999, 262 | header=None, parse_dates=[[0,1,2,3,4]], index_col=0) 263 | 264 | #convert the dates to datetimes 265 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 266 | 267 | specs = df.iloc[:,0::2] 268 | freqs = df.iloc[0,1::2] 269 | 270 | specs.columns=freqs 271 | 272 | #remove the parenthesis from the column index 273 | specs.columns = [cname.replace('(','').replace(')','') 274 | for cname in specs.columns] 275 | 276 | return specs 277 | 278 | def swdir2(self): 279 | """ 280 | Spectral wave data for alpha 2. 281 | 282 | Returns 283 | ------- 284 | 285 | specs : pandas dataframe 286 | Index is the date and the columns are the spectrum. Values in 287 | the table indicate how much energy is at each spectrum. 288 | """ 289 | 290 | link = "{}.{}".format(self.link, 'swdir2') 291 | 292 | #combine the first five date columns YY MM DD hh mm and make index 293 | df = pd.read_csv(link,delim_whitespace=True,skiprows=1, 294 | header=None, parse_dates=[[0,1,2,3,4]], index_col=0) 295 | 296 | #convert the dates to datetimes 297 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 298 | 299 | specs = df.iloc[:,0::2] 300 | freqs = df.iloc[0,1::2] 301 | 302 | specs.columns=freqs 303 | 304 | #remove the parenthesis from the column index 305 | specs.columns = [cname.replace('(','').replace(')','') 306 | for cname in specs.columns] 307 | 308 | return specs 309 | 310 | def swr1(self): 311 | """ 312 | Spectral wave data for r1. 313 | 314 | Returns 315 | ------- 316 | 317 | specs : pandas dataframe 318 | Index is the date and the columns are the spectrum. Values in 319 | the table indicate how much energy is at each spectrum. 320 | """ 321 | 322 | 323 | 324 | link = "{}.{}".format(self.link, 'swr1') 325 | #combine the first five date columns YY MM DD hh mm and make index 326 | df = pd.read_csv(link,delim_whitespace=True,skiprows=1, 327 | header=None, parse_dates=[[0,1,2,3,4]], index_col=0) 328 | 329 | #convert the dates to datetimes 330 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 331 | 332 | specs = df.iloc[:,0::2] 333 | freqs = df.iloc[0,1::2] 334 | 335 | specs.columns=freqs 336 | 337 | #remove the parenthesis from the column index 338 | specs.columns = [cname.replace('(','').replace(')','') 339 | for cname in specs.columns] 340 | 341 | return specs 342 | 343 | def swr2(self): 344 | """ 345 | Spectral wave data for r2. 346 | 347 | Returns 348 | ------- 349 | 350 | specs : pandas dataframe 351 | Index is the date and the columns are the spectrum. Values in 352 | the table indicate how much energy is at each spectrum. 353 | """ 354 | 355 | 356 | link = "{}.{}".format(self.link, 'swr2') 357 | 358 | #combine the first five date columns YY MM DD hh mm and make index 359 | df = pd.read_csv(link,delim_whitespace=True,skiprows=1, 360 | header=None, parse_dates=[[0,1,2,3,4]], index_col=0) 361 | 362 | #convert the dates to datetimes 363 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 364 | 365 | specs = df.iloc[:,0::2] 366 | freqs = df.iloc[0,1::2] 367 | 368 | specs.columns=freqs 369 | 370 | #remove the parenthesis from the column index 371 | specs.columns = [cname.replace('(','').replace(')','') 372 | for cname in specs.columns] 373 | 374 | return specs 375 | 376 | def txt(self): 377 | """ 378 | Retrieve standard Meteorological data. NDBC seems to be updating 379 | the data with different column names, so this metric can return 380 | two possible data frames with different column names: 381 | 382 | Returns 383 | ------- 384 | 385 | df : pandas dataframe 386 | Index is the date and the columns can be: 387 | 388 | ['WDIR','WSPD','GST','WVHT','DPD','APD','MWD', 389 | 'PRES','ATMP','WTMP','DEWP','VIS','PTDY','TIDE'] 390 | 391 | or 392 | 393 | ['WD','WSPD','GST','WVHT','DPD','APD','MWD','BARO', 394 | 'ATMP','WTMP','DEWP','VIS','PTDY','TIDE'] 395 | 396 | """ 397 | 398 | link = "{}.{}".format(self.link, 'txt') 399 | #combine the first five date columns YY MM DD hh mm and make index 400 | df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 401 | parse_dates=[[0,1,2,3,4]], index_col=0) 402 | 403 | try: 404 | #first column is units, so drop it 405 | df.drop(df.index[0], inplace=True) 406 | #convert the dates to datetimes 407 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 408 | 409 | #convert to floats 410 | cols = ['WDIR','WSPD','GST','WVHT','DPD','APD','MWD', 411 | 'PRES','ATMP','WTMP','DEWP','VIS','PTDY','TIDE'] 412 | df[cols] = df[cols].astype(float) 413 | except: 414 | 415 | #convert the dates to datetimes 416 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 417 | 418 | #convert to floats 419 | cols = ['WD','WSPD','GST','WVHT','DPD','APD','MWD','BARO', 420 | 'ATMP','WTMP','DEWP','VIS','PTDY','TIDE'] 421 | df[cols] = df[cols].astype(float) 422 | df.index.name='Date' 423 | return df 424 | 425 | ################################################ 426 | ################################################ 427 | 428 | class historic_data: 429 | 430 | def __init__(self, buoy, year, year_range=None): 431 | 432 | link = 'http://www.ndbc.noaa.gov/view_text_file.php?filename=' 433 | link += '{}h{}.txt.gz&dir=data/historical/'.format(buoy, year) 434 | self.link = link 435 | 436 | def get_stand_meteo(self,link = None): 437 | ''' 438 | Standard Meteorological Data. Data header was changed in 2007. Thus 439 | the need for the if statement below. 440 | 441 | 442 | 443 | WDIR Wind direction (degrees clockwise from true N) 444 | WSPD Wind speed (m/s) averaged over an eight-minute period 445 | GST Peak 5 or 8 second gust speed (m/s) 446 | WVHT Significant wave height (meters) is calculated as 447 | the average of the highest one-third of all of the 448 | wave heights during the 20-minute sampling period. 449 | DPD Dominant wave period (seconds) is the period with the maximum wave energy. 450 | APD Average wave period (seconds) of all waves during the 20-minute period. 451 | MWD The direction from which the waves at the dominant period (DPD) are coming. 452 | (degrees clockwise from true N) 453 | PRES Sea level pressure (hPa). 454 | ATMP Air temperature (Celsius). 455 | WTMP Sea surface temperature (Celsius). 456 | DEWP Dewpoint temperature 457 | VIS Station visibility (nautical miles). 458 | PTDY Pressure Tendency 459 | TIDE The water level in feet above or below Mean Lower Low Water (MLLW). 460 | ''' 461 | 462 | link = self.link + 'stdmet/' 463 | 464 | #combine the first five date columns YY MM DD hh and make index 465 | df = pd.read_csv(link, header=0, delim_whitespace=True, dtype=object, 466 | na_values=[99,999,9999,99.,999.,9999.]) 467 | 468 | 469 | #2007 and on format 470 | if df.iloc[0,0] =='#yr': 471 | 472 | 473 | df = df.rename(columns={'#YY': 'YY'}) #get rid of hash 474 | 475 | #make the indices 476 | 477 | df.drop(0, inplace=True) #first row is units, so drop them 478 | 479 | d = df.YY + ' ' + df.MM+ ' ' + df.DD + ' ' + df.hh + ' ' + df.mm 480 | ind = pd.to_datetime(d, format="%Y %m %d %H %M") 481 | 482 | df.index = ind 483 | 484 | #drop useless columns and rename the ones we want 485 | df.drop(['YY','MM','DD','hh','mm'], axis=1, inplace=True) 486 | df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 487 | 'PRES', 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE'] 488 | 489 | 490 | #before 2006 to 2000 491 | else: 492 | date_str = df.YYYY + ' ' + df.MM + ' ' + df.DD + ' ' + df.hh 493 | 494 | ind = pd.to_datetime(date_str,format="%Y %m %d %H") 495 | 496 | df.index = ind 497 | 498 | #some data has a minute column. Some doesn't. 499 | 500 | if 'mm' in df.columns: 501 | df.drop(['YYYY','MM','DD','hh','mm'], axis=1, inplace=True) 502 | else: 503 | df.drop(['YYYY','MM','DD','hh'], axis=1, inplace=True) 504 | 505 | 506 | df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 507 | 'MWD', 'PRES', 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE'] 508 | 509 | 510 | # all data should be floats 511 | df = df.astype('float') 512 | 513 | return df 514 | 515 | def get_all_stand_meteo(self): 516 | """ 517 | Retrieves all the standard meterological data. Calls get_stand_meteo. 518 | It also checks to make sure that the years that were requested are 519 | available. Data is not available for the same years at all the buoys. 520 | 521 | Returns 522 | ------- 523 | df : pandas dataframe 524 | Contains all the data from all the years that were specified 525 | in year_range. 526 | """ 527 | 528 | start,stop = self.year_range 529 | 530 | #see what is on the NDBC so we only pull the years that are available 531 | links = [] 532 | for ii in range(start,stop+1): 533 | 534 | base = 'http://www.ndbc.noaa.gov/view_text_file.php?filename=' 535 | end = '.txt.gz&dir=data/historical/stdmet/' 536 | link = base + str(self.buoy) + 'h' + str(ii) + end 537 | 538 | try: 539 | urllib2.urlopen(link) 540 | links.append(link) 541 | 542 | except: 543 | print(str(ii) + ' not in records') 544 | 545 | #need to also retrieve jan, feb, march, etc. 546 | month = ['Jan','Feb','Mar','Apr','May','Jun', 547 | 'Jul','Aug','Sep','Oct','Nov','Dec'] 548 | k = [1,2,3,4,5,6,7,8,9,'a','b','c'] #for the links 549 | 550 | for ii in range(len(month)): 551 | mid = '.txt.gz&dir=data/stdmet/' 552 | link = base + str(self.buoy) + str(k[ii]) + '2016' + mid + str(month[ii]) +'/' 553 | 554 | try: 555 | urllib2.urlopen(link) 556 | links.append(link) 557 | 558 | except: 559 | print(str(month[ii]) + '2016' + ' not in records') 560 | print(link) 561 | 562 | 563 | # start grabbing some data 564 | df=pd.DataFrame() #initialize empty df 565 | 566 | for L in links: 567 | 568 | new_df = self.get_stand_meteo(link=L) 569 | print('Link : ' + L) 570 | df = df.append(new_df) 571 | 572 | return df 573 | 574 | 575 | class write_data(historic_data): 576 | 577 | def __init__(self, buoy, year, year_range,db_name = 'buoydata.db'): 578 | self.buoy = buoy 579 | self.year = year 580 | self.year_range=year_range 581 | self.db_name = db_name 582 | 583 | def write_all_stand_meteo(self): 584 | """ 585 | Write the standard meteological data to the database. See get_all_stand_meteo 586 | for a discription of the data. Which is in the historic data class. 587 | 588 | Returns 589 | ------- 590 | df : pandas dataframe (date, frequency) 591 | data frame containing the raw spectral data. index is the date 592 | and the columns are each of the frequencies 593 | 594 | """ 595 | 596 | #hist = self.historic_data(self.buoy,self.year,year_range=self.year_range) 597 | df = self.get_all_stand_meteo() 598 | 599 | #write the df to disk 600 | disk_engine = create_engine('sqlite:///' + self.db_name) 601 | 602 | table_name = str(self.buoy) + '_buoy' 603 | df.to_sql(table_name,disk_engine,if_exists='append') 604 | sql = disk_engine.execute("""DELETE FROM wave_data 605 | WHERE rowid not in 606 | (SELECT max(rowid) FROM wave_data GROUP BY date)""") 607 | 608 | print(str(self.buoy) + 'written to database : ' + str(self.db_name)) 609 | 610 | 611 | return True 612 | 613 | 614 | class read_data: 615 | """ 616 | Reads the data from the setup database 617 | """ 618 | 619 | def __init__(self, buoy, year_range=None): 620 | self.buoy = buoy 621 | self.year_range = year_range 622 | self.disk_eng = 'sqlite:///buoydata.db' 623 | 624 | 625 | def get_stand_meteo(self): 626 | 627 | disk_engine = create_engine(self.disk_eng) 628 | 629 | 630 | df = pd.read_sql_query(" SELECT * FROM " + "'" + str(self.buoy) + '_buoy' + "'", disk_engine) 631 | 632 | #give it a datetime index since it was stripped by sqllite 633 | df.index = pd.to_datetime(df['index']) 634 | df.index.name='date' 635 | df.drop('index',axis=1,inplace=True) 636 | 637 | if self.year_range: 638 | print("""this is not implemented in SQL. Could be slow. 639 | Get out while you can!!!""" ) 640 | 641 | start,stop = (self.year_range) 642 | begin = df.index.searchsorted(datetime.datetime(start, 1, 1)) 643 | end = df.index.searchsorted(datetime.datetime(stop, 12, 31)) 644 | df = df.ix[begin:end] 645 | 646 | 647 | 648 | return df 649 | -------------------------------------------------------------------------------- /buoypy/get_data.py: -------------------------------------------------------------------------------- 1 | """ 2 | By Nick Cortale 3 | nickc1.github.io 4 | 5 | Functions to query the NDBC (http://www.ndbc.noaa.gov/). 6 | 7 | The realtime data for all of their buoys can be found at: 8 | http://www.ndbc.noaa.gov/data/realtime2/ 9 | 10 | Info about all of noaa data can be found at: 11 | http://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf 12 | 13 | What all the values mean: 14 | http://www.ndbc.noaa.gov/measdes.shtml 15 | 16 | Each buoy has the data: 17 | 18 | File Parameters 19 | ---- ---------- 20 | .data_spec Raw Spectral Wave Data 21 | .ocean Oceanographic Data 22 | .spec Spectral Wave Summary Data 23 | .supl Supplemental Measurements Data 24 | .swdir Spectral Wave Data (alpha1) 25 | .swdir2 Spectral Wave Data (alpha2) 26 | .swr1 Spectral Wave Data (r1) 27 | .swr2 Spectral Wave Data (r2) 28 | .txt Standard Meteorological Data 29 | 30 | 31 | 32 | Example: 33 | import buoypy as bp 34 | 35 | # Get the last 45 days of data 36 | rt = bp.realtime(41013) #frying pan shoals buoy 37 | ocean_data = rt.get_ocean() #get Oceanographic data 38 | 39 | wave_data.head() 40 | 41 | Out[7]: 42 | WVHT SwH SwP WWH WWP SwD WWD STEEPNESS APD MWD 43 | 2016-02-04 17:42:00 1.6 1.3 7.1 0.9 4.5 S S STEEP 5.3 169 44 | 2016-02-04 16:42:00 1.7 1.5 7.7 0.9 5.0 S S STEEP 5.4 174 45 | 2016-02-04 15:41:00 2.0 0.0 NaN 2.0 7.1 NaN S STEEP 5.3 174 46 | 2016-02-04 14:41:00 2.0 1.2 7.7 1.5 5.9 SSE SSE STEEP 5.5 167 47 | 2016-02-04 13:41:00 2.0 1.7 7.1 0.9 4.8 S SSE STEEP 5.7 175 48 | 49 | 50 | 51 | TODO: 52 | Make functions with except statements always spit out the same 53 | column headings. 54 | 55 | """ 56 | 57 | import pandas as pd 58 | import numpy as np 59 | import urllib2 60 | from sqlalchemy import create_engine # database connection 61 | import datetime 62 | 63 | class formatter: 64 | """ 65 | Correctly formats the data contained in the link into a 66 | pandas dataframe. 67 | """ 68 | 69 | def __init__(self,link): 70 | self.link = link 71 | 72 | def format_stand_meteo(self): 73 | """ 74 | Format the standard Meteorological data. 75 | """ 76 | 77 | df = pd.read_csv(self.link,delim_whitespace=True, 78 | na_values=[99,999,9999,99.,999.,9999.]) 79 | 80 | #2007 and on format 81 | if df.iloc[0,0] =='#yr': 82 | 83 | 84 | df = df.rename(columns={'#YY': 'YY'}) #get rid of hash 85 | 86 | #make the indices 87 | date_str = df.YY + ' ' + df.MM+ ' ' + df.DD + ' ' + df.hh + ' ' + df.mm 88 | df.drop(0,inplace=True) #first row is units, so drop them 89 | ind = pd.to_datetime(date_str.drop(0),format="%Y %m %d %H %M") 90 | 91 | df.index = ind 92 | 93 | #drop useless columns and rename the ones we want 94 | df.drop(['YY','MM','DD','hh','mm'],axis=1,inplace=True) 95 | df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 96 | 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE'] 97 | 98 | 99 | #before 2006 to 2000 100 | else: 101 | date_str = df.YYYY.astype('str') + ' ' + df.MM.astype('str') + \ 102 | ' ' + df.DD.astype('str') + ' ' + df.hh.astype('str') 103 | 104 | ind = pd.to_datetime(date_str,format="%Y %m %d %H") 105 | 106 | df.index = ind 107 | 108 | #drop useless columns and rename the ones we want 109 | ####################### 110 | '''FIX MEEEEE!!!!!!! 111 | Get rid of the try except 112 | some have minute column''' 113 | 114 | #this is hacky and bad 115 | try: 116 | df.drop(['YYYY','MM','DD','hh','mm'],axis=1,inplace=True) 117 | df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 118 | 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE'] 119 | 120 | except: 121 | df.drop(['YYYY','MM','DD','hh'],axis=1,inplace=True) 122 | df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 123 | 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE'] 124 | 125 | 126 | # all data should be floats 127 | df = df.astype('float') 128 | nvals = [99,999,9999,99.0,999.0,9999.0] 129 | df.replace(nvals,np.nan,inplace=True) 130 | 131 | return df 132 | 133 | ################################################ 134 | ################################################ 135 | 136 | class realtime: 137 | """ 138 | Retrieves the last 45 days worth of data for a specific buoy. 139 | Realtime data is formatted a little different from all the other data. 140 | 141 | 142 | """ 143 | 144 | def __init__(self, buoy): 145 | self.buoy = buoy 146 | 147 | def get_data_spec(self): 148 | """ 149 | Get the raw spectral wave data from the buoy. The seperation 150 | frequency is dropped to keep the data clean. 151 | 152 | Parameters 153 | ---------- 154 | buoy : string 155 | Buoy number ex: '41013' is off wilmington, nc 156 | 157 | Returns 158 | ------- 159 | df : pandas dataframe (date, frequency) 160 | data frame containing the raw spectral data. index is the date 161 | and the columns are each of the frequencies 162 | 163 | """ 164 | 165 | params = 'data_spec' 166 | base = 'http://www.ndbc.noaa.gov/data/realtime2/' 167 | link = base + str(self.buoy) + '.' + params 168 | 169 | #combine the first five date columns YY MM DD hh mm and make index 170 | df = pd.read_csv(link,delim_whitespace=True,skiprows=1,header=None, 171 | parse_dates=[[0,1,2,3,4]], index_col=0) 172 | 173 | 174 | #convert the dates to datetimes 175 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 176 | 177 | specs = df.iloc[:,1::2] 178 | freqs = df.iloc[0,2::2] 179 | 180 | specs.columns=freqs 181 | 182 | #remove the parenthesis from the column index 183 | specs.columns = [cname.replace('(','').replace(')','') 184 | for cname in specs.columns] 185 | 186 | return specs 187 | 188 | 189 | def get_ocean(self): 190 | """ 191 | Retrieve oceanic data. For the buoys explored, 192 | O2%, O2PPM, CLCON, TURB, PH, EH were always NaNs 193 | 194 | 195 | Returns 196 | ------- 197 | df : pandas dataframe 198 | Index is the date and columns are: 199 | DEPTH m 200 | OTMP degc 201 | COND mS/cm 202 | SAL PSU 203 | O2% % 204 | 02PPM ppm 205 | CLCON ug/l 206 | TURB FTU 207 | PH - 208 | EH mv 209 | 210 | """ 211 | 212 | params = 'ocean' 213 | base = 'http://www.ndbc.noaa.gov/data/realtime2/' 214 | link = base + str(self.buoy) + '.' + params 215 | 216 | #combine the first five date columns YY MM DD hh mm and make index 217 | df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 218 | parse_dates=[[0,1,2,3,4]], index_col=0) 219 | 220 | #units are in the second row drop them 221 | df.drop(df.index[0], inplace=True) 222 | 223 | #convert the dates to datetimes 224 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 225 | 226 | #convert to floats 227 | cols = ['DEPTH','OTMP','COND','SAL'] 228 | df[cols] = df[cols].astype(float) 229 | 230 | 231 | return df 232 | 233 | 234 | def get_spec(self): 235 | """ 236 | Get the spectral wave data from the ndbc. Something is wrong with 237 | the data for this parameter. The columns seem to change randomly. 238 | Refreshing the data page will yield different column names from 239 | minute to minute. 240 | 241 | parameters 242 | ---------- 243 | buoy : string 244 | Buoy number ex: '41013' is off wilmington, nc 245 | 246 | Returns 247 | ------- 248 | df : pandas dataframe 249 | data frame containing the spectral data. index is the date 250 | and the columns are: 251 | 252 | HO, SwH, SwP, WWH, WWP, SwD, WWD, STEEPNESS, AVP, MWD 253 | 254 | OR 255 | 256 | WVHT SwH SwP WWH WWP SwD WWD STEEPNESS APD MWD 257 | 258 | 259 | """ 260 | 261 | params = 'spec' 262 | base = 'http://www.ndbc.noaa.gov/data/realtime2/' 263 | link = base + str(self.buoy) + '.' + params 264 | 265 | #combine the first five date columns YY MM DD hh mm and make index 266 | df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 267 | parse_dates=[[0,1,2,3,4]], index_col=0) 268 | 269 | try: 270 | #units are in the second row drop them 271 | #df.columns = df.columns + '('+ df.iloc[0] + ')' 272 | df.drop(df.index[0], inplace=True) 273 | 274 | #convert the dates to datetimes 275 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 276 | 277 | #convert to floats 278 | cols = ['WVHT','SwH','SwP','WWH','WWP','APD','MWD'] 279 | df[cols] = df[cols].astype(float) 280 | except: 281 | 282 | #convert the dates to datetimes 283 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 284 | 285 | #convert to floats 286 | cols = ['H0','SwH','SwP','WWH','WWP','AVP','MWD'] 287 | df[cols] = df[cols].astype(float) 288 | 289 | 290 | return df 291 | 292 | 293 | 294 | def get_supl(self): 295 | """ 296 | Get supplemental data 297 | 298 | Returns 299 | ------- 300 | data frame containing the spectral data. index is the date 301 | and the columns are: 302 | 303 | PRES hpa 304 | PTIME hhmm 305 | WSPD m/s 306 | WDIR degT 307 | WTIME hhmm 308 | 309 | 310 | """ 311 | params = 'supl' 312 | base = 'http://www.ndbc.noaa.gov/data/realtime2/' 313 | link = base + str(self.buoy) + '.' + params 314 | 315 | #combine the first five date columns YY MM DD hh mm and make index 316 | df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 317 | parse_dates=[[0,1,2,3,4]], index_col=0) 318 | 319 | #units are in the second row drop them 320 | df.drop(df.index[0], inplace=True) 321 | 322 | #convert the dates to datetimes 323 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 324 | 325 | #convert to floats 326 | cols = ['PRES','PTIME','WSPD','WDIR','WTIME'] 327 | df[cols] = df[cols].astype(float) 328 | 329 | return df 330 | 331 | 332 | def get_swdir(self): 333 | """ 334 | Spectral wave data for alpha 1. 335 | 336 | Returns 337 | ------- 338 | 339 | specs : pandas dataframe 340 | Index is the date and the columns are the spectrum. Values in 341 | the table indicate how much energy is at each spectrum. 342 | """ 343 | 344 | 345 | params = 'swdir' 346 | base = 'http://www.ndbc.noaa.gov/data/realtime2/' 347 | link = base + str(self.buoy) + '.' + params 348 | 349 | #combine the first five date columns YY MM DD hh mm and make index 350 | df = pd.read_csv(link,delim_whitespace=True,skiprows=1,na_values=999, 351 | header=None, parse_dates=[[0,1,2,3,4]], index_col=0) 352 | 353 | #convert the dates to datetimes 354 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 355 | 356 | specs = df.iloc[:,0::2] 357 | freqs = df.iloc[0,1::2] 358 | 359 | specs.columns=freqs 360 | 361 | #remove the parenthesis from the column index 362 | specs.columns = [cname.replace('(','').replace(')','') 363 | for cname in specs.columns] 364 | 365 | return specs 366 | 367 | def get_swdir2(self): 368 | """ 369 | Spectral wave data for alpha 2. 370 | 371 | Returns 372 | ------- 373 | 374 | specs : pandas dataframe 375 | Index is the date and the columns are the spectrum. Values in 376 | the table indicate how much energy is at each spectrum. 377 | """ 378 | params = 'swdir2' 379 | base = 'http://www.ndbc.noaa.gov/data/realtime2/' 380 | link = base + str(self.buoy) + '.' + params 381 | 382 | #combine the first five date columns YY MM DD hh mm and make index 383 | df = pd.read_csv(link,delim_whitespace=True,skiprows=1, 384 | header=None, parse_dates=[[0,1,2,3,4]], index_col=0) 385 | 386 | #convert the dates to datetimes 387 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 388 | 389 | specs = df.iloc[:,0::2] 390 | freqs = df.iloc[0,1::2] 391 | 392 | specs.columns=freqs 393 | 394 | #remove the parenthesis from the column index 395 | specs.columns = [cname.replace('(','').replace(')','') 396 | for cname in specs.columns] 397 | 398 | return specs 399 | 400 | def get_swr1(self): 401 | """ 402 | Spectral wave data for r1. 403 | 404 | Returns 405 | ------- 406 | 407 | specs : pandas dataframe 408 | Index is the date and the columns are the spectrum. Values in 409 | the table indicate how much energy is at each spectrum. 410 | """ 411 | 412 | 413 | params = 'swr1' 414 | base = 'http://www.ndbc.noaa.gov/data/realtime2/' 415 | link = base + str(self.buoy) + '.' + params 416 | 417 | #combine the first five date columns YY MM DD hh mm and make index 418 | df = pd.read_csv(link,delim_whitespace=True,skiprows=1, 419 | header=None, parse_dates=[[0,1,2,3,4]], index_col=0) 420 | 421 | #convert the dates to datetimes 422 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 423 | 424 | specs = df.iloc[:,0::2] 425 | freqs = df.iloc[0,1::2] 426 | 427 | specs.columns=freqs 428 | 429 | #remove the parenthesis from the column index 430 | specs.columns = [cname.replace('(','').replace(')','') 431 | for cname in specs.columns] 432 | 433 | return specs 434 | 435 | def get_swr2(self): 436 | """ 437 | Spectral wave data for r2. 438 | 439 | Returns 440 | ------- 441 | 442 | specs : pandas dataframe 443 | Index is the date and the columns are the spectrum. Values in 444 | the table indicate how much energy is at each spectrum. 445 | """ 446 | 447 | params = 'swr2' 448 | base = 'http://www.ndbc.noaa.gov/data/realtime2/' 449 | link = base + str(self.buoy) + '.' + params 450 | 451 | #combine the first five date columns YY MM DD hh mm and make index 452 | df = pd.read_csv(link,delim_whitespace=True,skiprows=1, 453 | header=None, parse_dates=[[0,1,2,3,4]], index_col=0) 454 | 455 | #convert the dates to datetimes 456 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 457 | 458 | specs = df.iloc[:,0::2] 459 | freqs = df.iloc[0,1::2] 460 | 461 | specs.columns=freqs 462 | 463 | #remove the parenthesis from the column index 464 | specs.columns = [cname.replace('(','').replace(')','') 465 | for cname in specs.columns] 466 | 467 | return specs 468 | 469 | def get_txt(self): 470 | """ 471 | Retrieve standard Meteorological data. NDBC seems to be updating 472 | the data with different column names, so this metric can return 473 | two possible data frames with different column names: 474 | 475 | Returns 476 | ------- 477 | 478 | df : pandas dataframe 479 | Index is the date and the columns can be: 480 | 481 | ['WDIR','WSPD','GST','WVHT','DPD','APD','MWD', 482 | 'PRES','ATMP','WTMP','DEWP','VIS','PTDY','TIDE'] 483 | 484 | or 485 | 486 | ['WD','WSPD','GST','WVHT','DPD','APD','MWD','BARO', 487 | 'ATMP','WTMP','DEWP','VIS','PTDY','TIDE'] 488 | 489 | """ 490 | 491 | params = 'txt' 492 | base = 'http://www.ndbc.noaa.gov/data/realtime2/' 493 | link = base + str(self.buoy) + '.' + params 494 | 495 | #combine the first five date columns YY MM DD hh mm and make index 496 | df = pd.read_csv(link, delim_whitespace=True, na_values='MM', 497 | parse_dates=[[0,1,2,3,4]], index_col=0) 498 | 499 | try: 500 | #first column is units, so drop it 501 | df.drop(df.index[0], inplace=True) 502 | #convert the dates to datetimes 503 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 504 | 505 | #convert to floats 506 | cols = ['WDIR','WSPD','GST','WVHT','DPD','APD','MWD', 507 | 'PRES','ATMP','WTMP','DEWP','VIS','PTDY','TIDE'] 508 | df[cols] = df[cols].astype(float) 509 | except: 510 | 511 | #convert the dates to datetimes 512 | df.index = pd.to_datetime(df.index,format="%Y %m %d %H %M") 513 | 514 | #convert to floats 515 | cols = ['WD','WSPD','GST','WVHT','DPD','APD','MWD','BARO', 516 | 'ATMP','WTMP','DEWP','VIS','PTDY','TIDE'] 517 | df[cols] = df[cols].astype(float) 518 | return df 519 | 520 | ################################################ 521 | ################################################ 522 | 523 | class get_months(formatter): 524 | """ 525 | Before a year is complete ndbc stores there data monthly. 526 | This class will get all that scrap data. 527 | """ 528 | 529 | def __init__(self, buoy, year=None): 530 | self.buoy = buoy 531 | self.year = year 532 | 533 | def get_stand_meteo(self): 534 | #see what is on the NDBC so we only pull the years that are available 535 | links = [] 536 | 537 | #need to also retrieve jan, feb, march, etc. 538 | month = ['Jan','Feb','Mar','Apr','May','Jun', 539 | 'Jul','Aug','Sep','Oct','Nov','Dec'] 540 | k = [1,2,3,4,5,6,7,8,9,'a','b','c'] #for the links 541 | 542 | #NDBC sometimes lags the new months in january and feb 543 | #Might need to define a year on init 544 | if not self.year: 545 | self.year = str(datetime.date.today().year) 546 | 547 | if datetime.date.month <= 2: 548 | print "using" + self.year + "to get the months. Might be wrong!" 549 | 550 | #for contstructing links 551 | base = 'http://www.ndbc.noaa.gov/view_text_file.php?filename=' 552 | base2 = 'http://www.ndbc.noaa.gov/data/stdmet/' 553 | mid = '.txt.gz&dir=data/stdmet/' 554 | 555 | for ii in range(len(month)): 556 | 557 | #links can come in 2 formats 558 | link = base + str(self.buoy) + str(k[ii]) + self.year + mid + str(month[ii]) +'/' 559 | link2 = base2 + month[ii] + '/' + str(self.buoy) + '.txt' 560 | 561 | try: 562 | urllib2.urlopen(link) 563 | links.append(link) 564 | 565 | except: 566 | print(str(month[ii]) + '2015' + ' not in records') 567 | print link 568 | 569 | #need to try the second link 570 | try: 571 | urllib2.urlopen(link2) 572 | links.append(link2) 573 | print(link2 + 'was found in records') 574 | except: 575 | pass 576 | 577 | 578 | # start grabbing some data 579 | df=pd.DataFrame() 580 | 581 | for L in links: 582 | self.link=L 583 | new_df = self.format_stand_meteo() 584 | print 'Link : ' + L 585 | df = df.append(new_df) 586 | 587 | return df 588 | 589 | ################################################ 590 | ################################################ 591 | 592 | class get_historic(formatter): 593 | 594 | def __init__(self, buoy, year,year_range=None): 595 | self.buoy = buoy 596 | self.year = year 597 | 598 | def hist_stand_meteo(self,link = None): 599 | ''' 600 | Standard Meteorological Data. Data header was changed in 2007. Thus 601 | the need for the if statement below. 602 | 603 | 604 | 605 | WDIR Wind direction (degrees clockwise from true N) 606 | WSPD Wind speed (m/s) averaged over an eight-minute period 607 | GST Peak 5 or 8 second gust speed (m/s) 608 | WVHT Significant wave height (meters) is calculated as 609 | the average of the highest one-third of all of the 610 | wave heights during the 20-minute sampling period. 611 | DPD Dominant wave period (seconds) is the period with the maximum wave energy. 612 | APD Average wave period (seconds) of all waves during the 20-minute period. 613 | MWD The direction from which the waves at the dominant period (DPD) are coming. 614 | (degrees clockwise from true N) 615 | PRES Sea level pressure (hPa). 616 | ATMP Air temperature (Celsius). 617 | WTMP Sea surface temperature (Celsius). 618 | DEWP Dewpoint temperature 619 | VIS Station visibility (nautical miles). 620 | PTDY Pressure Tendency 621 | TIDE The water level in feet above or below Mean Lower Low Water (MLLW). 622 | ''' 623 | 624 | 625 | if not link: 626 | base = 'http://www.ndbc.noaa.gov/view_text_file.php?filename=' 627 | link = base + str(self.buoy) + 'h' + str(self.year) + '.txt.gz&dir=data/historical/stdmet/' 628 | 629 | #combine the first five date columns YY MM DD hh and make index 630 | df = pd.read_csv(link,delim_whitespace=True,na_values=[99,999,9999,99.,999.,9999.]) 631 | 632 | #2007 and on format 633 | if df.iloc[0,0] =='#yr': 634 | 635 | 636 | df = df.rename(columns={'#YY': 'YY'}) #get rid of hash 637 | 638 | #make the indices 639 | date_str = df.YY + ' ' + df.MM+ ' ' + df.DD + ' ' + df.hh + ' ' + df.mm 640 | df.drop(0,inplace=True) #first row is units, so drop them 641 | ind = pd.to_datetime(date_str.drop(0),format="%Y %m %d %H %M") 642 | 643 | df.index = ind 644 | 645 | #drop useless columns and rename the ones we want 646 | df.drop(['YY','MM','DD','hh','mm'],axis=1,inplace=True) 647 | df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 648 | 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE'] 649 | 650 | 651 | #before 2006 to 2000 652 | else: 653 | date_str = df.YYYY.astype('str') + ' ' + df.MM.astype('str') + \ 654 | ' ' + df.DD.astype('str') + ' ' + df.hh.astype('str') 655 | 656 | ind = pd.to_datetime(date_str,format="%Y %m %d %H") 657 | 658 | df.index = ind 659 | 660 | #drop useless columns and rename the ones we want 661 | ####################### 662 | '''FIX MEEEEE!!!!!!! 663 | Get rid of the try except 664 | some have minute column''' 665 | 666 | #this is hacky and bad 667 | try: 668 | df.drop(['YYYY','MM','DD','hh','mm'],axis=1,inplace=True) 669 | df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 670 | 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE'] 671 | 672 | except: 673 | df.drop(['YYYY','MM','DD','hh'],axis=1,inplace=True) 674 | df.columns = ['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 675 | 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE'] 676 | 677 | 678 | # all data should be floats 679 | df = df.astype('float') 680 | nvals = [99,999,9999,99.0,999.0,9999.0] 681 | df.replace(nvals,np.nan,inplace=True) 682 | 683 | return df 684 | 685 | ################################################ 686 | ################################################ 687 | 688 | 689 | class makecall(get_historic,get_months): 690 | 691 | def __init__(self,year_range): 692 | self.year_range = year_range 693 | 694 | 695 | def get_all_stand_meteo(self): 696 | """ 697 | Retrieves all the standard meterological data. Calls get_stand_meteo. 698 | It also checks to make sure that the years that were requested are 699 | available. Data is not available for the same years at all the buoys. 700 | 701 | Returns 702 | ------- 703 | df : pandas dataframe 704 | Contains all the data from all the years that were specified 705 | in year_range. 706 | """ 707 | 708 | start_yr,stop_yr = self.year_range 709 | 710 | #see what is on the NDBC so we only pull the years that are available 711 | links = [] 712 | for ii in range(start,stop+1): 713 | 714 | base = 'http://www.ndbc.noaa.gov/view_text_file.php?filename=' 715 | end = '.txt.gz&dir=data/historical/stdmet/' 716 | link = base + str(self.buoy) + 'h' + str(ii) + end 717 | 718 | try: 719 | urllib2.urlopen(link) 720 | links.append(link) 721 | 722 | except: 723 | print(str(ii) + ' not in records') 724 | 725 | #need to also retrieve jan, feb, march, etc. 726 | month = ['Jan','Feb','Mar','Apr','May','Jun', 727 | 'Jul','Aug','Sep','Oct','Nov','Dec'] 728 | k = [1,2,3,4,5,6,7,8,9,'a','b','c'] #for the links 729 | 730 | for ii in range(len(month)): 731 | mid = '.txt.gz&dir=data/stdmet/' 732 | link = base + str(self.buoy) + str(k[ii]) + '2015' + mid + str(month[ii]) +'/' 733 | 734 | try: 735 | urllib2.urlopen(link) 736 | links.append(link) 737 | 738 | except: 739 | print(str(month[ii]) + '2015' + ' not in records') 740 | print link 741 | 742 | 743 | # start grabbing some data 744 | df=pd.DataFrame() #initialize empty df 745 | 746 | for L in links: 747 | 748 | new_df = self.get_stand_meteo(link=L) 749 | print 'Link : ' + L 750 | df = df.append(new_df) 751 | 752 | return df 753 | 754 | 755 | 756 | 757 | 758 | 759 | 760 | 761 | 762 | 763 | 764 | 765 | 766 | 767 | 768 | 769 | # -------------------------------------------------------------------------------- /figures/historic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickc1/buoypy/64912437e2b07dfaccddb6cd4d53b66c168a97b8/figures/historic.png -------------------------------------------------------------------------------- /figures/historic_range.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickc1/buoypy/64912437e2b07dfaccddb6cd4d53b66c168a97b8/figures/historic_range.png -------------------------------------------------------------------------------- /figures/realtime.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickc1/buoypy/64912437e2b07dfaccddb6cd4d53b66c168a97b8/figures/realtime.png -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup( 4 | name='buoypy', 5 | author='Nick Cortale', 6 | version='0.0.1', 7 | description='buoypy scrapes the National Data Buoy Center and dumps it into pandas dataframes.', 8 | packages=['buoypy'] 9 | 10 | ) --------------------------------------------------------------------------------