├── ATB.md
├── BUTTER.md
├── BuildingsBench.md
├── ImperialValleyDarkFiber
    ├── DarkFiber.md
    └── DarkFiber_Tutorial_Notebook.ipynb
├── NREL_Building_Stock
    ├── Individual_Building_Data.md
    └── Query_ComStock_Athena.md
├── NSO.md
├── NSRDB.md
├── PVROOFTOPS.md
├── PVROOFTOPS_PR.md
├── PoroTomo
    ├── PoroTomo.md
    ├── PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_SEGY.ipynb
    ├── PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hdf5.ipynb
    ├── PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hsds.ipynb
    └── README.md
├── SMART-DS
    ├── Readme.md
    ├── Readme.pdf
    └── figures
    │   ├── AUS
    │       └── all_labels.PNG
    │   ├── CYME
    │       ├── import_timeseries.PNG
    │       ├── import_voltvar.PNG
    │       ├── importing.PNG
    │       ├── networks.PNG
    │       ├── simplified_view.PNG
    │       ├── simplified_view_zoomed.PNG
    │       ├── substation.PNG
    │       └── timeseries_results.PNG
    │   ├── GIS
    │       ├── layer_examples.PNG
    │       └── missing_layers.png
    │   ├── GSO
    │       ├── all_labels.PNG
    │       └── all_labels2.PNG
    │   ├── OpenDSS
    │       ├── feeder.PNG
    │       ├── monitor_current.PNG
    │       ├── monitor_kva.PNG
    │       ├── profile.PNG
    │       └── running_dss.PNG
    │   ├── SAF
    │       └── all_labels.png
    │   ├── SFO
    │       ├── downtown_labels.PNG
    │       ├── east_labels.PNG
    │       ├── north_labels.PNG
    │       └── south_labels.PNG
    │   ├── analysis
    │       ├── pu_voltages_histogram.png
    │       └── pu_voltages_percentiles.png
    │   └── load_curves
    │       ├── total_load_200.png
    │       └── total_load_244.png
├── Sup3rCC.md
├── Template.md
├── TrackingtheSun.md
├── UMCM_Hurricanes.md
├── US_Wave.md
├── WINDToolkit.md
├── dGen.md
├── pvdaq.md
└── windAIBench.md


/ATB.md:
--------------------------------------------------------------------------------
  1 | # Annual Technology Baseline (ATB)
  2 | 
  3 | ## Description
  4 | 
  5 | The NREL Annual Technology Baseline (ATB) provides a consistent set of technology and performance data for energy analysis. This dataset was developed with funding from the U.S. Department of Energy's,
  6 | Office of Energy Efficiency & Renewable Energy.
  7 | 
  8 | To inform electric and transportation sector analysis in the United States, each year NREL provides a robust set of modeling input assumptions for energy technologies, and a diverse set of potential electricity generation futures or modeling scenarios (Standard Scenarios).
  9 | 
 10 | The ATB is a populated framework to identify technology-specific cost and performance paramters or other investment decision metrics across a range of fuel price conditions as well as site-specific conditions for electric generation technologies at present and with projections through 2050.  
 11 | 
 12 | ## Model
 13 | 
 14 | The purpose of the ATB is to provide CAPEX, O&M, and capacity factor estimates for Base Year and future year projections representing three levels of technical innovation (conservative, moderate, and advanced) for use in electric sector models.
 15 | 
 16 | The R&D Only cases are intended to reflect fundamental technology changes over time — not short-term market variations in pricing, not changes in interest rates or other project finance elements, and not macroeconomic influences such as commodity price fluctuations. These cases attempt to estimate the potential effects of technology innovation across the renewable electricity generation technologies under comparable levels of probability. This is inherently uncertain.
 17 | 
 18 | The Market + Policies Case approximate the costs of electricity generation plants with Independent Power Producer financial terms, covering the energy component of electric system planning and operation. Important items that are not included in these costs limit the validity of comparisons across technologies. The following table summarizes these limitations, identifies other analyses, tools, and data sets that are more complete sources for these items, and suggests applications that are affected by these limitations of the ATB.
 19 | 
 20 | See [technical limitations on the ATB website](https://atb.nrel.gov/electricity/2024/technical_limitations) for more detailed information.
 21 | 
 22 | ## Directory structure
 23 | 
 24 | The CSV files summarize in database-friendly form the capital expenditures, operations expenditures, and capacity factor, as well as the financial assumptions and the levelized cost of energy, for each technology. They are reformatted from the summary section of the spreadsheet, which documents the underlying calculations and data. The same data is also available in the Apache Parquet format.
 25 | 
 26 | The files are stored by type and then by year. The file types are parquet and csv. The files can be accessed in the [Department of Energy's Open Energy Data Initiative (OEDI) in the Registry of Open Data on AWS](https://registry.opendata.aws/oedi-data-lake/), or in the [bucket viewer](https://data.openei.org/s3_viewer?bucket=oedi-data-lake&prefix=ATB%2F).
 27 | 
 28 | - `s3://oedi-data-lake/ATB/electricity`
 29 | 
 30 | ## Vintage
 31 | 
 32 | Annual data for 2015 through the previous year can be found at [https://atb.nrel.gov/archive](https://atb.nrel.gov/archive)
 33 | 
 34 | The current year data can be found at [https://atb.nrel.gov/data](https://atb.nrel.gov/data)
 35 | 
 36 | ## Data Format
 37 | 
 38 | The most recent annual data is provided in CSV and Apache Parquet format. The data structure is as follows:
 39 | 
 40 | Column | Type | Description
 41 | -- | -- | --
 42 |   `atb_year` | bigint | year of ATB publication
 43 |   `core_metric_key` | string | concatenated unique key  
 44 |   `core_metric_parameter` | string | technology and cost performance parameters  
 45 |   `core_metric_case` | string | financial case (R&D or Market)
 46 |   `crpyears` | bigint | cost recovery period, years
 47 |   `technology` | string | technology
 48 |   `technology_alias` | string | technology alias
 49 |   `techdetail` | string | technology specific classifications and sub-groups   
 50 |   `techdetail2` | string | technology specific classifications and sub-groups   
 51 |   `resourcedetail` | string | resource specific classifications and sub-groups   
 52 |   `display_name` | string | technology specific classifications and sub-groups for use in Tableau
 53 |   `default` | string | default or subgroup technology
 54 |   `scale` | string | null, utility, commercial or resdidential
 55 |   `maturity` | string | nascent or mature
 56 |   `scenario` | string | moderate, conservative and advanced
 57 |   `core_metric_variable`| string | projected year
 58 |   `units` | string | units
 59 |   `value` | double | value
 60 | 
 61 | ## Python Examples
 62 | 
 63 | ```python
 64 | 
 65 | import pandas as pd
 66 | from pyathena import connect
 67 | 
 68 | conn = connect(
 69 |     s3_staging_dir='s3://<USER DEFINED STAGING>/', ##user defined staging directory
 70 |     region_name='us-west-2',
 71 |     work_group='<USER SPECIFIC WORKGROUP>'  ##specify workgroup if exists
 72 | )
 73 | 
 74 | df = pd.read_sql("SELECT distinct technology, techdetail from oedi_atb.atb_electricity_parquet_2024 where techdetail <> '*' order by technology, techdetail;",conn)
 75 | ```
 76 | 
 77 | OEDI has created a set of tools to facilitate access to open energy data sets, including ATB. Please visit the [open-data-access-tools documentation page](https://openedi.github.io/open-data-access-tools/) for more info. You can find jupyter notebook examples that show how to use the tools in our [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples)
 78 | 
 79 | ## References
 80 | 
 81 | Please cite the most relevant publication below when referencing this dataset:
 82 | 
 83 | 1) NREL (National Renewable Energy Laboratory). 2024. "2024 Annual Technology Baseline." Golden, CO: National Renewable Energy Laboratory. NREL (National Renewable Energy Laboratory). 2024. "2024 Annual Technology Baseline." Golden, CO: National Renewable Energy Laboratory. [https://atb.nrel.gov/](https://atb.nrel.gov/).
 84 | 
 85 | ## Errata
 86 | 
 87 | Documentaion for the ATB errata can be found on the ATB website:
 88 | 
 89 | [https://atb.nrel.gov/electricity/2024/errata](https://atb.nrel.gov/electricity/2024/errata)
 90 | 
 91 | The Data in the S3 bucket maintains the most up to date version of the parquet and csv files for the years provided.    
 92 | 
 93 | ## Disclaimer and Attribution
 94 | 
 95 | DISCLAIMER AGREEMENT
 96 | 
 97 | These detailed electricity generation technology cost and performance data ("Data") are provided by the National Renewable Energy Laboratory ("NREL"), which is operated by the Alliance for Sustainable Energy LLC ("Alliance") for the U.S. Department of Energy (the "DOE").
 98 | 
 99 | It is recognized that disclosure of these Data are provided under the following conditions and warnings: (1) these Data have been prepared for reference purposes only; (2) these Data consist of forecasts, estimates, or assumptions made on a best-efforts basis, based upon expectations of current and future conditions at the time they were developed; and (3) these Data were prepared with existing information and are subject to change without notice.
100 | 
101 | The user understands that DOE/NREL/ALLIANCE are not obligated to provide the user with any support, consulting, training or assistance of any kind with regard to the use of the Data or to provide the user with any updates, revisions or new versions thereof. DOE, NREL, and ALLIANCE do not guarantee or endorse any results generated by use of the Data, and user is entirely responsible for the results and any reliance on the results or the Data in general.
102 | 
103 | The user understands that DOE/NREL/ALLIANCE are not obligated to provide the user with any support, consulting, training or assistance of any kind with regard to the use of the Data or to provide the user with any updates, revisions or new versions thereof. DOE, NREL, and ALLIANCE do not guarantee or endorse any results generated by use of the Data, and user is entirely responsible for the results and any reliance on the results or the Data in general.
104 | 
105 | USER AGREES TO INDEMNIFY DOE/NREL/ALLIANCE AND ITS SUBSIDIARIES, AFFILIATES, OFFICERS, AGENTS, AND EMPLOYEES AGAINST ANY CLAIM OR DEMAND, INCLUDING REASONABLE ATTORNEYS' FEES, RELATED TO USER'S USE OF THE DATA. THE DATA ARE PROVIDED BY DOE/NREL/ALLIANCE "AS IS," AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL DOE/NREL/ALLIANCE BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER, INCLUDING BUT NOT LIMITED TO CLAIMS ASSOCIATED WITH THE LOSS OF DATA OR PROFITS, THAT MAY RESULT FROM AN ACTION IN CONTRACT, NEGLIGENCE OR OTHER TORTIOUS CLAIM THAT ARISES OUT OF OR IN CONNECTION WITH THE ACCESS, USE OR PERFORMANCE OF THE DATA.
106 | >>>>>>> Stashed changes
107 | 


--------------------------------------------------------------------------------
/BuildingsBench.md:
--------------------------------------------------------------------------------
  1 | # BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting
  2 | 
  3 | ## Description
  4 | 
  5 | The BuildingsBench datasets consist of:
  6 | 
  7 |   - Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock.
  8 |   - 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF.
  9 | 
 10 | Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size 
 11 | for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) 
 12 | dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, 
 13 | manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning 
 14 | and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series 
 15 | per building, it is much smaller (~110 GB). 
 16 | 
 17 | BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside 
 18 | Buildings-900K, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB.
 19 | 
 20 | ## Directory structure
 21 | 
 22 | A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench.
 23 | 
 24 | - `BuildingsBench/`
 25 |     - `Buildings-900K/end-use-load-profiles-for-us-building-stock/2021/`: The Buildings-900K pretraining and validation data.
 26 |         - `comstock_amy2018_release_1/`
 27 |             - `timeseries_individual_buildings/`
 28 |                 - `by_puma_midwest`
 29 |                     - `upgrade=0`
 30 |                         - `puma={puma_id}/*.parquet`
 31 |                         - `...`
 32 |                 - `by_puma_northeast`
 33 |                 - `by_puma_south`
 34 |                 - `by_puma_west`
 35 |             - `weather/`
 36 |                 - `amy2018/`
 37 |                     - `{puma_id}_2018.csv`
 38 |                     - ...
 39 |             - `metadata/`
 40 |                 - `metadata.parquet`
 41 |         - `...`: Other datasets            
 42 |     - `BDG-2/`: Building Data Genome Project 2 BuildingsBench evaluation data *with outliers removed*. 
 43 |         -  `{building_id}={year}.csv`: The .csv files for the BDG-2 dataset.
 44 |         - `...`: Other buildings
 45 |     - `...`: Other evaluation datasets (Borealis, Electricity, etc.)
 46 |     - `buildingsbench_with_outliers`: The BuildingsBench evaluation data *with outliers*.
 47 |         - `BDG-2/`: Buildings Data Genome Project 2 BuildingsBench evaluation data *with outliers*. 
 48 |             -  `{building_id}={year}.csv`: The .csv files for the BDG-2 dataset.
 49 |             - `...`: Other buildings
 50 |         - `...`: Other evaluation datasets (Borealis, Electricity, etc.)
 51 |     - `LICENSES/`: Licenses for each evaluation dataset redistributed in BuildingsBench. 
 52 |     - `metadata/`: Metadata for the evaluation suite.
 53 |         - `benchmark.toml`: Metadata for the benchmark. For each dataset, we specify:
 54 |             - `building_type`: `residential` or `commercial`.
 55 |             - `latlon`: a List of two floats representing the location of the building(s).
 56 |             - `conus_location`: The name of the county or city in the U.S. where the building is located, or a county/city in the U.S. of similar climate to the building's true location.
 57 |             - `actual_location`: County/city where the building actually is located. This will be different from `conus_location` when the building is outside of the CONUS. These values are for book-keeping and can be set to dummy values.
 58 |             - `url`: The URL where the dataset was obtained from.
 59 |         - `building_years.txt`: List of .csv files included in the benchmark. Each line is of the form `{dataset}/{building_id}={year}.csv`.
 60 |         - `withheld_pumas.tsv`: List of PUMAs withheld from the training/validation set of Buildings-900K, which we use as synthetic test data.
 61 |         - `map_of_pumas_in_census_region*.csv`: Maps PUMA IDs to their geographical centroid (lat/lon).
 62 |         - `spatial_tract_lookup_table.csv`: Mapping between census tract identifiers and other geographies.
 63 |         - `list_oov.py`: Python script to generate a list of buildings that are OOV for the Buildings-900K tokenizer.
 64 |         - `oov.txt`: List of buildings that are OOV for the Buildings-900K tokenizer.
 65 |         - `transfer_learning_commercial_buildings.txt`: List of 100 commercial buildings from the benchmark we use for evaluating transfer learning.
 66 |         - `transfer_learning_residential_buildings.txt`: List of 100 residential buildings from the benchmark we use for evaluating transfer learning.
 67 |         - `transfer_learning_hyperparameter_tuning.txt`: List of 2 held out buildings (1 commercial, 1 residential) that can be used for hyperparameter tuning.
 68 |         - `train*.idx`: Index files for fast dataloading of Buildings-900K. This file uncompressed is 16GB. 
 69 |         - `val*.idx`: Index files for fast dataloading of Buildings-900K.
 70 |         - `transforms`: Directory for storing data transform info.
 71 | 
 72 | ## Data Format
 73 | 
 74 | ### Parquet file format
 75 | 
 76 |   The pretraining dataset Buildings-900K is stored as a collection of PUMA-level parquet files.
 77 |   Each parquet file in Buildings-900K is stored in a directory named after a unique PUMA ID `puma={puma_id}/*.parquet`. The first column is the timestamp and 
 78 |   each subsequent column is the energy consumption in kWh for a different building in that. These columns are named by building id. The timestamp is in the 
 79 |   format `YYYY-MM-DD HH:MM:SS`. The energy consumption is in kWh.
 80 |   The parquet files are compressed with snappy.
 81 | 
 82 | ### CSV file format
 83 | 
 84 | Most CSV files in the benchmark are named `building_id=year.csv` and correspond to a single building's energy consumption time series. The first column is the 
 85 | timestamp (the Pandas index), and the second column is the energy consumption in kWh. The timestamp is in the format `YYYY-MM-DD HH:MM:SS`. The energy consumption is in kWh. 
 86 | 
 87 | Certain datasets have multiple buildings in a single file. In this case, the first column is the timestamp (the Pandas index), and each subsequent column is the energy 
 88 | consumption in kWh for a different building. These columns are named by building id. The timestamp is in the format `YYYY-MM-DD HH:MM:SS`. The energy consumption is in kWh.
 89 | 
 90 | ## Code Examples
 91 | For dataset quick start and other tutorials see the [BuildingsBench Github Tutorials](https://github.com/NREL/BuildingsBench/tree/main/tutorials).
 92 | 
 93 | ## References
 94 | 
 95 | - [NeurIPS paper](https://arxiv.org/abs/2307.00142) providing additional information on the datasets along with the analyses conducted with BuildingsBench.
 96 | - [End-Use Load Profiles (EULP)](https://data.openei.org/submissions/4520) from which Buildings-900K is derived from.
 97 | - Additional information about the parquet file format can be found [here](https://parquet.apache.org/).
 98 | 
 99 | Users of the BuildingsBench data should please cite:
100 | - Emami, Patrick, Graf, Peter. BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting. United States: N.p., 31 Dec, 2018. Web. doi: 10.25984/1986147.
101 | 
102 | 


--------------------------------------------------------------------------------
/ImperialValleyDarkFiber/DarkFiber.md:
--------------------------------------------------------------------------------
 1 | # Imperial Valley Dark Fiber Project Continuous DAS Data
 2 | 
 3 | ## Description
 4 | 
 5 | Whereas permanent seismic networks are sparsely distributed, this dense array seismic data with gauge length 10 m and sampling rate 500 Hz 
 6 | was continuously acquired on a ~ 28 km long segment of unused fiber-optic cable buried in the ground for telecommunication using a novel 
 7 | technology called “Distributed Acoustic Sensing”. The objective is to demonstrate dark fiber DAS as a tool for basin-scale geothermal 
 8 | exploration and monitoring. 
 9 | 
10 | The included DAS data were recorded during two days at the beginning the project. The study area, Imperial Valley in Southern California, 
11 | is a sedimentary basin characterized by intense seismicity and faulting, high heat flow, and deformation over a broad area in a transtensional 
12 | tectonic regime, hosts multiple producing geothermal fields, and is believed to host other hidden geothermal resources. In particular, the 
13 | dark fiber DAS array passes close to the Brawley geothermal field, which was a hidden geothermal resource prior to its discovery. Therefore, 
14 | the DAS dataset provides new, exciting research opportunities in the studies of seismology, tectonics, seismic exploration, and basin-scale 
15 | geothermal exploration and monitoring. 
16 |     
17 | This dataset is continuous array seismic data acquired using distributed acoustic sensing (DAS) and can be used for all applications and methods 
18 | in seismology that can use DAS data. This dataset can be used for analysis of ambient seismic noise and earthquakes in the Imperial Valley. This 
19 | data can provide insights into sources of ambient seismic noise, subsurface seismic velocity structure, short-term spatiotemporal variations in 
20 | the subsurface, source properties, detection and monitoring of earthquakes, and seismic-wave propagation in the Imperial Valley and similar 
21 | sedimentary basins. 
22 | 
23 | ## Directory structure
24 | 
25 | This dataset contains continuous raw DAS data acquired over a period of two days (November 12-13, 2020) at the beginning of the Imperial 
26 | Valley Dark Fiber Project. It consists 2,880 files of approximately 400 MB each, each labeled with the start time naming scheme DF__UTC_YYYYMMDD_HHMMSS.SSS.h5.
27 | 
28 | ## Data Format
29 | 
30 | The DAS data files are 1 minute segments of strain rate in .hdf5 format. The technical specifications of the DAS data acquisition are - 6912 channels, 
31 | 4 m channel spacing, 10 m gauge length, 500 Hz sample rate. Data in each hdf5 file in the dataset is a 2D array of dimensions 6912 channels x 30000 time 
32 | samples and int16 datatype under the dataset name “Acoustic”. The number of attributes or headers is 83. The timestamp is in the header “GPSTimeStamp”. 
33 | 
34 | ## Code Examples
35 | For a tutorial of accessing and using this data please see the following link:
36 |   - https://github.com/openEDI/documentation/blob/main/ImperialValleyDarkFiber/DarkFiber_Tutorial_Notebook.ipynb
37 | 
38 | ## References
39 | 
40 | For additional information about the objective of this data, users can reference the following article:
41 |   -  Ajo-Franklin, J, et al. The Imperial Valley Dark Fiber Project: Toward Seismic Studies Using DAS and Telecom Infrastructure for Geothermal Applications. United States. https://doi.org/10.1785/0220220072 
42 | 
43 | Additional information about the HDF5 file format can be found [here](https://support.hdfgroup.org/HDF5/doc/H5.format.html).
44 | 
45 | Users of the Dark Fiber DAS data should use the following citation:
46 |   - Ajo-Franklin, Jonathan, Dobson, Patrick, and Rodriguez Tribaldos, Veronica. Imperial Valley Dark Fiber Project Continuous DAS Data. 
47 |     United States: N.p., 10 Nov, 2020. Web. https://gdr.openei.org/submissions/1499.
48 | 


--------------------------------------------------------------------------------
/NREL_Building_Stock/Individual_Building_Data.md:
--------------------------------------------------------------------------------
 1 | # Downloading individual building data files from ComStock results
 2 | 
 3 | For many use cases, the goal is to aggregate the timeseries energy data
 4 | from many buildings in a given region, building type, etc. The methods For
 5 | doing that are documented in [TODO link to query document].  Those methods
 6 | rely on AWS Athena because of the sheer size of the data to be aggregated.
 7 | 
 8 | Other use cases may require access the individual building timeseries data.
 9 | This document describes how that data is stored and how to access it.
10 | 
11 | ## Requirements
12 | 
13 | An AWS account is necessary to follow this tutorial.
14 | 
15 | ## Data location
16 | 
17 | The ComStock dataset is stored in a publicly-accessible Amazon S3 bucket.
18 | To access this bucket:
19 | 
20 | 1. Login to AWS
21 | 2. Visit the [ComStock data bucket](https://s3.console.aws.amazon.com/s3/buckets/nrel-pds-building-stock?region=us-west-2&tab=objects)
22 | 
23 | This should take you to an interface where you can browse through the directories
24 | and look through the data.
25 | 
26 | ## Data organization
27 | 
28 | ```
29 | nrel-pds-building-stock
30 | ├── athena # root directory
31 | ├──── 2020 # year the dataset was published
32 | ├────── comstock_v1 # name of the dataset
33 | ├──────── metadata # building characteristics and annual energy data
34 | │         ├── fast1_metadata.parquet # read all files to get all buildings
35 | │         ├── fast2_metadata.parquet
36 | │         └── ...
37 | ├────────climate_zone # timeseries data, partitioned by climate zone
38 | │         ├── upgrade=0
39 | │         ├───── climate_zone=1A
40 | │         ├──────── 100022-0.parquet # buildingID-upgradeID
41 | │         ├──────── 100052-0.parquet
42 | │         ├──────── 10006-0.parquet
43 | │         └──────── ...
44 | ├────────state # timeseries data
45 | │         ├── upgrade=0
46 | │         ├───── state=01 # same timeseries data, partitioned by state
47 | │         ├──────── 100022-0.parquet
48 | │         ├──────── 100052-0.parquet
49 | │         ├──────── 10006-0.parquet
50 | └         └──────── ...
51 | ```
52 | ## Finding specific buildings
53 | 
54 | In order to find specific buildings, first the metadata files should be
55 | downloaded and parsed.  The list of buildings can be filtered by various
56 | characteristics, and a list of building IDs can be generated.  Either the
57 | state or climate_zone should be determined for each building ID.
58 | 
59 | Once the list of building IDs and corresponding state or is ready, the
60 | individual files can be retrieved from either the `/state` or `/climate_zone`
61 | directories.  The timeseries profiles in these directories are identical, they
62 | are duplicated for query optimization. The full URI to a file will look like:
63 | ```
64 | s3://nrel-pds-building-stock/comstock/athena/2020/comstock_v1/state/upgrade=0/state=01/100094-0.parquet
65 | ```
66 | ## Programatic download of files
67 | The files can be downloaded programatically using a Python library such as
68 | [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html),
69 | or by using a similar library in another programming language.
70 | 


--------------------------------------------------------------------------------
/NREL_Building_Stock/Query_ComStock_Athena.md:
--------------------------------------------------------------------------------
  1 | # Running queries on ComStock using AWS Athena
  2 | 
  3 | ComStock data can be accessed and downloaded using
  4 | [AWS Athena](https://aws.amazon.com/athena/).
  5 | Here we will show how to query the data for multiple building types,
  6 | US regions, and upgrades.
  7 | 
  8 | ## Requirements
  9 | 
 10 | An AWS account is necessary to run the code in this tutorial.
 11 | 
 12 | ## Setting up AWS Athena with ComStock data for querying
 13 | 
 14 | First of all, we need to log into AWS and go to
 15 | [AWS Athena Console](https://console.aws.amazon.com/athena/home).
 16 | 
 17 | NOTE: To run these queries, go to the `Query Editor` tab of the Athena
 18 | interface, copy and paste the query into the `New query` box, and click the
 19 | `Run query` button.
 20 | 
 21 | ### Setting up the database
 22 | As a first step, we will create a database which the data we want to query
 23 | will live in - if you already have a database you want to create the tables
 24 | in, or are fine with the tables being created in a default database, skip
 25 | this step.
 26 | 
 27 | ```sql
 28 | CREATE DATABASE comstock
 29 | ```
 30 | 
 31 | ### Creating the metadata table
 32 | Next we need to create the data tables that will be used to query the data.
 33 | We will create two tables, one with building characteristics and annual energy
 34 | data, and another with the time series energy data.
 35 | 
 36 | First we will create the metadata table, which will contain building
 37 | characteristics and annual energy data. This is query should take less than 1
 38 | minute to run.
 39 | 
 40 | In the SQL in the Athena `Query Editor` tab:
 41 | 
 42 | ```sql
 43 | CREATE EXTERNAL TABLE `comstock_v1_metadata`(
 44 |   `applicability` boolean,
 45 |   `bldg_id` bigint,
 46 |   `climate_zone` string,
 47 |   `in.aspect_ratio` double,
 48 |   `in.building_type` string,
 49 |   `in.climate_zone` string,
 50 |   `in.code_when_built` string,
 51 |   `in.cooling_fuel` string,
 52 |   `in.current_envelope_code` string,
 53 |   `in.current_exterior_lighting_code` string,
 54 |   `in.current_hvac_code` string,
 55 |   `in.current_interior_equipment_code` string,
 56 |   `in.current_interior_lighting_code` string,
 57 |   `in.floor_height` double,
 58 |   `in.heating_fuel` string,
 59 |   `in.hvac_delivery_type` string,
 60 |   `in.hvac_system_type` string,
 61 |   `in.number_of_stories` double,
 62 |   `in.rotation` double,
 63 |   `in.sqft` double,
 64 |   `in.water_systems_fuel` string,
 65 |   `in.weather_station` string,
 66 |   `in.weekday_opening_time` string,
 67 |   `in.weekday_operating_hours` string,
 68 |   `in.weekend_opening_time` string,
 69 |   `in.weekend_operating_hours` string,
 70 |   `out.electricity.cooling.energy_consumption` double,
 71 |   `out.electricity.cooling.energy_consumption_intensity` double,
 72 |   `out.electricity.cooling.energy_savings` double,
 73 |   `out.electricity.cooling.energy_savings_intensity` double,
 74 |   `out.electricity.exterior_lighting.energy_consumption` double,
 75 |   `out.electricity.exterior_lighting.energy_consumption_intensity` double,
 76 |   `out.electricity.exterior_lighting.energy_savings` double,
 77 |   `out.electricity.exterior_lighting.energy_savings_intensity` double,
 78 |   `out.electricity.fans.energy_consumption` double,
 79 |   `out.electricity.fans.energy_consumption_intensity` double,
 80 |   `out.electricity.fans.energy_savings` double,
 81 |   `out.electricity.fans.energy_savings_intensity` double,
 82 |   `out.electricity.heat_recovery.energy_consumption` double,
 83 |   `out.electricity.heat_recovery.energy_consumption_intensity` double,
 84 |   `out.electricity.heat_recovery.energy_savings` double,
 85 |   `out.electricity.heat_recovery.energy_savings_intensity` double,
 86 |   `out.electricity.heat_rejection.energy_consumption` double,
 87 |   `out.electricity.heat_rejection.energy_consumption_intensity` double,
 88 |   `out.electricity.heat_rejection.energy_savings` double,
 89 |   `out.electricity.heat_rejection.energy_savings_intensity` double,
 90 |   `out.electricity.heating.energy_consumption` double,
 91 |   `out.electricity.heating.energy_consumption_intensity` double,
 92 |   `out.electricity.heating.energy_savings` double,
 93 |   `out.electricity.heating.energy_savings_intensity` double,
 94 |   `out.electricity.humidification.energy_consumption` double,
 95 |   `out.electricity.humidification.energy_consumption_intensity` double,
 96 |   `out.electricity.humidification.energy_savings` double,
 97 |   `out.electricity.humidification.energy_savings_intensity` double,
 98 |   `out.electricity.interior_equipment.energy_consumption` double,
 99 |   `out.electricity.interior_equipment.energy_consumption_intensity` double,
100 |   `out.electricity.interior_equipment.energy_savings` double,
101 |   `out.electricity.interior_equipment.energy_savings_intensity` double,
102 |   `out.electricity.interior_lighting.energy_consumption` double,
103 |   `out.electricity.interior_lighting.energy_consumption_intensity` double,
104 |   `out.electricity.interior_lighting.energy_savings` double,
105 |   `out.electricity.interior_lighting.energy_savings_intensity` double,
106 |   `out.electricity.peak_demand.energy_consumption` double,
107 |   `out.electricity.peak_demand.energy_consumption_intensity` double,
108 |   `out.electricity.peak_demand.energy_savings` double,
109 |   `out.electricity.peak_demand.energy_savings_intensity` double,
110 |   `out.electricity.pumps.energy_consumption` double,
111 |   `out.electricity.pumps.energy_consumption_intensity` double,
112 |   `out.electricity.pumps.energy_savings` double,
113 |   `out.electricity.pumps.energy_savings_intensity` double,
114 |   `out.electricity.refrigeration.energy_consumption` double,
115 |   `out.electricity.refrigeration.energy_consumption_intensity` double,
116 |   `out.electricity.refrigeration.energy_savings` double,
117 |   `out.electricity.refrigeration.energy_savings_intensity` double,
118 |   `out.electricity.total.energy_consumption` double,
119 |   `out.electricity.total.energy_consumption_intensity` double,
120 |   `out.electricity.total.energy_savings` double,
121 |   `out.electricity.total.energy_savings_intensity` double,
122 |   `out.electricity.water_systems.energy_consumption` double,
123 |   `out.electricity.water_systems.energy_consumption_intensity` double,
124 |   `out.electricity.water_systems.energy_savings` double,
125 |   `out.electricity.water_systems.energy_savings_intensity` double,
126 |   `out.natural_gas.cooling.energy_consumption` double,
127 |   `out.natural_gas.cooling.energy_consumption_intensity` double,
128 |   `out.natural_gas.cooling.energy_savings` double,
129 |   `out.natural_gas.cooling.energy_savings_intensity` double,
130 |   `out.natural_gas.heating.energy_consumption` double,
131 |   `out.natural_gas.heating.energy_consumption_intensity` double,
132 |   `out.natural_gas.heating.energy_savings` double,
133 |   `out.natural_gas.heating.energy_savings_intensity` double,
134 |   `out.natural_gas.interior_equipment.energy_consumption` double,
135 |   `out.natural_gas.interior_equipment.energy_consumption_intensity` double,
136 |   `out.natural_gas.interior_equipment.energy_savings` double,
137 |   `out.natural_gas.interior_equipment.energy_savings_intensity` double,
138 |   `out.natural_gas.total.energy_consumption` double,
139 |   `out.natural_gas.total.energy_consumption_intensity` double,
140 |   `out.natural_gas.total.energy_savings` double,
141 |   `out.natural_gas.total.energy_savings_intensity` double,
142 |   `out.natural_gas.water_systems.energy_consumption` double,
143 |   `out.natural_gas.water_systems.energy_consumption_intensity` double,
144 |   `out.natural_gas.water_systems.energy_savings` double,
145 |   `out.natural_gas.water_systems.energy_savings_intensity` double,
146 |   `out.other_fuel.heating.energy_consumption` double,
147 |   `out.other_fuel.heating.energy_consumption_intensity` double,
148 |   `out.other_fuel.heating.energy_savings` double,
149 |   `out.other_fuel.heating.energy_savings_intensity` double,
150 |   `out.other_fuel.interior_equipment.energy_consumption` double,
151 |   `out.other_fuel.interior_equipment.energy_consumption_intensity` double,
152 |   `out.other_fuel.interior_equipment.energy_savings` double,
153 |   `out.other_fuel.interior_equipment.energy_savings_intensity` double,
154 |   `out.other_fuel.total.energy_consumption` double,
155 |   `out.other_fuel.total.energy_consumption_intensity` double,
156 |   `out.other_fuel.total.energy_savings` double,
157 |   `out.other_fuel.total.energy_savings_intensity` double,
158 |   `out.other_fuel.water_systems.energy_consumption` double,
159 |   `out.other_fuel.water_systems.energy_consumption_intensity` double,
160 |   `out.other_fuel.water_systems.energy_savings` double,
161 |   `out.other_fuel.water_systems.energy_savings_intensity` double,
162 |   `out.site_energy.total.energy_consumption` double,
163 |   `out.site_energy.total.energy_consumption_intensity` double,
164 |   `out.site_energy.total.energy_savings` double,
165 |   `out.site_energy.total.energy_savings_intensity` double,
166 |   `state` string,
167 |   `upgrade` bigint,
168 |   `weight` double,
169 |   `metadata_index` bigint,
170 |   `in.applicable` boolean,
171 |   `__index_level_0__` bigint)
172 | ROW FORMAT SERDE
173 |   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
174 | WITH SERDEPROPERTIES (
175 |   'parquet.column.index.access'='true')
176 | STORED AS INPUTFORMAT
177 |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
178 | OUTPUTFORMAT
179 |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
180 | LOCATION
181 |   's3://nrel-pds-building-stock/comstock/athena/2020/comstock_v1/metadata/'
182 | TBLPROPERTIES (
183 |   'CrawlerSchemaDeserializerVersion'='1.0',
184 |   'CrawlerSchemaSerializerVersion'='1.0',
185 |   'UPDATED_BY_CRAWLER'='vizstock_oedi_comstock_v1_metadata',
186 |   'averageRecordSize'='170',
187 |   'classification'='parquet',
188 |   'compressionType'='none',
189 |   'objectCount'='10',
190 |   'recordCount'='30560553',
191 |   'sizeKey'='5293679830',
192 |   'typeOfData'='file')
193 | ```
194 | 
195 | The response from Athena when the command has successfully run should look
196 | like this:
197 | ![CreateMetadata](https://github.com/NREL/ComStock/blob/cbianchi/documentation/documentation/Screen%20Shot%202021-04-15%20at%208.44.36%20PM.png?raw=true)
198 | 
199 | ### Creating the time series table
200 | 
201 | Next we will create the timeseries table. This query can take anywhere from 1
202 | minute to approximately 14 hours to run, so we advise starting this before bed!
203 | The wide range depends on whether or not it has been run recently by someone
204 | else, in which case AWS appears to use a cache rather than redoing things.
205 | 
206 | ```sql
207 | CREATE EXTERNAL TABLE `comstock_v1_state`(
208 |   `timestamp` timestamp,
209 |   `bldg_id` bigint,
210 |   `out.electricity.cooling.energy_consumption` double,
211 |   `out.electricity.cooling.energy_consumption_intensity` double,
212 |   `out.electricity.cooling.energy_savings` double,
213 |   `out.electricity.cooling.energy_savings_intensity` double,
214 |   `out.electricity.exterior_lighting.energy_consumption` double,
215 |   `out.electricity.exterior_lighting.energy_consumption_intensity` double,
216 |   `out.electricity.exterior_lighting.energy_savings` double,
217 |   `out.electricity.exterior_lighting.energy_savings_intensity` double,
218 |   `out.electricity.fans.energy_consumption` double,
219 |   `out.electricity.fans.energy_consumption_intensity` double,
220 |   `out.electricity.fans.energy_savings` double,
221 |   `out.electricity.fans.energy_savings_intensity` double,
222 |   `out.electricity.heat_recovery.energy_consumption` double,
223 |   `out.electricity.heat_recovery.energy_consumption_intensity` double,
224 |   `out.electricity.heat_recovery.energy_savings` double,
225 |   `out.electricity.heat_recovery.energy_savings_intensity` double,
226 |   `out.electricity.heat_rejection.energy_consumption` double,
227 |   `out.electricity.heat_rejection.energy_consumption_intensity` double,
228 |   `out.electricity.heat_rejection.energy_savings` double,
229 |   `out.electricity.heat_rejection.energy_savings_intensity` double,
230 |   `out.electricity.heating.energy_consumption` double,
231 |   `out.electricity.heating.energy_consumption_intensity` double,
232 |   `out.electricity.heating.energy_savings` double,
233 |   `out.electricity.heating.energy_savings_intensity` double,
234 |   `out.electricity.humidification.energy_consumption` double,
235 |   `out.electricity.humidification.energy_consumption_intensity` double,
236 |   `out.electricity.humidification.energy_savings` double,
237 |   `out.electricity.humidification.energy_savings_intensity` double,
238 |   `out.electricity.interior_equipment.energy_consumption` double,
239 |   `out.electricity.interior_equipment.energy_consumption_intensity` double,
240 |   `out.electricity.interior_equipment.energy_savings` double,
241 |   `out.electricity.interior_equipment.energy_savings_intensity` double,
242 |   `out.electricity.interior_lighting.energy_consumption` double,
243 |   `out.electricity.interior_lighting.energy_consumption_intensity` double,
244 |   `out.electricity.interior_lighting.energy_savings` double,
245 |   `out.electricity.interior_lighting.energy_savings_intensity` double,
246 |   `out.electricity.peak_demand.energy_consumption` double,
247 |   `out.electricity.peak_demand.energy_consumption_intensity` double,
248 |   `out.electricity.peak_demand.energy_savings` double,
249 |   `out.electricity.peak_demand.energy_savings_intensity` double,
250 |   `out.electricity.pumps.energy_consumption` double,
251 |   `out.electricity.pumps.energy_consumption_intensity` double,
252 |   `out.electricity.pumps.energy_savings` double,
253 |   `out.electricity.pumps.energy_savings_intensity` double,
254 |   `out.electricity.refrigeration.energy_consumption` double,
255 |   `out.electricity.refrigeration.energy_consumption_intensity` double,
256 |   `out.electricity.refrigeration.energy_savings` double,
257 |   `out.electricity.refrigeration.energy_savings_intensity` double,
258 |   `out.electricity.total.energy_consumption` double,
259 |   `out.electricity.total.energy_consumption_intensity` double,
260 |   `out.electricity.total.energy_savings` double,
261 |   `out.electricity.total.energy_savings_intensity` double,
262 |   `out.electricity.water_systems.energy_consumption` double,
263 |   `out.electricity.water_systems.energy_consumption_intensity` double,
264 |   `out.electricity.water_systems.energy_savings` double,
265 |   `out.electricity.water_systems.energy_savings_intensity` double,
266 |   `out.natural_gas.cooling.energy_consumption` double,
267 |   `out.natural_gas.cooling.energy_consumption_intensity` double,
268 |   `out.natural_gas.cooling.energy_savings` double,
269 |   `out.natural_gas.cooling.energy_savings_intensity` double,
270 |   `out.natural_gas.heating.energy_consumption` double,
271 |   `out.natural_gas.heating.energy_consumption_intensity` double,
272 |   `out.natural_gas.heating.energy_savings` double,
273 |   `out.natural_gas.heating.energy_savings_intensity` double,
274 |   `out.natural_gas.interior_equipment.energy_consumption` double,
275 |   `out.natural_gas.interior_equipment.energy_consumption_intensity` double,
276 |   `out.natural_gas.interior_equipment.energy_savings` double,
277 |   `out.natural_gas.interior_equipment.energy_savings_intensity` double,
278 |   `out.natural_gas.total.energy_consumption` double,
279 |   `out.natural_gas.total.energy_consumption_intensity` double,
280 |   `out.natural_gas.total.energy_savings` double,
281 |   `out.natural_gas.total.energy_savings_intensity` double,
282 |   `out.natural_gas.water_systems.energy_consumption` double,
283 |   `out.natural_gas.water_systems.energy_consumption_intensity` double,
284 |   `out.natural_gas.water_systems.energy_savings` double,
285 |   `out.natural_gas.water_systems.energy_savings_intensity` double,
286 |   `out.other_fuel.heating.energy_consumption` double,
287 |   `out.other_fuel.heating.energy_consumption_intensity` double,
288 |   `out.other_fuel.heating.energy_savings` double,
289 |   `out.other_fuel.heating.energy_savings_intensity` double,
290 |   `out.other_fuel.interior_equipment.energy_consumption` double,
291 |   `out.other_fuel.interior_equipment.energy_consumption_intensity` double,
292 |   `out.other_fuel.interior_equipment.energy_savings` double,
293 |   `out.other_fuel.interior_equipment.energy_savings_intensity` double,
294 |   `out.other_fuel.total.energy_consumption` double,
295 |   `out.other_fuel.total.energy_consumption_intensity` double,
296 |   `out.other_fuel.total.energy_savings` double,
297 |   `out.other_fuel.total.energy_savings_intensity` double,
298 |   `out.other_fuel.water_systems.energy_consumption` double,
299 |   `out.other_fuel.water_systems.energy_consumption_intensity` double,
300 |   `out.other_fuel.water_systems.energy_savings` double,
301 |   `out.other_fuel.water_systems.energy_savings_intensity` double,
302 |   `out.site_energy.total.energy_consumption` double,
303 |   `out.site_energy.total.energy_consumption_intensity` double,
304 |   `out.site_energy.total.energy_savings` double,
305 |   `out.site_energy.total.energy_savings_intensity` double)
306 | PARTITIONED BY (
307 |   `upgrade` bigint,
308 |   `state` string)
309 | ROW FORMAT SERDE
310 |   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
311 | WITH SERDEPROPERTIES (
312 |   'parquet.column.index.access'='true')
313 | STORED AS INPUTFORMAT
314 |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
315 | OUTPUTFORMAT
316 |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
317 | LOCATION
318 |   's3://nrel-pds-building-stock/comstock/athena/2020/comstock_v1/state/'
319 | TBLPROPERTIES (
320 |   'CrawlerSchemaDeserializerVersion'='1.0',
321 |   'CrawlerSchemaSerializerVersion'='1.0',
322 |   'UPDATED_BY_CRAWLER'='vizstock_oedi_comstock_v1_state',
323 |   'averageRecordSize'='11',
324 |   'classification'='parquet',
325 |   'compressionType'='none',
326 |   'objectCount'='30499408',
327 |   'recordCount'='1068697644480',
328 |   'sizeKey'='107871145911082',
329 |   'typeOfData'='file')
330 | ```
331 | 
332 | You will likely be logged out of AWS while waiting for this to complete. To
333 | check on if the query has completed click on the **History** tab next in the
334 | Athena GUI and look for the query that begins
335 | `CREATE EXTERNAL TABLE comstock_v1_state(`
336 | \- when the query updates from running to completed you will know you are ready
337 | to proceed to the next step.
338 | 
339 | ![CreateTable](https://github.com/NREL/ComStock/blob/cbianchi/documentation/documentation/Screen%20Shot%202021-04-15%20at%2011.03.03%20AM.png?raw=true)
340 | 
341 | ### Establishing partitions on the time series table
342 | 
343 | We use partitions to make our queries against the time series table quicker
344 | and more cost efficient. By splitting up the table by `state` and `upgrade`
345 | average query time decreases by ~48X and cost by even more. This works by
346 | having Athena mark which folders in S3 contain data relevant to each of the
347 | possible values for `state` or `upgrade`, and then only access those folders
348 | when the relevant value is requested. Guaranteeing that the partitions are all
349 | correctly instantiated one last query, which should take between one and four
350 | hours to complete:
351 | 
352 | ```sql
353 | MSCK REPAIR TABLE comstock_v1_state
354 | ```
355 | 
356 | Once again, you may need to use the **History** tab in the Athena GUI to check
357 | when this command completes.
358 | 
359 | ## Running queries against ComStock
360 | 
361 | Now that the tables have been created, we can query the data through Athena.
362 | 
363 | ### Core example query
364 | 
365 | Let's start with an example of a query that uses the metadata table and time
366 | series table to retrieve hourly data from a subset of the building simulations.
367 | In this example we'll get all available end-uses for a Medium Office in
368 | Wisconsin for the baseline case (no upgrade has been applied).
369 | 
370 | ```sql
371 | SELECT
372 |   sum(
373 |     "comstock_v1_state"."out.electricity.cooling.energy_consumption" * "comstock_v1_metadata"."weight"
374 |   ) AS "out.electricity.cooling.energy_consumption",
375 |   sum(
376 |     "comstock_v1_state"."out.electricity.exterior_lighting.energy_consumption" * "comstock_v1_metadata"."weight"
377 |   ) AS "out.electricity.exterior_lighting.energy_consumption",
378 |   sum(
379 |     "comstock_v1_state"."out.electricity.fans.energy_consumption" * "comstock_v1_metadata"."weight"
380 |   ) AS "out.electricity.fans.energy_consumption",
381 |   sum(
382 |     "comstock_v1_state"."out.electricity.heat_recovery.energy_consumption" * "comstock_v1_metadata"."weight"
383 |   ) AS "out.electricity.heat_recovery.energy_consumption",
384 |   sum(
385 |     "comstock_v1_state"."out.electricity.heat_rejection.energy_consumption" * "comstock_v1_metadata"."weight"
386 |   ) AS "out.electricity.heat_rejection.energy_consumption",
387 |   sum(
388 |     "comstock_v1_state"."out.electricity.heating.energy_consumption" * "comstock_v1_metadata"."weight"
389 |   ) AS "out.electricity.heating.energy_consumption",
390 |   sum(
391 |     "comstock_v1_state"."out.electricity.humidification.energy_consumption" * "comstock_v1_metadata"."weight"
392 |   ) AS "out.electricity.humidification.energy_consumption",
393 |   sum(
394 |     "comstock_v1_state"."out.electricity.interior_equipment.energy_consumption" * "comstock_v1_metadata"."weight"
395 |   ) AS "out.electricity.interior_equipment.energy_consumption",
396 |   sum(
397 |     "comstock_v1_state"."out.electricity.interior_lighting.energy_consumption" * "comstock_v1_metadata"."weight"
398 |   ) AS "out.electricity.interior_lighting.energy_consumption",
399 |   sum(
400 |     "comstock_v1_state"."out.electricity.peak_demand.energy_consumption" * "comstock_v1_metadata"."weight"
401 |   ) AS "out.electricity.peak_demand.energy_consumption",
402 |   sum(
403 |     "comstock_v1_state"."out.electricity.pumps.energy_consumption" * "comstock_v1_metadata"."weight"
404 |   ) AS "out.electricity.pumps.energy_consumption",
405 |   sum(
406 |     "comstock_v1_state"."out.electricity.refrigeration.energy_consumption" * "comstock_v1_metadata"."weight"
407 |   ) AS "out.electricity.refrigeration.energy_consumption",
408 |   sum(
409 |     "comstock_v1_state"."out.electricity.total.energy_consumption" * "comstock_v1_metadata"."weight"
410 |   ) AS "out.electricity.total.energy_consumption",
411 |   sum(
412 |     "comstock_v1_state"."out.electricity.water_systems.energy_consumption" * "comstock_v1_metadata"."weight"
413 |   ) AS "out.electricity.water_systems.energy_consumption",
414 |   sum(
415 |     "comstock_v1_state"."out.natural_gas.cooling.energy_consumption" * "comstock_v1_metadata"."weight"
416 |   ) AS "out.natural_gas.cooling.energy_consumption",
417 |   sum(
418 |     "comstock_v1_state"."out.natural_gas.heating.energy_consumption" * "comstock_v1_metadata"."weight"
419 |   ) AS "out.natural_gas.heating.energy_consumption",
420 |   sum(
421 |     "comstock_v1_state"."out.natural_gas.interior_equipment.energy_consumption" * "comstock_v1_metadata"."weight"
422 |   ) AS "out.natural_gas.interior_equipment.energy_consumption",
423 |   sum(
424 |     "comstock_v1_state"."out.natural_gas.total.energy_consumption" * "comstock_v1_metadata"."weight"
425 |   ) AS "out.natural_gas.total.energy_consumption",
426 |   sum(
427 |     "comstock_v1_state"."out.natural_gas.water_systems.energy_consumption" * "comstock_v1_metadata"."weight"
428 |   ) AS "out.natural_gas.water_systems.energy_consumption",
429 |   sum(
430 |     "comstock_v1_state"."out.site_energy.total.energy_consumption" * "comstock_v1_metadata"."weight"
431 |   ) AS "out.site_energy.total.energy_consumption",
432 |   "comstock_v1_metadata"."state" AS "location",
433 |   "comstock_v1_state"."upgrade" AS "comstock_v1_state_upgrade",
434 |   month(
435 |     "comstock_v1_state"."timestamp"
436 |   ) AS "month",
437 |   day(
438 |     "comstock_v1_state"."timestamp"
439 |   ) AS "day",
440 |   hour(
441 |     "comstock_v1_state"."timestamp"
442 |   ) AS "hour"
443 | FROM
444 |   "comstock_v1_state",
445 |   "comstock_v1_metadata"
446 | WHERE
447 |   "comstock_v1_state"."bldg_id" = "comstock_v1_metadata"."bldg_id"
448 |   AND "comstock_v1_state"."state" IN ('55')
449 |   AND "comstock_v1_state"."upgrade" = 0
450 |   AND "comstock_v1_metadata"."upgrade" = 0
451 |   AND "comstock_v1_metadata"."in.building_type" = 'MediumOffice'
452 | GROUP BY
453 |   "comstock_v1_metadata"."state",
454 |   "comstock_v1_state"."upgrade",
455 |   month(
456 |     "comstock_v1_state"."timestamp"
457 |   ),
458 |   day(
459 |     "comstock_v1_state"."timestamp"
460 |   ),
461 |   hour(
462 |     "comstock_v1_state"."timestamp"
463 |   )
464 | ORDER BY
465 |   "comstock_v1_metadata"."state",
466 |   "comstock_v1_state"."upgrade",
467 |   month(
468 |     "comstock_v1_state"."timestamp"
469 |   ),
470 |   day(
471 |     "comstock_v1_state"."timestamp"
472 |   ),
473 |   hour(
474 |     "comstock_v1_state"."timestamp"
475 |   )
476 | 
477 | ```
478 | 
479 | ![QuerySimple](https://github.com/NREL/ComStock/blob/cbianchi/documentation/documentation/Screen%20Shot%202021-04-15%20at%208.55.34%20PM.png?raw=true)
480 | 
481 | To download the returned data as a CSV file click the file icon highlighted in
482 | red in the screenshot above.
483 | 
484 | Now let's demonstrate so ways of changing up the query:
485 | 
486 | ### How to query data for other building types?
487 | 
488 | All we have to do for this is change the following line in the `WHERE` clause
489 | in the example query:
490 | 
491 | ```sql
492 |     AND "comstock_v1_metadata"."in.building_type" = 'MediumOffice'
493 | ```
494 | 
495 | Instead of 'MediumOffice' we can use one any of the following building types:
496 | 
497 | 'MediumOffice', 'LargeOffice', 'SecondarySchool', 'Hospital','Outpatient'
498 | 
499 | ### How to query other states?
500 | 
501 | All we have to do for this is change the following line in the `WHERE` clause
502 | in the example query:
503 | 
504 | ```sql
505 |     AND "comstock_v1_state"."state" IN ('55')
506 | ```
507 | 
508 | Instead of '55' (Wisconsin) we can use the FIPS code for another state, as
509 | listed here:
510 | https://www.nrcs.usda.gov/wps/portal/nrcs/detail/?cid=nrcs143_013696
511 | 
512 | ### How to query an upgrade?
513 | 
514 | All we have to do for this is change the following lines in the `WHERE` clause
515 | in the example query:
516 | 
517 | ```sql
518 |     AND "comstock_v1_state"."upgrade" = 0
519 |     AND "comstock_v1_metadata"."upgrade" = 0
520 | ```
521 | 
522 | Instead of '0' we can use one of the following options:
523 | ```
524 | 0: Baseline,
525 | 1: Upgrade Roof Insulation (R-19)
526 | 2: Upgrade Roof Insulation (R-30)
527 | 3: Upgrade Wall Insulation (R-13)
528 | 4: Upgrade Wall Insulation (R-30)
529 | 7: Add Window Film
530 | 8: Add Cool Roof
531 | 9: Add EIFS Wall Insulation
532 | 10: Add Electrochromic Windows (BleachedGlass)
533 | 11: Add Electrochromic Windows (TintedGlass)
534 | 12: Kitchen Exhaust Fan DCV
535 | 13: Upgrade Boiler (AFUE-81)
536 | 14: Upgrade Boiler (AFUE-83)
537 | 15: Upgrade Boiler (AFUE-94)
538 | 17: Upgrade Chiller (efficient)
539 | 18: Add Demand Control Ventilation
540 | 19: Add Economizer
541 | 20: Upgrade Furnace (AFUE-81)
542 | 21: Upgrade Furnace (AFUE-92)
543 | 22: Upgrade Furnace (AFUE-98)
544 | 23: Add Heat Recovery
545 | 24: Upgrade Motors
546 | 25: Add PTAC Controls
547 | 26: Upgrade Packaged Terminal Heat Pump (Code)
548 | 27: Upgrade Packaged Terminal Heat Pump (Efficient)
549 | 28: Upgrade Packaged Terminal Heat Pump (Highly Efficient)
550 | 29: Upgrade Packaged Terminal Air Conditioner (Code)
551 | 30: Upgrade Packaged Terminal Air Conditioner (Efficient)
552 | 31: Upgrade Packaged Terminal Air Conditioner (Highly Efficient)
553 | 32: Add VFD To Pumps
554 | 33: Upgrade RTU Air Source Heat Pump (IEER-13.3)
555 | 34: Upgrade RTU Air Source Heat Pump (IEER-15.0)
556 | 35: Upgrade RTU Air Source Heat Pump (IEER-16.5)
557 | 36: Upgrade RTU DX Air Conditioner (IEER-14.0)
558 | 37: Upgrade RTU DX Air Conditioner (IEER-15.5)
559 | 38: Upgrade RTU DX Air Conditioner (IEER-17.0)
560 | 39: Upgrade Split System DX Air Conditioner (SEER 14)
561 | 40: Upgrade Split System DX Air Conditioner (SEER 16)
562 | 41: Upgrade Split System DX Air Conditioner (SEER 18)
563 | 42: Upgrade Split System DX Air Conditioner (SEER 20)
564 | 45: Add Advanced Hybrid RTUs
565 | 46: Upgrade Air Filters
566 | 47: Add Brushless DC Compressor Motors
567 | 48: Reset Chilled Water Supply Temperature
568 | 49: Close Outdoor Air Dampers During Unoccupied Hours
569 | 50: Add Cold Climate Heat Pumps
570 | 51: Upgrade Duct Routing
571 | 52: Add Exhaust Fan Interlock
572 | 53: Reset Hot Water Temperature
573 | 55: Add Predictive Thermostats
574 | 56: Reset Supply Air
575 | 57: Add Thermoelastic Heat Pumps
576 | 58: Add Variable Speed Cooling Tower
577 | 61: Adjust Thermostat Setpoints
578 | 62: Upgrade Compact Lights
579 | 63: Add Lighting Occupancy Controls
580 | 64: Add Daylighting Controls
581 | 65: Upgrade High Bay Lights
582 | 66: Upgrade Linear Lights
583 | 67: Upgrade Outdoor Lights
584 | 68: Upgrade Specialty Lights
585 | 69: PC Power Management (Screen Saver)
586 | 70: PC Power Management (Desktop)
587 | 71: PC Power Management (Display)
588 | 72: PC Virtualization (Laptop)
589 | 73: PC Virtualization (Thin Client)
590 | 74: Advanced Power Strips (Power Strips)
591 | 75: Advanced Power Strips (Controllable Power Outlets)
592 | 76: Upgrade SWH Electric Storage Water Heater (Eff=0.88)
593 | 77: Upgrade SWH Electric Storage Water Heater (Eff=0.93)
594 | 78: Upgrade to Instantaneous Gas Water Heater
595 | 79: Upgrade SWH Gas Storage Water Heater (Eff=0.67)
596 | 80: Upgrade SWH Gas Storage Water Heater (Eff=0.70)
597 | 81: Upgrade SWH Gas Storage Water Heater (Eff=0.82)
598 | 82: Upgrade to Heat Pump Water Heater
599 | 83: Add Floating Heat Pressure Control
600 | 84: Add Refrigerated Walk-In Doorway Protection (Strip Curtain)
601 | 85: Add Refrigerated Walk-In Doorway Protection (Automatic Door Closer)
602 | 86: Add Refrigerated Walk-In Doorway Protection (Automatic Door Closer and Strip Curtain)
603 | 87: Upgrade Refrigerated Walk-In Motor (PSC)
604 | 88: Upgrade Refrigerated Walk-In Motor (ECM)
605 | ```
606 | 
607 | ### How to change time resolution?
608 | 
609 | 1. For changing the resolution to 15 minutes, all we have to do is modifying
610 |    the lines just before the `FROM` clause in the example query above as
611 |    follows:
612 | 
613 | ```sql
614 |     month(
615 |     "comstock_v1_state"."timestamp"
616 |     ) AS "month",
617 |     day(
618 |     "comstock_v1_state"."timestamp"
619 |     ) AS "day",
620 |     hour(
621 |     "comstock_v1_state"."timestamp"
622 |     ) AS "hour"
623 |     minute(
624 |     "comstock_v1_state"."timestamp"
625 |     ) AS "minute"
626 |     FROM
627 | ```
628 | 
629 | And then in the `GROUP BY` clause:
630 | 
631 | ```sql
632 |     GROUP BY
633 |     "comstock_v1_metadata"."state",
634 |     "comstock_v1_state"."upgrade",
635 |     month(
636 |     "comstock_v1_state"."timestamp"
637 |     ),
638 |     day(
639 |     "comstock_v1_state"."timestamp"
640 |     ),
641 |     hour(
642 |     "comstock_v1_state"."timestamp"
643 |     )
644 |     minute(
645 |     "comstock_v1_state"."timestamp"
646 |     )
647 | ```
648 | And finally in the `ORDER BY` clause:
649 | ```sql
650 |     ORDER BY
651 |     "comstock_v1_metadata"."state",
652 |     "comstock_v1_state"."upgrade",
653 |     month(
654 |     "comstock_v1_state"."timestamp"
655 |     ),
656 |     day(
657 |     "comstock_v1_state"."timestamp"
658 |     ),
659 |     hour(
660 |     "comstock_v1_state"."timestamp"
661 |     )
662 |     minute(
663 |     "comstock_v1_state"."timestamp"
664 |     )
665 | ```
666 | 
667 | 2. For changing the resolution to 1 day, all we have to do is modifying the
668 |    lines just before the `FROM` clause in the example query above as follows:
669 | 
670 | ```sql
671 |     month(
672 |     "comstock_v1_state"."timestamp"
673 |     ) AS "month",
674 |     day(
675 |     "comstock_v1_state"."timestamp"
676 |     ) AS "day"
677 |     FROM
678 | ```
679 | 
680 | And then in the `GROUP BY` clause:
681 | 
682 | ```sql
683 |     GROUP BY
684 |     "comstock_v1_metadata"."state",
685 |     "comstock_v1_state"."upgrade",
686 |     month(
687 |     "comstock_v1_state"."timestamp"
688 |     ),
689 |     day(
690 |     "comstock_v1_state"."timestamp"
691 |     )
692 | ```
693 | And finally in the `ORDER BY` clause:
694 | ```sql
695 |     ORDER BY
696 |     "comstock_v1_metadata"."state",
697 |     "comstock_v1_state"."upgrade",
698 |     month(
699 |     "comstock_v1_state"."timestamp"
700 |     ),
701 |     day(
702 |     "comstock_v1_state"."timestamp"
703 |     )
704 | ```
705 | 
706 | 3. For changing the resolution to 1 month, all we have to do is modifying the
707 |    lines just before the `FROM` clause in the example query above as follows:
708 | 
709 | ```sql
710 |     month(
711 |     "comstock_v1_state"."timestamp"
712 |     ) AS "month"
713 |     FROM
714 | ```
715 | 
716 | And then in the `GROUP BY` clause:
717 | 
718 | ```sql
719 |     GROUP BY
720 |     "comstock_v1_metadata"."state",
721 |     "comstock_v1_state"."upgrade",
722 |     month(
723 |     "comstock_v1_state"."timestamp"
724 |     )
725 | ```
726 | And finally in the `ORDER BY` clause:
727 | ```sql
728 |     ORDER BY
729 |     "comstock_v1_metadata"."state",
730 |     "comstock_v1_state"."upgrade",
731 |     month(
732 |     "comstock_v1_state"."timestamp"
733 |     )
734 | ```
735 | 
736 | ### How to average data rather than sums?
737 | 
738 | All we have to do for this is change the modify part in everythome it appears
739 | in the `SELECT` clause:
740 | 
741 | ```sql
742 |     sum(
743 | ```
744 | 
745 | It has to be changed with:
746 | ```sql
747 |     avg(
748 | ```
749 | 
750 | ### How to preview the table to see all available columns?
751 | 
752 | All we have to do for this is change the whole the example query above with
753 | what follows:
754 | 
755 | ```sql
756 |     SELECT *
757 |     FROM
758 |       "comstock_v1_state",
759 |       "comstock_v1_metadata"
760 |     WHERE
761 |       "comstock_v1_state"."bldg_id" = "comstock_v1_metadata"."bldg_id"
762 |       AND "comstock_v1_state"."state" IN ('55')
763 |       AND "comstock_v1_state"."upgrade" = 0
764 |       AND "comstock_v1_metadata"."upgrade" = 0
765 |       AND "comstock_v1_metadata"."in.building_type" = 'MediumOffice'
766 |     limit 10
767 | ```
768 | 
769 | # From here on follow the example above:
770 | 
771 | In each case say which block of code we're doing the update in - i.e. the
772 | group / order by clause or the where clause
773 | 
774 | things to include:
775 | 
776 | - How to filter by another variable - i.e. sqft or hvac system type
777 | - How to get all the allowable values for a metadata filter (using select
778 |   unique(col) from comstock_metadata_v0)
779 | 
780 | DONE
781 | - How to query multiple states / building types
782 | - How to query an upgrade
783 | - How to change resolution - give changes for 15 min, daily, and monthly
784 | - How to get average instead of sum by altering the select clauses
785 | - How to preview the table to see what columns are available using select *
786 |   from blah limit 10
787 | 


--------------------------------------------------------------------------------
/NSO.md:
--------------------------------------------------------------------------------
 1 | # High-Resolution Wind and Structural Loads Data measured on Parabolic Trough Solar Collectors at Nevada Solar One (NSO)
 2 | 
 3 | ## Description
 4 | 
 5 | <!--  A brief description of the data including:
 6 | - how it was produced?
 7 | - why it important/novel
 8 | - who/how it might be used -->
 9 | 
10 | This data set characterizes the complex wind conditions and resulting structural loads on full-scale, operational parabolic trough collectors.
11 | Over two years, NREL conducted comprehensive field measurements of the atmospheric turbulent wind conditions and the resulting structural wind loads on the parabolic troughs at the Nevada Solar One plant. The measurement set-up included meteorological masts and structural load sensors on four trough rows.
12 | Additionally, we commissioned a lidar, scanning the horizontal plane over the trough field.
13 | 
14 | Wind loading is a main contributor to structural design costs of Concentrating Solar Power (CSP) collectors, such as heliostats and parabolic troughs. These structures must resist the mechanical forces generated by turbulent wind. At the same time, the reflector surfaces must exhibit the necessary rigidity to maintain their optimal optical performance in windy conditions. 
15 | Studying wind-driven loads at a full-scale, fully operational CSP plant will provide insights into the wind impact on the solar collector fields, which currently is beyond the capabilities of wind tunnel tests or state-of-the-art simulations.
16 | 
17 | By providing this first-of-its-kind data set to the CSP community, we aim to enhance the community's understanding of wind-loading experienced by CSP collector structures.
18 | The data set might be used, for example, to verify simulations or for comparisons to wind tunnel tests.
19 | 
20 | ## Directory structure
21 | 
22 | <!--  If the dataset is made up of multiple files a description of how they are/will
23 | be stored in relation to each other. -->
24 | 
25 | The directory structure is:
26 | 
27 | NSO/  <br>
28 | &nbsp; {}/   &emsp;       &emsp; &emsp;     &emsp;        &emsp;    data set type: inflow_mast_1min, inflow_mast_20Hz, wake_masts_1min, wake_masts_20Hz, loads_1min, loads_20Hz, or lidar  <br>
29 | &nbsp;&nbsp;&nbsp;   year={}/     &emsp;    &emsp;  &emsp;        year  <br>
30 | &nbsp;&nbsp;&nbsp;&nbsp;    month={}/     &emsp;   &emsp;        month  <br>
31 | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     day={}/   &emsp;    &emsp;   &emsp;        day  <br>
32 | 
33 | 
34 | The structure of the filenames is:
35 | 
36 | Type_YYYY-MM-DD_00h_to_YYYY-MM-DD_00h.parquet  <br>
37 | YYYY-MM-DD: date of the daily file that contains data from 00:00 UTC to 24:00 UTC  (lidar data contain one scan shorter than a day, so the lidar file name also contains hour and minute)<br>
38 | Type: Inflow_mast_20Hz, Inflow_mast_1min, Wake_masts_20Hz, Wake_masts_1min, Loads_20Hz, Loads_1min, or Lidar  <br>
39 | 
40 | 
41 | Examples:
42 | 
43 | NSO/inflow_mast_20Hz/year=2022/month=02/day=03/Inflow_Mast_20Hz_2022-02-03_00h_to_2022-02-03_23h.parquet<br>
44 | NSO/lidar/year=2023/month=04/day=02/Lidar_2023-04-02_19-00-17_to_2023-04-02_19-00-17.parquet
45 | 
46 | 
47 | ## Data Format
48 | 
49 | <!--   How the data is stored with in each file including a data dictionary with
50 |  - dataset/variable/column names
51 |  - units -->
52 | 
53 | Data are stored in the Parquet file format. The variables and units in each dataset are listed in [1].
54 | 
55 | ## Code Examples
56 | 
57 | <!-- Example scripts of how to access the data IN THE CLOUD. A jupyter notebook or link to a github repo with examples can be used instead. -->
58 | 
59 | Example Python scripts to read the data are provided at the OEDI data lake.
60 | 
61 | ## References
62 | 
63 | [1] Egerer, U., Dana, S., Jager, D. et al. Wind and structural loads data measured on parabolic trough solar collectors at an operational power plant. Sci Data 11, 98 (2024). https://doi.org/10.1038/s41597-023-02896-4
64 | 


--------------------------------------------------------------------------------
/NSRDB.md:
--------------------------------------------------------------------------------
  1 | # Solar Resource Data: National Solar Radiation Database (NSRDB)
  2 | 
  3 | ## NSRDB
  4 | 
  5 | The National Solar Radiation Database (NSRDB) is a serially complete collection
  6 | of meteorological and solar irradiance data sets for the United States and a
  7 | growing list of international locations for 1998-2017. The NSRDB provides
  8 | foundational information to support U.S. Department of Energy programs,
  9 | research, and the general public.
 10 | 
 11 | The NSRDB provides time-series data at 30 minute resolution of resource
 12 | averaged over surface cells of 0.038 degrees in both latitude and longitude,
 13 | or nominally 4 km in size. The solar radiation values represent the resource
 14 | available to solar energy systems. The data was created using cloud properties
 15 | which are generated using the AVHRR Pathfinder Atmospheres-Extended (PATMOS-x)
 16 | algorithms developed by the University of Wisconsin. Fast all-sky radiation
 17 | model for solar applications (FARMS) in conjunction with the cloud properties,
 18 | and aerosol optical depth (AOD) and precipitable water vapor (PWV) from
 19 | ancillary source are used to estimate solar irradiance (GHI, DNI, and DHI).
 20 | The Global Horizontal Irradiance (GHI) is computed for clear skies using the
 21 | REST2 model. For cloud scenes identified by the cloud mask, FARMS is used to
 22 | compute GHI. The Direct Normal Irradiance (DNI) for cloud scenes is then
 23 | computed using the DISC model. The PATMOS-X model uses half-hourly radiance
 24 | images in visible and infrared channels from the GOES series of geostationary
 25 | weather satellites.  Ancillary variables needed to run REST2 and FARMS (e.g.,
 26 | aerosol optical depth, precipitable water vapor, and albedo) are derived from
 27 | the the Modern Era-Retrospective Analysis (MERRA-2) dataset. Temperature and
 28 | wind speed data are also derived from MERRA-2 and provided for use in SAM to
 29 | compute PV generation.
 30 | 
 31 | The following variables are provided by the NSRDB:
 32 | - Irradiance:
 33 |     - Global Horizontal (ghi)
 34 |     - Direct Normal (dni)
 35 |     - Diffuse (dhi)
 36 | - Clear-sky Irradiance
 37 | - Cloud Type
 38 | - Dew Point
 39 | - Temperature
 40 | - Surface Albedo
 41 | - Pressure
 42 | - Relative Humidity
 43 | - Solar Zenith Angle
 44 | - Precipitable Water
 45 | - Wind Direction
 46 | - Wind Speed
 47 | - Fill Flag
 48 | - Angstrom wavelength exponent (alpha)
 49 | - Aerosol optical depth (aod)
 50 | - Aerosol asymmetry parameter (asymmetry)
 51 | - Cloud optical depth (cld_opd_dcomp)
 52 | - Cloud effective radius (cld_ref_dcomp)
 53 | - cloud_press_acha
 54 | - Reduced ozone vertical pathlength (ozone)
 55 | - Aerosol single-scatter albedo (ssa)
 56 | 
 57 | 
 58 | ## Directory structure
 59 | 
 60 | Solar resource data is made available as a series of .h5 files corresponding to
 61 | each year and can be found at s3://nrel-pds-nsrdb/v3/nsrdb_${year}.h5
 62 | 
 63 | The NSRDB data is also available via HSDS at /nrel/nsrdb/nsrdb_${year}.h5
 64 | 
 65 | For examples on setting up and using HSDS please see our [examples repository](https://github.com/nrel/hsds-examples)
 66 | 
 67 | ## Data Format
 68 | 
 69 | The data is provided in high density data file (.h5) separated by year.  The
 70 | variables mentioned above are provided in 2 dimensional time-series arrays with
 71 | dimensions (time x location). The temporal axis is defined by the `time_index`
 72 | dataset, while the positional axis is defined by the `meta` dataset. For
 73 | storage efficiency each variable has been scaled and stored as an integer. The
 74 | scale-factor is provided in the 'psm_scale-factor' attribute.  The units for
 75 | the variable data is also provided as an attribute (`psm_units`).
 76 | 
 77 | ## Python Examples
 78 | 
 79 | Example scripts to extract solar resource data using python are provided below:
 80 | 
 81 | The easiest way to access and extract data from the Resource eXtraction tool
 82 | [`rex`](https://github.com/nrel/rex)
 83 | 
 84 | ```python
 85 | from rex import NSRDBX
 86 | 
 87 | nsrdb_file = '/nrel/nsrdb/nsrdb_2010.h5'
 88 | with NSRDBX(nsrdb_file, hsds=True) as f:
 89 |     meta = f.meta
 90 |     time_index = f.time_index
 91 |     dni = f['dni']
 92 | ```
 93 | 
 94 | `rex` also allows easy extraction of the nearest site to a desired (lat, lon)
 95 | location:
 96 | 
 97 | ```python
 98 | from rex import NSRDBX
 99 | 
100 | nsrdb_file = '/nrel/nsrdb/nsrdb_2010.h5'
101 | nrel = (39.741931, -105.169891)
102 | with NSRDBX(nsrdb_file, hsds=True) as f:
103 |     nrel_dni = f.get_lat_lon_df('dni', nrel)
104 | ```
105 | 
106 | or to extract all sites in a given region:
107 | 
108 | ```python
109 | from rex import NSRDBX
110 | 
111 | nsrdb_file = '/nrel/nsrdb/nsrdb_2010.h5'
112 | state='Colorado'
113 | with NSRDBX(nsrdb_file, hsds=True) as f:
114 |     co_dni = f.get_region_df('dni', state, region_col='state')
115 | ```
116 | 
117 | Lastly, `rex` can be used to extract all variables needed to run SAM at a given
118 | location:
119 | 
120 | ```python
121 | from rex import NSRDBX
122 | 
123 | nsrdb_file = '/nrel/nsrdb/nsrdb_2010.h5'
124 | nrel = (39.741931, -105.169891)
125 | with NSRDBX(nsrdb_file, hsds=True) as f:
126 |     nrel_sam_vars = f.get_SAM_df(nwtc)
127 | ```
128 | 
129 | If you would rather access the NSRDB data directly using h5pyd:
130 | 
131 | ```python
132 | # Extract the average direct normal irradiance (dni)
133 | import h5pyd
134 | import pandas as pd
135 | 
136 | # Open .h5 file
137 | with h5pyd.File('/nrel/nsrdb/nsrdb_2010.h5', mode='r') as f:
138 |     # Extract meta data and convert from records array to DataFrame
139 |     meta = pd.DataFrame(f['meta'][...])
140 |     # dni dataset
141 |     dni= f['dni']
142 |     # Extract scale factor
143 |     scale_factor = dni.attrs['psm_scale_factor']
144 |     # Extract, average, and un-scale dni
145 |     mean_dni= dni[...].mean(axis=0) / scale_factor
146 | 
147 | # Add mean windspeed to meta data
148 | meta['Average DNI'] = mean_dni
149 | ```
150 | 
151 | ```python
152 | # Extract time-series data for a single site
153 | import h5pyd
154 | import pandas as pd
155 | 
156 | # Open .h5 file
157 | with h5pyd.File('/nrel/nsrdb/nsrdb_2010.h5', mode='r') as f:
158 |     # Extract time_index and convert to datetime
159 |     # NOTE: time_index is saved as byte-strings and must be decoded
160 |     time_index = pd.to_datetime(f['time_index'][...].astype(str))
161 |     # Initialize DataFrame to store time-series data
162 |     time_series = pd.DataFrame(index=time_index)
163 |     # Extract variables needed to compute generation from SAM:
164 |     for var in ['dni', 'dhi', 'air_temperature', 'wind_speed']:
165 |     	# Get dataset
166 |     	ds = f[var]
167 |     	# Extract scale factor
168 |     	scale_factor = ds.attrs['psm_scale_factor']
169 |     	# Extract site 100 and add to DataFrame
170 |     	time_series[var] = ds[:, 100] / scale_factor
171 | ```
172 | 
173 | ## References
174 | 
175 | For more information about the NSRDB please see the
176 | [website](https://nsrdb.nrel.gov/)
177 | Users of the NSRDB should please cite:
178 | - [Sengupta, M., Y. Xie, A. Lopez, A. Habte, G. Maclaurin, and J. Shelby. 2018. "The National Solar Radiation Data Base (NSRDB)." Renewable and Sustainable Energy Reviews  89 (June): 51-60.](https://www.sciencedirect.com/science/article/pii/S136403211830087X?via%3Dihub)


--------------------------------------------------------------------------------
/PVROOFTOPS.md:
--------------------------------------------------------------------------------
  1 | # PV Rooftops
  2 | 
  3 | ## Description
  4 | 
  5 | The National Renewable Energy Laboratory's (NREL) PV Rooftop Database (PVRDB) is a lidar-derived, geospatially-resolved dataset of suitable roof surfaces and their PV technical potential for 128 metropolitan regions in the United States. The source lidar data and building footprints were obtained by the U.S. Department of Homeland Security Homeland Security Infrastructure Program for 2006-2014. Using GIS methods, NREL identified suitable roof surfaces based on their size, orientation, and shading parameters Gagnon et al. (2016). Standard 2015 technical potential was then estimated for each plane using NREL's System Advisory Model.
  6 | 
  7 | The PVRDB is down-loadable by city and year of lidar collection. Five geospatial layers are available for each city and year: 1) the raster extent of the lidar collection, 2) buildings identified from the lidar data, 3) suitable developable planes for each building, 4) aspect values of the developable planes, and 5) the technical potential estimates of the developable planes.
  8 | 
  9 | ## Data Format
 10 | 
 11 | The PV Rooftops dataset is provided in Parquet format partitioned by city.  There are 4 core datasets stored in S3 partitioned by region(city)-year for downloads of single cities or to allow city-specific queries or queries across the dataset using Glue/Athena:
 12 | 
 13 | /aspects
 14 | field | data_type | description
 15 | -- | -- | --
 16 | `gid` | bigint |  
 17 | `city` | string | city of source lidar dataset
 18 | `state` | string | state of source lidar dataset
 19 | `year` | bigint | year of source lidar dataset
 20 | `bldg_fid` | bigint | building id
 21 | `aspect` | bigint | aspect value
 22 | `the_geom_96703` | string | projected geometry ([US Contiguous Albers Equal Area Conic - SRID 6703](https://spatialreference.org/ref/sr-org/6703/))
 23 | `the_geom_4326` | string | geometry ([WGS 1984 - SRID   4326](https://spatialreference.org/ref/epsg/4326/))
 24 | `region_id` | bigint |  
 25 | 
 26 | 
 27 | /buildings
 28 | 
 29 | field | data_type | description
 30 | -- | -- | --
 31 | `gid` | bigint |  
 32 | `bldg_fid` | bigint | the building fid
 33 | `the_geom_96703` | string | projected geometry ([US Contiguous Albers Equal Area Conic - SRID 6703](https://spatialreference.org/ref/sr-org/6703/))
 34 | `the_geom_4326` | string | geometry ([WGS 1984 - SRID   4326](https://spatialreference.org/ref/epsg/4326/))
 35 | `city` | string | the city of the source lidar data
 36 | `state` | string | the state of the source lidar data
 37 | `year` | bigint | the year of the source lidar data
 38 | `region_id` | bigint |  
 39 | 
 40 | 
 41 | /developable_planes
 42 | 
 43 | field | data_type | description
 44 | -- | -- | --
 45 | `bldg_fid` | bigint | building ID associated with the developable plane
 46 | `footprint_m2` | double | developable plane footprint area (m2)
 47 | `slope` | bigint | slope value
 48 | `flatarea_m2` | double | flat area of the developable plane (m2)
 49 | `slopeconversion` | double | the slope conversion factor used to convert the flat area into the sloped   area
 50 | `slopearea_m2` | double | sloped area of the developable plane (m2)
 51 | `zip` | string | zipcode
 52 | `zip_perc` | double |  
 53 | `aspect` | bigint | the aspect value of the developable plane
 54 | `gid` | bigint | unique developable plane ID
 55 | `city` | string | the city of the source lidar data
 56 | `state` | string | the state of the source lidar data
 57 | `year` | bigint | the year of the source lidar data
 58 | `region_id` | bigint |  
 59 | `the_geom_96703` | string | projected geometry ([US Contiguous Albers Equal Area Conic - SRID 6703](https://spatialreference.org/ref/sr-org/6703/))
 60 | `the_geom_4326` | string | geometry ([WGS 1984 - SRID   4326](https://spatialreference.org/ref/epsg/4326/))
 61 | 
 62 | 
 63 | /rasd
 64 | 
 65 | field | data_type | description
 66 | -- | -- | --
 67 | `gid` | bigint | the unique geographic ID of the raster domain
 68 | `the_geom_96703` | string | projected geometry ([US Contiguous Albers Equal Area Conic - SRID 6703](https://spatialreference.org/ref/sr-org/6703/))
 69 | `the_geom_4326` | string | geometry ([WGS 1984 - SRID   4326](https://spatialreference.org/ref/epsg/4326/))
 70 | `city` | string | the city of the source lidar data
 71 | `state` | string | the state of the source lidar data
 72 | `year` | bigint | the year of the source lidar data
 73 | `region_id` | bigint |  
 74 | `serial_id` | bigint |  
 75 | `__index_level_0__` | bigint |  
 76 | 
 77 | 
 78 | Within each core dataset there are paritions by city_state_year(YY) that can be queried directly via Athena or PrestoDB with relatively quick response times, or downloaded as a Parquet format data file.
 79 | 
 80 | Aspects Lookup:
 81 | ```
 82 | 1	337.5 - 22.5	north
 83 | 2	22.5 - 67.5	northeast
 84 | 3	67.5 - 112.5	east
 85 | 4	112.5 - 157.5 southeast
 86 | 5	157.5 - 202.5	south
 87 | 6	202.5 - 247.5	southwest
 88 | 7	247.5 - 292.5	west
 89 | 8	292.5 - 337.5	northwest
 90 | 0	flat	flat
 91 | ```
 92 | 
 93 | Regions Lookup:
 94 | ```
 95 | 1	Albany	NY	2006-01-01
 96 | 2	Albany	NY	2013-01-01
 97 | 3	Albuquerque	NM	2006-01-01
 98 | 4	Albuquerque	NM	2012-01-01
 99 | 5	Allentown	PA	2006-01-01
100 | 6	Amarillo	TX	2008-01-01
101 | 7	Anaheim	CA	2010-01-01
102 | 8	Arnold	MO	2006-01-01
103 | 9	Atlanta	GA	2008-01-01
104 | 10	Atlanta	GA	2013-01-01
105 | 11	Augusta	GA	2010-01-01
106 | 12	Augusta	ME	2008-01-01
107 | 13	Austin	TX	2006-01-01
108 | 14	Austin	TX	2012-01-01
109 | 15	Bakersfield	CA	2010-01-01
110 | 16	Baltimore	MD	2008-01-01
111 | 17	Baltimore	MD	2013-01-01
112 | 18	Baton Rouge	LA	2006-01-01
113 | 19	Baton Rouge	LA	2012-01-01
114 | 20	Birmingham	AL	2008-01-01
115 | 21	Bismarck	ND	2008-01-01
116 | 22	Boise	ID	2007-01-01
117 | 23	Boise	ID	2013-01-01
118 | 24	Boulder	CO	2014-01-01
119 | 25	Bridgeport	CT	2006-01-01
120 | 26	Bridgeport	CT	2013-01-01
121 | 27	Buffalo	NY	2008-01-01
122 | 28	Carson City	NV	2009-01-01
123 | 29	Charleston	SC	2010-01-01
124 | 30	Charleston	WV	2009-01-01
125 | 31	Charlotte	NC	2006-01-01
126 | 32	Charlotte	NC	2012-01-01
127 | 33	Cheyenne	WY	2008-01-01
128 | 34	Chicago	IL	2008-01-01
129 | 35	Chicago	IL	2012-01-01
130 | 36	Cincinnati	OH	2010-01-01
131 | 37	Cleveland	OH	2012-01-01
132 | 38	Colorado Springs	CO	2006-01-01
133 | 39	Colorado Springs	CO	2013-01-01
134 | 40	Columbia	SC	2009-01-01
135 | 41	Columbus	GA	2009-01-01
136 | 42	Columbus	OH	2006-01-01
137 | 43	Columbus	OH	2012-01-01
138 | 44	Concord	NH	2009-01-01
139 | 45	Corpus Christi	TX	2012-01-01
140 | 46	Dayton	OH	2006-01-01
141 | 47	Dayton	OH	2012-01-01
142 | 48	Denver	CO	2012-01-01
143 | 49	Des Moines	IA	2010-01-01
144 | 50	Detroit	MI	2012-01-01
145 | 51	Dover	DE	2009-01-01
146 | 52	El Paso	TX	2007-01-01
147 | 53	Flint	MI	2009-01-01
148 | 54	Fort Wayne	IN	2008-01-01
149 | 55	Frankfort	KY	2012-01-01
150 | 56	Fresno	CA	2006-01-01
151 | 57	Fresno	CA	2013-01-01
152 | 58	Ft Belvoir	DC	2012-01-01
153 | 59	Grand Rapids	MI	2013-01-01
154 | 60	Greensboro	NC	2009-01-01
155 | 61	Harrisburg	PA	2009-01-01
156 | 62	Hartford	CT	2006-01-01
157 | 63	Hartford	CT	2013-01-01
158 | 64	Helena	MT	2007-01-01
159 | 65	Helena	MT	2013-01-01
160 | 66	Houston	TX	2010-01-01
161 | 67	Huntsville	AL	2009-01-01
162 | 68	Indianapolis	IN	2006-01-01
163 | 69	Indianapolis	IN	2012-01-01
164 | 70	Jackson	MS	2007-01-01
165 | 71	Jacksonville	FL	2010-01-01
166 | 72	Jefferson City	MO	2008-01-01
167 | 73	Kansas City	MO	2010-01-01
168 | 74	Kansas City	MO	2012-01-01
169 | 75	LaGuardia JFK	NY	2012-01-01
170 | 76	Lancaster	PA	2010-01-01
171 | 77	Lansing	MI	2007-01-01
172 | 78	Lansing	MI	2013-01-01
173 | 79	Las Vegas	NV	2009-01-01
174 | 80	Lexington	KY	2012-01-01
175 | 81	Lincoln	NE	2008-01-01
176 | 82	Little Rock	AR	2008-01-01
177 | 83	Los Angeles	CA	2007-01-01
178 | 84	Louisville	KY	2006-01-01
179 | 85	Louisville	KY	2012-01-01
180 | 86	Lubbock	TX	2008-01-01
181 | 87	Madison	WI	2010-01-01
182 | 88	Manhattan	NY	2007-01-01
183 | 89	McAllen	TX	2008-01-01
184 | 90	Miami	FL	2009-01-01
185 | 91	Milwaukee	WI	2007-01-01
186 | 92	Milwaukee	WI	2013-01-01
187 | 93	Minneapolis	MN	2007-01-01
188 | 94	Minneapolis	MN	2012-01-01
189 | 95	Mission Viejo	CA	2013-01-01
190 | 96	Mobile	AL	2010-01-01
191 | 97	Modesto	CA	2010-01-01
192 | 98	Montgomery	AL	2007-01-01
193 | 99	Montpelier	VT	2009-01-01
194 | 100	Newark	NJ	2007-01-01
195 | 101	New Haven	CT	2007-01-01
196 | 102	New Haven	CT	2013-01-01
197 | 103	New Orleans	LA	2008-01-01
198 | 104	New Orleans	LA	2012-01-01
199 | 105	New York	NY	2005-01-01
200 | 106	New York	NY	2013-01-01
201 | 107	Norfolk	VA	2007-01-01
202 | 108	Oklahoma City	OK	2007-01-01
203 | 109	Oklahoma City	OK	2013-01-01
204 | 110	Olympia	WA	2010-01-01
205 | 111	Omaha	NE	2007-01-01
206 | 112	Omaha	NE	2013-01-01
207 | 113	Orlando	FL	2009-01-01
208 | 114	Oxnard	CA	2010-01-01
209 | 115	Palm Bay	FL	2010-01-01
210 | 116	Pensacola	FL	2009-01-01
211 | 117	Philadelphia	PA	2007-01-01
212 | 118	Pierre	SD	2008-01-01
213 | 119	Pittsburgh	PA	2004-01-01
214 | 120	Pittsburgh	PA	2012-01-01
215 | 121	Portland	OR	2012-01-01
216 | 122	Poughkeepsie	NY	2012-01-01
217 | 123	Providence	RI	2004-01-01
218 | 124	Providence	RI	2012-01-01
219 | 125	Raleigh-Durham	NC	2010-01-01
220 | 126	Reno	NV	2007-01-01
221 | 127	Richmond	VA	2008-01-01
222 | 128	Richmond	VA	2013-01-01
223 | 129	Rochester	NY	2008-01-01
224 | 130	Rochester	NY	2014-01-01
225 | 131	Sacramento	CA	2012-01-01
226 | 132	Salem	OR	2008-01-01
227 | 133	Salt Lake City	UT	2012-01-01
228 | 134	San Antonio	TX	2008-01-01
229 | 135	San Antonio	TX	2013-01-01
230 | 137	San Diego	CA	2008-01-01
231 | 138	San Diego	CA	2013-01-01
232 | 139	San Francisco	CA	2013-01-01
233 | 140	Santa Fe	NM	2009-01-01
234 | 141	Sarasota	FL	2009-01-01
235 | 142	Scranton	PA	2008-01-01
236 | 143	Seattle	WA	2011-01-01
237 | 144	Shreveport	LA	2008-01-01
238 | 145	Spokane	WA	2008-01-01
239 | 146	Springfield	IL	2009-01-01
240 | 147	Springfield	MA	2007-01-01
241 | 148	Springfield	MA	2013-01-01
242 | 149	St Louis	MO	2008-01-01
243 | 150	St Louis	MO	2013-01-01
244 | 151	Stockton	CA	2010-01-01
245 | 152	Syracuse	NY	2008-01-01
246 | 153	Tallahassee	FL	2009-01-01
247 | 154	Tampa	FL	2008-01-01
248 | 155	Toledo	OH	2006-01-01
249 | 156	Toledo	OH	2012-01-01
250 | 157	Topeka	KS	2008-01-01
251 | 158	Trenton	NJ	2008-01-01
252 | 159	Tucson	AZ	2007-01-01
253 | 160	Tulsa	OK	2008-01-01
254 | 161	Washington	DC	2009-01-01
255 | 162	Washington	DC	2012-01-01
256 | 163	Wichita	KS	2012-01-01
257 | 164	Winston-Salem	NC	2009-01-01
258 | 165	Worcester	MA	2009-01-01
259 | 166	Youngstown	OH	2008-01-01
260 | 167	Andrews AFB	DC	2012-01-01
261 | 136	San Bernardino-Riverside	CA	2012-01-01
262 | 168	Tampa	FL	2013-01-01
263 | ```
264 | 
265 | ## Model
266 | 
267 | Coming Soon: Details on the PV Suitability Model.
268 | 
269 | ## Directory structure
270 | 
271 | The Tracking the Sun Dataset is made available in Parquet format on AWS in S3. The four main datasets are stored in individual folders, and each partition is stored in an individual subfolder within each directory.  
272 | 
273 |  - `s3://oedi_pv_rooftops/`
274 | 
275 | Main datasets
276 |  /aspects
277 |  /buildings
278 |  /developable_planes
279 |  /rasd
280 | 
281 | Partitions
282 | /city_state_year    i.e. (/dover_de_09)
283 | 
284 | 
285 | ## Python Examples
286 | 
287 | ```python
288 | 
289 | import pandas as pd
290 | from pyathena import connect
291 | 
292 | conn = connect(
293 |     s3_staging_dir='s3://<USER DEFINED STAGING>/', ##user defined staging directory
294 |     region_name='us-west-2',
295 |     work_group='<USER SPECIFIC WORKGROUP>'  ##specify workgroup if exists
296 | )
297 | 
298 | df = pd.read_sql("SELECT * FROM oedi.pv_rooftops_developable_planes limit 8;",conn)
299 | ```
300 | 
301 | For jupyter notebook example see our notebook which includes partitions and data dictionary: [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples)
302 | 
303 | 
304 | ## References
305 | 
306 | Main References:
307 | 1. [Rooftop Solar Photovoltaic Technical Potential in the United States: A Detailed Assessment](https://www.nrel.gov/docs/fy16osti/65298.pdf)
308 | 
309 | 2. [Using GIS-based methods and lidar data to estimate rooftop solar technical potential in US cities](https://iopscience.iop.org/article/10.1088/1748-9326/aa7225/pdf)
310 | 
311 | 3. [Estimating rooftop solar technical potential across the US using a combination of GIS-based methods, lidar data, and statistical modeling](https://iopscience.iop.org/article/10.1088/1748-9326/aaa554/pdf)
312 | 
313 | 4. [Rooftop Photovoltaic Technical Potential in the United States](https://data.nrel.gov/submissions/121)
314 | 
315 | 5. [U.S. PV-Suitable Rooftop Resources](https://data.nrel.gov/submissions/47)
316 | 
317 | Related Reference:
318 | 
319 | 1. [Rooftop Solar Technical Potential for Low-to-Moderate Income Households in the United States](https://www.nrel.gov/docs/fy18osti/70901.pdf)
320 | 
321 | 2. [Rooftop Energy Potential of Low Income Communities in America REPLICA](https://data.nrel.gov/submissions/81)
322 | 
323 | 3. [Puerto Rico Solar-for-All: LMI PV Rooftop Technical Potential and Solar Savings Potential](https://data.nrel.gov/submissions/144)
324 | 
325 | 
326 | ## Disclaimer and Attribution
327 | 
328 | Copyright (c) 2020, Alliance for Sustainable Energy LLC, All rights reserved.
329 | 
330 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
331 | 
332 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
333 | 
334 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
335 | 
336 | * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
337 | 
338 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
339 | 


--------------------------------------------------------------------------------
/PVROOFTOPS_PR.md:
--------------------------------------------------------------------------------
  1 | # PV Rooftop Database - Puerto Rico (PVRDB-PR)
  2 | 
  3 | 
  4 | ## Description
  5 | 
  6 | The National Renewable Energy Laboratory's (NREL) PV Rooftop Database for Puerto Rico (PVRDB-PR) is a lidar-derived, geospatially-resolved dataset of suitable roof surfaces and their PV technical potential for virtually all buildings in Puerto Rico. The source lidar data were obtained from 2017 3cm NASA G-LiGHT and 2015-2016 <0.35m USGS 3DEP data. The building footprints were obtained from Open Street Map Buildings and were collected in 2018. Using GIS methods, NREL identified suitable roof surfaces based on their size, orientation, and shading parameters ([Gagnon et al. 2016](https://www.nrel.gov/docs/fy16osti/65298.pdf)). Standard 2020 technical potential was then estimated for each plane using the NREL [System Advisory Model's (SAM) PVWatts v5 model](https://developer.nrel.gov/docs/solar/pvwatts/v5/) and solar irradiance from the NREL [National Solar Radiation Database PSM3 for 2017](https://nsrdb.nrel.gov/).
  7 | 
  8 | The PVRDB-PR is downloadable by county for Puerto Rico. The data consists of PV suitable roof surfaces (aka "developable planes") for 96% of buildings in Puerto Rico. The developable planes contain orientation, area, spatial geometries, and technical potential details of each developable plane suitable for rooftop solar in Puerto Rico. 
  9 | 
 10 | The PV Rooftops dataset is provided in Parquet format partitioned by county name. The dataset can be queried directly via Athena or PrestoDB with relatively quick response time, or downloaded as a Parquet format data file from S3 (`s3://oedi-data-lake/pv-rooftop-pr/developable-planes`).
 11 | 
 12 | 
 13 | ## Data Dictionary
 14 | 
 15 | Column | Type | Description
 16 | -- | -- | --
 17 | `devp_gid` | bigint | The developable plane geographic ID (**unique/primary-key**).
 18 | `bldg_plane_id` | integer | The plane ID unique to a given building (`bldg_fid`).
 19 | `bldg_fid` | integer | The building feature ID (**foreign key**). The `bldg_fid` can be tagged to the HOTOSM building `fid` column to obtain the building feature and geometry.
 20 | `bg_geoid` | text | The census block group GEOID. The `bg_geoid` is 12 characters long. The first 2 numbers represent the State FIPS, the next 3 are County FIPS, the next 6 are the Tract FIPS and the last 1 is the Block Group FIPS. 
 21 | `azimuth` | numeric | Plane azimuth angle (degrees). 0 values represent flat roofs.
 22 | `tilt` | numeric | The panel sloped tilt (degrees). For flat roofs (azimuth=0), we assume a 15-degree panel tilt. For tilted roofs, we use the tilt of the roof plane.
 23 | `flat_m2` | double precision | Flat "bird's-eye" area of the developable plane surface (meters-squared).
 24 | `slope_degrees` | numeric | Slope of the developable roof plane in degrees.
 25 | `slope_m2` | double precision | Slope area of the developable place surface (meters-squared).
 26 | `pitchmultiplier` | numeric | This is the multiplier value used to calculate the slope area from the flat roof area.
 27 | `pct_shading` | numeric | The average diffuse shading value (%) from 4 representative days (summer and winter solstices, autumnal and vernal equinoxes).
 28 | `area_derate_factor` | numeric | This is the area derate factor used to identify the PV module area from the total developable plane area. Options include 0.70 for flat roofs and 0.98 for tilted roofs. 
 29 | `dc_ac_ratio` | numeric | The DC to AC ratio. We used the PVWatts default value of 1.2.
 30 | `array_type` | integer | The PVWatts array type. We use Fixed-Tilt Rooftop PV (value=1) for all simulations.
 31 | `losses` | numeric | Total system losses (%). We used the PVWatts default of 14.08% for all simulations.
 32 | `module_type` | integer | The PV module type used in the PVWatts estimation. We used the "Standard" module type (value=0) for all simulations.
 33 | `inverter_efficiency` | integer | Inverter efficiency at rated power (%). We used the SAM PVWatts default of 96% for all simulations.
 34 | `annual_kwh` | double precision | Annual generation (kWh) potential of each array.
 35 | `the_geom_4326` | USER-DEFINED | Geometry in WGS84 (SRID=4326)
 36 | `the_geom_32620` | USER-DEFINED | Projected Geometry (SRID=32620)
 37 | `county` | text | Census county name
 38 | 
 39 | ### Azimuth Coded Values
 40 | 
 41 | The azimuth values in PVRDB-PR are coded values which represent the median value of a range of azimuth degrees. The table below decodes the azimuth values with the value range and the ordinal direction.
 42 | 
 43 | Azimuth | Azimuth Value Range | Ordinal Direction
 44 | -- | -- | --
 45 | 0 | Flat | Flat
 46 | 45 | 22.5-67.5 | NE
 47 | 90 | 67.5-112.5 | E
 48 | 135 | 112.5-157.5 | SE
 49 | 180 | 157.5-202.5 | S
 50 | 225 | 202.5-247.5 | SW
 51 | 270 | 247.5-292.5 | W
 52 | 315 | 292.5-237.5 | NW
 53 | 360 | 337.5-360; 0-22.5 | N
 54 | 
 55 | ## Methods and Assumptions
 56 | 
 57 | The PVRDB data uses similar GIS methods as [PVRDB](https://registry.opendata.aws/nrel-oedi-pv-rooftops/) as described in [Gagnon et al. (2016)](https://www.nrel.gov/docs/fy16osti/65298.pdf). Unlike previous data versions, PVRDB-PR technical potential estimates are based on NREL’S PV Rooftop Model v2.0 updated assumptions, such as:
 58 | - Power Density is assumed to be 182 W/m2 (compared to the 160 W/m2 assumption in the 2016 U.S. study).
 59 | - North facing planes are not excluded for PR, however, they were excluded from the 2016 U.S. study. In PR, 13% of the rooftop PV technical potential is a N, NE, NW facing plane.
 60 | - Minimum size requirement for a developable plane is set to 1.62 m2, which is the average size required for 1 250-Watt solar panel. A building is considered suitable if it meets the other criteria and it has at least one plane large enough for a solar panel. In the 2016 U.S. Study, a suitable building needed a minimum 10m2. If we applied the same >= 10m2 assumption to Puerto Rico, generation would be reduced by ~4 TWh for the total building potential (all residential buildings). 
 61 | - The shading assumption in this PR assessment was updated to apply percent shading directly at the developable plane level into the System Advisor Model (SAM) when calculating generation potential. For the U.S. assessment, % shading was used to screen potential planes, but it was not used directly at the plane level when processed in SAM to get the generation; instead, the SAM default of 3% was applied. This new approach is more accurate than previous estimates, but it results in a lower kWh/kW estimate for Puerto Rico compared to the U.S.
 62 | 
 63 | ### Data Sources
 64 | 
 65 | 1. [NASA G-LiHT](https://gliht.gsfc.nasa.gov/index.php?section=49): 3cm LiDAR data collected in spring of 2017. This data only has partial PR island coverage.
 66 | 2. [USGS 3DEP LiDAR](https://registry.opendata.aws/usgs-lidar/): <0.35 m nominal resolution LiDAR data collected in 2015 as part of the 2016 Commonwealth of Puerto Rico Project Lidar survey (UUID: `{C2C7A2AF-8228-4C10-8756-BA971DD63953}`). This data was used to fill in LiDAR coverage after G-LiHT data.
 67 | 3. [OpenStreetMap HOT export of building footprints](https://data.humdata.org/dataset/hotosm_pri_buildings): builing footprint data used to identify buildings from the LiDAR data. HOTOSM export on 10/01/2018.
 68 | 
 69 | ### Assumptions for Building Suitability
 70 | 
 71 | Requirement | Description
 72 | -- | --
 73 | Shading | Measured shading for four seasons and required an average of 80% unshaded surface
 74 | Azimuth | All possible azimuths
 75 | Tilt | Average surface tilt <= 60 degrees
 76 | Minimum Area | >= 1.62 m2 (area required for a single solar panel)
 77 | 
 78 | ### Assumptions for PV Performance Simulations
 79 | 
 80 | PV System Characteristics | Value for Flat Roofs | Value for Tilted Roofs
 81 | -- | -- | --
 82 | Tilt | 15 degrees | Tilt of plane
 83 | Ratio of module area to suitable roof area | 0.7 | 0.98
 84 | Azimuth | 180 degrees (south facing) | Midpoint of azimuth class
 85 | Module Power Density | 183 W/m2
 86 | Total system losses | Varies (SAM defaults + individual surface % shading)
 87 | Inverter efficiency | 96%
 88 | DC-to-AC ratio | 1.2
 89 | 
 90 | ## Python Connection Examples
 91 | 
 92 | Athena data connection using PyAthena:
 93 | ```python
 94 | 
 95 | import pandas as pd
 96 | from pyathena import connect
 97 | 
 98 | conn = connect(
 99 |     s3_staging_dir='s3://<user-defined>/tracking-the-sun', ##user defined staging directory
100 |     region_name='us-west-2',
101 |     work_group='<USER SPECIFIC WORKGROUP>'  ##specify workgroup if exists
102 | )
103 | ```
104 | 
105 | Example #1: Querying with a limit:
106 | ```python
107 | df = pd.read_sql("SELECT * FROM oedi.pvrdb_pr_developable_planes limit 8;", conn)
108 | ```
109 | 
110 | Example #2: Querying for a specific county name:
111 | ```python
112 | df = pd.read_sql("SELECT * FROM oedi.pvrdb_pr_developable_planes WHERE county = 'San Juan' limit 8;", conn)
113 | ```
114 | 
115 | Example #3: Querying for a specific county FIPS (FIPS=127):
116 | ```python
117 | df = pd.read_sql("SELECT * FROM oedi.pvrdb_pr_developable_planes WHERE geoid LIKE '72127%' limit 8;", conn)
118 | ```
119 | 
120 | For jupyter notebook example see our notebook which includes partitions and data dictionary:
121 | [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples)
122 | 
123 | ## Related Links:
124 | 
125 | 1. [Puerto Rico Solar-for-All: LMI PV Rooftop Technical Potential and Solar Savings Potential](https://data.nrel.gov/submissions/144)
126 | 2. [Solar-For-All Interactive Web Map](https://maps.nrel.gov/solar-for-all/)
127 | 3. [PVRDB](https://registry.opendata.aws/nrel-oedi-pv-rooftops/)
128 | 4. [Rooftop Solar Photovoltaic Technical Potential in the United States: A Detailed Assessment](https://www.nrel.gov/docs/fy16osti/65298.pdf)
129 | 5. [Using GIS-based methods and lidar data to estimate rooftop solar technical potential in US cities](https://iopscience.iop.org/article/10.1088/1748-9326/aa7225/pdf)
130 | 6. [Estimating rooftop solar technical potential across the US using a combination of GIS-based methods, lidar data, and statistical modeling](https://iopscience.iop.org/article/10.1088/1748-9326/aaa554/pdf)
131 | 7. [Rooftop Photovoltaic Technical Potential in the United States](https://data.nrel.gov/submissions/121)
132 | 8. [U.S. PV-Suitable Rooftop Resources](https://data.nrel.gov/submissions/47)
133 | 9. [Rooftop Solar Technical Potential for Low-to-Moderate Income Households in the United States](https://www.nrel.gov/docs/fy18osti/70901.pdf)
134 | 10. [Rooftop Energy Potential of Low Income Communities in America REPLICA](https://data.nrel.gov/submissions/81)
135 | 11. [PVWattsV5 Documentation](https://pvwatts.nrel.gov/downloads/pvwattsv5.pdf)
136 | 
137 | 
138 | ## Disclaimer and Attribution
139 | 
140 | Copyright (c) 2020, Alliance for Sustainable Energy LLC, All rights reserved.
141 | 
142 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
143 | 
144 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
145 | 
146 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
147 | 
148 | * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
149 | 
150 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
/PoroTomo/PoroTomo.md:
--------------------------------------------------------------------------------
  1 | # PoroTomo
  2 | 
  3 | The data were collected during March 2016 at the PoroTomo Natural Laboratory
  4 | in Brady's Hot Springs, Nevada.
  5 | 
  6 | Silixa’s iDAS (TM) was used for DAS data acquisition with 1.021 m channel
  7 | spacing and a guage length of 10 m. Data in this archive are in .sgy and .h5
  8 | files with raw units (radians of optical phase change per time sample)
  9 | (Miller, et al.).
 10 | 
 11 | The files in this dataset use SEG-y-rev1 (see below for documentation). The
 12 | data are also available in .h5 or HDF5 file format.
 13 | 
 14 | Horizontal DAS (DASH) data collection began 3/8/2016, paused, and then started
 15 | again on 3/11/2016 and ended 3/26/2016 using zigzag trenched fiber optic
 16 | cabels. Vertical DAS (DASV) data collection began 3/17/2016 and ended 3/28/2016
 17 | using a fiber optic cable through the first 363 m of a vertical well.
 18 | 
 19 | The resampled DASH data are Matlab files with data for the surface DAS (DASH)
 20 | array deployed at the PoroTomo natural laboratory in March 2016. Each file
 21 | contains 30	seconds	worth of data for 8721	channels. These	files have been
 22 | resampled in time from the original data and have a sample rate of 100
 23 | samples/second.
 24 | 
 25 | The nodal seismometer data consists of continuous and windowed
 26 | (to vibroseis sweep) SAC files. Nodal data collection began between 3/6/2016
 27 | and 3/11/2016 depending on the station, and ended between 3/26/2016 and
 28 | 3/28/2016 also depending on the station. Station names, locations, start times,
 29 | stop times, and orientations can be found in the nodal seismometer metadata
 30 | linked below.
 31 | 
 32 | ## DAS
 33 | 
 34 | ### Directory structure
 35 | 
 36 | The PoroTomo DAS data is available on AWS S3: s3://nrel-pds-porotomo/DAS/.
 37 | The data is available in three formats:
 38 | 
 39 | #### SEG-Y Format:
 40 | 
 41 | Files are stored in daily directories labeled using the format YYYYMMDD.
 42 | Individual files names include the date and time appended at the end in the
 43 | format YYMMDDHHMMSS. Each file respresents 30 s of data.
 44 | 
 45 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASH:
 46 | - size: ~1-2 GB each
 47 | - shape: 8721 traces x 30000 samples/trace
 48 | 
 49 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASH/Resampled:
 50 | - format: MATLAB files created through resampling of SEG-Y files
 51 | - size: ~0.19 MB
 52 | - shape of 'data' object: 30000 npts x 8721 nch
 53 | 
 54 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASV:
 55 | - size: ~0.01-0.02 GB each
 56 | - shape: 384 traces x 30000 samples/trace
 57 | 
 58 | For examples on accessing the SEG-Y files please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_SEGY.ipynb)
 59 | 
 60 | #### HDF5 (.h5) Format:
 61 | 
 62 | Files are stored in daily directories labeled using the format YYYYMMDD.
 63 | Individual files names include the date and time appended at the end in the
 64 | format YYMMDDHHMMSS. Each file respresents 30 s of data, with the exception of
 65 | the .h5 files available via HSDS which represent 24 hrs of data.
 66 | 
 67 | s3://nrel-pds-porotomo/DAS/H5/DASH:
 68 | - size: ~1 GB
 69 | - shape of 'das' variable: 8721 traces x 30000 samples/trace
 70 | 
 71 | s3://nrel-pds-porotomo/DAS/H5/DASV:
 72 | - size: ~0.04 GB
 73 | - shape of 'das' variable: 384 traces x 30000 samples/trace
 74 | 
 75 | For examples on accessing the HDF5 files please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hdf5.ipynb)
 76 | 
 77 | #### HSDS Format: HD5F from the cloud
 78 | 
 79 | Files are stored in daily .h5 files
 80 | 
 81 | DASH:
 82 | - Source .h5: s3://nrel-pds-porotomo/H5/DASH
 83 | - HSDS: /nrel/porotomo/DASH
 84 | 
 85 | DASV:
 86 | - Source .h5: s3://nrel-pds-porotomo/H5/DASV
 87 | - HSDS: /nrel/porotomo/DASV
 88 | 
 89 | For examples on setting up and using HSDS please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hsds.ipynb)
 90 | 
 91 | ### Data Format
 92 | 
 93 | The following datasets are available in HDF5 and HSDS files:
 94 | 
 95 | - channel: channel number (along cabel)
 96 | - crs: coordinate reference system
 97 | - das: 2D array with das data (shape: t x channel)
 98 | - t: time in µs with respect to start of survey
 99 | - trace: enumerated integers over length of the trace
100 | - x: x position of channel
101 | - y: y position of channel
102 | - z: z position of channel
103 | 
104 | ## Nodal Seismometer Data
105 | 
106 | ### Directory Structure
107 | 
108 | The PoroTomo Nodal Seismometer data is available on AWS S3:
109 | s3://nrel-pds-porotomo/Nodal/.
110 | The following data and metadata are available:
111 | 
112 | #### Continuous Data
113 | 
114 | SAC files of the continuous raw data from the nodal seismometers. Data files
115 | sorted into folders by seismometer station number. Note: no data recovered
116 | from stations 73 and 82.
117 | 
118 | s3://nrel-pds-porotomo/Nodal/nodal_sac:
119 | - size: 45-173 MB
120 | 
121 | 
122 | #### Field Notes and Metadata
123 | 
124 | PDF scans of field notes and metadata for nodal seismometers including
125 | instrument installation and recovery.
126 | 
127 | s3://nrel-pds-porotomo/Nodal/nodal_metadata:
128 | - size: 1.3-15.5 MB
129 | 
130 | #### P-Picks
131 | 
132 | P-wave travel times auto-picked from cross-correlation waveforms. The files
133 | list the source time and location for a vibe sweep stack followed by the travel
134 | time to each nodal instrument and two means of pick quality assessment. See
135 | README.txt for details.
136 | 
137 | s3://nrel-pds-porotomo/Nodal/nodal_analysis/p_picks:
138 | - size: 1.3-1.4 MB
139 | 
140 | #### Sweep Data
141 | 
142 | 29.8 second long SAC files of the raw nodal seismometer data starting 3.9
143 | seconds before the initiation of each vibroseis sweep extracted from continuous
144 | 24 hour files. Data files sorted into folders by sweep number
145 | (see GDR Submission 826).
146 | 
147 | s3://nrel-pds-porotomo/Nodal/nodal_sac_sweep:
148 | - size: 58.8 kB
149 | 
150 | ## References:
151 | 
152 | Users of the PoroTomo data should please cite:
153 | - [Miller, Douglas E., et al. “DAS and DTS at Brady Hot Springs: Observations about Coupling and Coupled Interpretations.” Semantic Scholar, 14 Feb. 2018](pdfs.semanticscholar.org/048f/419e3c2b4de348a7166b13cab3bc0d56afdc.pdf)
154 | 
155 | Additional information regarding:
156 | - [SEG-Y-rev1 file structure](https://seg.org/Portals/0/SEG/News%20and%20Resources/Technical%20Standards/seg_y_rev1.pdf)
157 | - [.h5 file format](https://support.hdfgroup.org/HDF5/doc/H5.format.html)
158 | - [DAS Data](http://dx.doi.org/10.1093/gji/ggy102)
159 | - [PoroTomo Technical Report](https://www.osti.gov/servlets/purl/1499141)
160 | - [DAS and DTS Interpretation](https://pangea.stanford.edu/ERE/pdf/IGAstandard/SGW/2018/Miller.pdf)
161 | - [DTS and DAS Metadata](https://gdr.openei.org/submissions/825)
162 | - [DTS Data](https://gdr.openei.org/submissions/853)
163 | - [Nodal Seismometer Metadata](https://gdr.openei.org/submissions/826)
164 | 


--------------------------------------------------------------------------------
/PoroTomo/README.md:
--------------------------------------------------------------------------------
  1 | # PoroTomo
  2 | 
  3 | The data were collected during March 2016 at the PoroTomo Natural Laboratory
  4 | in Brady's Hot Springs, Nevada.
  5 | 
  6 | Silixa’s iDAS (TM) was used for DAS data acquisition with 1.021 m channel
  7 | spacing and a guage length of 10 m. Data in this archive are in .sgy and .h5
  8 | files with raw units (radians of optical phase change per time sample)
  9 | (Miller, et al.).
 10 | 
 11 | The files in this dataset use SEG-y-rev1 (see below for documentation). The
 12 | data are also available in .h5 or HDF5 file format.
 13 | 
 14 | Horizontal DAS (DASH) data collection began 3/8/2016, paused, and then started
 15 | again on 3/11/2016 and ended 3/26/2016 using zigzag trenched fiber optic
 16 | cabels. Vertical DAS (DASV) data collection began 3/17/2016 and ended 3/28/2016
 17 | using a fiber optic cable through the first 363 m of a vertical well.
 18 | 
 19 | The resampled DASH data are Matlab files with data for the surface DAS (DASH)
 20 | array deployed at the PoroTomo natural laboratory in March 2016. Each file
 21 | contains 30	seconds	worth of data for 8721	channels. These	files have been
 22 | resampled in time from the original data and have a sample rate of 100
 23 | samples/second.
 24 | 
 25 | The nodal seismometer data consists of continuous and windowed
 26 | (to vibroseis sweep) SAC files. Nodal data collection began between 3/6/2016
 27 | and 3/11/2016 depending on the station, and ended between 3/26/2016 and
 28 | 3/28/2016 also depending on the station. Station names, locations, start times,
 29 | stop times, and orientations can be found in the nodal seismometer metadata
 30 | linked below.
 31 | 
 32 | ## DAS
 33 | 
 34 | ### Directory structure
 35 | 
 36 | The PoroTomo DAS data is available on AWS S3: s3://nrel-pds-porotomo/DAS/.
 37 | The data is available in three formats:
 38 | 
 39 | #### SEG-Y Format:
 40 | 
 41 | Files are stored in daily directories labeled using the format YYYYMMDD.
 42 | Individual files names include the date and time appended at the end in the
 43 | format YYMMDDHHMMSS. Each file respresents 30 s of data.
 44 | 
 45 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASH:
 46 | - size: ~1-2 GB each
 47 | - shape: 8721 traces x 30000 samples/trace
 48 | 
 49 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASH/Resampled:
 50 | - format: MATLAB files created through resampling of SEG-Y files
 51 | - size: ~0.19 MB
 52 | - shape of 'data' object: 30000 npts x 8721 nch
 53 | 
 54 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASV:
 55 | - size: ~0.01-0.02 GB each
 56 | - shape: 384 traces x 30000 samples/trace
 57 | 
 58 | For examples on accessing the SEG-Y files please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_SEGY.ipynb)
 59 | 
 60 | #### HDF5 (.h5) Format:
 61 | 
 62 | Files are stored in daily directories labeled using the format YYYYMMDD.
 63 | Individual files names include the date and time appended at the end in the
 64 | format YYMMDDHHMMSS. Each file respresents 30 s of data, with the exception of
 65 | the .h5 files available via HSDS which represent 24 hrs of data.
 66 | 
 67 | s3://nrel-pds-porotomo/DAS/H5/DASH:
 68 | - size: ~1 GB
 69 | - shape of 'das' variable: 8721 traces x 30000 samples/trace
 70 | 
 71 | s3://nrel-pds-porotomo/DAS/H5/DASV:
 72 | - size: ~0.04 GB
 73 | - shape of 'das' variable: 384 traces x 30000 samples/trace
 74 | 
 75 | For examples on accessing the HDF5 files please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hdf5.ipynb)
 76 | 
 77 | #### HSDS Format: HD5F from the cloud
 78 | 
 79 | Files are stored in daily .h5 files
 80 | 
 81 | DASH:
 82 | - Source .h5: s3://nrel-pds-porotomo/H5/DASH
 83 | - HSDS: /nrel/porotomo/DASH
 84 | 
 85 | DASV:
 86 | - Source .h5: s3://nrel-pds-porotomo/H5/DASV
 87 | - HSDS: /nrel/porotomo/DASV
 88 | 
 89 | For examples on setting up and using HSDS please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hsds.ipynb)
 90 | 
 91 | ### Data Format
 92 | 
 93 | The following datasets are available in HDF5 and HSDS files:
 94 | 
 95 | - channel: channel number (along cabel)
 96 | - crs: coordinate reference system
 97 | - das: 2D array with das data (shape: t x channel)
 98 | - t: time in µs with respect to start of survey
 99 | - trace: enumerated integers over length of the trace
100 | - x: x position of channel
101 | - y: y position of channel
102 | - z: z position of channel
103 | 
104 | ## Nodal Seismometer Data
105 | 
106 | ### Directory Structure
107 | 
108 | The PoroTomo Nodal Seismometer data is available on AWS S3:
109 | s3://nrel-pds-porotomo/Nodal/.
110 | The following data and metadata are available:
111 | 
112 | #### Continuous Data
113 | 
114 | SAC files of the continuous raw data from the nodal seismometers. Data files
115 | sorted into folders by seismometer station number. Note: no data recovered
116 | from stations 73 and 82.
117 | 
118 | s3://nrel-pds-porotomo/Nodal/nodal_sac:
119 | - size: 45-173 MB
120 | 
121 | 
122 | #### Field Notes and Metadata
123 | 
124 | PDF scans of field notes and metadata for nodal seismometers including
125 | instrument installation and recovery.
126 | 
127 | s3://nrel-pds-porotomo/Nodal/nodal_metadata:
128 | - size: 1.3-15.5 MB
129 | 
130 | #### P-Picks
131 | 
132 | P-wave travel times auto-picked from cross-correlation waveforms. The files
133 | list the source time and location for a vibe sweep stack followed by the travel
134 | time to each nodal instrument and two means of pick quality assessment. See
135 | README.txt for details.
136 | 
137 | s3://nrel-pds-porotomo/Nodal/nodal_analysis/p_picks:
138 | - size: 1.3-1.4 MB
139 | 
140 | #### Sweep Data
141 | 
142 | 29.8 second long SAC files of the raw nodal seismometer data starting 3.9
143 | seconds before the initiation of each vibroseis sweep extracted from continuous
144 | 24 hour files. Data files sorted into folders by sweep number
145 | (see GDR Submission 826).
146 | 
147 | s3://nrel-pds-porotomo/Nodal/nodal_sac_sweep:
148 | - size: 58.8 kB
149 | 
150 | ## References:
151 | 
152 | Users of the PoroTomo data should please cite:
153 | - [Miller, Douglas E., et al. “DAS and DTS at Brady Hot Springs: Observations about Coupling and Coupled Interpretations.” Semantic Scholar, 14 Feb. 2018](pdfs.semanticscholar.org/048f/419e3c2b4de348a7166b13cab3bc0d56afdc.pdf)
154 | 
155 | Additional information regarding:
156 | - [SEG-Y-rev1 file structure](https://seg.org/Portals/0/SEG/News%20and%20Resources/Technical%20Standards/seg_y_rev1.pdf)
157 | - [.h5 file format](https://support.hdfgroup.org/HDF5/doc/H5.format.html)
158 | - [DAS Data](http://dx.doi.org/10.1093/gji/ggy102)
159 | - [PoroTomo Technical Report](https://www.osti.gov/servlets/purl/1499141)
160 | - [DAS and DTS Interpretation](https://pangea.stanford.edu/ERE/pdf/IGAstandard/SGW/2018/Miller.pdf)
161 | - [DTS and DAS Metadata](https://gdr.openei.org/submissions/825)
162 | - [DTS Data](https://gdr.openei.org/submissions/853)
163 | - [Nodal Seismometer Metadata](https://gdr.openei.org/submissions/826)
164 | 


--------------------------------------------------------------------------------
/SMART-DS/Readme.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/Readme.pdf


--------------------------------------------------------------------------------
/SMART-DS/figures/AUS/all_labels.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/AUS/all_labels.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/CYME/import_timeseries.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/import_timeseries.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/CYME/import_voltvar.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/import_voltvar.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/CYME/importing.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/importing.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/CYME/networks.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/networks.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/CYME/simplified_view.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/simplified_view.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/CYME/simplified_view_zoomed.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/simplified_view_zoomed.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/CYME/substation.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/substation.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/CYME/timeseries_results.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/timeseries_results.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/GIS/layer_examples.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/GIS/layer_examples.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/GIS/missing_layers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/GIS/missing_layers.png


--------------------------------------------------------------------------------
/SMART-DS/figures/GSO/all_labels.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/GSO/all_labels.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/GSO/all_labels2.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/GSO/all_labels2.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/OpenDSS/feeder.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/feeder.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/OpenDSS/monitor_current.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/monitor_current.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/OpenDSS/monitor_kva.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/monitor_kva.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/OpenDSS/profile.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/profile.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/OpenDSS/running_dss.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/running_dss.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/SAF/all_labels.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SAF/all_labels.png


--------------------------------------------------------------------------------
/SMART-DS/figures/SFO/downtown_labels.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SFO/downtown_labels.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/SFO/east_labels.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SFO/east_labels.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/SFO/north_labels.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SFO/north_labels.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/SFO/south_labels.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SFO/south_labels.PNG


--------------------------------------------------------------------------------
/SMART-DS/figures/analysis/pu_voltages_histogram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/analysis/pu_voltages_histogram.png


--------------------------------------------------------------------------------
/SMART-DS/figures/analysis/pu_voltages_percentiles.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/analysis/pu_voltages_percentiles.png


--------------------------------------------------------------------------------
/SMART-DS/figures/load_curves/total_load_200.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/load_curves/total_load_200.png


--------------------------------------------------------------------------------
/SMART-DS/figures/load_curves/total_load_244.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/load_curves/total_load_244.png


--------------------------------------------------------------------------------
/Sup3rCC.md:
--------------------------------------------------------------------------------
 1 | # Super-Resolution for Renewable Energy Resource Data with Climate Change Impacts (Sup3rCC)
 2 | 
 3 | ## Description
 4 | 
 5 | The Super-Resolution for Renewable Energy Resource Data with Climate Change Impacts (Sup3rCC) data is a collection of 4km hourly wind, solar, temperature, humidity, and pressure fields for the contiguous United States under various climate change scenarios.
 6 | 
 7 | Sup3rCC is downscaled Global Climate Model (GCM) data. The downscaling process was performed using a generative machine learning approach called sup3r: Super-Resolution for Renewable Energy Resource Data (linked below as "Sup3r GitHub Repo"). The data includes both historical and future weather years, although the historical years represent the historical climate, not the actual historical weather that we experienced. You cannot use Sup3rCC data to study historical weather events, although other sup3r datasets may be intended for this.
 8 | 
 9 | The Sup3rCC data is intended to help researchers study the impact of climate change on energy systems with high levels of wind and solar capacity. Please note that all climate change data is only a representation of the *possible* future climate and contains significant uncertainty. Analysis of multiple climate change scenarios and multiple climate models can help quantify this uncertainty.
10 | 
11 | ## Version Log
12 | The Sup3rCC data has versions that coincide with the sup3r software versions. Note that not every sup3r software version will have a corresponding Sup3rCC data release, but every Sup3rCC data release will have a corresponding sup3r software version. This table records versions of Sup3rCC data releases. Sup3rCC generative models may have slightly different versions than the data. The version in the Sup3rCC .h5 file attribute can be inspected to verify the actual version of the data you are using.
13 | 
14 | | Version | Date | Notes |
15 | | -------- | -------- | ------- |
16 | | 0.1.0 | 6/27/2023	| Initial Sup3rCC release with two GCMs and one climate scenario. Known issues: few years used for bias correction, simplistic GCM bias correction method, mean bias in high-res output especially in wind and solar data, imperfect wind diurnal cycles when compared to WTK and timing of diurnal peak temperature when compared to observation. 
17 | 
18 | ## Directory structure
19 | 
20 | The Sup3rCC directory contains downscaled data for multiple projections of future climate change. For example, a file from the initial data release `sup3rcc_conus_ecearth3_ssp585_r1i1p1f1_wind_2015.h5` is downscaled from the climate model MRI ESM 2.0 for climate change scenario SSP5 8.5 and variant label r1i1p1f1. The file contains wind variables for the year 2015. Note that this will represent the climate from 2015, but not the actual weather we experienced. 
21 | 
22 | Within the S3 bucket there is also a folder `models` providing pre-trained Sup3rCC generative machine learning models. See the Sup3r GitHub Repo below for examples of how to use these models. 
23 | 
24 | ## Data Format
25 | 
26 | The data is provided in Hierarchical Data Format (.h5) separated by year and by variable set. Variables are provided in 2 dimensional spatiotemporal arrays (called “datasets” in h5 files) with dimensions `(time, space)`. The temporal axis is defined by the `time_index` dataset, while the positional axis is defined by the `meta` dataset. Additional details on data format and data access patterns can be found in the [rex docs](https://nrel.github.io/rex/misc/examples.nrel_data.html).
27 | 
28 | ## Code Examples
29 | - For code examples, users can reference the [Sup3r GitHub Repository](https://github.com/NREL/sup3r/tree/main) which includes examples for Sup3rCC data access in the [/examples/sup3rcc](https://github.com/NREL/sup3r/tree/main/examples/sup3rcc) directory.
30 | - The [rex docs](https://nrel.github.io/rex/misc/examples.nrel_data.html) provide examples on the easiest ways to access the data remotely or on the NREL HPC. 
31 | 
32 | ## References
33 | Users of the Sup3rCC data should use the following citation:
34 | 
35 |    - Buster, Grant, Benton, Brandon, Glaws, Andrew, and King, Ryan. “High-Resolution Meteorology with Climate Change Impacts from Global Climate Model Data Using Generative Machine Learning.” _Nature Energy_, March 14, 2024. https://doi.org/10.1038/s41560-024-01507-9.
36 |    - Buster, Grant, Benton, Brandon, Glaws, Andrew, and King, Ryan. Super-Resolution for Renewable Energy Resource Data with Climate Change Impacts (Sup3rCC). United States: N.p., 19 Apr, 2023. Web. doi: 10.25984/1970814.
37 | 


--------------------------------------------------------------------------------
/Template.md:
--------------------------------------------------------------------------------
 1 | # {Dataset Name}
 2 | 
 3 | ## Description
 4 | 
 5 | A brief description of the data including:
 6 | - how it was produced?
 7 | - why it important/novel
 8 | - who/how it might be used
 9 | 
10 | ## Directory structure
11 | 
12 | If the dataset is made up of multiple files a description of how they are/will
13 | be stored in relation to each other.
14 | 
15 | ## Data Format
16 | 
17 | How the data is stored with in each file including a data dictionary with
18 | - dataset/variable/column names
19 | - units
20 | 
21 | ## Code Examples
22 | 
23 | Example scripts of how to access the data IN THE CLOUD. A jupyter notebook or
24 | link to a github repo with examples can be used instead.
25 | 
26 | ## References
27 | 
28 | Any helpful references other documentation
29 | 
30 | ## Disclaimer and Attribution
31 | 
32 | Optional additional attributes/disclaimers
33 | 


--------------------------------------------------------------------------------
/TrackingtheSun.md:
--------------------------------------------------------------------------------
  1 | # LBNL Tracking the Sun
  2 | 
  3 | ## Description
  4 | 
  5 | Berkeley Lab’s Tracking the Sun report series is dedicated to summarizing installed prices and other trends among grid-connected, distributed solar photovoltaic (PV) systems in the United States. The present report, the 11th edition in the series, focuses on systems installed through year-end 2017, with preliminary trends for the first half of 2018. As in years past, the primary emphasis is on describing changes in installed prices over time and variation in pricing across projects based on location, project ownership, system design, and other attributes. New to this year, however, is an expanded discussion of other project characteristics in the large underlying data sample. Future editions will include more of such material, beyond the report’s traditional focus on installed pricing.
  6 | 
  7 | The trends described in this report derive primarily from project-level data reported to state agencies and utilities that administer PV incentive programs, solar renewable energy credit (SREC) registration systems, or interconnection processes. In total, data were collected and cleaned for more than 1.3 million individual PV systems, representing 81% of U.S. residential and non-residential PV systems installed through 2017. The analysis of installed pricing trends is based on a subset of roughly 770,000 systems with available installed price data.
  8 | 
  9 | A technical summary of the dataset is as follows:
 10 | 
 11 | Focuses on projects installed through 2018 with preliminary data for the first half of 2022:
 12 | - Describes and analyzes trends related to Project characteristics, including system size and design, ownership, customer segmentation, and other attributes
 13 | - National median installed prices, both long-term and recent trends, focusing on host-owned systems
 14 | - Variability in pricing across projects according to system size, state, installer, module efficiency, inverter technology, residential new construction vs. retrofit, tax-exempt vs. commercial site hosts, and mounting configuration
 15 | - Distributed PV for the purpose of this report, includes residential and non-residential systems that are roof-mounted (of any size) or groundmounted up to 5 MWAC
 16 | 
 17 | Tracking the Sun relies on project-level data:
 18 | - Provided by state agencies and utilities that administer PV incentive programs, renewable energy credit registration (REC) systems, or interconnection processes
 19 | - Some of these data already exist in the public domain (e.g., California’s Currently Interconnected Dataset), though LBNL may receive supplementary fields, in some cases covered under non-disclosure agreements
 20 | - 67 entities spanning 30 states contributed data to this year’s report (Table A-1 in the Appendix of the report )
 21 | 
 22 | Customer Segments
 23 | - Residential: Single-family and, depending on the data provider, may also include multi-family
 24 | - Small Non-Residential: Non-residential systems ≤100 kWDC
 25 | - Large Non-Residential: Non-residential systems >100 kWDC (and ≤5,000 kWAC if ground-mounted) * Independent of whether connected to the customer- or utility-side of the meter
 26 | 
 27 | Units
 28 | - Real 2018 dollars
 29 | - Direct current (DC) Watts (W), unless otherwise noted
 30 | 
 31 | Installed Price: Up-front $/W price paid by the PV system owner, prior to incentives
 32 | 
 33 | Sample Frames and Data cleaning
 34 | Full sample: (Used to describe system characteristics, the basis for the public dataset)
 35 | 1. Remove systems with missing size or install date
 36 | 2. Standardize installer, module, inverter names
 37 | 3. Integrate equipment spec sheet data
 38 | – Module efficiency and technology type
 39 | – Flag microinverters or DC optimizers
 40 | 4. Convert dollar and kW values to appropriate units,
 41 | and compute other derived fields
 42 | 
 43 | Installed-Price Sample: (Used in analysis of installed prices)
 44 | 5. Remove systems if:
 45 | – Missing installed price data
 46 | – Third-party owned (TPO)
 47 | – Battery storage included
 48 | – Self-installed
 49 | 
 50 | ## Directory Structure
 51 | 
 52 | The Tracking the Sun Dataset is made available in Parquet format on AWS and is partitioned by `state` in AWS Glue and Athena. The schema may change across dataset years on S3.
 53 | 
 54 |  - `s3://oedi-data-lake/tracking-the-sun/2018/`
 55 |  - `s3://oedi-data-lake/tracking-the-sun/2019/`
 56 |  - `s3://oedi-data-lake/tracking-the-sun/2020/`
 57 |  - `s3://oedi-data-lake/tracking-the-sun/2021/`
 58 |  - `s3://oedi-data-lake/tracking-the-sun/2022/`
 59 | 
 60 | ## python Connection examples
 61 | 
 62 | ```python
 63 | 
 64 | import pandas as pd
 65 | from pyathena import connect
 66 | 
 67 | conn = connect(
 68 |     s3_staging_dir='s3://<user-defined>/tracking-the-sun', ##user defined staging directory
 69 |     region_name='us-west-2',
 70 |     work_group='<USER SPECIFIC WORKGROUP>'  ##specify workgroup if exists
 71 | )
 72 | 
 73 | df = pd.read_sql("SELECT * FROM oedi_tracking_the_sun_2019 limit 8;", conn)
 74 | ```
 75 | For jupyter notebook example see our notebook which includes partitions and data dictionary:
 76 | [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples)
 77 | 
 78 | ## Metadata Information
 79 | 
 80 | The dataset is partitioned by the US State.
 81 | 
 82 | Please refer to this repository for examples of metadata and data access - https://github.com/openEDI/open-data-access-tools/tree/master/examples
 83 | 
 84 | ## References
 85 | 
 86 | [https://emp.lbl.gov/sites/default/files/tracking_the_sun_2019_report.pdf](https://emp.lbl.gov/sites/default/files/tracking_the_sun_2019_report.pdf)
 87 | 
 88 | [https://emp.lbl.gov/sites/default/files/tracking_the_sun_2019_slide_deck_summary_0.pdf](https://emp.lbl.gov/sites/default/files/tracking_the_sun_2019_slide_deck_summary_0.pdf)
 89 | 
 90 | [https://emp.lbl.gov/tracking-sun-tool](https://emp.lbl.gov/tracking-sun-tool)
 91 | 
 92 | ## Disclaimer and Attribution
 93 | 
 94 | Copyright (c) 2020, Alliance for Sustainable Energy LLC, All rights reserved.
 95 | 
 96 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 97 | 
 98 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
 99 | 
100 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
101 | 
102 | * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
103 | 
104 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
105 | 


--------------------------------------------------------------------------------
/UMCM_Hurricanes.md:
--------------------------------------------------------------------------------
  1 | # University of Miami Coupled Model (UMCM) for Hurricanes Ike and Sandy
  2 | 
  3 | ## Model
  4 | 
  5 | The University of Miami Coupled Model (UMCM) is a coupled model that integrates
  6 | atmospheric, wave, and ocean components to produce wind, wave, and current
  7 | data. Atmospheric data is produced using the [Weather Research and Forecasting](https://www.mmm.ucar.edu/weather-research-and-forecasting-model)
  8 | model (WRF), Wave data is produced using the [University of Miami Wave Model](https://umwm.org/)
  9 | (UMWM), While ocean current data is produce using the
 10 | [HYbrid Coordinate Ocean Model](https://www.hycom.org/) (HYCOM).
 11 | 
 12 | The model was used to study offshore wind conditions during Hurricane Ike
 13 | and Hurricane Sandy. The time resolution for each model run is as follows:
 14 | 
 15 | - Hurricane Ike
 16 |   - 1 sample/hour from 9/8/2008 12:00:00 UTC to 9/12/2008 6:00:00 UTC
 17 |   - 1 sample/10 minutes from 9/12/2008 6:00:00 UTC to 9/13/2008 9:00:00 UTC
 18 | - Hurricane Sandy
 19 |   - 1 sample/10 minutes from 10/28/2012 00:10:00 UTC to 10/31/2012 00:00:00 UTC
 20 | 
 21 | The following variables were extracted from the HYCOM model:
 22 | - bathymetry
 23 | - ocean_mixed_layer_thickness-ilt
 24 | - ocean_mixed_layer_thickness-mlt
 25 | - sea_water_potential_density at all depths
 26 | - sea_water_salinity at all depths
 27 | - sea_surface_elevation
 28 | - eastward_sea_water_velocity
 29 | - northward_sea_water_velocity
 30 | - upward_sea_water_velocity
 31 | - sea_water_temperature
 32 | - depth (m): [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 120, 135, 150, 175, 200, 225, 250, 275, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000]
 33 | 
 34 | The following variables were extracted from the
 35 | - cd
 36 | - cgmxx
 37 | - cgmxy
 38 | - cgmyy
 39 | - dcg
 40 | - dcg0
 41 | - dcp
 42 | - dcp0
 43 | - depth (m): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100]
 44 | - dwd
 45 | - dwl
 46 | - dwp
 47 | - momx
 48 | - momy
 49 | - mss
 50 | - mwd
 51 | - mwl
 52 | - mwp
 53 | - rhoa
 54 | - rhow
 55 | - seamask
 56 | - swh
 57 | - tailatmx
 58 | - tailatmy
 59 | - tailocnx
 60 | - tailocny
 61 | - taux_bot
 62 | - taux_form
 63 | - taux_form_1
 64 | - taux_form_2
 65 | - taux_form_3
 66 | - taux_ocn
 67 | - taux_skin
 68 | - taux_snl
 69 | - tauy_bot
 70 | - tauy_form
 71 | - tauy_form_1
 72 | - tauy_form_2
 73 | - tauy_form_3
 74 | - tauy_ocn
 75 | - tauy_skin
 76 | - tauy_snl
 77 | - u_stokes at all depths
 78 | - uc
 79 | - ust
 80 | - v_stokes at all depths
 81 | - vc
 82 | - wdir
 83 | - wspd
 84 | 
 85 | ## Directory structure
 86 | 
 87 | The UMCM data is stored in two .h5 files:
 88 | - s3://oedi-data-lake/umcm/
 89 |   - ike.h5
 90 |   - sandy.h5
 91 | 
 92 | The UMCM data is also available via HSDS at /nrel/umcm/.
 93 | 
 94 | For examples on setting up and using HSDS please see our [examples repository](https://github.com/nrel/hsds-examples)
 95 | 
 96 | ## Data Format
 97 | 
 98 | The UMCM data is stored by in datasets by variable and depth (when available).
 99 | Each dataset is composed of a 3D data "cube" with dimensions (time, latitude,
100 | longitude). The positional values of each dimension are available in the 1D
101 | datasets:
102 | - `time_index`
103 | - `latitude`
104 | - `longitude`
105 | 
106 | Additional locational meta data is available in the `meta` table.
107 | 
108 | ## References
109 | 
110 | Users of the UMCM model should use the following citation:
111 | - [Phillipes, Caleb, Veers, Paul, Kim, Eungsoo, Manuel, Lance, Curcic, Milan, and Chen, Shuyi. University of Miami Coupled Model (UMCM) for Hurricanes Ike and Sandy. United States: N.p., 30 Sep, 2015. Web. https://data.openei.org/submissions/574.](https://data.openei.org/submissions/574)
112 | 


--------------------------------------------------------------------------------
/US_Wave.md:
--------------------------------------------------------------------------------
  1 | # High Resolution Ocean Surface Wave Hindcast
  2 | 
  3 | ## Description
  4 | 
  5 | The development of this dataset was funded by the U.S. Department of Energy,
  6 | Office of Energy Efficiency & Renewable Energy, Water Power Technologies Office
  7 | to improve our understanding of the U.S. wave energy resource and to provide
  8 | critical information for wave energy project development and wave energy
  9 | converter design.
 10 | 
 11 | This is the highest resolution publicly available long-term wave hindcast
 12 | dataset that – when complete – will cover the entire U.S. Exclusive Economic
 13 | Zone (EEZ). The data can be used to investigate the historical record of wave
 14 | statistics at any U.S. site. As such, the dataset could also be of value to any
 15 | entity with marine operations inside the U.S. EEZ.
 16 | 
 17 | A technical summary of the dataset is as follows:
 18 | 
 19 | - 32 Year Wave Hindcast (1979-2010), 3-hour temporal resolution
 20 | - Unstructured grid spatial resolution ranges from 200 meters in shallow water to ~10 km in deep water
 21 | - Spatial coverage: EEZ offshore of all U.S territories (see below)
 22 | 
 23 | The following variables are included in the dataset:
 24 | 
 25 | - Mean Wave Direction: Direction normal to the wave crests
 26 | - Significant Wave Height: Calculated as the zeroth spectral moment (i.e., H_m0)
 27 | - Mean Absolute Period: Calculated as a ratio of spectral moments (m_0/m_1)
 28 | - Peak Period: The period associated with the maximum value of the wave energy spectrum
 29 | - Mean Zero-Crossing Period: Calculated as a ratio of spectral moments (sqrt(m_0/m_2))
 30 | - Energy Period: Calculated as a ratio of spectral moments (m_-1/m_0)
 31 | - Directionality Coefficient: Fraction of total wave energy travelling in the direction of maximum wave power
 32 | - Maximum Energy Direction: The direction from which the most wave energy is travelling
 33 | - Omni-Directional Wave Power: Total wave energy flux from all directions
 34 | - Spectral Width: Spectral width characterizes the relative spreading of energy in the wave spectrum
 35 | 
 36 | The following U.S. regions will be added to this dataset under the given
 37 | `domain` names
 38 | 
 39 | - West Coast United States: `West_Coast`
 40 | - East Coast United States: `Atlantic`
 41 | - Alaskan Coast: TBD
 42 | - Hawaiian Islands: `Hawaii`
 43 | - Gulf of Mexico, Puerto Rico, and U.S. Virgin Islands: TBD
 44 | - U.S. Pacific Island Territories: TBD
 45 | 
 46 | ## Model
 47 | 
 48 | The multi-scale, unstructured-grid modeling approach using WaveWatch III and
 49 | SWAN enabled long-term (decades) high-resolution hindcasts in a large regional
 50 | domain. In particular, the dataset was generated from the unstructured-grid
 51 | SWAN model output that was driven by a WaveWatch III model with global-regional
 52 | nested grids. The unstructured-grid SWAN model simulations were performed with
 53 | a spatial resolution as fine as 200 meters in shallow waters. The dataset has a
 54 | 3-hour timestep spanning 32 years from 1979 through 2010. The project team
 55 | intends to extend this to 2020 (i.e., 1979-2020), pending DOE support to do so.
 56 | 
 57 | The models were extensively validated not only for the most common wave
 58 | parameters, but also six IEC resource parameters and 2D spectra with high
 59 | quality spectral data derived from publicly available buoys. Additional details
 60 | on definitions of the variables found in the dataset, the SWAN and WaveWatch
 61 | III model configurations and model validation are available in technical report
 62 | and peer-reviewed publications (Wu et al. 2020, Yang et al. 2020, Yang et al.
 63 | 2018). This study was funded by the U.S. Department of Energy, Office of Energy
 64 | Efficiency & Renewable Energy, Water Power Technologies Office under Contract
 65 | DE-AC05-76RL01830 to Pacific Northwest National Laboratory (PNNL).
 66 | 
 67 | ## Directory structure
 68 | 
 69 | High Resolution Ocean Surface Wave Hindcast data is made available as a series
 70 | of 3 hourly .h5 files located on AWS S3 for the domains discussed above:
 71 | - `s3://wpto-pds-US_wave/v1.0.0/${domain}`
 72 | 
 73 | Hourly virtual bouy data is also available in hourly .h5 files on AWS S3:
 74 | - `s3://wpto-pds-US_wave/v1.0.0/virtual_buoy/${domain}`
 75 | 
 76 | The US wave data is also available via HSDS at `/nrel/US_wave/`
 77 | For examples on setting up and using HSDS please see our [examples repository](https://github.com/nrel/hsds-examples)
 78 | 
 79 | ## Data Format
 80 | 
 81 | The data is provided in high density data file (.h5) separated by year. The
 82 | variables mentioned above are provided in 2 dimensional time-series arrays with
 83 | dimensions (time x location). The temporal axis is defined by the `time_index`
 84 | dataset, while the positional axis is defined by the `coordinate` dataset. The
 85 | units for the variable data is also provided as an attribute (`units`). The
 86 | SWAN and IEC valiable names are also provide under the attributes
 87 | (`SWAWN_name`) and (`IEC_name`) respectively.
 88 | 
 89 | ## Python Examples
 90 | 
 91 | Example scripts to extract wind resource data using python are provided below:
 92 | 
 93 | The easiest way to access and extract data from the Resource eXtraction tool
 94 | [`rex`](https://github.com/nrel/rex)
 95 | 
 96 | To use `rex` with [`HSDS`](https://github.com/NREL/hsds-examples) you will need
 97 | to install `h5pyd`:
 98 | 
 99 | ```
100 | pip install h5pyd
101 | ```
102 | 
103 | Next you'll need to configure HSDS:
104 | 
105 | ```
106 | hsconfigure
107 | ```
108 | 
109 | and enter at the prompt:
110 | 
111 | ```
112 | hs_endpoint = https://developer.nrel.gov/api/hsds
113 | hs_username =
114 | hs_password =
115 | hs_api_key = 3K3JQbjZmWctY0xmIfSYvYgtIcM3CN0cb1Y2w9bf
116 | ```
117 | 
118 | **IMPORTANT: The example API key here is for demonstation and is rate-limited per IP. To get your own API key, visit https://developer.nrel.gov/signup/**
119 | 
120 | You can also add the above contents to a configuration file at `~/.hscfg`
121 | 
122 | 
123 | ```python
124 | from rex import ResourceX
125 | 
126 | wave_file = '/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5'
127 | with ResourceX(wave_file, hsds=True) as f:
128 |     meta = f.meta
129 |     time_index = f.time_index
130 |     swh = f['significant_wave_height']
131 | ```
132 | 
133 | `rex` also allows easy extraction of the nearest site to a desired (lat, lon)
134 | location:
135 | 
136 | ```python
137 | from rex import ResourceX
138 | 
139 | wave_file = '/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5'
140 | lat_lon = (34.399408, -119.841181)
141 | with ResourceX(wave_file, hsds=True) as f:
142 |     lat_lon_swh = f.get_lat_lon_df('significant_wave_height', lat_lon)
143 | ```
144 | 
145 | or to extract all sites in a given region:
146 | 
147 | ```python
148 | from rex import ResourceX
149 | 
150 | wave_file = '/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5'
151 | jurisdication='California'
152 | with ResourceX(wave_file, hsds=True) as f:
153 |     ca_swh = f.get_region_df('significant_wave_height', jurisdiction,
154 |                              region_col='jurisdiction')
155 | ```
156 | 
157 | If you would rather access the US Wave data directly using h5pyd:
158 | 
159 | ```python
160 | # Extract the average wave height
161 | import h5pyd
162 | import pandas as pd
163 | 
164 | # Open .h5 file
165 | with h5pyd.File('/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5', mode='r') as f:
166 |     # Extract meta data and convert from records array to DataFrame
167 |     meta = pd.DataFrame(f['meta'][...])
168 |     # Significant Wave Height
169 |     swh = f['significant_wave_height']
170 |     # Extract scale factor
171 |     scale_factor = swh.attrs['scale_factor']
172 |     # Extract, average, and unscale wave height
173 |     mean_swh = swh[...].mean(axis=0) / scale_factor
174 | 
175 | # Add mean wave height to meta data
176 | meta['Average Wave Height'] = mean_swh
177 | ```
178 | 
179 | ```python
180 | # Extract time-series data for a single site
181 | import h5pyd
182 | import pandas as pd
183 | 
184 | # Open .h5 file
185 | with h5pyd.File('/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5', mode='r') as f:
186 |     # Extract time_index and convert to datetime
187 |     # NOTE: time_index is saved as byte-strings and must be decoded
188 |     time_index = pd.to_datetime(f['time_index'][...].astype(str))
189 |     # Initialize DataFrame to store time-series data
190 |     time_series = pd.DataFrame(index=time_index)
191 |     # Extract wave height, direction, and period
192 |     for var in ['significant_wave_height', 'mean_wave_direction',
193 |                 'mean_absolute_period']:
194 |     	# Get dataset
195 |     	ds = f[var]
196 |     	# Extract scale factor
197 |     	scale_factor = ds.attrs['scale_factor']
198 |     	# Extract site 100 and add to DataFrame
199 |     	time_series[var] = ds[:, 100] / scale_factor
200 | ```
201 | ## References
202 | 
203 | Please cite the most relevant publication below when referencing this dataset:
204 | 
205 | 1) [Wu, Wei-Cheng, et al. "Development and validation of a high-resolution regional wave hindcast model for US West Coast wave resource characterization." Renewable Energy 152 (2020): 736-753.](https://www.osti.gov/biblio/1599105)
206 | 2) [Yang, Z., G. García-Medina, W. Wu, and T. Wang, 2020. Characteristics and variability of the Nearshore Wave Resource on the U.S. West Coast. Energy.](https://doi.org/10.1016/j.energy.2020.117818)
207 | 3) [Yang, Zhaoqing, et al. High-Resolution Regional Wave Hindcast for the US West Coast. No. PNNL-28107. Pacific Northwest National Lab.(PNNL), Richland, WA (United States), 2018.](https://doi.org/10.2172/1573061)
208 | 4) [Ahn, S. V.S. Neary, Allahdadi, N. and R. He, Nearshore wave energy resource characterization along the East Coast of the United States, Renewable Energy, 2021, 172](https://doi.org/10.1016/j.renene.2021.03.037)
209 | 5) [Yang, Z. and V.S. Neary, High-resolution hindcasts for U.S. wave energy resource characterization. International Marine Energy Journal, 2020, 3, 65-71](https://doi.org/10.36688/imej.3.65-71)
210 | 6) [Allahdadi, M.N., He, R., and Neary, V.S.: Predicting ocean waves along the US East Coast during energetic winter storms: sensitivity to whitecapping parameterizations, Ocean Sci., 2019, 15, 691-715](https://doi.org/10.5194/os-15-691-2019)
211 | 7) [Allahdadi, M.N., Gunawan, J. Lai, R. He, V.S. Neary, Development and validation of a regional-scale high-resolution unstructured model for wave energy resource characterization along the US East Coast, Renewable Energy, 2019, 136, 500-511](https://doi.org/10.1016/j.renene.2019.01.020)
212 | 
213 | ## Disclaimer and Attribution
214 | 
215 | The National Renewable Energy Laboratory (“NREL”) is operated for the U.S.
216 | Department of Energy (“DOE”) by the Alliance for Sustainable Energy, LLC
217 | ("Alliance"). Pacific Northwest National Laboratory (PNNL) is managed and
218 | operated by Battelle Memorial Institute ("Battelle") for DOE. As such the
219 | following rules apply:
220 | 
221 | This data arose from worked performed under funding provided by the United
222 | States Government. Access to or use of this data ("Data") denotes consent with
223 | the fact that this data is provided "AS IS," “WHEREIS” AND SPECIFICALLY FREE
224 | FROM ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND, INCLUDING BUT NOT LIMITED TO
225 | ANY IMPLIED WARRANTIES SUCH AS MERCHANTABILITY AND/OR FITNESS FOR ANY
226 | PARTICULAR PURPOSE. Furthermore, NEITHER THE UNITED STATES GOVERNMENT NOR ANY
227 | OF ITS ASSOCITED ENTITES OR CONTRACTORS INCLUDING BUT NOT LIMITED TO THE
228 | DOE/PNNL/NREL/BATTELLE/ALLIANCE ASSUME ANY LEGAL LIABILITY OR RESPONSIBILITY
229 | FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF THE DATA, OR REPRESENT THAT
230 | ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS. NO ENDORSEMENT OF THE DATA
231 | OR ANY REPRESENTATIONS MADE IN CONNECTION WITH THE DATA IS PROVIDED. IN NO
232 | EVENT SHALL ANY PARTY BE LIABLE FOR ANY DAMAGES, INCLUDING BUT NOT LIMITED TO
233 | SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES ARISING FROM THE PROVISION OF THIS
234 | DATA; TO THE EXTENT PERMITTED BY LAW USER AGREES TO INDEMNIFY
235 | DOE/PNNL/NREL/BATTELLE/ALLIANCE AND ITS SUBSIDIARIES, AFFILIATES, OFFICERS,
236 | AGENTS, AND EMPLOYEES AGAINST ANY CLAIM OR DEMAND RELATED TO USER'S USE OF THE
237 | DATA, INCLUDING ANY REASONABLE ATTORNEYS FEES INCURRED.
238 | 
239 | The user is granted the right, without any fee or cost, to use or copy the
240 | Data, provided that this entire notice appears in all copies of the Data. In
241 | the event that user engages in any scientific or technical publication
242 | utilizing this data user agrees to credit DOE/PNNL/NREL/BATTELLE/ALLIANCE in
243 | any such publication consistent with respective professional practice.
244 | 


--------------------------------------------------------------------------------
/WINDToolkit.md:
--------------------------------------------------------------------------------
  1 | # Wind Resource Data: Wind Integration National Dataset (WIND) Toolkit
  2 | 
  3 | ## Model
  4 | 
  5 | Wind resource data for North America was produced using the [Weather Research and Forecasting Model (WRF)](https://www.mmm.ucar.edu/weather-research-and-forecasting-model).
  6 | The WRF model was initialized with the European Centre for Medium Range Weather
  7 | Forecasts Interim Reanalysis (ERA-Interm) data set with an initial grid spacing
  8 | of 54 km.  Three internal nested domains were used to refine the spatial
  9 | resolution to 18, 6, and finally 2 km.  The WRF model was run for years 2007
 10 | to 2014. While outputs were extracted from WRF at 5 minute time-steps, due to
 11 | storage limitations instantaneous hourly time-step are provided for all
 12 | variables while full 5 min resolution data is provided for wind speed and wind
 13 | direction only.
 14 | 
 15 | The following variables were extracted from the WRF model data:
 16 | - Wind Speed at 10, 40, 60, 80, 100, 120, 140, 160, 200 m
 17 | - Wind Direction at 10, 40, 60, 80, 100, 120, 140, 160, 200 m
 18 | - Temperature at 2, 10, 40, 60, 80, 100, 120, 140, 160, 200 m
 19 | - Pressure at 0, 100, 200 m
 20 | - Surface Precipitation Rate
 21 | - Surface Relative Humidity
 22 | - Inverse Monin Obukhov Length
 23 | 
 24 | ## Countries
 25 | 
 26 | ### North America
 27 | 
 28 | Wind resource for North America was produce using three distinct WRF domains
 29 | shown below. The CONUS domain for 2007-2013 was run by 3Tier while 2014 as well
 30 | as all years of the Canada and Mexico domains were run under NARIS. The data
 31 | is provided in three sets of files:
 32 | 
 33 | - CONUS: Extracted exclusively from the CONUS domain
 34 | - Canada: Combined data from the Canada and CONUS domains
 35 | - Mexico: Combined data from the Mexico and CONUS domains
 36 | 
 37 | ### Asia
 38 | 
 39 | Wind resource was also produced fro the following countries and years:
 40 | 
 41 | - Bangladesh: 2014-2017
 42 | - Central Asia: 2015
 43 | - India: 2014
 44 | - Kazakhstan: 2015
 45 | - Philippines: 2017
 46 | - Vietnam: 2016-2018
 47 | 
 48 | ## Directory structure
 49 | 
 50 | Wind resource data is made available as a series of hourly .h5 files
 51 | corresponding to each country and year. Below is an example of the directory
 52 | structure for the CONUS domains:
 53 | - s3://nrel-pds-wtk/conus -> root directory for the conus domain
 54 |     - /v1.0.0 -> version 1 of the data corresponding to years 2007-2013, run by 3Tier
 55 |         - /wtk_conus_${year}.h5 -> Hourly data for all variables for the given year
 56 |         - /${year}/wind_${hub_height}.h5 -> Five minute wind resource data for the given year and hub height
 57 |     - /v1.1.0 -> version 1.1 of the data corresponding to 2014, run under NARIS with an updated version of WRF and new Boundary Layer Physics (PBL scheme)
 58 | 
 59 | The WIND Toolkit data is also available via HSDS at /nrel/wtk/${country}.
 60 | 
 61 | For examples on setting up and using HSDS please see our [examples repository](https://github.com/nrel/hsds-examples)
 62 | 
 63 | ## Data Format
 64 | 
 65 | The data is provided in high density data file (.h5) separated by year. The
 66 | variables mentioned above are provided in 2 dimensional time-series arrays with
 67 | dimensions (time x location). The temporal axis is defined by the `time_index`
 68 | dataset, while the positional axis is defined by the `meta` dataset. For
 69 | storage efficiency each variable has been scaled and stored as an integer. The
 70 | scale-factor is provided in the `scale-factor` attribute.  The units for the
 71 | variable data is also provided as an attribute (`units`).
 72 | 
 73 | ## Python Examples
 74 | 
 75 | Example scripts to extract wind resource data using python are provided below:
 76 | 
 77 | The easiest way to access and extract data from the Resource eXtraction tool
 78 | [`rex`](https://github.com/nrel/rex)
 79 | 
 80 | 
 81 | ```python
 82 | from rex import WindX
 83 | 
 84 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5'
 85 | with WindX(wtk_file, hsds=True) as f:
 86 |     meta = f.meta
 87 |     time_index = f.time_index
 88 |     wspd_100m = f['windspeed_100m']
 89 | ```
 90 | 
 91 | Note: `WindX` will automatically interpolate to the desired hub-height:
 92 | 
 93 | ```python
 94 | from rex import WindX
 95 | 
 96 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5'
 97 | with WindX(wtk_file, hsds=True) as f:
 98 |     print(f.datasets)  # not 90m is not a valid dataset
 99 |     wspd_90m = f['windspeed_90m']
100 | ```
101 | 
102 | `rex` also allows easy extraction of the nearest site to a desired (lat, lon)
103 | location:
104 | 
105 | ```python
106 | from rex import WindX
107 | 
108 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5'
109 | nwtc = (39.913561, -105.222422)
110 | with WindX(wtk_file, hsds=True) as f:
111 |     nwtc_wspd = f.get_lat_lon_df('windspeed_100m', nwtc)
112 | ```
113 | 
114 | or to extract all sites in a given region:
115 | 
116 | ```python
117 | from rex import WindX
118 | 
119 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5'
120 | state='Colorado'
121 | with WindX(wtk_file, hsds=True) as f:
122 |     co_wspd = f.get_region_df('windspeed_100m', state, region_col='state')
123 | ```
124 | 
125 | Lastly, `rex` can be used to extract all variables needed to run SAM at a given
126 | location:
127 | 
128 | ```python
129 | from rex import WindX
130 | 
131 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5'
132 | nwtc = (39.913561, -105.222422)
133 | with WindX(wtk_file, hsds=True) as f:
134 |     nwtc_sam_vars = f.get_SAM_df(nwtc)
135 | ```
136 | 
137 | If you would rather access the WIND Toolkit data directly using h5pyd:
138 | 
139 | ```python
140 | # Extract the average 100m wind speed
141 | import h5pyd
142 | import pandas as pd
143 | 
144 | # Open .h5 file
145 | with h5pyd.File('/nrel/wtk/conus/wtk_conus_2010.h5', mode='r') as f:
146 |     # Extract meta data and convert from records array to DataFrame
147 |     meta = pd.DataFrame(f['meta'][...])
148 |     # 100m windspeed dataset
149 |     wspd = f['windspeed_100m']
150 |     # Extract scale factor
151 |     scale_factor = wspd.attrs['scale_factor']
152 |     # Extract, average, and unscale windspeed
153 |     mean_wspd_100m = wspd[...].mean(axis=0) / scale_factor
154 | 
155 | # Add mean windspeed to meta data
156 | meta['Average 100m Wind Speed'] = mean_wspd_100m
157 | ```
158 | 
159 | ```python
160 | # Extract time-series data for a single site
161 | import h5pyd
162 | import pandas as pd
163 | 
164 | # Open .h5 file
165 | with h5pyd.File('/nrel/wtk/conus/wtk_conus_2010.h5', mode='r') as f:
166 |     # Extract time_index and convert to datetime
167 |     # NOTE: time_index is saved as byte-strings and must be decoded
168 |     time_index = pd.to_datetime(f['time_index'][...].astype(str))
169 |     # Initialize DataFrame to store time-series data
170 |     time_series = pd.DataFrame(index=time_index)
171 |     # Extract 100m wind speed, wind direction, temperature, and pressure
172 |     for var in ['windspeed_100m', 'winddirection_100m',
173 |     			'temperature_100m', 'pressure_100m']:
174 |     	# Get dataset
175 |     	ds = f[var]
176 |     	# Extract scale factor
177 |     	scale_factor = ds.attrs['scale_factor']
178 |     	# Extract site 100 and add to DataFrame
179 |     	time_series[var] = ds[:, 100] / scale_factor
180 | ```
181 | 
182 | ## References
183 | 
184 | For more information about the WIND Toolkit please see the [website.](https://www.nrel.gov/grid/wind-toolkit.html)
185 | Users of the WIND Toolkit should use the following citations:
186 | - [Draxl, C., B.M. Hodge, A. Clifton, and J. McCaa. 2015. Overview and Meteorological Validation of the Wind Integration National Dataset Toolkit (Technical Report, NREL/TP-5000-61740). Golden, CO: National Renewable Energy Laboratory.](https://www.nrel.gov/docs/fy15osti/61740.pdf)
187 | - [Draxl, C., B.M. Hodge, A. Clifton, and J. McCaa. 2015. "The Wind Integration National Dataset (WIND) Toolkit." Applied Energy 151: 355366.](https://www.sciencedirect.com/science/article/pii/S0306261915004237?via%3Dihub)
188 | - [Lieberman-Cribbin, W., C. Draxl, and A. Clifton. 2014. Guide to Using the WIND Toolkit Validation Code (Technical Report, NREL/TP-5000-62595). Golden, CO: National Renewable Energy Laboratory.](https://www.nrel.gov/docs/fy15osti/62595.pdf)
189 | - [King, J., A. Clifton, and B.M. Hodge. 2014. Validation of Power Output for the WIND Toolkit (Technical Report, NREL/TP-5D00-61714). Golden, CO: National Renewable Energy Laboratory.](https://www.nrel.gov/docs/fy14osti/61714.pdf)


--------------------------------------------------------------------------------
/dGen.md:
--------------------------------------------------------------------------------
  1 | # dGen Data: Distributed Generation Market Demand (dGen) Model
  2 | 
  3 | The Distributed Generation Market Demand (dGen) model simulates customer adoption of distributed energy resources (DERs) for residential, commercial, and industrial entities in the United States or other countries through 2050. 
  4 | 
  5 | The dGen model can be used for:
  6 | - Identifying the sectors, locations, and customers for whom adopting DERs would have a high economic value
  7 | - Generating forecasts as an input to estimate distribution hosting capacity analysis, integrated resource planning, and load forecasting
  8 | - Understanding the economic or policy conditions in which DER adoption becomes viable
  9 | - Illustrating sensitivity to market and policy changes such as retail electricity rate structures, net energy metering, and technology costs.
 10 | 
 11 | For access to technical papers and publications, more information, and to contact the dGen team please visit [dGen's NREL website](https://www.nrel.gov/analysis/dgen/) 
 12 | 
 13 | 
 14 | 
 15 | ## Directory format & structure
 16 | 
 17 | There are zipped .sql database files and zipped .pkl agent files. These are described in more detail below.
 18 | 
 19 | 
 20 | 
 21 | #### Template PostgreSQL Database:
 22 | 
 23 | - diffusion_load_profiles: This schema contains tables relating to the load profiles used by agents generated by the NREL Buildings team. These load profiles in parquet format, along with their metadata, are included in the data submission.
 24 | 
 25 | - diffusion_resource_solar: This schema contains a table, solar_resource_hourly, which contains the solar capacity factor for a given geographic-azimuth-tilt combination that matches to the same geographic-azimuth-tilt combination found in the pre-generated agents pickle file.
 26 | 
 27 | - diffusion_shared: This schema contains tables used for inputs in the input sheet. Please browse these tables as the names of these tables are representative of what these data are.
 28 | 
 29 | - diffusion_storage: This schema contains a single table related to PySAM storage inputs.
 30 | 
 31 | - diffusion_solar: This schema contains tables with additional data pertaining to modeling solar
 32 | constraints, incentives, and costs.
 33 | 
 34 | - diffusion_template: This schema contains tables that are copied to make a new schema upon
 35 | completing a dgen model run. Many of these are populated with data from the input sheet, from various joins/functions done within the database, and of course data from the model run.
 36 | 
 37 | 
 38 | #### Pre-Generated Agents & Load Profiles:
 39 | 
 40 | Every dGen analysis starts with a base agent file that uses statistically-sampled agents meant to be comprehensive and representative of the modeled population. They are comprehensive in the sense they are intended to represent the summation of underlying statistics, e.g. the total retail electricity consumed in the state. They are representative in that agents are sampled to represent heterogeneity of the population, e.g. variance in the cost of electricity. As described in ( Sigrin et al. 2018) “during agent creation, each county in United States is seeded with sets of residential, commercial, and industrial agents, each instantiated at population-weighted random locations within the county’s geographic boundaries. Agents are referenced against geographic data sets to establish a load profile, solar resource availability, a feasible utility rate structure, and other techno-economic attributes specific to the agent’s location. Each agent is assigned a weight that is proportional to the number of customers the agent represents in its county. In this context, agents can be understood as statistically representative population clusters and do not represent individual entities.”
 41 | 
 42 | Variable definitions and data types can be found in the data dictionary.
 43 | 
 44 | 
 45 | 
 46 | ## Restoring Databases
 47 | 
 48 | Example scripts to restore unzipped database files are provided below. For full documentation of the dGen Model and setting up and using the dGen Model, please visit our [open source repository](https://github.com/NREL/dgen)
 49 | 
 50 | 1. Create the docker container and postgreSQL server:
 51 | 
 52 | ```
 53 |    $ docker run --name postgis_1 -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -d mdillon/postgis
 54 | ```
 55 | 
 56 | 2. Connect to the postgreSQL server on the docker container and create a new database:
 57 | 
 58 | ```
 59 |    $ docker container ls
 60 |    $ docker exec -it <container id> psql -U postgres
 61 |    $ postgres=# CREATE DATABASE dgen_db;
 62 | ```
 63 | 
 64 | 3. After downloading and unzipping the data, run the following in the command line (replacing 'path_to_where_you_saved_database_file' below with the actual path where you saved your database file): 
 65 | 
 66 | ```
 67 |    $ cat /path_to_where_you_saved_data/dgen_db.sql | docker exec -i <container id> psql -U postgres -d dgen_db
 68 | ```
 69 | 
 70 | Note, make sure linux commands are enabled in order to properly restore the database. 
 71 | Also note that the full database can take around an hour to restore. All of the database files will take time to restore, so please be patient and plan accordingly.
 72 | 
 73 | 
 74 | ## License
 75 | 
 76 | 
 77 | The open source dGen model is licensed under the BSD 3-Clause License
 78 | 
 79 | Copyright (c) 2020, Alliance for Sustainable Energy, LLC
 80 | All rights reserved.
 81 | 
 82 | Redistribution and use in source and binary forms, with or without
 83 | modification, are permitted provided that the following conditions are met:
 84 | 
 85 | * Redistributions of source code must retain the above copyright notice, this
 86 |   list of conditions and the following disclaimer.
 87 | 
 88 | * Redistributions in binary form must reproduce the above copyright notice,
 89 |   this list of conditions and the following disclaimer in the documentation
 90 |   and/or other materials provided with the distribution.
 91 | 
 92 | * Neither the name of the copyright holder nor the names of its
 93 |   contributors may be used to endorse or promote products derived from
 94 |   this software without specific prior written permission.
 95 | 
 96 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 97 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 98 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 99 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
100 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
101 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
102 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
103 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
104 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
105 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE
106 | 


--------------------------------------------------------------------------------
/pvdaq.md:
--------------------------------------------------------------------------------
  1 | #  Photovoltaic Field Array Time-Series (PVDAQ)
  2 | 
  3 | ## Description
  4 | 
  5 | The Photovoltaic field array (PVDAQ) data is composed of time-series, raw performance data taken through a variety of sensors connected to a PV array. The data is typically taken at 15 minute averaged resolution, but can vary between systems. NREL source data is typically aggregated into the main database every 24 hours. Data is then processed to the NREL PVDAQ data lake on a monthly basis.
  6 | 
  7 | Some datasets have been acquired through previous research agreements with site owners, and with their permission, have now been made public. Those datasets are static and do not show any additional data increments. 
  8 | 
  9 | Our researchers utilize the data to monitor the durability of PV systems under a wide variety of conditions. Similar data within NREL archives also provides insites into experimental emerging technology systems. Addtionally, the data has proven useful in assisting in the development of data quality assurance software, and data analysis and machine learning tools.
 10 | 
 11 | All Data for PVDAQ and DOE Solar Data Prize is covered under the [DOI:10.25984/1846021](https://dx.doi.org/10.25984/1846021)
 12 | 
 13 | ### 2023 Solar Data Prize
 14 | 
 15 | The American-Made Solar Data Bounty Prize was open to U.S.-based PV system owners and entities authorized to share data from PV systems. These owners were invited to submit at least five years of historical time series data at a minimum of 15-minute time resolution for one or two of their systems. Datasets collected through this prize are meant to assist commercial and academic research and development efforts seeking to improve the accuracy of PV system modeling, and thus lower the risk associated with developing and operating those assets.
 16 | 
 17 | The Data Prize entries were submitted in one of two categories: systems < 5 MW DC capacity, and those >5 MW DC capacity. The data from the submissions are available to the public for download as part of the PVDAQ Data repository. The following are the system IDs of the winners, in numerical order, not placement by award.
 18 | 
 19 | #### < 5 MW DC system IDs:
 20 | 
 21 | * **2105** - A 237 kW multi building roof top deployment with highly variable mount orientations in Hawaii 
 22 | * **2107** - A 893 kW Fixed ground-mount facility in a highly active agricultural area in California
 23 | * **9068** - A 4.7 MW Single-axis tracked facility in Colorado
 24 | 
 25 | #### > 5 MW DC system IDs:
 26 | 
 27 | * **7333** - A 257 MW Single-axis tracker facility in California. This dataset is at a very high time resolution of 10s for all channels.
 28 | * **9069** - A 38.7 MW Fixed ground-mount facility in Georgia
 29 | 
 30 | #### Details on the Prize Datasets
 31 | 
 32 | These datasets differ from the regular PVDAQ repository storage architecture (See below) where data is broken down by year, month, and day. In each of the prize repositories the available metadata, any support files, and the entire dataset as was submittied and curated is available. Some of these the datasets are broken down by sensor channel set type, and in others the data is labeled by sensor channel tag names or bundles.
 33 | 
 34 | **Note:** *Some of the prize datasets are extremely large and can have 10s of GBs of data. These could take a long time to download so please plan accordingly*
 35 | 
 36 | ## Data Dictionary
 37 | 
 38 | The PVDAQ data is partitioned by system_id, year, month and day. Raw data is reported at 15 minute increments in ISO 8601 date and time. The timestamp is striped and data is averaged daily. An example file output is included here.   
 39 | 
 40 | ## Data Tables
 41 | 
 42 | * pvdaq_inverters - metadata about the inverter hardware on the system
 43 | * pvdaq_meters - metadata about the meter hardware on the system
 44 | * pvdaq_metrics - metadata about the sensor values captured as part of the PV time-series
 45 | * pvdaq_mount - mounting configuration of the array or subsets of the array
 46 | * pvdaq_other_instruments - metadata about other ancillary equipment fielded on the system
 47 | * pvdaq_site - geo location details of a PV array
 48 | * pvdaq_system -  basic details about a PV array
 49 | * pvdaq_pvdata - PV time series data.
 50 | 
 51 | ## Table Schemas
 52 | 
 53 | ### pvdaq_inverters
 54 | 
 55 | * inverter_id (string) - database primary key
 56 | * name (string) - alias given to the inverter by the array owner or autogenerated
 57 | * manufacturer (string)
 58 | * model (string)
 59 | * serial_num (string)
 60 | * num_strings (string)- how many strings are tied to the inverter
 61 | * modules_per_string (string)- how many modules are tied to each string
 62 | * type (string)- indicates type of inverter such as micro, string, central, etc.
 63 | * quantity (string)- number of inverters fielded at the array site
 64 | * time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval
 65 | * site_id (string) - associated site
 66 | * system_id (bigint) - associated system
 67 | * comments (string)- any additional details
 68 | 
 69 | ### pvdaq_meters
 70 | 
 71 | * meter_id (string)- primary key of the meter
 72 | * name (string) - alias given to the meter by the array owner or autogenerated
 73 | * manufacturer (string)
 74 | * model (string)
 75 | * serial_num (string)
 76 | * time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval
 77 | * type (string)- is the type of the meter production, site or revenue
 78 | * site_id (string) - associated site
 79 | * system_id (bigint) - associated system
 80 | * comments (string)- any additional details
 81 | 
 82 | ### pvdaq_metrics
 83 | 
 84 | * system_id (int) - associated system for the metric
 85 | * metric_id (int) - primary key of the metric
 86 | * sensor_name (string) - referenced name produced by the instrumentation or tagged by array owner
 87 | * common_name (string) -  a general grouping of sensor types (e.g. DC voltage, AC energy, POA irradiance)
 88 | * raw_units (string) - raw unscaled or uncalibrated units of the values produced by the sensor
 89 | * units (string) - units of the values produced by the sensor. Could be modified raw_units by calc_scale and calc_offset.
 90 | * calc_scale (double) - scaling for adjusting the sensor values (default 1)
 91 | * calc_offset (double) - offset for adjusting the sensor values (default 0)
 92 | * calc_details (string) - mathematical equation used to calculate the sensor value, if needed.
 93 | * aggregation_type (string) -  avg, min, max, sample, union, median, or calculated
 94 | * source_type (string) - What is generating the sensor value (Inverters, meters or other instruments). Can be NULL
 95 | * source_id (int) - The assicated primary key of the senor type generating the value. Can be NULL
 96 | * comments (string) - any additional details
 97 | * standard_name (string)- a unique autogenerated name based on either the primary key and sensor_name or a combination of common_name, sensor_type, and sensor_id
 98 | 
 99 | ### pvdaq_modules
100 | 
101 | * module_id (string)- the module primary key
102 | * name (string) - alias given to the module by the array owner or autogenerated
103 | * inverter_id (string)- the associate inverter primary key tied to this module, if known.
104 | * manufacturer (string)
105 | * model (string)
106 | * serial_num (string)
107 | * type (string)- what is the technology of the module: CdTe, Crystalline Si, multicrystalline Si, etc.
108 | * quantity (string) - number of modules installed on system
109 | * reference_module (string)- is this a reference module
110 | * start_on (string) - date module was installed
111 | * end_on (string) - date module was removed
112 | * site_id (string) - associated site
113 | * system_id (bigint) - associated system
114 | * comments (string)- any additional details
115 | 
116 | 
117 | ### pvdaq_mount
118 | 
119 | * mount_id (bigint) - the primary key for the mount
120 | * name (string) - alias given to the mount by the array owner or autogenerated
121 | * manufacturer (string)
122 | * model (string)
123 | * azimuth (string)- pointing of the mount in compass direction decimal degrees. 0 degrees = north, 90 degrees = east
124 | * tilt (string) - angle of mount pointing in degrees
125 | * tracking (string)- is the mount tracking or fixed
126 | * type (string)- configuration of the mount: ground, roof, canopy, etc.
127 | * site_id (string) - associated site
128 | * system_id (bigint) - associated system
129 | * comments (string)- any additional details
130 | 
131 | ### pvdaq_other_instruments
132 | 
133 | * instrument_id (string) - the primary key of the instrument
134 | * name (string) - alias given to the other instrument by the array owner or autogenerated
135 | * manufacturer (string)
136 | * model (string)
137 | * serial_num (string)
138 | * time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval
139 | * type (string) - identifies what the instrument is: ref cell, weather station, thermocouple, etc.
140 | * site_id (string) - associated site
141 | * system_id (bigint) - associated system
142 | * comments (string)- any additional details
143 | 
144 | ### pvdaq_site
145 | 
146 | * site_id (string) - primary key of the site
147 | * system_id (bigint) - associated system
148 | * public_name (string) - unique given name to the site
149 | * location (string) - text descriptive name of site location. Could include street address type details
150 | * latitude (string) - decimal latitude geo location
151 | * longitude (string) - decimal longitude geo location
152 | * elevation (string) - distance in meters above sea level, if known
153 | * av_pressure (string) - average annual atmospheric pressure at site in psi
154 | * av_temp (string)- average ambient temperature in degrees Celsius at site
155 | * climate_type (string) - The Koppen-Geiger classifier for the site location
156 | 
157 | ### pvdaq_system
158 | 
159 | * system_id (bigint) - primary key of the system
160 | * site_id (bigint)- associated site representing geolocation details for system
161 | * public_name (string)- unique name given to the array
162 | * area (string)- covered area of the array in square meters
163 | * power (string)- maximum calculated or nameplate DC power of the array in kW
164 | * started_on (string)- date system became active
165 | * ended_on (string) - day system was deactivated
166 | * comments (string) - any additional details
167 | 
168 | ### pvdaq_pvdata
169 | 
170 | * system_id (string) (Partitioned) - associated system for the data
171 | * measured_on (timestamp) - local timestamp as generated by the instrumentation. Could include DST.
172 | * utc_measured_on (timestamp) - calculated UTC timestamp from the measured_on value. Could include DST.
173 | * metric_id (int) - associated metric_id for the data
174 | * value (double) - value of the data. Join to metric_id table record for units or other details.
175 | 
176 | Note: not every site or system_id will contain data for each attribute included in the data dictionary.  
177 | 
178 | ## Data Format
179 | 
180 | The PVDAQ Dataset is made available in Parquet format on AWS and is partitioned by `year`, `month`, `day` in AWS Glue and Athena. The schema may change across dataset years on S3.
181 | 
182 | Partition Keys of `pvdaq_pvdata` table,
183 | 
184 | * year (string) (Partitioned)
185 | * month (string) (Partitioned)
186 | * day (string) (Partitioned)
187 | 
188 | ## S3 Paths
189 | 
190 | * s3://oedi-data-lake/pvdaq/inverters/*.parquet
191 | * s3://oedi-data-lake/pvdaq/meters/*.parquet
192 | * s3://oedi-data-lake/pvdaq/metrics/*.parquet
193 | * s3://oedi-data-lake/pvdaq/mount/*.parquet
194 | * s3://oedi-data-lake/pvdaq/other_instruments/*.parquet
195 | * s3://oedi-data-lake/pvdaq/site/*.parquet
196 | * s3://oedi-data-lake/pvdaq/system/*.parquet
197 | * s3://oedi-data-lake/pvdata/system_id=<x>/year=<x>/month=<x>/day=<x>/*.parquet
198 | 
199 |  - `s3://oedi-data-lake/pvdaq/`
200 | 
201 | 
202 | ## Bulk Downloads from the OEDI site
203 | 
204 | The PVDAQ Access repository contains a small python program that can bundle all the daily data from a site  and download it onto your local system. If accessing the data for a Solar Data Prize site, some adjustment to the code would be necessary, since all the data sits within a single directory for each site.
205 | 
206 | [PVDAQ Access](https://github.com/NREL/pvdaq_access)
207 | 
208 | 
209 | ## Data Sources
210 | 
211 | [https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html](https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html)
212 | 
213 | ## Model, Methods, and Analysis Tools
214 | 
215 | #### Rd Tools<br/>
216 | RdTools is an open-source library to support reproducible technical analysis of time series data from photovoltaic energy systems, particularly degredation effects.<br/>
217 | [Rd Tools](https://www.nrel.gov/pv/rdtools.html)
218 | 
219 | #### PV Lib
220 | A toolbox provides a set of well-documented functions for simulating the performance of photovoltaic energy systems.<br/>
221 | [pv_lib-toolbox](https://pvpmc.sandia.gov/applications/pv_lib-toolbox/)
222 |  
223 | #### PVAnalytics
224 | PVAnalytics is a python library that supports analytics for PV systems. It provides functions for quality control, filtering, and feature labeling and other tools supporting the analysis of PV system-level data.
225 | [PV_Analytics[(https://github.com/pvlib/pvanalytics)
226 | 
227 | 
228 | ### Other Data Sources
229 | 
230 | #### DuraMAT
231 | A multi-institution consortium focused on discovery, development, de-risking, and enabling the commercialization of new materials and designs for PV modules.<br/>
232 | [Main Site](https://www.duramat.org/) <br/>
233 | 
234 | * [Validation models for PV performance](https://datahub.duramat.org/dataset/data-for-validating-models-for-pv-module-performance/)
235 | * Machine Learning training set for validation of [satellite imagery of PV Array sites](https://datahub.duramat.org/dataset/satellite-images-training-and-validation-set)
236 | * Machine Learning training set for [Detection of Inverter Clipping - Real Data](https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data)
237 | * Machine Learning training set for [Detection of Inverter Clipping - Simulated Data](https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-simulated-data) 
238 | * Machine learning training set for [ Detection of soiling cleaning events](https://datahub.duramat.org/dataset/automated-pv-systems-cleaning-and-detection)
239 | * Example data of [Soiling signal in time-series data](https://datahub.duramat.org/dataset/pvdaq-time-series-with-soiling-signal)
240 | * Spectral Irradiance Data Sets [Albuqueque](https://datahub.duramat.org/project/spectral-irradiance-data-and-resources)
241 | 
242 | ### Addtional Resources
243 | 
244 | [https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html](https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html)
245 | 
246 | [https://www.nrel.gov/docs/fy17osti/69131.pdf](https://www.nrel.gov/docs/fy17osti/69131.pdf)
247 | 
248 | ### Other Data Sources
249 | 
250 | #### DuraMAT
251 | A multi-institution consortium focused on discovery, development, de-risking, and enabling the commercialization of new materials and designs for PV modules.<br/>
252 | [Main Site](https://www.duramat.org/) <br/>
253 | 
254 | * [Validation models for PV performance](https://datahub.duramat.org/dataset/data-for-validating-models-for-pv-module-performance/)
255 | * Machine Learning training set for validation of [satellite imagery of PV Array sites](https://datahub.duramat.org/dataset/satellite-images-training-and-validation-set)
256 | * Machine Learning training set for [Detection of Inverter Clipping - Real Data](https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data)
257 | * Machine Learning training set for [Detection of Inverter Clipping - Simulated Data](https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-simulated-data) 
258 | * Machine learning training set for [ Detection of soiling cleaning events](https://datahub.duramat.org/dataset/automated-pv-systems-cleaning-and-detection)
259 | * Example data of [Soiling signal in time-series data](https://datahub.duramat.org/dataset/pvdaq-time-series-with-soiling-signal)
260 | * Spectral Irradiance Data Sets [Albuqueque](https://datahub.duramat.org/project/spectral-irradiance-data-and-resources)
261 | 
262 | ### Addtional Resources
263 | 
264 | [https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html](https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html)
265 | 
266 | [https://www.nrel.gov/docs/fy17osti/69131.pdf](https://www.nrel.gov/docs/fy17osti/69131.pdf)
267 | 
268 | 
269 | ## Python Connection Examples
270 | 
271 | Athena data connection using PyAthena:
272 | ```python
273 | 
274 | import pandas as pd
275 | from pyathena import connect
276 | 
277 | conn = connect(
278 |     s3_staging_dir='s3://<user-defined>/<>', ##user defined staging directory
279 |     region_name='us-west-2',
280 |     work_group='<USER SPECIFIC WORKGROUP>'  ##specify workgroup if exists
281 | )
282 | ```
283 | 
284 | Example #1: Querying with a limit:
285 | ```python
286 | df = pd.read_sql("SELECT * FROM oedi.<> limit 8;", conn)
287 | ```
288 | 
289 | For jupyter notebook example see our notebook which includes partitions and data dictionary:
290 | [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples)
291 | 
292 | 
293 | ## Disclaimer and Attribution
294 | 
295 | Copyright (c) 2024, Alliance for Sustainable Energy LLC, All rights reserved.
296 | 
297 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
298 | 
299 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
300 | 
301 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
302 | 
303 | * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
304 | 
305 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
306 | 


--------------------------------------------------------------------------------
/windAIBench.md:
--------------------------------------------------------------------------------
 1 | ## Data Description
 2 | The floris_data.h5 file is a robust dataset providing 252,500 samples of diverse wind plant layouts operating under a wide range of yawing and atmospheric conditions. This data submission should be considered a benchmark dataset for comparison with new ML approaches. Wind plant layouts were randomly sampled using a specialized Plant Layout Generator (PLayGen). Given a user-specified plant size and average turbine spacing, PLayGen generates a realistic layout that is reflective of one of the four canonical configurations: (i) cluster, (ii) single string, (iii) multiple string, (iv) parallel string. 500 unique wind plant layouts were sampled with the number of turbines randomly sampled from N_turb  ∈[25,200] and the turbine spacing randomly sampled from Δ_turb  ∈[3D,10D], where D denotes the turbine rotor diameter. For each layout 500 different sets of atmospheric conditions with wind speeds and directions were sampled uniformly from u ∈[0,25] m/s and θ∈[0°,360°], respectively, where u is wind speed and θ is wind direction. Turbulence intensity was sampled uniformly between low (6%), medium (8%), and high (10%). For each atmospheric inflow scenario, the individual turbine yaw angles were randomly sampled from a one-sided truncated Gaussian on the interval [0°,30°] oriented relative to wind inflow direction. Given these randomly sampled wind plant layouts, controls, and atmospheric conditions, the power generation outputs were computed using FLORIS, an open-source analytic engineering flow model created by NREL that predicts steady state hub height velocity in normal or yawed operating conditions. The IEA onshore reference turbine, which has a 130 m rotor diameter, a 110 m hub height, and a rated power capacity of 3.4 MW was used as our fixed turbine technology throughout all simulations. 
 3 | The data generation process resulted in 250,000 individual samples. We supplement this data by selecting a subset of cases (50 atmospheric conditions from 50 layouts each for a total of 2,500 cases) for which FLORIS was re-run with wake steering control optimization. In these additional 2,500 cases the turbine yaw angles are optimized to maximize total plant power production. The data generation process was completed between X-Y on the Eagle HPC system located at NREL’s Golden, CO Campus. The data was reformatted and preprocessed for OEDI submission in May 2023. The data was generated as part of a broader effort to support the IEA ‘Net Zero by 2050’ roadmap which calls for an 11x increase in wind energy. Meeting these deployment goals raises many engineering challenges. Thus, the Department of Energy is looking to data-driven modeling to help address them. The role of AI/ML in supporting efforts to address these problems requires novel methods to handle modeling complexities. However, AI/ML research in wind energy has been performed in a nonsystematic, ad hoc manner, making it difficult to identify promising approaches and target investments appropriately. Future R&D planning requires a reliable framework for comparing different methods. In AI/ML research, benchmark data sets with well-defined problem statements have enabled consistent comparisons for emerging approaches and systematic ablation studies. Thus, this submission is encompassed in a larger goal to define benchmark data sets, problems, and metrics for wind energy research to facilitate a systematic approach to developing and comparing emerging AI/ML technologies that can inform future research investments.
 4 | 
 5 | ## HDF5 – Hierarchical Data Format
 6 | All data is contained within a singular HDF5 file, floris_data.h5. HDF5 is a Hierarchical Data Format which functions similarly to a file management system. The .h5 file itself is an object that acts as a container, or group, that can hold a variety of heterogeneous data objects or datasets. The two primary object types within a .h5 file are groups and datasets. Groups function similarly to a directory or folder that can contain objects, known as members, such as datasets or other groups. The root group of a .h5 file is denoted by a forward slash /. Objects within a .h5 file are often described with absolute path names, starting from the root group. For example: /Layout000/Scenarios/Scenario000 is the absolute path for the zeroth layout and corresponding zeroth scenario.
 7 | For additional information on the HDF5 file format and API please see the documentation below:
 8 | https://docs.hdfgroup.org/hdf5/develop/_getting_started.html 
 9 | 
10 | ## Data Structure
11 | The floris_data.h5 file is structured as follows:
12 | |-- root (group, 500 members)
13 | |-- LayoutXXX (group, 3 members)
14 |  		|-- Number of Turbines (dataset)
15 | 		|-- Scenarios (group, 500 members)
16 |   			|-- ScenarioXXX (group, 6 members)
17 | 				|-- Optimal Yaw (group, 3 members)**
18 | 					|-- Turbine Power (dataset)
19 | 					|-- Turbine Wind Speed (dataset)
20 | 					|-- Yaw Angles (dataset)
21 |   				|-- Turbine Power (dataset)
22 |    				|-- Turbine Wind Speed (dataset)
23 |   				|-- Turbulence Intensity (dataset)
24 |    				|-- Wind Direction (dataset)
25 |    				|-- Wind Speed (dataset)
26 |    				|-- Yaw Angles (dataset)
27 |  		|--  Turbines (group, variable number of members)
28 |    			|-- TurbineXXX (group, 4 members)
29 |    				|-- Hub Height (dataset)
30 |    				|-- Rotor Diameter (dataset)
31 |    				|-- X Location (dataset)
32 |    				|-- Location (dataset) 
33 | 
34 | ** The Optimal Yaw Group is only present in 2500 ScenarioXXX groups. These 2500 groups represent the layouts and scenarios where FLORIS was re-run using wake steering and the yaw angles were optimized for power production. The standard datasets within these ScenarioXXX groups represent the FLORIS simulation with no yaw optimization. For convenience and ease of parsing, an opt_yaw_list.csv file has been provided for users to easily identify the LayoutXXX/ScenarioXXX groups which contain the Optimal Yaw Group and corresponding datasets.
35 | 
36 | In the structure seen above “XXX” is used to indicate there are multiple groups. For the LayoutXXX and ScenarioXXX groups these values range from 000 to 499 indicating 500 groups for each. For the TurbineXXX group these values are variable, indicating a different number of turbines for each LayoutXXX, ranging between 25 to 200 inclusive.
37 | 
38 | ## Data Dictionary
39 | Name	Type	Shape	Data Type	Units
40 | Number of Turbines	Scalar	()	i4, 4-byte integer	N/A
41 | Turbine Power	Vector	(# turbines,)	f4, 4-byte float	W
42 | Turbine Wind Speed	Vector	(# turbines,)	f4, 4-byte float	m/s
43 | Turbulence Intensity	Scalar	()	f4, 4-byte float	N/A
44 | Wind Direction	Scalar	()	f4, 4-byte float	degrees
45 | Wind Speed	Scalar	()	f4, 4-byte float	m/s
46 | Yaw Angles	Vector	(# turbines,)	f4, 4-byte float	degrees
47 | Hub Height	Scalar	()	f4, 4-byte float	m
48 | Rotor Diameter	Scalar	()	f4, 4-byte float	m
49 | X Location	Scalar	()	f4, 4-byte float	m
50 | Y Location	Scalar	()	f4, 4-byte float	m
51 | 
52 | ## Submission Keywords
53 | energy, power, wind, AI, ML, AI/ML, wind plant, benchmark
54 | 


--------------------------------------------------------------------------------