├── ATB.md ├── BUTTER.md ├── BuildingsBench.md ├── ImperialValleyDarkFiber ├── DarkFiber.md └── DarkFiber_Tutorial_Notebook.ipynb ├── NREL_Building_Stock ├── Individual_Building_Data.md └── Query_ComStock_Athena.md ├── NSO.md ├── NSRDB.md ├── PVROOFTOPS.md ├── PVROOFTOPS_PR.md ├── PoroTomo ├── PoroTomo.md ├── PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_SEGY.ipynb ├── PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hdf5.ipynb ├── PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hsds.ipynb └── README.md ├── SMART-DS ├── Readme.md ├── Readme.pdf └── figures │ ├── AUS │ └── all_labels.PNG │ ├── CYME │ ├── import_timeseries.PNG │ ├── import_voltvar.PNG │ ├── importing.PNG │ ├── networks.PNG │ ├── simplified_view.PNG │ ├── simplified_view_zoomed.PNG │ ├── substation.PNG │ └── timeseries_results.PNG │ ├── GIS │ ├── layer_examples.PNG │ └── missing_layers.png │ ├── GSO │ ├── all_labels.PNG │ └── all_labels2.PNG │ ├── OpenDSS │ ├── feeder.PNG │ ├── monitor_current.PNG │ ├── monitor_kva.PNG │ ├── profile.PNG │ └── running_dss.PNG │ ├── SAF │ └── all_labels.png │ ├── SFO │ ├── downtown_labels.PNG │ ├── east_labels.PNG │ ├── north_labels.PNG │ └── south_labels.PNG │ ├── analysis │ ├── pu_voltages_histogram.png │ └── pu_voltages_percentiles.png │ └── load_curves │ ├── total_load_200.png │ └── total_load_244.png ├── Sup3rCC.md ├── Template.md ├── TrackingtheSun.md ├── UMCM_Hurricanes.md ├── US_Wave.md ├── WINDToolkit.md ├── dGen.md ├── pvdaq.md └── windAIBench.md /ATB.md: -------------------------------------------------------------------------------- 1 | # Annual Technology Baseline (ATB) 2 | 3 | ## Description 4 | 5 | The NREL Annual Technology Baseline (ATB) provides a consistent set of technology and performance data for energy analysis. This dataset was developed with funding from the U.S. Department of Energy's, 6 | Office of Energy Efficiency & Renewable Energy. 7 | 8 | To inform electric and transportation sector analysis in the United States, each year NREL provides a robust set of modeling input assumptions for energy technologies, and a diverse set of potential electricity generation futures or modeling scenarios (Standard Scenarios). 9 | 10 | The ATB is a populated framework to identify technology-specific cost and performance paramters or other investment decision metrics across a range of fuel price conditions as well as site-specific conditions for electric generation technologies at present and with projections through 2050. 11 | 12 | ## Model 13 | 14 | The purpose of the ATB is to provide CAPEX, O&M, and capacity factor estimates for Base Year and future year projections representing three levels of technical innovation (conservative, moderate, and advanced) for use in electric sector models. 15 | 16 | The R&D Only cases are intended to reflect fundamental technology changes over time — not short-term market variations in pricing, not changes in interest rates or other project finance elements, and not macroeconomic influences such as commodity price fluctuations. These cases attempt to estimate the potential effects of technology innovation across the renewable electricity generation technologies under comparable levels of probability. This is inherently uncertain. 17 | 18 | The Market + Policies Case approximate the costs of electricity generation plants with Independent Power Producer financial terms, covering the energy component of electric system planning and operation. Important items that are not included in these costs limit the validity of comparisons across technologies. The following table summarizes these limitations, identifies other analyses, tools, and data sets that are more complete sources for these items, and suggests applications that are affected by these limitations of the ATB. 19 | 20 | See [technical limitations on the ATB website](https://atb.nrel.gov/electricity/2024/technical_limitations) for more detailed information. 21 | 22 | ## Directory structure 23 | 24 | The CSV files summarize in database-friendly form the capital expenditures, operations expenditures, and capacity factor, as well as the financial assumptions and the levelized cost of energy, for each technology. They are reformatted from the summary section of the spreadsheet, which documents the underlying calculations and data. The same data is also available in the Apache Parquet format. 25 | 26 | The files are stored by type and then by year. The file types are parquet and csv. The files can be accessed in the [Department of Energy's Open Energy Data Initiative (OEDI) in the Registry of Open Data on AWS](https://registry.opendata.aws/oedi-data-lake/), or in the [bucket viewer](https://data.openei.org/s3_viewer?bucket=oedi-data-lake&prefix=ATB%2F). 27 | 28 | - `s3://oedi-data-lake/ATB/electricity` 29 | 30 | ## Vintage 31 | 32 | Annual data for 2015 through the previous year can be found at [https://atb.nrel.gov/archive](https://atb.nrel.gov/archive) 33 | 34 | The current year data can be found at [https://atb.nrel.gov/data](https://atb.nrel.gov/data) 35 | 36 | ## Data Format 37 | 38 | The most recent annual data is provided in CSV and Apache Parquet format. The data structure is as follows: 39 | 40 | Column | Type | Description 41 | -- | -- | -- 42 | `atb_year` | bigint | year of ATB publication 43 | `core_metric_key` | string | concatenated unique key 44 | `core_metric_parameter` | string | technology and cost performance parameters 45 | `core_metric_case` | string | financial case (R&D or Market) 46 | `crpyears` | bigint | cost recovery period, years 47 | `technology` | string | technology 48 | `technology_alias` | string | technology alias 49 | `techdetail` | string | technology specific classifications and sub-groups 50 | `techdetail2` | string | technology specific classifications and sub-groups 51 | `resourcedetail` | string | resource specific classifications and sub-groups 52 | `display_name` | string | technology specific classifications and sub-groups for use in Tableau 53 | `default` | string | default or subgroup technology 54 | `scale` | string | null, utility, commercial or resdidential 55 | `maturity` | string | nascent or mature 56 | `scenario` | string | moderate, conservative and advanced 57 | `core_metric_variable`| string | projected year 58 | `units` | string | units 59 | `value` | double | value 60 | 61 | ## Python Examples 62 | 63 | ```python 64 | 65 | import pandas as pd 66 | from pyathena import connect 67 | 68 | conn = connect( 69 | s3_staging_dir='s3:///', ##user defined staging directory 70 | region_name='us-west-2', 71 | work_group='' ##specify workgroup if exists 72 | ) 73 | 74 | df = pd.read_sql("SELECT distinct technology, techdetail from oedi_atb.atb_electricity_parquet_2024 where techdetail <> '*' order by technology, techdetail;",conn) 75 | ``` 76 | 77 | OEDI has created a set of tools to facilitate access to open energy data sets, including ATB. Please visit the [open-data-access-tools documentation page](https://openedi.github.io/open-data-access-tools/) for more info. You can find jupyter notebook examples that show how to use the tools in our [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples) 78 | 79 | ## References 80 | 81 | Please cite the most relevant publication below when referencing this dataset: 82 | 83 | 1) NREL (National Renewable Energy Laboratory). 2024. "2024 Annual Technology Baseline." Golden, CO: National Renewable Energy Laboratory. NREL (National Renewable Energy Laboratory). 2024. "2024 Annual Technology Baseline." Golden, CO: National Renewable Energy Laboratory. [https://atb.nrel.gov/](https://atb.nrel.gov/). 84 | 85 | ## Errata 86 | 87 | Documentaion for the ATB errata can be found on the ATB website: 88 | 89 | [https://atb.nrel.gov/electricity/2024/errata](https://atb.nrel.gov/electricity/2024/errata) 90 | 91 | The Data in the S3 bucket maintains the most up to date version of the parquet and csv files for the years provided. 92 | 93 | ## Disclaimer and Attribution 94 | 95 | DISCLAIMER AGREEMENT 96 | 97 | These detailed electricity generation technology cost and performance data ("Data") are provided by the National Renewable Energy Laboratory ("NREL"), which is operated by the Alliance for Sustainable Energy LLC ("Alliance") for the U.S. Department of Energy (the "DOE"). 98 | 99 | It is recognized that disclosure of these Data are provided under the following conditions and warnings: (1) these Data have been prepared for reference purposes only; (2) these Data consist of forecasts, estimates, or assumptions made on a best-efforts basis, based upon expectations of current and future conditions at the time they were developed; and (3) these Data were prepared with existing information and are subject to change without notice. 100 | 101 | The user understands that DOE/NREL/ALLIANCE are not obligated to provide the user with any support, consulting, training or assistance of any kind with regard to the use of the Data or to provide the user with any updates, revisions or new versions thereof. DOE, NREL, and ALLIANCE do not guarantee or endorse any results generated by use of the Data, and user is entirely responsible for the results and any reliance on the results or the Data in general. 102 | 103 | The user understands that DOE/NREL/ALLIANCE are not obligated to provide the user with any support, consulting, training or assistance of any kind with regard to the use of the Data or to provide the user with any updates, revisions or new versions thereof. DOE, NREL, and ALLIANCE do not guarantee or endorse any results generated by use of the Data, and user is entirely responsible for the results and any reliance on the results or the Data in general. 104 | 105 | USER AGREES TO INDEMNIFY DOE/NREL/ALLIANCE AND ITS SUBSIDIARIES, AFFILIATES, OFFICERS, AGENTS, AND EMPLOYEES AGAINST ANY CLAIM OR DEMAND, INCLUDING REASONABLE ATTORNEYS' FEES, RELATED TO USER'S USE OF THE DATA. THE DATA ARE PROVIDED BY DOE/NREL/ALLIANCE "AS IS," AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL DOE/NREL/ALLIANCE BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER, INCLUDING BUT NOT LIMITED TO CLAIMS ASSOCIATED WITH THE LOSS OF DATA OR PROFITS, THAT MAY RESULT FROM AN ACTION IN CONTRACT, NEGLIGENCE OR OTHER TORTIOUS CLAIM THAT ARISES OUT OF OR IN CONNECTION WITH THE ACCESS, USE OR PERFORMANCE OF THE DATA. 106 | >>>>>>> Stashed changes 107 | -------------------------------------------------------------------------------- /BuildingsBench.md: -------------------------------------------------------------------------------- 1 | # BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting 2 | 3 | ## Description 4 | 5 | The BuildingsBench datasets consist of: 6 | 7 | - Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock. 8 | - 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF. 9 | 10 | Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size 11 | for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) 12 | dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, 13 | manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning 14 | and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series 15 | per building, it is much smaller (~110 GB). 16 | 17 | BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside 18 | Buildings-900K, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB. 19 | 20 | ## Directory structure 21 | 22 | A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench. 23 | 24 | - `BuildingsBench/` 25 | - `Buildings-900K/end-use-load-profiles-for-us-building-stock/2021/`: The Buildings-900K pretraining and validation data. 26 | - `comstock_amy2018_release_1/` 27 | - `timeseries_individual_buildings/` 28 | - `by_puma_midwest` 29 | - `upgrade=0` 30 | - `puma={puma_id}/*.parquet` 31 | - `...` 32 | - `by_puma_northeast` 33 | - `by_puma_south` 34 | - `by_puma_west` 35 | - `weather/` 36 | - `amy2018/` 37 | - `{puma_id}_2018.csv` 38 | - ... 39 | - `metadata/` 40 | - `metadata.parquet` 41 | - `...`: Other datasets 42 | - `BDG-2/`: Building Data Genome Project 2 BuildingsBench evaluation data *with outliers removed*. 43 | - `{building_id}={year}.csv`: The .csv files for the BDG-2 dataset. 44 | - `...`: Other buildings 45 | - `...`: Other evaluation datasets (Borealis, Electricity, etc.) 46 | - `buildingsbench_with_outliers`: The BuildingsBench evaluation data *with outliers*. 47 | - `BDG-2/`: Buildings Data Genome Project 2 BuildingsBench evaluation data *with outliers*. 48 | - `{building_id}={year}.csv`: The .csv files for the BDG-2 dataset. 49 | - `...`: Other buildings 50 | - `...`: Other evaluation datasets (Borealis, Electricity, etc.) 51 | - `LICENSES/`: Licenses for each evaluation dataset redistributed in BuildingsBench. 52 | - `metadata/`: Metadata for the evaluation suite. 53 | - `benchmark.toml`: Metadata for the benchmark. For each dataset, we specify: 54 | - `building_type`: `residential` or `commercial`. 55 | - `latlon`: a List of two floats representing the location of the building(s). 56 | - `conus_location`: The name of the county or city in the U.S. where the building is located, or a county/city in the U.S. of similar climate to the building's true location. 57 | - `actual_location`: County/city where the building actually is located. This will be different from `conus_location` when the building is outside of the CONUS. These values are for book-keeping and can be set to dummy values. 58 | - `url`: The URL where the dataset was obtained from. 59 | - `building_years.txt`: List of .csv files included in the benchmark. Each line is of the form `{dataset}/{building_id}={year}.csv`. 60 | - `withheld_pumas.tsv`: List of PUMAs withheld from the training/validation set of Buildings-900K, which we use as synthetic test data. 61 | - `map_of_pumas_in_census_region*.csv`: Maps PUMA IDs to their geographical centroid (lat/lon). 62 | - `spatial_tract_lookup_table.csv`: Mapping between census tract identifiers and other geographies. 63 | - `list_oov.py`: Python script to generate a list of buildings that are OOV for the Buildings-900K tokenizer. 64 | - `oov.txt`: List of buildings that are OOV for the Buildings-900K tokenizer. 65 | - `transfer_learning_commercial_buildings.txt`: List of 100 commercial buildings from the benchmark we use for evaluating transfer learning. 66 | - `transfer_learning_residential_buildings.txt`: List of 100 residential buildings from the benchmark we use for evaluating transfer learning. 67 | - `transfer_learning_hyperparameter_tuning.txt`: List of 2 held out buildings (1 commercial, 1 residential) that can be used for hyperparameter tuning. 68 | - `train*.idx`: Index files for fast dataloading of Buildings-900K. This file uncompressed is 16GB. 69 | - `val*.idx`: Index files for fast dataloading of Buildings-900K. 70 | - `transforms`: Directory for storing data transform info. 71 | 72 | ## Data Format 73 | 74 | ### Parquet file format 75 | 76 | The pretraining dataset Buildings-900K is stored as a collection of PUMA-level parquet files. 77 | Each parquet file in Buildings-900K is stored in a directory named after a unique PUMA ID `puma={puma_id}/*.parquet`. The first column is the timestamp and 78 | each subsequent column is the energy consumption in kWh for a different building in that. These columns are named by building id. The timestamp is in the 79 | format `YYYY-MM-DD HH:MM:SS`. The energy consumption is in kWh. 80 | The parquet files are compressed with snappy. 81 | 82 | ### CSV file format 83 | 84 | Most CSV files in the benchmark are named `building_id=year.csv` and correspond to a single building's energy consumption time series. The first column is the 85 | timestamp (the Pandas index), and the second column is the energy consumption in kWh. The timestamp is in the format `YYYY-MM-DD HH:MM:SS`. The energy consumption is in kWh. 86 | 87 | Certain datasets have multiple buildings in a single file. In this case, the first column is the timestamp (the Pandas index), and each subsequent column is the energy 88 | consumption in kWh for a different building. These columns are named by building id. The timestamp is in the format `YYYY-MM-DD HH:MM:SS`. The energy consumption is in kWh. 89 | 90 | ## Code Examples 91 | For dataset quick start and other tutorials see the [BuildingsBench Github Tutorials](https://github.com/NREL/BuildingsBench/tree/main/tutorials). 92 | 93 | ## References 94 | 95 | - [NeurIPS paper](https://arxiv.org/abs/2307.00142) providing additional information on the datasets along with the analyses conducted with BuildingsBench. 96 | - [End-Use Load Profiles (EULP)](https://data.openei.org/submissions/4520) from which Buildings-900K is derived from. 97 | - Additional information about the parquet file format can be found [here](https://parquet.apache.org/). 98 | 99 | Users of the BuildingsBench data should please cite: 100 | - Emami, Patrick, Graf, Peter. BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting. United States: N.p., 31 Dec, 2018. Web. doi: 10.25984/1986147. 101 | 102 | -------------------------------------------------------------------------------- /ImperialValleyDarkFiber/DarkFiber.md: -------------------------------------------------------------------------------- 1 | # Imperial Valley Dark Fiber Project Continuous DAS Data 2 | 3 | ## Description 4 | 5 | Whereas permanent seismic networks are sparsely distributed, this dense array seismic data with gauge length 10 m and sampling rate 500 Hz 6 | was continuously acquired on a ~ 28 km long segment of unused fiber-optic cable buried in the ground for telecommunication using a novel 7 | technology called “Distributed Acoustic Sensing”. The objective is to demonstrate dark fiber DAS as a tool for basin-scale geothermal 8 | exploration and monitoring. 9 | 10 | The included DAS data were recorded during two days at the beginning the project. The study area, Imperial Valley in Southern California, 11 | is a sedimentary basin characterized by intense seismicity and faulting, high heat flow, and deformation over a broad area in a transtensional 12 | tectonic regime, hosts multiple producing geothermal fields, and is believed to host other hidden geothermal resources. In particular, the 13 | dark fiber DAS array passes close to the Brawley geothermal field, which was a hidden geothermal resource prior to its discovery. Therefore, 14 | the DAS dataset provides new, exciting research opportunities in the studies of seismology, tectonics, seismic exploration, and basin-scale 15 | geothermal exploration and monitoring. 16 | 17 | This dataset is continuous array seismic data acquired using distributed acoustic sensing (DAS) and can be used for all applications and methods 18 | in seismology that can use DAS data. This dataset can be used for analysis of ambient seismic noise and earthquakes in the Imperial Valley. This 19 | data can provide insights into sources of ambient seismic noise, subsurface seismic velocity structure, short-term spatiotemporal variations in 20 | the subsurface, source properties, detection and monitoring of earthquakes, and seismic-wave propagation in the Imperial Valley and similar 21 | sedimentary basins. 22 | 23 | ## Directory structure 24 | 25 | This dataset contains continuous raw DAS data acquired over a period of two days (November 12-13, 2020) at the beginning of the Imperial 26 | Valley Dark Fiber Project. It consists 2,880 files of approximately 400 MB each, each labeled with the start time naming scheme DF__UTC_YYYYMMDD_HHMMSS.SSS.h5. 27 | 28 | ## Data Format 29 | 30 | The DAS data files are 1 minute segments of strain rate in .hdf5 format. The technical specifications of the DAS data acquisition are - 6912 channels, 31 | 4 m channel spacing, 10 m gauge length, 500 Hz sample rate. Data in each hdf5 file in the dataset is a 2D array of dimensions 6912 channels x 30000 time 32 | samples and int16 datatype under the dataset name “Acoustic”. The number of attributes or headers is 83. The timestamp is in the header “GPSTimeStamp”. 33 | 34 | ## Code Examples 35 | For a tutorial of accessing and using this data please see the following link: 36 | - https://github.com/openEDI/documentation/blob/main/ImperialValleyDarkFiber/DarkFiber_Tutorial_Notebook.ipynb 37 | 38 | ## References 39 | 40 | For additional information about the objective of this data, users can reference the following article: 41 | - Ajo-Franklin, J, et al. The Imperial Valley Dark Fiber Project: Toward Seismic Studies Using DAS and Telecom Infrastructure for Geothermal Applications. United States. https://doi.org/10.1785/0220220072 42 | 43 | Additional information about the HDF5 file format can be found [here](https://support.hdfgroup.org/HDF5/doc/H5.format.html). 44 | 45 | Users of the Dark Fiber DAS data should use the following citation: 46 | - Ajo-Franklin, Jonathan, Dobson, Patrick, and Rodriguez Tribaldos, Veronica. Imperial Valley Dark Fiber Project Continuous DAS Data. 47 | United States: N.p., 10 Nov, 2020. Web. https://gdr.openei.org/submissions/1499. 48 | -------------------------------------------------------------------------------- /NREL_Building_Stock/Individual_Building_Data.md: -------------------------------------------------------------------------------- 1 | # Downloading individual building data files from ComStock results 2 | 3 | For many use cases, the goal is to aggregate the timeseries energy data 4 | from many buildings in a given region, building type, etc. The methods For 5 | doing that are documented in [TODO link to query document]. Those methods 6 | rely on AWS Athena because of the sheer size of the data to be aggregated. 7 | 8 | Other use cases may require access the individual building timeseries data. 9 | This document describes how that data is stored and how to access it. 10 | 11 | ## Requirements 12 | 13 | An AWS account is necessary to follow this tutorial. 14 | 15 | ## Data location 16 | 17 | The ComStock dataset is stored in a publicly-accessible Amazon S3 bucket. 18 | To access this bucket: 19 | 20 | 1. Login to AWS 21 | 2. Visit the [ComStock data bucket](https://s3.console.aws.amazon.com/s3/buckets/nrel-pds-building-stock?region=us-west-2&tab=objects) 22 | 23 | This should take you to an interface where you can browse through the directories 24 | and look through the data. 25 | 26 | ## Data organization 27 | 28 | ``` 29 | nrel-pds-building-stock 30 | ├── athena # root directory 31 | ├──── 2020 # year the dataset was published 32 | ├────── comstock_v1 # name of the dataset 33 | ├──────── metadata # building characteristics and annual energy data 34 | │ ├── fast1_metadata.parquet # read all files to get all buildings 35 | │ ├── fast2_metadata.parquet 36 | │ └── ... 37 | ├────────climate_zone # timeseries data, partitioned by climate zone 38 | │ ├── upgrade=0 39 | │ ├───── climate_zone=1A 40 | │ ├──────── 100022-0.parquet # buildingID-upgradeID 41 | │ ├──────── 100052-0.parquet 42 | │ ├──────── 10006-0.parquet 43 | │ └──────── ... 44 | ├────────state # timeseries data 45 | │ ├── upgrade=0 46 | │ ├───── state=01 # same timeseries data, partitioned by state 47 | │ ├──────── 100022-0.parquet 48 | │ ├──────── 100052-0.parquet 49 | │ ├──────── 10006-0.parquet 50 | └ └──────── ... 51 | ``` 52 | ## Finding specific buildings 53 | 54 | In order to find specific buildings, first the metadata files should be 55 | downloaded and parsed. The list of buildings can be filtered by various 56 | characteristics, and a list of building IDs can be generated. Either the 57 | state or climate_zone should be determined for each building ID. 58 | 59 | Once the list of building IDs and corresponding state or is ready, the 60 | individual files can be retrieved from either the `/state` or `/climate_zone` 61 | directories. The timeseries profiles in these directories are identical, they 62 | are duplicated for query optimization. The full URI to a file will look like: 63 | ``` 64 | s3://nrel-pds-building-stock/comstock/athena/2020/comstock_v1/state/upgrade=0/state=01/100094-0.parquet 65 | ``` 66 | ## Programatic download of files 67 | The files can be downloaded programatically using a Python library such as 68 | [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html), 69 | or by using a similar library in another programming language. 70 | -------------------------------------------------------------------------------- /NREL_Building_Stock/Query_ComStock_Athena.md: -------------------------------------------------------------------------------- 1 | # Running queries on ComStock using AWS Athena 2 | 3 | ComStock data can be accessed and downloaded using 4 | [AWS Athena](https://aws.amazon.com/athena/). 5 | Here we will show how to query the data for multiple building types, 6 | US regions, and upgrades. 7 | 8 | ## Requirements 9 | 10 | An AWS account is necessary to run the code in this tutorial. 11 | 12 | ## Setting up AWS Athena with ComStock data for querying 13 | 14 | First of all, we need to log into AWS and go to 15 | [AWS Athena Console](https://console.aws.amazon.com/athena/home). 16 | 17 | NOTE: To run these queries, go to the `Query Editor` tab of the Athena 18 | interface, copy and paste the query into the `New query` box, and click the 19 | `Run query` button. 20 | 21 | ### Setting up the database 22 | As a first step, we will create a database which the data we want to query 23 | will live in - if you already have a database you want to create the tables 24 | in, or are fine with the tables being created in a default database, skip 25 | this step. 26 | 27 | ```sql 28 | CREATE DATABASE comstock 29 | ``` 30 | 31 | ### Creating the metadata table 32 | Next we need to create the data tables that will be used to query the data. 33 | We will create two tables, one with building characteristics and annual energy 34 | data, and another with the time series energy data. 35 | 36 | First we will create the metadata table, which will contain building 37 | characteristics and annual energy data. This is query should take less than 1 38 | minute to run. 39 | 40 | In the SQL in the Athena `Query Editor` tab: 41 | 42 | ```sql 43 | CREATE EXTERNAL TABLE `comstock_v1_metadata`( 44 | `applicability` boolean, 45 | `bldg_id` bigint, 46 | `climate_zone` string, 47 | `in.aspect_ratio` double, 48 | `in.building_type` string, 49 | `in.climate_zone` string, 50 | `in.code_when_built` string, 51 | `in.cooling_fuel` string, 52 | `in.current_envelope_code` string, 53 | `in.current_exterior_lighting_code` string, 54 | `in.current_hvac_code` string, 55 | `in.current_interior_equipment_code` string, 56 | `in.current_interior_lighting_code` string, 57 | `in.floor_height` double, 58 | `in.heating_fuel` string, 59 | `in.hvac_delivery_type` string, 60 | `in.hvac_system_type` string, 61 | `in.number_of_stories` double, 62 | `in.rotation` double, 63 | `in.sqft` double, 64 | `in.water_systems_fuel` string, 65 | `in.weather_station` string, 66 | `in.weekday_opening_time` string, 67 | `in.weekday_operating_hours` string, 68 | `in.weekend_opening_time` string, 69 | `in.weekend_operating_hours` string, 70 | `out.electricity.cooling.energy_consumption` double, 71 | `out.electricity.cooling.energy_consumption_intensity` double, 72 | `out.electricity.cooling.energy_savings` double, 73 | `out.electricity.cooling.energy_savings_intensity` double, 74 | `out.electricity.exterior_lighting.energy_consumption` double, 75 | `out.electricity.exterior_lighting.energy_consumption_intensity` double, 76 | `out.electricity.exterior_lighting.energy_savings` double, 77 | `out.electricity.exterior_lighting.energy_savings_intensity` double, 78 | `out.electricity.fans.energy_consumption` double, 79 | `out.electricity.fans.energy_consumption_intensity` double, 80 | `out.electricity.fans.energy_savings` double, 81 | `out.electricity.fans.energy_savings_intensity` double, 82 | `out.electricity.heat_recovery.energy_consumption` double, 83 | `out.electricity.heat_recovery.energy_consumption_intensity` double, 84 | `out.electricity.heat_recovery.energy_savings` double, 85 | `out.electricity.heat_recovery.energy_savings_intensity` double, 86 | `out.electricity.heat_rejection.energy_consumption` double, 87 | `out.electricity.heat_rejection.energy_consumption_intensity` double, 88 | `out.electricity.heat_rejection.energy_savings` double, 89 | `out.electricity.heat_rejection.energy_savings_intensity` double, 90 | `out.electricity.heating.energy_consumption` double, 91 | `out.electricity.heating.energy_consumption_intensity` double, 92 | `out.electricity.heating.energy_savings` double, 93 | `out.electricity.heating.energy_savings_intensity` double, 94 | `out.electricity.humidification.energy_consumption` double, 95 | `out.electricity.humidification.energy_consumption_intensity` double, 96 | `out.electricity.humidification.energy_savings` double, 97 | `out.electricity.humidification.energy_savings_intensity` double, 98 | `out.electricity.interior_equipment.energy_consumption` double, 99 | `out.electricity.interior_equipment.energy_consumption_intensity` double, 100 | `out.electricity.interior_equipment.energy_savings` double, 101 | `out.electricity.interior_equipment.energy_savings_intensity` double, 102 | `out.electricity.interior_lighting.energy_consumption` double, 103 | `out.electricity.interior_lighting.energy_consumption_intensity` double, 104 | `out.electricity.interior_lighting.energy_savings` double, 105 | `out.electricity.interior_lighting.energy_savings_intensity` double, 106 | `out.electricity.peak_demand.energy_consumption` double, 107 | `out.electricity.peak_demand.energy_consumption_intensity` double, 108 | `out.electricity.peak_demand.energy_savings` double, 109 | `out.electricity.peak_demand.energy_savings_intensity` double, 110 | `out.electricity.pumps.energy_consumption` double, 111 | `out.electricity.pumps.energy_consumption_intensity` double, 112 | `out.electricity.pumps.energy_savings` double, 113 | `out.electricity.pumps.energy_savings_intensity` double, 114 | `out.electricity.refrigeration.energy_consumption` double, 115 | `out.electricity.refrigeration.energy_consumption_intensity` double, 116 | `out.electricity.refrigeration.energy_savings` double, 117 | `out.electricity.refrigeration.energy_savings_intensity` double, 118 | `out.electricity.total.energy_consumption` double, 119 | `out.electricity.total.energy_consumption_intensity` double, 120 | `out.electricity.total.energy_savings` double, 121 | `out.electricity.total.energy_savings_intensity` double, 122 | `out.electricity.water_systems.energy_consumption` double, 123 | `out.electricity.water_systems.energy_consumption_intensity` double, 124 | `out.electricity.water_systems.energy_savings` double, 125 | `out.electricity.water_systems.energy_savings_intensity` double, 126 | `out.natural_gas.cooling.energy_consumption` double, 127 | `out.natural_gas.cooling.energy_consumption_intensity` double, 128 | `out.natural_gas.cooling.energy_savings` double, 129 | `out.natural_gas.cooling.energy_savings_intensity` double, 130 | `out.natural_gas.heating.energy_consumption` double, 131 | `out.natural_gas.heating.energy_consumption_intensity` double, 132 | `out.natural_gas.heating.energy_savings` double, 133 | `out.natural_gas.heating.energy_savings_intensity` double, 134 | `out.natural_gas.interior_equipment.energy_consumption` double, 135 | `out.natural_gas.interior_equipment.energy_consumption_intensity` double, 136 | `out.natural_gas.interior_equipment.energy_savings` double, 137 | `out.natural_gas.interior_equipment.energy_savings_intensity` double, 138 | `out.natural_gas.total.energy_consumption` double, 139 | `out.natural_gas.total.energy_consumption_intensity` double, 140 | `out.natural_gas.total.energy_savings` double, 141 | `out.natural_gas.total.energy_savings_intensity` double, 142 | `out.natural_gas.water_systems.energy_consumption` double, 143 | `out.natural_gas.water_systems.energy_consumption_intensity` double, 144 | `out.natural_gas.water_systems.energy_savings` double, 145 | `out.natural_gas.water_systems.energy_savings_intensity` double, 146 | `out.other_fuel.heating.energy_consumption` double, 147 | `out.other_fuel.heating.energy_consumption_intensity` double, 148 | `out.other_fuel.heating.energy_savings` double, 149 | `out.other_fuel.heating.energy_savings_intensity` double, 150 | `out.other_fuel.interior_equipment.energy_consumption` double, 151 | `out.other_fuel.interior_equipment.energy_consumption_intensity` double, 152 | `out.other_fuel.interior_equipment.energy_savings` double, 153 | `out.other_fuel.interior_equipment.energy_savings_intensity` double, 154 | `out.other_fuel.total.energy_consumption` double, 155 | `out.other_fuel.total.energy_consumption_intensity` double, 156 | `out.other_fuel.total.energy_savings` double, 157 | `out.other_fuel.total.energy_savings_intensity` double, 158 | `out.other_fuel.water_systems.energy_consumption` double, 159 | `out.other_fuel.water_systems.energy_consumption_intensity` double, 160 | `out.other_fuel.water_systems.energy_savings` double, 161 | `out.other_fuel.water_systems.energy_savings_intensity` double, 162 | `out.site_energy.total.energy_consumption` double, 163 | `out.site_energy.total.energy_consumption_intensity` double, 164 | `out.site_energy.total.energy_savings` double, 165 | `out.site_energy.total.energy_savings_intensity` double, 166 | `state` string, 167 | `upgrade` bigint, 168 | `weight` double, 169 | `metadata_index` bigint, 170 | `in.applicable` boolean, 171 | `__index_level_0__` bigint) 172 | ROW FORMAT SERDE 173 | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 174 | WITH SERDEPROPERTIES ( 175 | 'parquet.column.index.access'='true') 176 | STORED AS INPUTFORMAT 177 | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 178 | OUTPUTFORMAT 179 | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 180 | LOCATION 181 | 's3://nrel-pds-building-stock/comstock/athena/2020/comstock_v1/metadata/' 182 | TBLPROPERTIES ( 183 | 'CrawlerSchemaDeserializerVersion'='1.0', 184 | 'CrawlerSchemaSerializerVersion'='1.0', 185 | 'UPDATED_BY_CRAWLER'='vizstock_oedi_comstock_v1_metadata', 186 | 'averageRecordSize'='170', 187 | 'classification'='parquet', 188 | 'compressionType'='none', 189 | 'objectCount'='10', 190 | 'recordCount'='30560553', 191 | 'sizeKey'='5293679830', 192 | 'typeOfData'='file') 193 | ``` 194 | 195 | The response from Athena when the command has successfully run should look 196 | like this: 197 | ![CreateMetadata](https://github.com/NREL/ComStock/blob/cbianchi/documentation/documentation/Screen%20Shot%202021-04-15%20at%208.44.36%20PM.png?raw=true) 198 | 199 | ### Creating the time series table 200 | 201 | Next we will create the timeseries table. This query can take anywhere from 1 202 | minute to approximately 14 hours to run, so we advise starting this before bed! 203 | The wide range depends on whether or not it has been run recently by someone 204 | else, in which case AWS appears to use a cache rather than redoing things. 205 | 206 | ```sql 207 | CREATE EXTERNAL TABLE `comstock_v1_state`( 208 | `timestamp` timestamp, 209 | `bldg_id` bigint, 210 | `out.electricity.cooling.energy_consumption` double, 211 | `out.electricity.cooling.energy_consumption_intensity` double, 212 | `out.electricity.cooling.energy_savings` double, 213 | `out.electricity.cooling.energy_savings_intensity` double, 214 | `out.electricity.exterior_lighting.energy_consumption` double, 215 | `out.electricity.exterior_lighting.energy_consumption_intensity` double, 216 | `out.electricity.exterior_lighting.energy_savings` double, 217 | `out.electricity.exterior_lighting.energy_savings_intensity` double, 218 | `out.electricity.fans.energy_consumption` double, 219 | `out.electricity.fans.energy_consumption_intensity` double, 220 | `out.electricity.fans.energy_savings` double, 221 | `out.electricity.fans.energy_savings_intensity` double, 222 | `out.electricity.heat_recovery.energy_consumption` double, 223 | `out.electricity.heat_recovery.energy_consumption_intensity` double, 224 | `out.electricity.heat_recovery.energy_savings` double, 225 | `out.electricity.heat_recovery.energy_savings_intensity` double, 226 | `out.electricity.heat_rejection.energy_consumption` double, 227 | `out.electricity.heat_rejection.energy_consumption_intensity` double, 228 | `out.electricity.heat_rejection.energy_savings` double, 229 | `out.electricity.heat_rejection.energy_savings_intensity` double, 230 | `out.electricity.heating.energy_consumption` double, 231 | `out.electricity.heating.energy_consumption_intensity` double, 232 | `out.electricity.heating.energy_savings` double, 233 | `out.electricity.heating.energy_savings_intensity` double, 234 | `out.electricity.humidification.energy_consumption` double, 235 | `out.electricity.humidification.energy_consumption_intensity` double, 236 | `out.electricity.humidification.energy_savings` double, 237 | `out.electricity.humidification.energy_savings_intensity` double, 238 | `out.electricity.interior_equipment.energy_consumption` double, 239 | `out.electricity.interior_equipment.energy_consumption_intensity` double, 240 | `out.electricity.interior_equipment.energy_savings` double, 241 | `out.electricity.interior_equipment.energy_savings_intensity` double, 242 | `out.electricity.interior_lighting.energy_consumption` double, 243 | `out.electricity.interior_lighting.energy_consumption_intensity` double, 244 | `out.electricity.interior_lighting.energy_savings` double, 245 | `out.electricity.interior_lighting.energy_savings_intensity` double, 246 | `out.electricity.peak_demand.energy_consumption` double, 247 | `out.electricity.peak_demand.energy_consumption_intensity` double, 248 | `out.electricity.peak_demand.energy_savings` double, 249 | `out.electricity.peak_demand.energy_savings_intensity` double, 250 | `out.electricity.pumps.energy_consumption` double, 251 | `out.electricity.pumps.energy_consumption_intensity` double, 252 | `out.electricity.pumps.energy_savings` double, 253 | `out.electricity.pumps.energy_savings_intensity` double, 254 | `out.electricity.refrigeration.energy_consumption` double, 255 | `out.electricity.refrigeration.energy_consumption_intensity` double, 256 | `out.electricity.refrigeration.energy_savings` double, 257 | `out.electricity.refrigeration.energy_savings_intensity` double, 258 | `out.electricity.total.energy_consumption` double, 259 | `out.electricity.total.energy_consumption_intensity` double, 260 | `out.electricity.total.energy_savings` double, 261 | `out.electricity.total.energy_savings_intensity` double, 262 | `out.electricity.water_systems.energy_consumption` double, 263 | `out.electricity.water_systems.energy_consumption_intensity` double, 264 | `out.electricity.water_systems.energy_savings` double, 265 | `out.electricity.water_systems.energy_savings_intensity` double, 266 | `out.natural_gas.cooling.energy_consumption` double, 267 | `out.natural_gas.cooling.energy_consumption_intensity` double, 268 | `out.natural_gas.cooling.energy_savings` double, 269 | `out.natural_gas.cooling.energy_savings_intensity` double, 270 | `out.natural_gas.heating.energy_consumption` double, 271 | `out.natural_gas.heating.energy_consumption_intensity` double, 272 | `out.natural_gas.heating.energy_savings` double, 273 | `out.natural_gas.heating.energy_savings_intensity` double, 274 | `out.natural_gas.interior_equipment.energy_consumption` double, 275 | `out.natural_gas.interior_equipment.energy_consumption_intensity` double, 276 | `out.natural_gas.interior_equipment.energy_savings` double, 277 | `out.natural_gas.interior_equipment.energy_savings_intensity` double, 278 | `out.natural_gas.total.energy_consumption` double, 279 | `out.natural_gas.total.energy_consumption_intensity` double, 280 | `out.natural_gas.total.energy_savings` double, 281 | `out.natural_gas.total.energy_savings_intensity` double, 282 | `out.natural_gas.water_systems.energy_consumption` double, 283 | `out.natural_gas.water_systems.energy_consumption_intensity` double, 284 | `out.natural_gas.water_systems.energy_savings` double, 285 | `out.natural_gas.water_systems.energy_savings_intensity` double, 286 | `out.other_fuel.heating.energy_consumption` double, 287 | `out.other_fuel.heating.energy_consumption_intensity` double, 288 | `out.other_fuel.heating.energy_savings` double, 289 | `out.other_fuel.heating.energy_savings_intensity` double, 290 | `out.other_fuel.interior_equipment.energy_consumption` double, 291 | `out.other_fuel.interior_equipment.energy_consumption_intensity` double, 292 | `out.other_fuel.interior_equipment.energy_savings` double, 293 | `out.other_fuel.interior_equipment.energy_savings_intensity` double, 294 | `out.other_fuel.total.energy_consumption` double, 295 | `out.other_fuel.total.energy_consumption_intensity` double, 296 | `out.other_fuel.total.energy_savings` double, 297 | `out.other_fuel.total.energy_savings_intensity` double, 298 | `out.other_fuel.water_systems.energy_consumption` double, 299 | `out.other_fuel.water_systems.energy_consumption_intensity` double, 300 | `out.other_fuel.water_systems.energy_savings` double, 301 | `out.other_fuel.water_systems.energy_savings_intensity` double, 302 | `out.site_energy.total.energy_consumption` double, 303 | `out.site_energy.total.energy_consumption_intensity` double, 304 | `out.site_energy.total.energy_savings` double, 305 | `out.site_energy.total.energy_savings_intensity` double) 306 | PARTITIONED BY ( 307 | `upgrade` bigint, 308 | `state` string) 309 | ROW FORMAT SERDE 310 | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 311 | WITH SERDEPROPERTIES ( 312 | 'parquet.column.index.access'='true') 313 | STORED AS INPUTFORMAT 314 | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 315 | OUTPUTFORMAT 316 | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 317 | LOCATION 318 | 's3://nrel-pds-building-stock/comstock/athena/2020/comstock_v1/state/' 319 | TBLPROPERTIES ( 320 | 'CrawlerSchemaDeserializerVersion'='1.0', 321 | 'CrawlerSchemaSerializerVersion'='1.0', 322 | 'UPDATED_BY_CRAWLER'='vizstock_oedi_comstock_v1_state', 323 | 'averageRecordSize'='11', 324 | 'classification'='parquet', 325 | 'compressionType'='none', 326 | 'objectCount'='30499408', 327 | 'recordCount'='1068697644480', 328 | 'sizeKey'='107871145911082', 329 | 'typeOfData'='file') 330 | ``` 331 | 332 | You will likely be logged out of AWS while waiting for this to complete. To 333 | check on if the query has completed click on the **History** tab next in the 334 | Athena GUI and look for the query that begins 335 | `CREATE EXTERNAL TABLE comstock_v1_state(` 336 | \- when the query updates from running to completed you will know you are ready 337 | to proceed to the next step. 338 | 339 | ![CreateTable](https://github.com/NREL/ComStock/blob/cbianchi/documentation/documentation/Screen%20Shot%202021-04-15%20at%2011.03.03%20AM.png?raw=true) 340 | 341 | ### Establishing partitions on the time series table 342 | 343 | We use partitions to make our queries against the time series table quicker 344 | and more cost efficient. By splitting up the table by `state` and `upgrade` 345 | average query time decreases by ~48X and cost by even more. This works by 346 | having Athena mark which folders in S3 contain data relevant to each of the 347 | possible values for `state` or `upgrade`, and then only access those folders 348 | when the relevant value is requested. Guaranteeing that the partitions are all 349 | correctly instantiated one last query, which should take between one and four 350 | hours to complete: 351 | 352 | ```sql 353 | MSCK REPAIR TABLE comstock_v1_state 354 | ``` 355 | 356 | Once again, you may need to use the **History** tab in the Athena GUI to check 357 | when this command completes. 358 | 359 | ## Running queries against ComStock 360 | 361 | Now that the tables have been created, we can query the data through Athena. 362 | 363 | ### Core example query 364 | 365 | Let's start with an example of a query that uses the metadata table and time 366 | series table to retrieve hourly data from a subset of the building simulations. 367 | In this example we'll get all available end-uses for a Medium Office in 368 | Wisconsin for the baseline case (no upgrade has been applied). 369 | 370 | ```sql 371 | SELECT 372 | sum( 373 | "comstock_v1_state"."out.electricity.cooling.energy_consumption" * "comstock_v1_metadata"."weight" 374 | ) AS "out.electricity.cooling.energy_consumption", 375 | sum( 376 | "comstock_v1_state"."out.electricity.exterior_lighting.energy_consumption" * "comstock_v1_metadata"."weight" 377 | ) AS "out.electricity.exterior_lighting.energy_consumption", 378 | sum( 379 | "comstock_v1_state"."out.electricity.fans.energy_consumption" * "comstock_v1_metadata"."weight" 380 | ) AS "out.electricity.fans.energy_consumption", 381 | sum( 382 | "comstock_v1_state"."out.electricity.heat_recovery.energy_consumption" * "comstock_v1_metadata"."weight" 383 | ) AS "out.electricity.heat_recovery.energy_consumption", 384 | sum( 385 | "comstock_v1_state"."out.electricity.heat_rejection.energy_consumption" * "comstock_v1_metadata"."weight" 386 | ) AS "out.electricity.heat_rejection.energy_consumption", 387 | sum( 388 | "comstock_v1_state"."out.electricity.heating.energy_consumption" * "comstock_v1_metadata"."weight" 389 | ) AS "out.electricity.heating.energy_consumption", 390 | sum( 391 | "comstock_v1_state"."out.electricity.humidification.energy_consumption" * "comstock_v1_metadata"."weight" 392 | ) AS "out.electricity.humidification.energy_consumption", 393 | sum( 394 | "comstock_v1_state"."out.electricity.interior_equipment.energy_consumption" * "comstock_v1_metadata"."weight" 395 | ) AS "out.electricity.interior_equipment.energy_consumption", 396 | sum( 397 | "comstock_v1_state"."out.electricity.interior_lighting.energy_consumption" * "comstock_v1_metadata"."weight" 398 | ) AS "out.electricity.interior_lighting.energy_consumption", 399 | sum( 400 | "comstock_v1_state"."out.electricity.peak_demand.energy_consumption" * "comstock_v1_metadata"."weight" 401 | ) AS "out.electricity.peak_demand.energy_consumption", 402 | sum( 403 | "comstock_v1_state"."out.electricity.pumps.energy_consumption" * "comstock_v1_metadata"."weight" 404 | ) AS "out.electricity.pumps.energy_consumption", 405 | sum( 406 | "comstock_v1_state"."out.electricity.refrigeration.energy_consumption" * "comstock_v1_metadata"."weight" 407 | ) AS "out.electricity.refrigeration.energy_consumption", 408 | sum( 409 | "comstock_v1_state"."out.electricity.total.energy_consumption" * "comstock_v1_metadata"."weight" 410 | ) AS "out.electricity.total.energy_consumption", 411 | sum( 412 | "comstock_v1_state"."out.electricity.water_systems.energy_consumption" * "comstock_v1_metadata"."weight" 413 | ) AS "out.electricity.water_systems.energy_consumption", 414 | sum( 415 | "comstock_v1_state"."out.natural_gas.cooling.energy_consumption" * "comstock_v1_metadata"."weight" 416 | ) AS "out.natural_gas.cooling.energy_consumption", 417 | sum( 418 | "comstock_v1_state"."out.natural_gas.heating.energy_consumption" * "comstock_v1_metadata"."weight" 419 | ) AS "out.natural_gas.heating.energy_consumption", 420 | sum( 421 | "comstock_v1_state"."out.natural_gas.interior_equipment.energy_consumption" * "comstock_v1_metadata"."weight" 422 | ) AS "out.natural_gas.interior_equipment.energy_consumption", 423 | sum( 424 | "comstock_v1_state"."out.natural_gas.total.energy_consumption" * "comstock_v1_metadata"."weight" 425 | ) AS "out.natural_gas.total.energy_consumption", 426 | sum( 427 | "comstock_v1_state"."out.natural_gas.water_systems.energy_consumption" * "comstock_v1_metadata"."weight" 428 | ) AS "out.natural_gas.water_systems.energy_consumption", 429 | sum( 430 | "comstock_v1_state"."out.site_energy.total.energy_consumption" * "comstock_v1_metadata"."weight" 431 | ) AS "out.site_energy.total.energy_consumption", 432 | "comstock_v1_metadata"."state" AS "location", 433 | "comstock_v1_state"."upgrade" AS "comstock_v1_state_upgrade", 434 | month( 435 | "comstock_v1_state"."timestamp" 436 | ) AS "month", 437 | day( 438 | "comstock_v1_state"."timestamp" 439 | ) AS "day", 440 | hour( 441 | "comstock_v1_state"."timestamp" 442 | ) AS "hour" 443 | FROM 444 | "comstock_v1_state", 445 | "comstock_v1_metadata" 446 | WHERE 447 | "comstock_v1_state"."bldg_id" = "comstock_v1_metadata"."bldg_id" 448 | AND "comstock_v1_state"."state" IN ('55') 449 | AND "comstock_v1_state"."upgrade" = 0 450 | AND "comstock_v1_metadata"."upgrade" = 0 451 | AND "comstock_v1_metadata"."in.building_type" = 'MediumOffice' 452 | GROUP BY 453 | "comstock_v1_metadata"."state", 454 | "comstock_v1_state"."upgrade", 455 | month( 456 | "comstock_v1_state"."timestamp" 457 | ), 458 | day( 459 | "comstock_v1_state"."timestamp" 460 | ), 461 | hour( 462 | "comstock_v1_state"."timestamp" 463 | ) 464 | ORDER BY 465 | "comstock_v1_metadata"."state", 466 | "comstock_v1_state"."upgrade", 467 | month( 468 | "comstock_v1_state"."timestamp" 469 | ), 470 | day( 471 | "comstock_v1_state"."timestamp" 472 | ), 473 | hour( 474 | "comstock_v1_state"."timestamp" 475 | ) 476 | 477 | ``` 478 | 479 | ![QuerySimple](https://github.com/NREL/ComStock/blob/cbianchi/documentation/documentation/Screen%20Shot%202021-04-15%20at%208.55.34%20PM.png?raw=true) 480 | 481 | To download the returned data as a CSV file click the file icon highlighted in 482 | red in the screenshot above. 483 | 484 | Now let's demonstrate so ways of changing up the query: 485 | 486 | ### How to query data for other building types? 487 | 488 | All we have to do for this is change the following line in the `WHERE` clause 489 | in the example query: 490 | 491 | ```sql 492 | AND "comstock_v1_metadata"."in.building_type" = 'MediumOffice' 493 | ``` 494 | 495 | Instead of 'MediumOffice' we can use one any of the following building types: 496 | 497 | 'MediumOffice', 'LargeOffice', 'SecondarySchool', 'Hospital','Outpatient' 498 | 499 | ### How to query other states? 500 | 501 | All we have to do for this is change the following line in the `WHERE` clause 502 | in the example query: 503 | 504 | ```sql 505 | AND "comstock_v1_state"."state" IN ('55') 506 | ``` 507 | 508 | Instead of '55' (Wisconsin) we can use the FIPS code for another state, as 509 | listed here: 510 | https://www.nrcs.usda.gov/wps/portal/nrcs/detail/?cid=nrcs143_013696 511 | 512 | ### How to query an upgrade? 513 | 514 | All we have to do for this is change the following lines in the `WHERE` clause 515 | in the example query: 516 | 517 | ```sql 518 | AND "comstock_v1_state"."upgrade" = 0 519 | AND "comstock_v1_metadata"."upgrade" = 0 520 | ``` 521 | 522 | Instead of '0' we can use one of the following options: 523 | ``` 524 | 0: Baseline, 525 | 1: Upgrade Roof Insulation (R-19) 526 | 2: Upgrade Roof Insulation (R-30) 527 | 3: Upgrade Wall Insulation (R-13) 528 | 4: Upgrade Wall Insulation (R-30) 529 | 7: Add Window Film 530 | 8: Add Cool Roof 531 | 9: Add EIFS Wall Insulation 532 | 10: Add Electrochromic Windows (BleachedGlass) 533 | 11: Add Electrochromic Windows (TintedGlass) 534 | 12: Kitchen Exhaust Fan DCV 535 | 13: Upgrade Boiler (AFUE-81) 536 | 14: Upgrade Boiler (AFUE-83) 537 | 15: Upgrade Boiler (AFUE-94) 538 | 17: Upgrade Chiller (efficient) 539 | 18: Add Demand Control Ventilation 540 | 19: Add Economizer 541 | 20: Upgrade Furnace (AFUE-81) 542 | 21: Upgrade Furnace (AFUE-92) 543 | 22: Upgrade Furnace (AFUE-98) 544 | 23: Add Heat Recovery 545 | 24: Upgrade Motors 546 | 25: Add PTAC Controls 547 | 26: Upgrade Packaged Terminal Heat Pump (Code) 548 | 27: Upgrade Packaged Terminal Heat Pump (Efficient) 549 | 28: Upgrade Packaged Terminal Heat Pump (Highly Efficient) 550 | 29: Upgrade Packaged Terminal Air Conditioner (Code) 551 | 30: Upgrade Packaged Terminal Air Conditioner (Efficient) 552 | 31: Upgrade Packaged Terminal Air Conditioner (Highly Efficient) 553 | 32: Add VFD To Pumps 554 | 33: Upgrade RTU Air Source Heat Pump (IEER-13.3) 555 | 34: Upgrade RTU Air Source Heat Pump (IEER-15.0) 556 | 35: Upgrade RTU Air Source Heat Pump (IEER-16.5) 557 | 36: Upgrade RTU DX Air Conditioner (IEER-14.0) 558 | 37: Upgrade RTU DX Air Conditioner (IEER-15.5) 559 | 38: Upgrade RTU DX Air Conditioner (IEER-17.0) 560 | 39: Upgrade Split System DX Air Conditioner (SEER 14) 561 | 40: Upgrade Split System DX Air Conditioner (SEER 16) 562 | 41: Upgrade Split System DX Air Conditioner (SEER 18) 563 | 42: Upgrade Split System DX Air Conditioner (SEER 20) 564 | 45: Add Advanced Hybrid RTUs 565 | 46: Upgrade Air Filters 566 | 47: Add Brushless DC Compressor Motors 567 | 48: Reset Chilled Water Supply Temperature 568 | 49: Close Outdoor Air Dampers During Unoccupied Hours 569 | 50: Add Cold Climate Heat Pumps 570 | 51: Upgrade Duct Routing 571 | 52: Add Exhaust Fan Interlock 572 | 53: Reset Hot Water Temperature 573 | 55: Add Predictive Thermostats 574 | 56: Reset Supply Air 575 | 57: Add Thermoelastic Heat Pumps 576 | 58: Add Variable Speed Cooling Tower 577 | 61: Adjust Thermostat Setpoints 578 | 62: Upgrade Compact Lights 579 | 63: Add Lighting Occupancy Controls 580 | 64: Add Daylighting Controls 581 | 65: Upgrade High Bay Lights 582 | 66: Upgrade Linear Lights 583 | 67: Upgrade Outdoor Lights 584 | 68: Upgrade Specialty Lights 585 | 69: PC Power Management (Screen Saver) 586 | 70: PC Power Management (Desktop) 587 | 71: PC Power Management (Display) 588 | 72: PC Virtualization (Laptop) 589 | 73: PC Virtualization (Thin Client) 590 | 74: Advanced Power Strips (Power Strips) 591 | 75: Advanced Power Strips (Controllable Power Outlets) 592 | 76: Upgrade SWH Electric Storage Water Heater (Eff=0.88) 593 | 77: Upgrade SWH Electric Storage Water Heater (Eff=0.93) 594 | 78: Upgrade to Instantaneous Gas Water Heater 595 | 79: Upgrade SWH Gas Storage Water Heater (Eff=0.67) 596 | 80: Upgrade SWH Gas Storage Water Heater (Eff=0.70) 597 | 81: Upgrade SWH Gas Storage Water Heater (Eff=0.82) 598 | 82: Upgrade to Heat Pump Water Heater 599 | 83: Add Floating Heat Pressure Control 600 | 84: Add Refrigerated Walk-In Doorway Protection (Strip Curtain) 601 | 85: Add Refrigerated Walk-In Doorway Protection (Automatic Door Closer) 602 | 86: Add Refrigerated Walk-In Doorway Protection (Automatic Door Closer and Strip Curtain) 603 | 87: Upgrade Refrigerated Walk-In Motor (PSC) 604 | 88: Upgrade Refrigerated Walk-In Motor (ECM) 605 | ``` 606 | 607 | ### How to change time resolution? 608 | 609 | 1. For changing the resolution to 15 minutes, all we have to do is modifying 610 | the lines just before the `FROM` clause in the example query above as 611 | follows: 612 | 613 | ```sql 614 | month( 615 | "comstock_v1_state"."timestamp" 616 | ) AS "month", 617 | day( 618 | "comstock_v1_state"."timestamp" 619 | ) AS "day", 620 | hour( 621 | "comstock_v1_state"."timestamp" 622 | ) AS "hour" 623 | minute( 624 | "comstock_v1_state"."timestamp" 625 | ) AS "minute" 626 | FROM 627 | ``` 628 | 629 | And then in the `GROUP BY` clause: 630 | 631 | ```sql 632 | GROUP BY 633 | "comstock_v1_metadata"."state", 634 | "comstock_v1_state"."upgrade", 635 | month( 636 | "comstock_v1_state"."timestamp" 637 | ), 638 | day( 639 | "comstock_v1_state"."timestamp" 640 | ), 641 | hour( 642 | "comstock_v1_state"."timestamp" 643 | ) 644 | minute( 645 | "comstock_v1_state"."timestamp" 646 | ) 647 | ``` 648 | And finally in the `ORDER BY` clause: 649 | ```sql 650 | ORDER BY 651 | "comstock_v1_metadata"."state", 652 | "comstock_v1_state"."upgrade", 653 | month( 654 | "comstock_v1_state"."timestamp" 655 | ), 656 | day( 657 | "comstock_v1_state"."timestamp" 658 | ), 659 | hour( 660 | "comstock_v1_state"."timestamp" 661 | ) 662 | minute( 663 | "comstock_v1_state"."timestamp" 664 | ) 665 | ``` 666 | 667 | 2. For changing the resolution to 1 day, all we have to do is modifying the 668 | lines just before the `FROM` clause in the example query above as follows: 669 | 670 | ```sql 671 | month( 672 | "comstock_v1_state"."timestamp" 673 | ) AS "month", 674 | day( 675 | "comstock_v1_state"."timestamp" 676 | ) AS "day" 677 | FROM 678 | ``` 679 | 680 | And then in the `GROUP BY` clause: 681 | 682 | ```sql 683 | GROUP BY 684 | "comstock_v1_metadata"."state", 685 | "comstock_v1_state"."upgrade", 686 | month( 687 | "comstock_v1_state"."timestamp" 688 | ), 689 | day( 690 | "comstock_v1_state"."timestamp" 691 | ) 692 | ``` 693 | And finally in the `ORDER BY` clause: 694 | ```sql 695 | ORDER BY 696 | "comstock_v1_metadata"."state", 697 | "comstock_v1_state"."upgrade", 698 | month( 699 | "comstock_v1_state"."timestamp" 700 | ), 701 | day( 702 | "comstock_v1_state"."timestamp" 703 | ) 704 | ``` 705 | 706 | 3. For changing the resolution to 1 month, all we have to do is modifying the 707 | lines just before the `FROM` clause in the example query above as follows: 708 | 709 | ```sql 710 | month( 711 | "comstock_v1_state"."timestamp" 712 | ) AS "month" 713 | FROM 714 | ``` 715 | 716 | And then in the `GROUP BY` clause: 717 | 718 | ```sql 719 | GROUP BY 720 | "comstock_v1_metadata"."state", 721 | "comstock_v1_state"."upgrade", 722 | month( 723 | "comstock_v1_state"."timestamp" 724 | ) 725 | ``` 726 | And finally in the `ORDER BY` clause: 727 | ```sql 728 | ORDER BY 729 | "comstock_v1_metadata"."state", 730 | "comstock_v1_state"."upgrade", 731 | month( 732 | "comstock_v1_state"."timestamp" 733 | ) 734 | ``` 735 | 736 | ### How to average data rather than sums? 737 | 738 | All we have to do for this is change the modify part in everythome it appears 739 | in the `SELECT` clause: 740 | 741 | ```sql 742 | sum( 743 | ``` 744 | 745 | It has to be changed with: 746 | ```sql 747 | avg( 748 | ``` 749 | 750 | ### How to preview the table to see all available columns? 751 | 752 | All we have to do for this is change the whole the example query above with 753 | what follows: 754 | 755 | ```sql 756 | SELECT * 757 | FROM 758 | "comstock_v1_state", 759 | "comstock_v1_metadata" 760 | WHERE 761 | "comstock_v1_state"."bldg_id" = "comstock_v1_metadata"."bldg_id" 762 | AND "comstock_v1_state"."state" IN ('55') 763 | AND "comstock_v1_state"."upgrade" = 0 764 | AND "comstock_v1_metadata"."upgrade" = 0 765 | AND "comstock_v1_metadata"."in.building_type" = 'MediumOffice' 766 | limit 10 767 | ``` 768 | 769 | # From here on follow the example above: 770 | 771 | In each case say which block of code we're doing the update in - i.e. the 772 | group / order by clause or the where clause 773 | 774 | things to include: 775 | 776 | - How to filter by another variable - i.e. sqft or hvac system type 777 | - How to get all the allowable values for a metadata filter (using select 778 | unique(col) from comstock_metadata_v0) 779 | 780 | DONE 781 | - How to query multiple states / building types 782 | - How to query an upgrade 783 | - How to change resolution - give changes for 15 min, daily, and monthly 784 | - How to get average instead of sum by altering the select clauses 785 | - How to preview the table to see what columns are available using select * 786 | from blah limit 10 787 | -------------------------------------------------------------------------------- /NSO.md: -------------------------------------------------------------------------------- 1 | # High-Resolution Wind and Structural Loads Data measured on Parabolic Trough Solar Collectors at Nevada Solar One (NSO) 2 | 3 | ## Description 4 | 5 | 9 | 10 | This data set characterizes the complex wind conditions and resulting structural loads on full-scale, operational parabolic trough collectors. 11 | Over two years, NREL conducted comprehensive field measurements of the atmospheric turbulent wind conditions and the resulting structural wind loads on the parabolic troughs at the Nevada Solar One plant. The measurement set-up included meteorological masts and structural load sensors on four trough rows. 12 | Additionally, we commissioned a lidar, scanning the horizontal plane over the trough field. 13 | 14 | Wind loading is a main contributor to structural design costs of Concentrating Solar Power (CSP) collectors, such as heliostats and parabolic troughs. These structures must resist the mechanical forces generated by turbulent wind. At the same time, the reflector surfaces must exhibit the necessary rigidity to maintain their optimal optical performance in windy conditions. 15 | Studying wind-driven loads at a full-scale, fully operational CSP plant will provide insights into the wind impact on the solar collector fields, which currently is beyond the capabilities of wind tunnel tests or state-of-the-art simulations. 16 | 17 | By providing this first-of-its-kind data set to the CSP community, we aim to enhance the community's understanding of wind-loading experienced by CSP collector structures. 18 | The data set might be used, for example, to verify simulations or for comparisons to wind tunnel tests. 19 | 20 | ## Directory structure 21 | 22 | 24 | 25 | The directory structure is: 26 | 27 | NSO/
28 |   {}/           data set type: inflow_mast_1min, inflow_mast_20Hz, wake_masts_1min, wake_masts_20Hz, loads_1min, loads_20Hz, or lidar
29 |     year={}/       year
30 |      month={}/     month
31 |       day={}/       day
32 | 33 | 34 | The structure of the filenames is: 35 | 36 | Type_YYYY-MM-DD_00h_to_YYYY-MM-DD_00h.parquet
37 | YYYY-MM-DD: date of the daily file that contains data from 00:00 UTC to 24:00 UTC (lidar data contain one scan shorter than a day, so the lidar file name also contains hour and minute)
38 | Type: Inflow_mast_20Hz, Inflow_mast_1min, Wake_masts_20Hz, Wake_masts_1min, Loads_20Hz, Loads_1min, or Lidar
39 | 40 | 41 | Examples: 42 | 43 | NSO/inflow_mast_20Hz/year=2022/month=02/day=03/Inflow_Mast_20Hz_2022-02-03_00h_to_2022-02-03_23h.parquet
44 | NSO/lidar/year=2023/month=04/day=02/Lidar_2023-04-02_19-00-17_to_2023-04-02_19-00-17.parquet 45 | 46 | 47 | ## Data Format 48 | 49 | 52 | 53 | Data are stored in the Parquet file format. The variables and units in each dataset are listed in [1]. 54 | 55 | ## Code Examples 56 | 57 | 58 | 59 | Example Python scripts to read the data are provided at the OEDI data lake. 60 | 61 | ## References 62 | 63 | [1] Egerer, U., Dana, S., Jager, D. et al. Wind and structural loads data measured on parabolic trough solar collectors at an operational power plant. Sci Data 11, 98 (2024). https://doi.org/10.1038/s41597-023-02896-4 64 | -------------------------------------------------------------------------------- /NSRDB.md: -------------------------------------------------------------------------------- 1 | # Solar Resource Data: National Solar Radiation Database (NSRDB) 2 | 3 | ## NSRDB 4 | 5 | The National Solar Radiation Database (NSRDB) is a serially complete collection 6 | of meteorological and solar irradiance data sets for the United States and a 7 | growing list of international locations for 1998-2017. The NSRDB provides 8 | foundational information to support U.S. Department of Energy programs, 9 | research, and the general public. 10 | 11 | The NSRDB provides time-series data at 30 minute resolution of resource 12 | averaged over surface cells of 0.038 degrees in both latitude and longitude, 13 | or nominally 4 km in size. The solar radiation values represent the resource 14 | available to solar energy systems. The data was created using cloud properties 15 | which are generated using the AVHRR Pathfinder Atmospheres-Extended (PATMOS-x) 16 | algorithms developed by the University of Wisconsin. Fast all-sky radiation 17 | model for solar applications (FARMS) in conjunction with the cloud properties, 18 | and aerosol optical depth (AOD) and precipitable water vapor (PWV) from 19 | ancillary source are used to estimate solar irradiance (GHI, DNI, and DHI). 20 | The Global Horizontal Irradiance (GHI) is computed for clear skies using the 21 | REST2 model. For cloud scenes identified by the cloud mask, FARMS is used to 22 | compute GHI. The Direct Normal Irradiance (DNI) for cloud scenes is then 23 | computed using the DISC model. The PATMOS-X model uses half-hourly radiance 24 | images in visible and infrared channels from the GOES series of geostationary 25 | weather satellites. Ancillary variables needed to run REST2 and FARMS (e.g., 26 | aerosol optical depth, precipitable water vapor, and albedo) are derived from 27 | the the Modern Era-Retrospective Analysis (MERRA-2) dataset. Temperature and 28 | wind speed data are also derived from MERRA-2 and provided for use in SAM to 29 | compute PV generation. 30 | 31 | The following variables are provided by the NSRDB: 32 | - Irradiance: 33 | - Global Horizontal (ghi) 34 | - Direct Normal (dni) 35 | - Diffuse (dhi) 36 | - Clear-sky Irradiance 37 | - Cloud Type 38 | - Dew Point 39 | - Temperature 40 | - Surface Albedo 41 | - Pressure 42 | - Relative Humidity 43 | - Solar Zenith Angle 44 | - Precipitable Water 45 | - Wind Direction 46 | - Wind Speed 47 | - Fill Flag 48 | - Angstrom wavelength exponent (alpha) 49 | - Aerosol optical depth (aod) 50 | - Aerosol asymmetry parameter (asymmetry) 51 | - Cloud optical depth (cld_opd_dcomp) 52 | - Cloud effective radius (cld_ref_dcomp) 53 | - cloud_press_acha 54 | - Reduced ozone vertical pathlength (ozone) 55 | - Aerosol single-scatter albedo (ssa) 56 | 57 | 58 | ## Directory structure 59 | 60 | Solar resource data is made available as a series of .h5 files corresponding to 61 | each year and can be found at s3://nrel-pds-nsrdb/v3/nsrdb_${year}.h5 62 | 63 | The NSRDB data is also available via HSDS at /nrel/nsrdb/nsrdb_${year}.h5 64 | 65 | For examples on setting up and using HSDS please see our [examples repository](https://github.com/nrel/hsds-examples) 66 | 67 | ## Data Format 68 | 69 | The data is provided in high density data file (.h5) separated by year. The 70 | variables mentioned above are provided in 2 dimensional time-series arrays with 71 | dimensions (time x location). The temporal axis is defined by the `time_index` 72 | dataset, while the positional axis is defined by the `meta` dataset. For 73 | storage efficiency each variable has been scaled and stored as an integer. The 74 | scale-factor is provided in the 'psm_scale-factor' attribute. The units for 75 | the variable data is also provided as an attribute (`psm_units`). 76 | 77 | ## Python Examples 78 | 79 | Example scripts to extract solar resource data using python are provided below: 80 | 81 | The easiest way to access and extract data from the Resource eXtraction tool 82 | [`rex`](https://github.com/nrel/rex) 83 | 84 | ```python 85 | from rex import NSRDBX 86 | 87 | nsrdb_file = '/nrel/nsrdb/nsrdb_2010.h5' 88 | with NSRDBX(nsrdb_file, hsds=True) as f: 89 | meta = f.meta 90 | time_index = f.time_index 91 | dni = f['dni'] 92 | ``` 93 | 94 | `rex` also allows easy extraction of the nearest site to a desired (lat, lon) 95 | location: 96 | 97 | ```python 98 | from rex import NSRDBX 99 | 100 | nsrdb_file = '/nrel/nsrdb/nsrdb_2010.h5' 101 | nrel = (39.741931, -105.169891) 102 | with NSRDBX(nsrdb_file, hsds=True) as f: 103 | nrel_dni = f.get_lat_lon_df('dni', nrel) 104 | ``` 105 | 106 | or to extract all sites in a given region: 107 | 108 | ```python 109 | from rex import NSRDBX 110 | 111 | nsrdb_file = '/nrel/nsrdb/nsrdb_2010.h5' 112 | state='Colorado' 113 | with NSRDBX(nsrdb_file, hsds=True) as f: 114 | co_dni = f.get_region_df('dni', state, region_col='state') 115 | ``` 116 | 117 | Lastly, `rex` can be used to extract all variables needed to run SAM at a given 118 | location: 119 | 120 | ```python 121 | from rex import NSRDBX 122 | 123 | nsrdb_file = '/nrel/nsrdb/nsrdb_2010.h5' 124 | nrel = (39.741931, -105.169891) 125 | with NSRDBX(nsrdb_file, hsds=True) as f: 126 | nrel_sam_vars = f.get_SAM_df(nwtc) 127 | ``` 128 | 129 | If you would rather access the NSRDB data directly using h5pyd: 130 | 131 | ```python 132 | # Extract the average direct normal irradiance (dni) 133 | import h5pyd 134 | import pandas as pd 135 | 136 | # Open .h5 file 137 | with h5pyd.File('/nrel/nsrdb/nsrdb_2010.h5', mode='r') as f: 138 | # Extract meta data and convert from records array to DataFrame 139 | meta = pd.DataFrame(f['meta'][...]) 140 | # dni dataset 141 | dni= f['dni'] 142 | # Extract scale factor 143 | scale_factor = dni.attrs['psm_scale_factor'] 144 | # Extract, average, and un-scale dni 145 | mean_dni= dni[...].mean(axis=0) / scale_factor 146 | 147 | # Add mean windspeed to meta data 148 | meta['Average DNI'] = mean_dni 149 | ``` 150 | 151 | ```python 152 | # Extract time-series data for a single site 153 | import h5pyd 154 | import pandas as pd 155 | 156 | # Open .h5 file 157 | with h5pyd.File('/nrel/nsrdb/nsrdb_2010.h5', mode='r') as f: 158 | # Extract time_index and convert to datetime 159 | # NOTE: time_index is saved as byte-strings and must be decoded 160 | time_index = pd.to_datetime(f['time_index'][...].astype(str)) 161 | # Initialize DataFrame to store time-series data 162 | time_series = pd.DataFrame(index=time_index) 163 | # Extract variables needed to compute generation from SAM: 164 | for var in ['dni', 'dhi', 'air_temperature', 'wind_speed']: 165 | # Get dataset 166 | ds = f[var] 167 | # Extract scale factor 168 | scale_factor = ds.attrs['psm_scale_factor'] 169 | # Extract site 100 and add to DataFrame 170 | time_series[var] = ds[:, 100] / scale_factor 171 | ``` 172 | 173 | ## References 174 | 175 | For more information about the NSRDB please see the 176 | [website](https://nsrdb.nrel.gov/) 177 | Users of the NSRDB should please cite: 178 | - [Sengupta, M., Y. Xie, A. Lopez, A. Habte, G. Maclaurin, and J. Shelby. 2018. "The National Solar Radiation Data Base (NSRDB)." Renewable and Sustainable Energy Reviews 89 (June): 51-60.](https://www.sciencedirect.com/science/article/pii/S136403211830087X?via%3Dihub) -------------------------------------------------------------------------------- /PVROOFTOPS.md: -------------------------------------------------------------------------------- 1 | # PV Rooftops 2 | 3 | ## Description 4 | 5 | The National Renewable Energy Laboratory's (NREL) PV Rooftop Database (PVRDB) is a lidar-derived, geospatially-resolved dataset of suitable roof surfaces and their PV technical potential for 128 metropolitan regions in the United States. The source lidar data and building footprints were obtained by the U.S. Department of Homeland Security Homeland Security Infrastructure Program for 2006-2014. Using GIS methods, NREL identified suitable roof surfaces based on their size, orientation, and shading parameters Gagnon et al. (2016). Standard 2015 technical potential was then estimated for each plane using NREL's System Advisory Model. 6 | 7 | The PVRDB is down-loadable by city and year of lidar collection. Five geospatial layers are available for each city and year: 1) the raster extent of the lidar collection, 2) buildings identified from the lidar data, 3) suitable developable planes for each building, 4) aspect values of the developable planes, and 5) the technical potential estimates of the developable planes. 8 | 9 | ## Data Format 10 | 11 | The PV Rooftops dataset is provided in Parquet format partitioned by city. There are 4 core datasets stored in S3 partitioned by region(city)-year for downloads of single cities or to allow city-specific queries or queries across the dataset using Glue/Athena: 12 | 13 | /aspects 14 | field | data_type | description 15 | -- | -- | -- 16 | `gid` | bigint |   17 | `city` | string | city of source lidar dataset 18 | `state` | string | state of source lidar dataset 19 | `year` | bigint | year of source lidar dataset 20 | `bldg_fid` | bigint | building id 21 | `aspect` | bigint | aspect value 22 | `the_geom_96703` | string | projected geometry ([US Contiguous Albers Equal Area Conic - SRID 6703](https://spatialreference.org/ref/sr-org/6703/)) 23 | `the_geom_4326` | string | geometry ([WGS 1984 - SRID 4326](https://spatialreference.org/ref/epsg/4326/)) 24 | `region_id` | bigint |   25 | 26 | 27 | /buildings 28 | 29 | field | data_type | description 30 | -- | -- | -- 31 | `gid` | bigint |   32 | `bldg_fid` | bigint | the building fid 33 | `the_geom_96703` | string | projected geometry ([US Contiguous Albers Equal Area Conic - SRID 6703](https://spatialreference.org/ref/sr-org/6703/)) 34 | `the_geom_4326` | string | geometry ([WGS 1984 - SRID 4326](https://spatialreference.org/ref/epsg/4326/)) 35 | `city` | string | the city of the source lidar data 36 | `state` | string | the state of the source lidar data 37 | `year` | bigint | the year of the source lidar data 38 | `region_id` | bigint |   39 | 40 | 41 | /developable_planes 42 | 43 | field | data_type | description 44 | -- | -- | -- 45 | `bldg_fid` | bigint | building ID associated with the developable plane 46 | `footprint_m2` | double | developable plane footprint area (m2) 47 | `slope` | bigint | slope value 48 | `flatarea_m2` | double | flat area of the developable plane (m2) 49 | `slopeconversion` | double | the slope conversion factor used to convert the flat area into the sloped area 50 | `slopearea_m2` | double | sloped area of the developable plane (m2) 51 | `zip` | string | zipcode 52 | `zip_perc` | double |   53 | `aspect` | bigint | the aspect value of the developable plane 54 | `gid` | bigint | unique developable plane ID 55 | `city` | string | the city of the source lidar data 56 | `state` | string | the state of the source lidar data 57 | `year` | bigint | the year of the source lidar data 58 | `region_id` | bigint |   59 | `the_geom_96703` | string | projected geometry ([US Contiguous Albers Equal Area Conic - SRID 6703](https://spatialreference.org/ref/sr-org/6703/)) 60 | `the_geom_4326` | string | geometry ([WGS 1984 - SRID 4326](https://spatialreference.org/ref/epsg/4326/)) 61 | 62 | 63 | /rasd 64 | 65 | field | data_type | description 66 | -- | -- | -- 67 | `gid` | bigint | the unique geographic ID of the raster domain 68 | `the_geom_96703` | string | projected geometry ([US Contiguous Albers Equal Area Conic - SRID 6703](https://spatialreference.org/ref/sr-org/6703/)) 69 | `the_geom_4326` | string | geometry ([WGS 1984 - SRID 4326](https://spatialreference.org/ref/epsg/4326/)) 70 | `city` | string | the city of the source lidar data 71 | `state` | string | the state of the source lidar data 72 | `year` | bigint | the year of the source lidar data 73 | `region_id` | bigint |   74 | `serial_id` | bigint |   75 | `__index_level_0__` | bigint |   76 | 77 | 78 | Within each core dataset there are paritions by city_state_year(YY) that can be queried directly via Athena or PrestoDB with relatively quick response times, or downloaded as a Parquet format data file. 79 | 80 | Aspects Lookup: 81 | ``` 82 | 1 337.5 - 22.5 north 83 | 2 22.5 - 67.5 northeast 84 | 3 67.5 - 112.5 east 85 | 4 112.5 - 157.5 southeast 86 | 5 157.5 - 202.5 south 87 | 6 202.5 - 247.5 southwest 88 | 7 247.5 - 292.5 west 89 | 8 292.5 - 337.5 northwest 90 | 0 flat flat 91 | ``` 92 | 93 | Regions Lookup: 94 | ``` 95 | 1 Albany NY 2006-01-01 96 | 2 Albany NY 2013-01-01 97 | 3 Albuquerque NM 2006-01-01 98 | 4 Albuquerque NM 2012-01-01 99 | 5 Allentown PA 2006-01-01 100 | 6 Amarillo TX 2008-01-01 101 | 7 Anaheim CA 2010-01-01 102 | 8 Arnold MO 2006-01-01 103 | 9 Atlanta GA 2008-01-01 104 | 10 Atlanta GA 2013-01-01 105 | 11 Augusta GA 2010-01-01 106 | 12 Augusta ME 2008-01-01 107 | 13 Austin TX 2006-01-01 108 | 14 Austin TX 2012-01-01 109 | 15 Bakersfield CA 2010-01-01 110 | 16 Baltimore MD 2008-01-01 111 | 17 Baltimore MD 2013-01-01 112 | 18 Baton Rouge LA 2006-01-01 113 | 19 Baton Rouge LA 2012-01-01 114 | 20 Birmingham AL 2008-01-01 115 | 21 Bismarck ND 2008-01-01 116 | 22 Boise ID 2007-01-01 117 | 23 Boise ID 2013-01-01 118 | 24 Boulder CO 2014-01-01 119 | 25 Bridgeport CT 2006-01-01 120 | 26 Bridgeport CT 2013-01-01 121 | 27 Buffalo NY 2008-01-01 122 | 28 Carson City NV 2009-01-01 123 | 29 Charleston SC 2010-01-01 124 | 30 Charleston WV 2009-01-01 125 | 31 Charlotte NC 2006-01-01 126 | 32 Charlotte NC 2012-01-01 127 | 33 Cheyenne WY 2008-01-01 128 | 34 Chicago IL 2008-01-01 129 | 35 Chicago IL 2012-01-01 130 | 36 Cincinnati OH 2010-01-01 131 | 37 Cleveland OH 2012-01-01 132 | 38 Colorado Springs CO 2006-01-01 133 | 39 Colorado Springs CO 2013-01-01 134 | 40 Columbia SC 2009-01-01 135 | 41 Columbus GA 2009-01-01 136 | 42 Columbus OH 2006-01-01 137 | 43 Columbus OH 2012-01-01 138 | 44 Concord NH 2009-01-01 139 | 45 Corpus Christi TX 2012-01-01 140 | 46 Dayton OH 2006-01-01 141 | 47 Dayton OH 2012-01-01 142 | 48 Denver CO 2012-01-01 143 | 49 Des Moines IA 2010-01-01 144 | 50 Detroit MI 2012-01-01 145 | 51 Dover DE 2009-01-01 146 | 52 El Paso TX 2007-01-01 147 | 53 Flint MI 2009-01-01 148 | 54 Fort Wayne IN 2008-01-01 149 | 55 Frankfort KY 2012-01-01 150 | 56 Fresno CA 2006-01-01 151 | 57 Fresno CA 2013-01-01 152 | 58 Ft Belvoir DC 2012-01-01 153 | 59 Grand Rapids MI 2013-01-01 154 | 60 Greensboro NC 2009-01-01 155 | 61 Harrisburg PA 2009-01-01 156 | 62 Hartford CT 2006-01-01 157 | 63 Hartford CT 2013-01-01 158 | 64 Helena MT 2007-01-01 159 | 65 Helena MT 2013-01-01 160 | 66 Houston TX 2010-01-01 161 | 67 Huntsville AL 2009-01-01 162 | 68 Indianapolis IN 2006-01-01 163 | 69 Indianapolis IN 2012-01-01 164 | 70 Jackson MS 2007-01-01 165 | 71 Jacksonville FL 2010-01-01 166 | 72 Jefferson City MO 2008-01-01 167 | 73 Kansas City MO 2010-01-01 168 | 74 Kansas City MO 2012-01-01 169 | 75 LaGuardia JFK NY 2012-01-01 170 | 76 Lancaster PA 2010-01-01 171 | 77 Lansing MI 2007-01-01 172 | 78 Lansing MI 2013-01-01 173 | 79 Las Vegas NV 2009-01-01 174 | 80 Lexington KY 2012-01-01 175 | 81 Lincoln NE 2008-01-01 176 | 82 Little Rock AR 2008-01-01 177 | 83 Los Angeles CA 2007-01-01 178 | 84 Louisville KY 2006-01-01 179 | 85 Louisville KY 2012-01-01 180 | 86 Lubbock TX 2008-01-01 181 | 87 Madison WI 2010-01-01 182 | 88 Manhattan NY 2007-01-01 183 | 89 McAllen TX 2008-01-01 184 | 90 Miami FL 2009-01-01 185 | 91 Milwaukee WI 2007-01-01 186 | 92 Milwaukee WI 2013-01-01 187 | 93 Minneapolis MN 2007-01-01 188 | 94 Minneapolis MN 2012-01-01 189 | 95 Mission Viejo CA 2013-01-01 190 | 96 Mobile AL 2010-01-01 191 | 97 Modesto CA 2010-01-01 192 | 98 Montgomery AL 2007-01-01 193 | 99 Montpelier VT 2009-01-01 194 | 100 Newark NJ 2007-01-01 195 | 101 New Haven CT 2007-01-01 196 | 102 New Haven CT 2013-01-01 197 | 103 New Orleans LA 2008-01-01 198 | 104 New Orleans LA 2012-01-01 199 | 105 New York NY 2005-01-01 200 | 106 New York NY 2013-01-01 201 | 107 Norfolk VA 2007-01-01 202 | 108 Oklahoma City OK 2007-01-01 203 | 109 Oklahoma City OK 2013-01-01 204 | 110 Olympia WA 2010-01-01 205 | 111 Omaha NE 2007-01-01 206 | 112 Omaha NE 2013-01-01 207 | 113 Orlando FL 2009-01-01 208 | 114 Oxnard CA 2010-01-01 209 | 115 Palm Bay FL 2010-01-01 210 | 116 Pensacola FL 2009-01-01 211 | 117 Philadelphia PA 2007-01-01 212 | 118 Pierre SD 2008-01-01 213 | 119 Pittsburgh PA 2004-01-01 214 | 120 Pittsburgh PA 2012-01-01 215 | 121 Portland OR 2012-01-01 216 | 122 Poughkeepsie NY 2012-01-01 217 | 123 Providence RI 2004-01-01 218 | 124 Providence RI 2012-01-01 219 | 125 Raleigh-Durham NC 2010-01-01 220 | 126 Reno NV 2007-01-01 221 | 127 Richmond VA 2008-01-01 222 | 128 Richmond VA 2013-01-01 223 | 129 Rochester NY 2008-01-01 224 | 130 Rochester NY 2014-01-01 225 | 131 Sacramento CA 2012-01-01 226 | 132 Salem OR 2008-01-01 227 | 133 Salt Lake City UT 2012-01-01 228 | 134 San Antonio TX 2008-01-01 229 | 135 San Antonio TX 2013-01-01 230 | 137 San Diego CA 2008-01-01 231 | 138 San Diego CA 2013-01-01 232 | 139 San Francisco CA 2013-01-01 233 | 140 Santa Fe NM 2009-01-01 234 | 141 Sarasota FL 2009-01-01 235 | 142 Scranton PA 2008-01-01 236 | 143 Seattle WA 2011-01-01 237 | 144 Shreveport LA 2008-01-01 238 | 145 Spokane WA 2008-01-01 239 | 146 Springfield IL 2009-01-01 240 | 147 Springfield MA 2007-01-01 241 | 148 Springfield MA 2013-01-01 242 | 149 St Louis MO 2008-01-01 243 | 150 St Louis MO 2013-01-01 244 | 151 Stockton CA 2010-01-01 245 | 152 Syracuse NY 2008-01-01 246 | 153 Tallahassee FL 2009-01-01 247 | 154 Tampa FL 2008-01-01 248 | 155 Toledo OH 2006-01-01 249 | 156 Toledo OH 2012-01-01 250 | 157 Topeka KS 2008-01-01 251 | 158 Trenton NJ 2008-01-01 252 | 159 Tucson AZ 2007-01-01 253 | 160 Tulsa OK 2008-01-01 254 | 161 Washington DC 2009-01-01 255 | 162 Washington DC 2012-01-01 256 | 163 Wichita KS 2012-01-01 257 | 164 Winston-Salem NC 2009-01-01 258 | 165 Worcester MA 2009-01-01 259 | 166 Youngstown OH 2008-01-01 260 | 167 Andrews AFB DC 2012-01-01 261 | 136 San Bernardino-Riverside CA 2012-01-01 262 | 168 Tampa FL 2013-01-01 263 | ``` 264 | 265 | ## Model 266 | 267 | Coming Soon: Details on the PV Suitability Model. 268 | 269 | ## Directory structure 270 | 271 | The Tracking the Sun Dataset is made available in Parquet format on AWS in S3. The four main datasets are stored in individual folders, and each partition is stored in an individual subfolder within each directory. 272 | 273 | - `s3://oedi_pv_rooftops/` 274 | 275 | Main datasets 276 | /aspects 277 | /buildings 278 | /developable_planes 279 | /rasd 280 | 281 | Partitions 282 | /city_state_year i.e. (/dover_de_09) 283 | 284 | 285 | ## Python Examples 286 | 287 | ```python 288 | 289 | import pandas as pd 290 | from pyathena import connect 291 | 292 | conn = connect( 293 | s3_staging_dir='s3:///', ##user defined staging directory 294 | region_name='us-west-2', 295 | work_group='' ##specify workgroup if exists 296 | ) 297 | 298 | df = pd.read_sql("SELECT * FROM oedi.pv_rooftops_developable_planes limit 8;",conn) 299 | ``` 300 | 301 | For jupyter notebook example see our notebook which includes partitions and data dictionary: [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples) 302 | 303 | 304 | ## References 305 | 306 | Main References: 307 | 1. [Rooftop Solar Photovoltaic Technical Potential in the United States: A Detailed Assessment](https://www.nrel.gov/docs/fy16osti/65298.pdf) 308 | 309 | 2. [Using GIS-based methods and lidar data to estimate rooftop solar technical potential in US cities](https://iopscience.iop.org/article/10.1088/1748-9326/aa7225/pdf) 310 | 311 | 3. [Estimating rooftop solar technical potential across the US using a combination of GIS-based methods, lidar data, and statistical modeling](https://iopscience.iop.org/article/10.1088/1748-9326/aaa554/pdf) 312 | 313 | 4. [Rooftop Photovoltaic Technical Potential in the United States](https://data.nrel.gov/submissions/121) 314 | 315 | 5. [U.S. PV-Suitable Rooftop Resources](https://data.nrel.gov/submissions/47) 316 | 317 | Related Reference: 318 | 319 | 1. [Rooftop Solar Technical Potential for Low-to-Moderate Income Households in the United States](https://www.nrel.gov/docs/fy18osti/70901.pdf) 320 | 321 | 2. [Rooftop Energy Potential of Low Income Communities in America REPLICA](https://data.nrel.gov/submissions/81) 322 | 323 | 3. [Puerto Rico Solar-for-All: LMI PV Rooftop Technical Potential and Solar Savings Potential](https://data.nrel.gov/submissions/144) 324 | 325 | 326 | ## Disclaimer and Attribution 327 | 328 | Copyright (c) 2020, Alliance for Sustainable Energy LLC, All rights reserved. 329 | 330 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 331 | 332 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 333 | 334 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 335 | 336 | * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 337 | 338 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 339 | -------------------------------------------------------------------------------- /PVROOFTOPS_PR.md: -------------------------------------------------------------------------------- 1 | # PV Rooftop Database - Puerto Rico (PVRDB-PR) 2 | 3 | 4 | ## Description 5 | 6 | The National Renewable Energy Laboratory's (NREL) PV Rooftop Database for Puerto Rico (PVRDB-PR) is a lidar-derived, geospatially-resolved dataset of suitable roof surfaces and their PV technical potential for virtually all buildings in Puerto Rico. The source lidar data were obtained from 2017 3cm NASA G-LiGHT and 2015-2016 <0.35m USGS 3DEP data. The building footprints were obtained from Open Street Map Buildings and were collected in 2018. Using GIS methods, NREL identified suitable roof surfaces based on their size, orientation, and shading parameters ([Gagnon et al. 2016](https://www.nrel.gov/docs/fy16osti/65298.pdf)). Standard 2020 technical potential was then estimated for each plane using the NREL [System Advisory Model's (SAM) PVWatts v5 model](https://developer.nrel.gov/docs/solar/pvwatts/v5/) and solar irradiance from the NREL [National Solar Radiation Database PSM3 for 2017](https://nsrdb.nrel.gov/). 7 | 8 | The PVRDB-PR is downloadable by county for Puerto Rico. The data consists of PV suitable roof surfaces (aka "developable planes") for 96% of buildings in Puerto Rico. The developable planes contain orientation, area, spatial geometries, and technical potential details of each developable plane suitable for rooftop solar in Puerto Rico. 9 | 10 | The PV Rooftops dataset is provided in Parquet format partitioned by county name. The dataset can be queried directly via Athena or PrestoDB with relatively quick response time, or downloaded as a Parquet format data file from S3 (`s3://oedi-data-lake/pv-rooftop-pr/developable-planes`). 11 | 12 | 13 | ## Data Dictionary 14 | 15 | Column | Type | Description 16 | -- | -- | -- 17 | `devp_gid` | bigint | The developable plane geographic ID (**unique/primary-key**). 18 | `bldg_plane_id` | integer | The plane ID unique to a given building (`bldg_fid`). 19 | `bldg_fid` | integer | The building feature ID (**foreign key**). The `bldg_fid` can be tagged to the HOTOSM building `fid` column to obtain the building feature and geometry. 20 | `bg_geoid` | text | The census block group GEOID. The `bg_geoid` is 12 characters long. The first 2 numbers represent the State FIPS, the next 3 are County FIPS, the next 6 are the Tract FIPS and the last 1 is the Block Group FIPS. 21 | `azimuth` | numeric | Plane azimuth angle (degrees). 0 values represent flat roofs. 22 | `tilt` | numeric | The panel sloped tilt (degrees). For flat roofs (azimuth=0), we assume a 15-degree panel tilt. For tilted roofs, we use the tilt of the roof plane. 23 | `flat_m2` | double precision | Flat "bird's-eye" area of the developable plane surface (meters-squared). 24 | `slope_degrees` | numeric | Slope of the developable roof plane in degrees. 25 | `slope_m2` | double precision | Slope area of the developable place surface (meters-squared). 26 | `pitchmultiplier` | numeric | This is the multiplier value used to calculate the slope area from the flat roof area. 27 | `pct_shading` | numeric | The average diffuse shading value (%) from 4 representative days (summer and winter solstices, autumnal and vernal equinoxes). 28 | `area_derate_factor` | numeric | This is the area derate factor used to identify the PV module area from the total developable plane area. Options include 0.70 for flat roofs and 0.98 for tilted roofs. 29 | `dc_ac_ratio` | numeric | The DC to AC ratio. We used the PVWatts default value of 1.2. 30 | `array_type` | integer | The PVWatts array type. We use Fixed-Tilt Rooftop PV (value=1) for all simulations. 31 | `losses` | numeric | Total system losses (%). We used the PVWatts default of 14.08% for all simulations. 32 | `module_type` | integer | The PV module type used in the PVWatts estimation. We used the "Standard" module type (value=0) for all simulations. 33 | `inverter_efficiency` | integer | Inverter efficiency at rated power (%). We used the SAM PVWatts default of 96% for all simulations. 34 | `annual_kwh` | double precision | Annual generation (kWh) potential of each array. 35 | `the_geom_4326` | USER-DEFINED | Geometry in WGS84 (SRID=4326) 36 | `the_geom_32620` | USER-DEFINED | Projected Geometry (SRID=32620) 37 | `county` | text | Census county name 38 | 39 | ### Azimuth Coded Values 40 | 41 | The azimuth values in PVRDB-PR are coded values which represent the median value of a range of azimuth degrees. The table below decodes the azimuth values with the value range and the ordinal direction. 42 | 43 | Azimuth | Azimuth Value Range | Ordinal Direction 44 | -- | -- | -- 45 | 0 | Flat | Flat 46 | 45 | 22.5-67.5 | NE 47 | 90 | 67.5-112.5 | E 48 | 135 | 112.5-157.5 | SE 49 | 180 | 157.5-202.5 | S 50 | 225 | 202.5-247.5 | SW 51 | 270 | 247.5-292.5 | W 52 | 315 | 292.5-237.5 | NW 53 | 360 | 337.5-360; 0-22.5 | N 54 | 55 | ## Methods and Assumptions 56 | 57 | The PVRDB data uses similar GIS methods as [PVRDB](https://registry.opendata.aws/nrel-oedi-pv-rooftops/) as described in [Gagnon et al. (2016)](https://www.nrel.gov/docs/fy16osti/65298.pdf). Unlike previous data versions, PVRDB-PR technical potential estimates are based on NREL’S PV Rooftop Model v2.0 updated assumptions, such as: 58 | - Power Density is assumed to be 182 W/m2 (compared to the 160 W/m2 assumption in the 2016 U.S. study). 59 | - North facing planes are not excluded for PR, however, they were excluded from the 2016 U.S. study. In PR, 13% of the rooftop PV technical potential is a N, NE, NW facing plane. 60 | - Minimum size requirement for a developable plane is set to 1.62 m2, which is the average size required for 1 250-Watt solar panel. A building is considered suitable if it meets the other criteria and it has at least one plane large enough for a solar panel. In the 2016 U.S. Study, a suitable building needed a minimum 10m2. If we applied the same >= 10m2 assumption to Puerto Rico, generation would be reduced by ~4 TWh for the total building potential (all residential buildings). 61 | - The shading assumption in this PR assessment was updated to apply percent shading directly at the developable plane level into the System Advisor Model (SAM) when calculating generation potential. For the U.S. assessment, % shading was used to screen potential planes, but it was not used directly at the plane level when processed in SAM to get the generation; instead, the SAM default of 3% was applied. This new approach is more accurate than previous estimates, but it results in a lower kWh/kW estimate for Puerto Rico compared to the U.S. 62 | 63 | ### Data Sources 64 | 65 | 1. [NASA G-LiHT](https://gliht.gsfc.nasa.gov/index.php?section=49): 3cm LiDAR data collected in spring of 2017. This data only has partial PR island coverage. 66 | 2. [USGS 3DEP LiDAR](https://registry.opendata.aws/usgs-lidar/): <0.35 m nominal resolution LiDAR data collected in 2015 as part of the 2016 Commonwealth of Puerto Rico Project Lidar survey (UUID: `{C2C7A2AF-8228-4C10-8756-BA971DD63953}`). This data was used to fill in LiDAR coverage after G-LiHT data. 67 | 3. [OpenStreetMap HOT export of building footprints](https://data.humdata.org/dataset/hotosm_pri_buildings): builing footprint data used to identify buildings from the LiDAR data. HOTOSM export on 10/01/2018. 68 | 69 | ### Assumptions for Building Suitability 70 | 71 | Requirement | Description 72 | -- | -- 73 | Shading | Measured shading for four seasons and required an average of 80% unshaded surface 74 | Azimuth | All possible azimuths 75 | Tilt | Average surface tilt <= 60 degrees 76 | Minimum Area | >= 1.62 m2 (area required for a single solar panel) 77 | 78 | ### Assumptions for PV Performance Simulations 79 | 80 | PV System Characteristics | Value for Flat Roofs | Value for Tilted Roofs 81 | -- | -- | -- 82 | Tilt | 15 degrees | Tilt of plane 83 | Ratio of module area to suitable roof area | 0.7 | 0.98 84 | Azimuth | 180 degrees (south facing) | Midpoint of azimuth class 85 | Module Power Density | 183 W/m2 86 | Total system losses | Varies (SAM defaults + individual surface % shading) 87 | Inverter efficiency | 96% 88 | DC-to-AC ratio | 1.2 89 | 90 | ## Python Connection Examples 91 | 92 | Athena data connection using PyAthena: 93 | ```python 94 | 95 | import pandas as pd 96 | from pyathena import connect 97 | 98 | conn = connect( 99 | s3_staging_dir='s3:///tracking-the-sun', ##user defined staging directory 100 | region_name='us-west-2', 101 | work_group='' ##specify workgroup if exists 102 | ) 103 | ``` 104 | 105 | Example #1: Querying with a limit: 106 | ```python 107 | df = pd.read_sql("SELECT * FROM oedi.pvrdb_pr_developable_planes limit 8;", conn) 108 | ``` 109 | 110 | Example #2: Querying for a specific county name: 111 | ```python 112 | df = pd.read_sql("SELECT * FROM oedi.pvrdb_pr_developable_planes WHERE county = 'San Juan' limit 8;", conn) 113 | ``` 114 | 115 | Example #3: Querying for a specific county FIPS (FIPS=127): 116 | ```python 117 | df = pd.read_sql("SELECT * FROM oedi.pvrdb_pr_developable_planes WHERE geoid LIKE '72127%' limit 8;", conn) 118 | ``` 119 | 120 | For jupyter notebook example see our notebook which includes partitions and data dictionary: 121 | [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples) 122 | 123 | ## Related Links: 124 | 125 | 1. [Puerto Rico Solar-for-All: LMI PV Rooftop Technical Potential and Solar Savings Potential](https://data.nrel.gov/submissions/144) 126 | 2. [Solar-For-All Interactive Web Map](https://maps.nrel.gov/solar-for-all/) 127 | 3. [PVRDB](https://registry.opendata.aws/nrel-oedi-pv-rooftops/) 128 | 4. [Rooftop Solar Photovoltaic Technical Potential in the United States: A Detailed Assessment](https://www.nrel.gov/docs/fy16osti/65298.pdf) 129 | 5. [Using GIS-based methods and lidar data to estimate rooftop solar technical potential in US cities](https://iopscience.iop.org/article/10.1088/1748-9326/aa7225/pdf) 130 | 6. [Estimating rooftop solar technical potential across the US using a combination of GIS-based methods, lidar data, and statistical modeling](https://iopscience.iop.org/article/10.1088/1748-9326/aaa554/pdf) 131 | 7. [Rooftop Photovoltaic Technical Potential in the United States](https://data.nrel.gov/submissions/121) 132 | 8. [U.S. PV-Suitable Rooftop Resources](https://data.nrel.gov/submissions/47) 133 | 9. [Rooftop Solar Technical Potential for Low-to-Moderate Income Households in the United States](https://www.nrel.gov/docs/fy18osti/70901.pdf) 134 | 10. [Rooftop Energy Potential of Low Income Communities in America REPLICA](https://data.nrel.gov/submissions/81) 135 | 11. [PVWattsV5 Documentation](https://pvwatts.nrel.gov/downloads/pvwattsv5.pdf) 136 | 137 | 138 | ## Disclaimer and Attribution 139 | 140 | Copyright (c) 2020, Alliance for Sustainable Energy LLC, All rights reserved. 141 | 142 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 143 | 144 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 145 | 146 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 147 | 148 | * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 149 | 150 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /PoroTomo/PoroTomo.md: -------------------------------------------------------------------------------- 1 | # PoroTomo 2 | 3 | The data were collected during March 2016 at the PoroTomo Natural Laboratory 4 | in Brady's Hot Springs, Nevada. 5 | 6 | Silixa’s iDAS (TM) was used for DAS data acquisition with 1.021 m channel 7 | spacing and a guage length of 10 m. Data in this archive are in .sgy and .h5 8 | files with raw units (radians of optical phase change per time sample) 9 | (Miller, et al.). 10 | 11 | The files in this dataset use SEG-y-rev1 (see below for documentation). The 12 | data are also available in .h5 or HDF5 file format. 13 | 14 | Horizontal DAS (DASH) data collection began 3/8/2016, paused, and then started 15 | again on 3/11/2016 and ended 3/26/2016 using zigzag trenched fiber optic 16 | cabels. Vertical DAS (DASV) data collection began 3/17/2016 and ended 3/28/2016 17 | using a fiber optic cable through the first 363 m of a vertical well. 18 | 19 | The resampled DASH data are Matlab files with data for the surface DAS (DASH) 20 | array deployed at the PoroTomo natural laboratory in March 2016. Each file 21 | contains 30 seconds worth of data for 8721 channels. These files have been 22 | resampled in time from the original data and have a sample rate of 100 23 | samples/second. 24 | 25 | The nodal seismometer data consists of continuous and windowed 26 | (to vibroseis sweep) SAC files. Nodal data collection began between 3/6/2016 27 | and 3/11/2016 depending on the station, and ended between 3/26/2016 and 28 | 3/28/2016 also depending on the station. Station names, locations, start times, 29 | stop times, and orientations can be found in the nodal seismometer metadata 30 | linked below. 31 | 32 | ## DAS 33 | 34 | ### Directory structure 35 | 36 | The PoroTomo DAS data is available on AWS S3: s3://nrel-pds-porotomo/DAS/. 37 | The data is available in three formats: 38 | 39 | #### SEG-Y Format: 40 | 41 | Files are stored in daily directories labeled using the format YYYYMMDD. 42 | Individual files names include the date and time appended at the end in the 43 | format YYMMDDHHMMSS. Each file respresents 30 s of data. 44 | 45 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASH: 46 | - size: ~1-2 GB each 47 | - shape: 8721 traces x 30000 samples/trace 48 | 49 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASH/Resampled: 50 | - format: MATLAB files created through resampling of SEG-Y files 51 | - size: ~0.19 MB 52 | - shape of 'data' object: 30000 npts x 8721 nch 53 | 54 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASV: 55 | - size: ~0.01-0.02 GB each 56 | - shape: 384 traces x 30000 samples/trace 57 | 58 | For examples on accessing the SEG-Y files please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_SEGY.ipynb) 59 | 60 | #### HDF5 (.h5) Format: 61 | 62 | Files are stored in daily directories labeled using the format YYYYMMDD. 63 | Individual files names include the date and time appended at the end in the 64 | format YYMMDDHHMMSS. Each file respresents 30 s of data, with the exception of 65 | the .h5 files available via HSDS which represent 24 hrs of data. 66 | 67 | s3://nrel-pds-porotomo/DAS/H5/DASH: 68 | - size: ~1 GB 69 | - shape of 'das' variable: 8721 traces x 30000 samples/trace 70 | 71 | s3://nrel-pds-porotomo/DAS/H5/DASV: 72 | - size: ~0.04 GB 73 | - shape of 'das' variable: 384 traces x 30000 samples/trace 74 | 75 | For examples on accessing the HDF5 files please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hdf5.ipynb) 76 | 77 | #### HSDS Format: HD5F from the cloud 78 | 79 | Files are stored in daily .h5 files 80 | 81 | DASH: 82 | - Source .h5: s3://nrel-pds-porotomo/H5/DASH 83 | - HSDS: /nrel/porotomo/DASH 84 | 85 | DASV: 86 | - Source .h5: s3://nrel-pds-porotomo/H5/DASV 87 | - HSDS: /nrel/porotomo/DASV 88 | 89 | For examples on setting up and using HSDS please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hsds.ipynb) 90 | 91 | ### Data Format 92 | 93 | The following datasets are available in HDF5 and HSDS files: 94 | 95 | - channel: channel number (along cabel) 96 | - crs: coordinate reference system 97 | - das: 2D array with das data (shape: t x channel) 98 | - t: time in µs with respect to start of survey 99 | - trace: enumerated integers over length of the trace 100 | - x: x position of channel 101 | - y: y position of channel 102 | - z: z position of channel 103 | 104 | ## Nodal Seismometer Data 105 | 106 | ### Directory Structure 107 | 108 | The PoroTomo Nodal Seismometer data is available on AWS S3: 109 | s3://nrel-pds-porotomo/Nodal/. 110 | The following data and metadata are available: 111 | 112 | #### Continuous Data 113 | 114 | SAC files of the continuous raw data from the nodal seismometers. Data files 115 | sorted into folders by seismometer station number. Note: no data recovered 116 | from stations 73 and 82. 117 | 118 | s3://nrel-pds-porotomo/Nodal/nodal_sac: 119 | - size: 45-173 MB 120 | 121 | 122 | #### Field Notes and Metadata 123 | 124 | PDF scans of field notes and metadata for nodal seismometers including 125 | instrument installation and recovery. 126 | 127 | s3://nrel-pds-porotomo/Nodal/nodal_metadata: 128 | - size: 1.3-15.5 MB 129 | 130 | #### P-Picks 131 | 132 | P-wave travel times auto-picked from cross-correlation waveforms. The files 133 | list the source time and location for a vibe sweep stack followed by the travel 134 | time to each nodal instrument and two means of pick quality assessment. See 135 | README.txt for details. 136 | 137 | s3://nrel-pds-porotomo/Nodal/nodal_analysis/p_picks: 138 | - size: 1.3-1.4 MB 139 | 140 | #### Sweep Data 141 | 142 | 29.8 second long SAC files of the raw nodal seismometer data starting 3.9 143 | seconds before the initiation of each vibroseis sweep extracted from continuous 144 | 24 hour files. Data files sorted into folders by sweep number 145 | (see GDR Submission 826). 146 | 147 | s3://nrel-pds-porotomo/Nodal/nodal_sac_sweep: 148 | - size: 58.8 kB 149 | 150 | ## References: 151 | 152 | Users of the PoroTomo data should please cite: 153 | - [Miller, Douglas E., et al. “DAS and DTS at Brady Hot Springs: Observations about Coupling and Coupled Interpretations.” Semantic Scholar, 14 Feb. 2018](pdfs.semanticscholar.org/048f/419e3c2b4de348a7166b13cab3bc0d56afdc.pdf) 154 | 155 | Additional information regarding: 156 | - [SEG-Y-rev1 file structure](https://seg.org/Portals/0/SEG/News%20and%20Resources/Technical%20Standards/seg_y_rev1.pdf) 157 | - [.h5 file format](https://support.hdfgroup.org/HDF5/doc/H5.format.html) 158 | - [DAS Data](http://dx.doi.org/10.1093/gji/ggy102) 159 | - [PoroTomo Technical Report](https://www.osti.gov/servlets/purl/1499141) 160 | - [DAS and DTS Interpretation](https://pangea.stanford.edu/ERE/pdf/IGAstandard/SGW/2018/Miller.pdf) 161 | - [DTS and DAS Metadata](https://gdr.openei.org/submissions/825) 162 | - [DTS Data](https://gdr.openei.org/submissions/853) 163 | - [Nodal Seismometer Metadata](https://gdr.openei.org/submissions/826) 164 | -------------------------------------------------------------------------------- /PoroTomo/README.md: -------------------------------------------------------------------------------- 1 | # PoroTomo 2 | 3 | The data were collected during March 2016 at the PoroTomo Natural Laboratory 4 | in Brady's Hot Springs, Nevada. 5 | 6 | Silixa’s iDAS (TM) was used for DAS data acquisition with 1.021 m channel 7 | spacing and a guage length of 10 m. Data in this archive are in .sgy and .h5 8 | files with raw units (radians of optical phase change per time sample) 9 | (Miller, et al.). 10 | 11 | The files in this dataset use SEG-y-rev1 (see below for documentation). The 12 | data are also available in .h5 or HDF5 file format. 13 | 14 | Horizontal DAS (DASH) data collection began 3/8/2016, paused, and then started 15 | again on 3/11/2016 and ended 3/26/2016 using zigzag trenched fiber optic 16 | cabels. Vertical DAS (DASV) data collection began 3/17/2016 and ended 3/28/2016 17 | using a fiber optic cable through the first 363 m of a vertical well. 18 | 19 | The resampled DASH data are Matlab files with data for the surface DAS (DASH) 20 | array deployed at the PoroTomo natural laboratory in March 2016. Each file 21 | contains 30 seconds worth of data for 8721 channels. These files have been 22 | resampled in time from the original data and have a sample rate of 100 23 | samples/second. 24 | 25 | The nodal seismometer data consists of continuous and windowed 26 | (to vibroseis sweep) SAC files. Nodal data collection began between 3/6/2016 27 | and 3/11/2016 depending on the station, and ended between 3/26/2016 and 28 | 3/28/2016 also depending on the station. Station names, locations, start times, 29 | stop times, and orientations can be found in the nodal seismometer metadata 30 | linked below. 31 | 32 | ## DAS 33 | 34 | ### Directory structure 35 | 36 | The PoroTomo DAS data is available on AWS S3: s3://nrel-pds-porotomo/DAS/. 37 | The data is available in three formats: 38 | 39 | #### SEG-Y Format: 40 | 41 | Files are stored in daily directories labeled using the format YYYYMMDD. 42 | Individual files names include the date and time appended at the end in the 43 | format YYMMDDHHMMSS. Each file respresents 30 s of data. 44 | 45 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASH: 46 | - size: ~1-2 GB each 47 | - shape: 8721 traces x 30000 samples/trace 48 | 49 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASH/Resampled: 50 | - format: MATLAB files created through resampling of SEG-Y files 51 | - size: ~0.19 MB 52 | - shape of 'data' object: 30000 npts x 8721 nch 53 | 54 | s3://nrel-pds-porotomo/DAS/SEG-Y/DASV: 55 | - size: ~0.01-0.02 GB each 56 | - shape: 384 traces x 30000 samples/trace 57 | 58 | For examples on accessing the SEG-Y files please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_SEGY.ipynb) 59 | 60 | #### HDF5 (.h5) Format: 61 | 62 | Files are stored in daily directories labeled using the format YYYYMMDD. 63 | Individual files names include the date and time appended at the end in the 64 | format YYMMDDHHMMSS. Each file respresents 30 s of data, with the exception of 65 | the .h5 files available via HSDS which represent 24 hrs of data. 66 | 67 | s3://nrel-pds-porotomo/DAS/H5/DASH: 68 | - size: ~1 GB 69 | - shape of 'das' variable: 8721 traces x 30000 samples/trace 70 | 71 | s3://nrel-pds-porotomo/DAS/H5/DASV: 72 | - size: ~0.04 GB 73 | - shape of 'das' variable: 384 traces x 30000 samples/trace 74 | 75 | For examples on accessing the HDF5 files please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hdf5.ipynb) 76 | 77 | #### HSDS Format: HD5F from the cloud 78 | 79 | Files are stored in daily .h5 files 80 | 81 | DASH: 82 | - Source .h5: s3://nrel-pds-porotomo/H5/DASH 83 | - HSDS: /nrel/porotomo/DASH 84 | 85 | DASV: 86 | - Source .h5: s3://nrel-pds-porotomo/H5/DASV 87 | - HSDS: /nrel/porotomo/DASV 88 | 89 | For examples on setting up and using HSDS please see our [example notebook](https://github.com/openEDI/documentation/blob/master/PoroTomo/PoroTomo_Distributed_Acoustic_Sensing_(DAS)_Data_hsds.ipynb) 90 | 91 | ### Data Format 92 | 93 | The following datasets are available in HDF5 and HSDS files: 94 | 95 | - channel: channel number (along cabel) 96 | - crs: coordinate reference system 97 | - das: 2D array with das data (shape: t x channel) 98 | - t: time in µs with respect to start of survey 99 | - trace: enumerated integers over length of the trace 100 | - x: x position of channel 101 | - y: y position of channel 102 | - z: z position of channel 103 | 104 | ## Nodal Seismometer Data 105 | 106 | ### Directory Structure 107 | 108 | The PoroTomo Nodal Seismometer data is available on AWS S3: 109 | s3://nrel-pds-porotomo/Nodal/. 110 | The following data and metadata are available: 111 | 112 | #### Continuous Data 113 | 114 | SAC files of the continuous raw data from the nodal seismometers. Data files 115 | sorted into folders by seismometer station number. Note: no data recovered 116 | from stations 73 and 82. 117 | 118 | s3://nrel-pds-porotomo/Nodal/nodal_sac: 119 | - size: 45-173 MB 120 | 121 | 122 | #### Field Notes and Metadata 123 | 124 | PDF scans of field notes and metadata for nodal seismometers including 125 | instrument installation and recovery. 126 | 127 | s3://nrel-pds-porotomo/Nodal/nodal_metadata: 128 | - size: 1.3-15.5 MB 129 | 130 | #### P-Picks 131 | 132 | P-wave travel times auto-picked from cross-correlation waveforms. The files 133 | list the source time and location for a vibe sweep stack followed by the travel 134 | time to each nodal instrument and two means of pick quality assessment. See 135 | README.txt for details. 136 | 137 | s3://nrel-pds-porotomo/Nodal/nodal_analysis/p_picks: 138 | - size: 1.3-1.4 MB 139 | 140 | #### Sweep Data 141 | 142 | 29.8 second long SAC files of the raw nodal seismometer data starting 3.9 143 | seconds before the initiation of each vibroseis sweep extracted from continuous 144 | 24 hour files. Data files sorted into folders by sweep number 145 | (see GDR Submission 826). 146 | 147 | s3://nrel-pds-porotomo/Nodal/nodal_sac_sweep: 148 | - size: 58.8 kB 149 | 150 | ## References: 151 | 152 | Users of the PoroTomo data should please cite: 153 | - [Miller, Douglas E., et al. “DAS and DTS at Brady Hot Springs: Observations about Coupling and Coupled Interpretations.” Semantic Scholar, 14 Feb. 2018](pdfs.semanticscholar.org/048f/419e3c2b4de348a7166b13cab3bc0d56afdc.pdf) 154 | 155 | Additional information regarding: 156 | - [SEG-Y-rev1 file structure](https://seg.org/Portals/0/SEG/News%20and%20Resources/Technical%20Standards/seg_y_rev1.pdf) 157 | - [.h5 file format](https://support.hdfgroup.org/HDF5/doc/H5.format.html) 158 | - [DAS Data](http://dx.doi.org/10.1093/gji/ggy102) 159 | - [PoroTomo Technical Report](https://www.osti.gov/servlets/purl/1499141) 160 | - [DAS and DTS Interpretation](https://pangea.stanford.edu/ERE/pdf/IGAstandard/SGW/2018/Miller.pdf) 161 | - [DTS and DAS Metadata](https://gdr.openei.org/submissions/825) 162 | - [DTS Data](https://gdr.openei.org/submissions/853) 163 | - [Nodal Seismometer Metadata](https://gdr.openei.org/submissions/826) 164 | -------------------------------------------------------------------------------- /SMART-DS/Readme.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/Readme.pdf -------------------------------------------------------------------------------- /SMART-DS/figures/AUS/all_labels.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/AUS/all_labels.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/CYME/import_timeseries.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/import_timeseries.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/CYME/import_voltvar.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/import_voltvar.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/CYME/importing.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/importing.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/CYME/networks.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/networks.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/CYME/simplified_view.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/simplified_view.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/CYME/simplified_view_zoomed.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/simplified_view_zoomed.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/CYME/substation.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/substation.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/CYME/timeseries_results.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/CYME/timeseries_results.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/GIS/layer_examples.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/GIS/layer_examples.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/GIS/missing_layers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/GIS/missing_layers.png -------------------------------------------------------------------------------- /SMART-DS/figures/GSO/all_labels.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/GSO/all_labels.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/GSO/all_labels2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/GSO/all_labels2.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/OpenDSS/feeder.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/feeder.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/OpenDSS/monitor_current.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/monitor_current.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/OpenDSS/monitor_kva.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/monitor_kva.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/OpenDSS/profile.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/profile.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/OpenDSS/running_dss.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/OpenDSS/running_dss.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/SAF/all_labels.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SAF/all_labels.png -------------------------------------------------------------------------------- /SMART-DS/figures/SFO/downtown_labels.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SFO/downtown_labels.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/SFO/east_labels.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SFO/east_labels.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/SFO/north_labels.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SFO/north_labels.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/SFO/south_labels.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/SFO/south_labels.PNG -------------------------------------------------------------------------------- /SMART-DS/figures/analysis/pu_voltages_histogram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/analysis/pu_voltages_histogram.png -------------------------------------------------------------------------------- /SMART-DS/figures/analysis/pu_voltages_percentiles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/analysis/pu_voltages_percentiles.png -------------------------------------------------------------------------------- /SMART-DS/figures/load_curves/total_load_200.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/load_curves/total_load_200.png -------------------------------------------------------------------------------- /SMART-DS/figures/load_curves/total_load_244.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/openEDI/documentation/726265c6b3e2a696ac5cacf724ad0ca83bbd2e35/SMART-DS/figures/load_curves/total_load_244.png -------------------------------------------------------------------------------- /Sup3rCC.md: -------------------------------------------------------------------------------- 1 | # Super-Resolution for Renewable Energy Resource Data with Climate Change Impacts (Sup3rCC) 2 | 3 | ## Description 4 | 5 | The Super-Resolution for Renewable Energy Resource Data with Climate Change Impacts (Sup3rCC) data is a collection of 4km hourly wind, solar, temperature, humidity, and pressure fields for the contiguous United States under various climate change scenarios. 6 | 7 | Sup3rCC is downscaled Global Climate Model (GCM) data. The downscaling process was performed using a generative machine learning approach called sup3r: Super-Resolution for Renewable Energy Resource Data (linked below as "Sup3r GitHub Repo"). The data includes both historical and future weather years, although the historical years represent the historical climate, not the actual historical weather that we experienced. You cannot use Sup3rCC data to study historical weather events, although other sup3r datasets may be intended for this. 8 | 9 | The Sup3rCC data is intended to help researchers study the impact of climate change on energy systems with high levels of wind and solar capacity. Please note that all climate change data is only a representation of the *possible* future climate and contains significant uncertainty. Analysis of multiple climate change scenarios and multiple climate models can help quantify this uncertainty. 10 | 11 | ## Version Log 12 | The Sup3rCC data has versions that coincide with the sup3r software versions. Note that not every sup3r software version will have a corresponding Sup3rCC data release, but every Sup3rCC data release will have a corresponding sup3r software version. This table records versions of Sup3rCC data releases. Sup3rCC generative models may have slightly different versions than the data. The version in the Sup3rCC .h5 file attribute can be inspected to verify the actual version of the data you are using. 13 | 14 | | Version | Date | Notes | 15 | | -------- | -------- | ------- | 16 | | 0.1.0 | 6/27/2023 | Initial Sup3rCC release with two GCMs and one climate scenario. Known issues: few years used for bias correction, simplistic GCM bias correction method, mean bias in high-res output especially in wind and solar data, imperfect wind diurnal cycles when compared to WTK and timing of diurnal peak temperature when compared to observation. 17 | 18 | ## Directory structure 19 | 20 | The Sup3rCC directory contains downscaled data for multiple projections of future climate change. For example, a file from the initial data release `sup3rcc_conus_ecearth3_ssp585_r1i1p1f1_wind_2015.h5` is downscaled from the climate model MRI ESM 2.0 for climate change scenario SSP5 8.5 and variant label r1i1p1f1. The file contains wind variables for the year 2015. Note that this will represent the climate from 2015, but not the actual weather we experienced. 21 | 22 | Within the S3 bucket there is also a folder `models` providing pre-trained Sup3rCC generative machine learning models. See the Sup3r GitHub Repo below for examples of how to use these models. 23 | 24 | ## Data Format 25 | 26 | The data is provided in Hierarchical Data Format (.h5) separated by year and by variable set. Variables are provided in 2 dimensional spatiotemporal arrays (called “datasets” in h5 files) with dimensions `(time, space)`. The temporal axis is defined by the `time_index` dataset, while the positional axis is defined by the `meta` dataset. Additional details on data format and data access patterns can be found in the [rex docs](https://nrel.github.io/rex/misc/examples.nrel_data.html). 27 | 28 | ## Code Examples 29 | - For code examples, users can reference the [Sup3r GitHub Repository](https://github.com/NREL/sup3r/tree/main) which includes examples for Sup3rCC data access in the [/examples/sup3rcc](https://github.com/NREL/sup3r/tree/main/examples/sup3rcc) directory. 30 | - The [rex docs](https://nrel.github.io/rex/misc/examples.nrel_data.html) provide examples on the easiest ways to access the data remotely or on the NREL HPC. 31 | 32 | ## References 33 | Users of the Sup3rCC data should use the following citation: 34 | 35 | - Buster, Grant, Benton, Brandon, Glaws, Andrew, and King, Ryan. “High-Resolution Meteorology with Climate Change Impacts from Global Climate Model Data Using Generative Machine Learning.” _Nature Energy_, March 14, 2024. https://doi.org/10.1038/s41560-024-01507-9. 36 | - Buster, Grant, Benton, Brandon, Glaws, Andrew, and King, Ryan. Super-Resolution for Renewable Energy Resource Data with Climate Change Impacts (Sup3rCC). United States: N.p., 19 Apr, 2023. Web. doi: 10.25984/1970814. 37 | -------------------------------------------------------------------------------- /Template.md: -------------------------------------------------------------------------------- 1 | # {Dataset Name} 2 | 3 | ## Description 4 | 5 | A brief description of the data including: 6 | - how it was produced? 7 | - why it important/novel 8 | - who/how it might be used 9 | 10 | ## Directory structure 11 | 12 | If the dataset is made up of multiple files a description of how they are/will 13 | be stored in relation to each other. 14 | 15 | ## Data Format 16 | 17 | How the data is stored with in each file including a data dictionary with 18 | - dataset/variable/column names 19 | - units 20 | 21 | ## Code Examples 22 | 23 | Example scripts of how to access the data IN THE CLOUD. A jupyter notebook or 24 | link to a github repo with examples can be used instead. 25 | 26 | ## References 27 | 28 | Any helpful references other documentation 29 | 30 | ## Disclaimer and Attribution 31 | 32 | Optional additional attributes/disclaimers 33 | -------------------------------------------------------------------------------- /TrackingtheSun.md: -------------------------------------------------------------------------------- 1 | # LBNL Tracking the Sun 2 | 3 | ## Description 4 | 5 | Berkeley Lab’s Tracking the Sun report series is dedicated to summarizing installed prices and other trends among grid-connected, distributed solar photovoltaic (PV) systems in the United States. The present report, the 11th edition in the series, focuses on systems installed through year-end 2017, with preliminary trends for the first half of 2018. As in years past, the primary emphasis is on describing changes in installed prices over time and variation in pricing across projects based on location, project ownership, system design, and other attributes. New to this year, however, is an expanded discussion of other project characteristics in the large underlying data sample. Future editions will include more of such material, beyond the report’s traditional focus on installed pricing. 6 | 7 | The trends described in this report derive primarily from project-level data reported to state agencies and utilities that administer PV incentive programs, solar renewable energy credit (SREC) registration systems, or interconnection processes. In total, data were collected and cleaned for more than 1.3 million individual PV systems, representing 81% of U.S. residential and non-residential PV systems installed through 2017. The analysis of installed pricing trends is based on a subset of roughly 770,000 systems with available installed price data. 8 | 9 | A technical summary of the dataset is as follows: 10 | 11 | Focuses on projects installed through 2018 with preliminary data for the first half of 2022: 12 | - Describes and analyzes trends related to Project characteristics, including system size and design, ownership, customer segmentation, and other attributes 13 | - National median installed prices, both long-term and recent trends, focusing on host-owned systems 14 | - Variability in pricing across projects according to system size, state, installer, module efficiency, inverter technology, residential new construction vs. retrofit, tax-exempt vs. commercial site hosts, and mounting configuration 15 | - Distributed PV for the purpose of this report, includes residential and non-residential systems that are roof-mounted (of any size) or groundmounted up to 5 MWAC 16 | 17 | Tracking the Sun relies on project-level data: 18 | - Provided by state agencies and utilities that administer PV incentive programs, renewable energy credit registration (REC) systems, or interconnection processes 19 | - Some of these data already exist in the public domain (e.g., California’s Currently Interconnected Dataset), though LBNL may receive supplementary fields, in some cases covered under non-disclosure agreements 20 | - 67 entities spanning 30 states contributed data to this year’s report (Table A-1 in the Appendix of the report ) 21 | 22 | Customer Segments 23 | - Residential: Single-family and, depending on the data provider, may also include multi-family 24 | - Small Non-Residential: Non-residential systems ≤100 kWDC 25 | - Large Non-Residential: Non-residential systems >100 kWDC (and ≤5,000 kWAC if ground-mounted) * Independent of whether connected to the customer- or utility-side of the meter 26 | 27 | Units 28 | - Real 2018 dollars 29 | - Direct current (DC) Watts (W), unless otherwise noted 30 | 31 | Installed Price: Up-front $/W price paid by the PV system owner, prior to incentives 32 | 33 | Sample Frames and Data cleaning 34 | Full sample: (Used to describe system characteristics, the basis for the public dataset) 35 | 1. Remove systems with missing size or install date 36 | 2. Standardize installer, module, inverter names 37 | 3. Integrate equipment spec sheet data 38 | – Module efficiency and technology type 39 | – Flag microinverters or DC optimizers 40 | 4. Convert dollar and kW values to appropriate units, 41 | and compute other derived fields 42 | 43 | Installed-Price Sample: (Used in analysis of installed prices) 44 | 5. Remove systems if: 45 | – Missing installed price data 46 | – Third-party owned (TPO) 47 | – Battery storage included 48 | – Self-installed 49 | 50 | ## Directory Structure 51 | 52 | The Tracking the Sun Dataset is made available in Parquet format on AWS and is partitioned by `state` in AWS Glue and Athena. The schema may change across dataset years on S3. 53 | 54 | - `s3://oedi-data-lake/tracking-the-sun/2018/` 55 | - `s3://oedi-data-lake/tracking-the-sun/2019/` 56 | - `s3://oedi-data-lake/tracking-the-sun/2020/` 57 | - `s3://oedi-data-lake/tracking-the-sun/2021/` 58 | - `s3://oedi-data-lake/tracking-the-sun/2022/` 59 | 60 | ## python Connection examples 61 | 62 | ```python 63 | 64 | import pandas as pd 65 | from pyathena import connect 66 | 67 | conn = connect( 68 | s3_staging_dir='s3:///tracking-the-sun', ##user defined staging directory 69 | region_name='us-west-2', 70 | work_group='' ##specify workgroup if exists 71 | ) 72 | 73 | df = pd.read_sql("SELECT * FROM oedi_tracking_the_sun_2019 limit 8;", conn) 74 | ``` 75 | For jupyter notebook example see our notebook which includes partitions and data dictionary: 76 | [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples) 77 | 78 | ## Metadata Information 79 | 80 | The dataset is partitioned by the US State. 81 | 82 | Please refer to this repository for examples of metadata and data access - https://github.com/openEDI/open-data-access-tools/tree/master/examples 83 | 84 | ## References 85 | 86 | [https://emp.lbl.gov/sites/default/files/tracking_the_sun_2019_report.pdf](https://emp.lbl.gov/sites/default/files/tracking_the_sun_2019_report.pdf) 87 | 88 | [https://emp.lbl.gov/sites/default/files/tracking_the_sun_2019_slide_deck_summary_0.pdf](https://emp.lbl.gov/sites/default/files/tracking_the_sun_2019_slide_deck_summary_0.pdf) 89 | 90 | [https://emp.lbl.gov/tracking-sun-tool](https://emp.lbl.gov/tracking-sun-tool) 91 | 92 | ## Disclaimer and Attribution 93 | 94 | Copyright (c) 2020, Alliance for Sustainable Energy LLC, All rights reserved. 95 | 96 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 97 | 98 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 99 | 100 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 101 | 102 | * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 103 | 104 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 105 | -------------------------------------------------------------------------------- /UMCM_Hurricanes.md: -------------------------------------------------------------------------------- 1 | # University of Miami Coupled Model (UMCM) for Hurricanes Ike and Sandy 2 | 3 | ## Model 4 | 5 | The University of Miami Coupled Model (UMCM) is a coupled model that integrates 6 | atmospheric, wave, and ocean components to produce wind, wave, and current 7 | data. Atmospheric data is produced using the [Weather Research and Forecasting](https://www.mmm.ucar.edu/weather-research-and-forecasting-model) 8 | model (WRF), Wave data is produced using the [University of Miami Wave Model](https://umwm.org/) 9 | (UMWM), While ocean current data is produce using the 10 | [HYbrid Coordinate Ocean Model](https://www.hycom.org/) (HYCOM). 11 | 12 | The model was used to study offshore wind conditions during Hurricane Ike 13 | and Hurricane Sandy. The time resolution for each model run is as follows: 14 | 15 | - Hurricane Ike 16 | - 1 sample/hour from 9/8/2008 12:00:00 UTC to 9/12/2008 6:00:00 UTC 17 | - 1 sample/10 minutes from 9/12/2008 6:00:00 UTC to 9/13/2008 9:00:00 UTC 18 | - Hurricane Sandy 19 | - 1 sample/10 minutes from 10/28/2012 00:10:00 UTC to 10/31/2012 00:00:00 UTC 20 | 21 | The following variables were extracted from the HYCOM model: 22 | - bathymetry 23 | - ocean_mixed_layer_thickness-ilt 24 | - ocean_mixed_layer_thickness-mlt 25 | - sea_water_potential_density at all depths 26 | - sea_water_salinity at all depths 27 | - sea_surface_elevation 28 | - eastward_sea_water_velocity 29 | - northward_sea_water_velocity 30 | - upward_sea_water_velocity 31 | - sea_water_temperature 32 | - depth (m): [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 120, 135, 150, 175, 200, 225, 250, 275, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000] 33 | 34 | The following variables were extracted from the 35 | - cd 36 | - cgmxx 37 | - cgmxy 38 | - cgmyy 39 | - dcg 40 | - dcg0 41 | - dcp 42 | - dcp0 43 | - depth (m): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100] 44 | - dwd 45 | - dwl 46 | - dwp 47 | - momx 48 | - momy 49 | - mss 50 | - mwd 51 | - mwl 52 | - mwp 53 | - rhoa 54 | - rhow 55 | - seamask 56 | - swh 57 | - tailatmx 58 | - tailatmy 59 | - tailocnx 60 | - tailocny 61 | - taux_bot 62 | - taux_form 63 | - taux_form_1 64 | - taux_form_2 65 | - taux_form_3 66 | - taux_ocn 67 | - taux_skin 68 | - taux_snl 69 | - tauy_bot 70 | - tauy_form 71 | - tauy_form_1 72 | - tauy_form_2 73 | - tauy_form_3 74 | - tauy_ocn 75 | - tauy_skin 76 | - tauy_snl 77 | - u_stokes at all depths 78 | - uc 79 | - ust 80 | - v_stokes at all depths 81 | - vc 82 | - wdir 83 | - wspd 84 | 85 | ## Directory structure 86 | 87 | The UMCM data is stored in two .h5 files: 88 | - s3://oedi-data-lake/umcm/ 89 | - ike.h5 90 | - sandy.h5 91 | 92 | The UMCM data is also available via HSDS at /nrel/umcm/. 93 | 94 | For examples on setting up and using HSDS please see our [examples repository](https://github.com/nrel/hsds-examples) 95 | 96 | ## Data Format 97 | 98 | The UMCM data is stored by in datasets by variable and depth (when available). 99 | Each dataset is composed of a 3D data "cube" with dimensions (time, latitude, 100 | longitude). The positional values of each dimension are available in the 1D 101 | datasets: 102 | - `time_index` 103 | - `latitude` 104 | - `longitude` 105 | 106 | Additional locational meta data is available in the `meta` table. 107 | 108 | ## References 109 | 110 | Users of the UMCM model should use the following citation: 111 | - [Phillipes, Caleb, Veers, Paul, Kim, Eungsoo, Manuel, Lance, Curcic, Milan, and Chen, Shuyi. University of Miami Coupled Model (UMCM) for Hurricanes Ike and Sandy. United States: N.p., 30 Sep, 2015. Web. https://data.openei.org/submissions/574.](https://data.openei.org/submissions/574) 112 | -------------------------------------------------------------------------------- /US_Wave.md: -------------------------------------------------------------------------------- 1 | # High Resolution Ocean Surface Wave Hindcast 2 | 3 | ## Description 4 | 5 | The development of this dataset was funded by the U.S. Department of Energy, 6 | Office of Energy Efficiency & Renewable Energy, Water Power Technologies Office 7 | to improve our understanding of the U.S. wave energy resource and to provide 8 | critical information for wave energy project development and wave energy 9 | converter design. 10 | 11 | This is the highest resolution publicly available long-term wave hindcast 12 | dataset that – when complete – will cover the entire U.S. Exclusive Economic 13 | Zone (EEZ). The data can be used to investigate the historical record of wave 14 | statistics at any U.S. site. As such, the dataset could also be of value to any 15 | entity with marine operations inside the U.S. EEZ. 16 | 17 | A technical summary of the dataset is as follows: 18 | 19 | - 32 Year Wave Hindcast (1979-2010), 3-hour temporal resolution 20 | - Unstructured grid spatial resolution ranges from 200 meters in shallow water to ~10 km in deep water 21 | - Spatial coverage: EEZ offshore of all U.S territories (see below) 22 | 23 | The following variables are included in the dataset: 24 | 25 | - Mean Wave Direction: Direction normal to the wave crests 26 | - Significant Wave Height: Calculated as the zeroth spectral moment (i.e., H_m0) 27 | - Mean Absolute Period: Calculated as a ratio of spectral moments (m_0/m_1) 28 | - Peak Period: The period associated with the maximum value of the wave energy spectrum 29 | - Mean Zero-Crossing Period: Calculated as a ratio of spectral moments (sqrt(m_0/m_2)) 30 | - Energy Period: Calculated as a ratio of spectral moments (m_-1/m_0) 31 | - Directionality Coefficient: Fraction of total wave energy travelling in the direction of maximum wave power 32 | - Maximum Energy Direction: The direction from which the most wave energy is travelling 33 | - Omni-Directional Wave Power: Total wave energy flux from all directions 34 | - Spectral Width: Spectral width characterizes the relative spreading of energy in the wave spectrum 35 | 36 | The following U.S. regions will be added to this dataset under the given 37 | `domain` names 38 | 39 | - West Coast United States: `West_Coast` 40 | - East Coast United States: `Atlantic` 41 | - Alaskan Coast: TBD 42 | - Hawaiian Islands: `Hawaii` 43 | - Gulf of Mexico, Puerto Rico, and U.S. Virgin Islands: TBD 44 | - U.S. Pacific Island Territories: TBD 45 | 46 | ## Model 47 | 48 | The multi-scale, unstructured-grid modeling approach using WaveWatch III and 49 | SWAN enabled long-term (decades) high-resolution hindcasts in a large regional 50 | domain. In particular, the dataset was generated from the unstructured-grid 51 | SWAN model output that was driven by a WaveWatch III model with global-regional 52 | nested grids. The unstructured-grid SWAN model simulations were performed with 53 | a spatial resolution as fine as 200 meters in shallow waters. The dataset has a 54 | 3-hour timestep spanning 32 years from 1979 through 2010. The project team 55 | intends to extend this to 2020 (i.e., 1979-2020), pending DOE support to do so. 56 | 57 | The models were extensively validated not only for the most common wave 58 | parameters, but also six IEC resource parameters and 2D spectra with high 59 | quality spectral data derived from publicly available buoys. Additional details 60 | on definitions of the variables found in the dataset, the SWAN and WaveWatch 61 | III model configurations and model validation are available in technical report 62 | and peer-reviewed publications (Wu et al. 2020, Yang et al. 2020, Yang et al. 63 | 2018). This study was funded by the U.S. Department of Energy, Office of Energy 64 | Efficiency & Renewable Energy, Water Power Technologies Office under Contract 65 | DE-AC05-76RL01830 to Pacific Northwest National Laboratory (PNNL). 66 | 67 | ## Directory structure 68 | 69 | High Resolution Ocean Surface Wave Hindcast data is made available as a series 70 | of 3 hourly .h5 files located on AWS S3 for the domains discussed above: 71 | - `s3://wpto-pds-US_wave/v1.0.0/${domain}` 72 | 73 | Hourly virtual bouy data is also available in hourly .h5 files on AWS S3: 74 | - `s3://wpto-pds-US_wave/v1.0.0/virtual_buoy/${domain}` 75 | 76 | The US wave data is also available via HSDS at `/nrel/US_wave/` 77 | For examples on setting up and using HSDS please see our [examples repository](https://github.com/nrel/hsds-examples) 78 | 79 | ## Data Format 80 | 81 | The data is provided in high density data file (.h5) separated by year. The 82 | variables mentioned above are provided in 2 dimensional time-series arrays with 83 | dimensions (time x location). The temporal axis is defined by the `time_index` 84 | dataset, while the positional axis is defined by the `coordinate` dataset. The 85 | units for the variable data is also provided as an attribute (`units`). The 86 | SWAN and IEC valiable names are also provide under the attributes 87 | (`SWAWN_name`) and (`IEC_name`) respectively. 88 | 89 | ## Python Examples 90 | 91 | Example scripts to extract wind resource data using python are provided below: 92 | 93 | The easiest way to access and extract data from the Resource eXtraction tool 94 | [`rex`](https://github.com/nrel/rex) 95 | 96 | To use `rex` with [`HSDS`](https://github.com/NREL/hsds-examples) you will need 97 | to install `h5pyd`: 98 | 99 | ``` 100 | pip install h5pyd 101 | ``` 102 | 103 | Next you'll need to configure HSDS: 104 | 105 | ``` 106 | hsconfigure 107 | ``` 108 | 109 | and enter at the prompt: 110 | 111 | ``` 112 | hs_endpoint = https://developer.nrel.gov/api/hsds 113 | hs_username = 114 | hs_password = 115 | hs_api_key = 3K3JQbjZmWctY0xmIfSYvYgtIcM3CN0cb1Y2w9bf 116 | ``` 117 | 118 | **IMPORTANT: The example API key here is for demonstation and is rate-limited per IP. To get your own API key, visit https://developer.nrel.gov/signup/** 119 | 120 | You can also add the above contents to a configuration file at `~/.hscfg` 121 | 122 | 123 | ```python 124 | from rex import ResourceX 125 | 126 | wave_file = '/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5' 127 | with ResourceX(wave_file, hsds=True) as f: 128 | meta = f.meta 129 | time_index = f.time_index 130 | swh = f['significant_wave_height'] 131 | ``` 132 | 133 | `rex` also allows easy extraction of the nearest site to a desired (lat, lon) 134 | location: 135 | 136 | ```python 137 | from rex import ResourceX 138 | 139 | wave_file = '/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5' 140 | lat_lon = (34.399408, -119.841181) 141 | with ResourceX(wave_file, hsds=True) as f: 142 | lat_lon_swh = f.get_lat_lon_df('significant_wave_height', lat_lon) 143 | ``` 144 | 145 | or to extract all sites in a given region: 146 | 147 | ```python 148 | from rex import ResourceX 149 | 150 | wave_file = '/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5' 151 | jurisdication='California' 152 | with ResourceX(wave_file, hsds=True) as f: 153 | ca_swh = f.get_region_df('significant_wave_height', jurisdiction, 154 | region_col='jurisdiction') 155 | ``` 156 | 157 | If you would rather access the US Wave data directly using h5pyd: 158 | 159 | ```python 160 | # Extract the average wave height 161 | import h5pyd 162 | import pandas as pd 163 | 164 | # Open .h5 file 165 | with h5pyd.File('/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5', mode='r') as f: 166 | # Extract meta data and convert from records array to DataFrame 167 | meta = pd.DataFrame(f['meta'][...]) 168 | # Significant Wave Height 169 | swh = f['significant_wave_height'] 170 | # Extract scale factor 171 | scale_factor = swh.attrs['scale_factor'] 172 | # Extract, average, and unscale wave height 173 | mean_swh = swh[...].mean(axis=0) / scale_factor 174 | 175 | # Add mean wave height to meta data 176 | meta['Average Wave Height'] = mean_swh 177 | ``` 178 | 179 | ```python 180 | # Extract time-series data for a single site 181 | import h5pyd 182 | import pandas as pd 183 | 184 | # Open .h5 file 185 | with h5pyd.File('/nrel/US_wave/West_Coast/West_Coast_wave_2010.h5', mode='r') as f: 186 | # Extract time_index and convert to datetime 187 | # NOTE: time_index is saved as byte-strings and must be decoded 188 | time_index = pd.to_datetime(f['time_index'][...].astype(str)) 189 | # Initialize DataFrame to store time-series data 190 | time_series = pd.DataFrame(index=time_index) 191 | # Extract wave height, direction, and period 192 | for var in ['significant_wave_height', 'mean_wave_direction', 193 | 'mean_absolute_period']: 194 | # Get dataset 195 | ds = f[var] 196 | # Extract scale factor 197 | scale_factor = ds.attrs['scale_factor'] 198 | # Extract site 100 and add to DataFrame 199 | time_series[var] = ds[:, 100] / scale_factor 200 | ``` 201 | ## References 202 | 203 | Please cite the most relevant publication below when referencing this dataset: 204 | 205 | 1) [Wu, Wei-Cheng, et al. "Development and validation of a high-resolution regional wave hindcast model for US West Coast wave resource characterization." Renewable Energy 152 (2020): 736-753.](https://www.osti.gov/biblio/1599105) 206 | 2) [Yang, Z., G. García-Medina, W. Wu, and T. Wang, 2020. Characteristics and variability of the Nearshore Wave Resource on the U.S. West Coast. Energy.](https://doi.org/10.1016/j.energy.2020.117818) 207 | 3) [Yang, Zhaoqing, et al. High-Resolution Regional Wave Hindcast for the US West Coast. No. PNNL-28107. Pacific Northwest National Lab.(PNNL), Richland, WA (United States), 2018.](https://doi.org/10.2172/1573061) 208 | 4) [Ahn, S. V.S. Neary, Allahdadi, N. and R. He, Nearshore wave energy resource characterization along the East Coast of the United States, Renewable Energy, 2021, 172](https://doi.org/10.1016/j.renene.2021.03.037) 209 | 5) [Yang, Z. and V.S. Neary, High-resolution hindcasts for U.S. wave energy resource characterization. International Marine Energy Journal, 2020, 3, 65-71](https://doi.org/10.36688/imej.3.65-71) 210 | 6) [Allahdadi, M.N., He, R., and Neary, V.S.: Predicting ocean waves along the US East Coast during energetic winter storms: sensitivity to whitecapping parameterizations, Ocean Sci., 2019, 15, 691-715](https://doi.org/10.5194/os-15-691-2019) 211 | 7) [Allahdadi, M.N., Gunawan, J. Lai, R. He, V.S. Neary, Development and validation of a regional-scale high-resolution unstructured model for wave energy resource characterization along the US East Coast, Renewable Energy, 2019, 136, 500-511](https://doi.org/10.1016/j.renene.2019.01.020) 212 | 213 | ## Disclaimer and Attribution 214 | 215 | The National Renewable Energy Laboratory (“NREL”) is operated for the U.S. 216 | Department of Energy (“DOE”) by the Alliance for Sustainable Energy, LLC 217 | ("Alliance"). Pacific Northwest National Laboratory (PNNL) is managed and 218 | operated by Battelle Memorial Institute ("Battelle") for DOE. As such the 219 | following rules apply: 220 | 221 | This data arose from worked performed under funding provided by the United 222 | States Government. Access to or use of this data ("Data") denotes consent with 223 | the fact that this data is provided "AS IS," “WHEREIS” AND SPECIFICALLY FREE 224 | FROM ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND, INCLUDING BUT NOT LIMITED TO 225 | ANY IMPLIED WARRANTIES SUCH AS MERCHANTABILITY AND/OR FITNESS FOR ANY 226 | PARTICULAR PURPOSE. Furthermore, NEITHER THE UNITED STATES GOVERNMENT NOR ANY 227 | OF ITS ASSOCITED ENTITES OR CONTRACTORS INCLUDING BUT NOT LIMITED TO THE 228 | DOE/PNNL/NREL/BATTELLE/ALLIANCE ASSUME ANY LEGAL LIABILITY OR RESPONSIBILITY 229 | FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF THE DATA, OR REPRESENT THAT 230 | ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS. NO ENDORSEMENT OF THE DATA 231 | OR ANY REPRESENTATIONS MADE IN CONNECTION WITH THE DATA IS PROVIDED. IN NO 232 | EVENT SHALL ANY PARTY BE LIABLE FOR ANY DAMAGES, INCLUDING BUT NOT LIMITED TO 233 | SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES ARISING FROM THE PROVISION OF THIS 234 | DATA; TO THE EXTENT PERMITTED BY LAW USER AGREES TO INDEMNIFY 235 | DOE/PNNL/NREL/BATTELLE/ALLIANCE AND ITS SUBSIDIARIES, AFFILIATES, OFFICERS, 236 | AGENTS, AND EMPLOYEES AGAINST ANY CLAIM OR DEMAND RELATED TO USER'S USE OF THE 237 | DATA, INCLUDING ANY REASONABLE ATTORNEYS FEES INCURRED. 238 | 239 | The user is granted the right, without any fee or cost, to use or copy the 240 | Data, provided that this entire notice appears in all copies of the Data. In 241 | the event that user engages in any scientific or technical publication 242 | utilizing this data user agrees to credit DOE/PNNL/NREL/BATTELLE/ALLIANCE in 243 | any such publication consistent with respective professional practice. 244 | -------------------------------------------------------------------------------- /WINDToolkit.md: -------------------------------------------------------------------------------- 1 | # Wind Resource Data: Wind Integration National Dataset (WIND) Toolkit 2 | 3 | ## Model 4 | 5 | Wind resource data for North America was produced using the [Weather Research and Forecasting Model (WRF)](https://www.mmm.ucar.edu/weather-research-and-forecasting-model). 6 | The WRF model was initialized with the European Centre for Medium Range Weather 7 | Forecasts Interim Reanalysis (ERA-Interm) data set with an initial grid spacing 8 | of 54 km. Three internal nested domains were used to refine the spatial 9 | resolution to 18, 6, and finally 2 km. The WRF model was run for years 2007 10 | to 2014. While outputs were extracted from WRF at 5 minute time-steps, due to 11 | storage limitations instantaneous hourly time-step are provided for all 12 | variables while full 5 min resolution data is provided for wind speed and wind 13 | direction only. 14 | 15 | The following variables were extracted from the WRF model data: 16 | - Wind Speed at 10, 40, 60, 80, 100, 120, 140, 160, 200 m 17 | - Wind Direction at 10, 40, 60, 80, 100, 120, 140, 160, 200 m 18 | - Temperature at 2, 10, 40, 60, 80, 100, 120, 140, 160, 200 m 19 | - Pressure at 0, 100, 200 m 20 | - Surface Precipitation Rate 21 | - Surface Relative Humidity 22 | - Inverse Monin Obukhov Length 23 | 24 | ## Countries 25 | 26 | ### North America 27 | 28 | Wind resource for North America was produce using three distinct WRF domains 29 | shown below. The CONUS domain for 2007-2013 was run by 3Tier while 2014 as well 30 | as all years of the Canada and Mexico domains were run under NARIS. The data 31 | is provided in three sets of files: 32 | 33 | - CONUS: Extracted exclusively from the CONUS domain 34 | - Canada: Combined data from the Canada and CONUS domains 35 | - Mexico: Combined data from the Mexico and CONUS domains 36 | 37 | ### Asia 38 | 39 | Wind resource was also produced fro the following countries and years: 40 | 41 | - Bangladesh: 2014-2017 42 | - Central Asia: 2015 43 | - India: 2014 44 | - Kazakhstan: 2015 45 | - Philippines: 2017 46 | - Vietnam: 2016-2018 47 | 48 | ## Directory structure 49 | 50 | Wind resource data is made available as a series of hourly .h5 files 51 | corresponding to each country and year. Below is an example of the directory 52 | structure for the CONUS domains: 53 | - s3://nrel-pds-wtk/conus -> root directory for the conus domain 54 | - /v1.0.0 -> version 1 of the data corresponding to years 2007-2013, run by 3Tier 55 | - /wtk_conus_${year}.h5 -> Hourly data for all variables for the given year 56 | - /${year}/wind_${hub_height}.h5 -> Five minute wind resource data for the given year and hub height 57 | - /v1.1.0 -> version 1.1 of the data corresponding to 2014, run under NARIS with an updated version of WRF and new Boundary Layer Physics (PBL scheme) 58 | 59 | The WIND Toolkit data is also available via HSDS at /nrel/wtk/${country}. 60 | 61 | For examples on setting up and using HSDS please see our [examples repository](https://github.com/nrel/hsds-examples) 62 | 63 | ## Data Format 64 | 65 | The data is provided in high density data file (.h5) separated by year. The 66 | variables mentioned above are provided in 2 dimensional time-series arrays with 67 | dimensions (time x location). The temporal axis is defined by the `time_index` 68 | dataset, while the positional axis is defined by the `meta` dataset. For 69 | storage efficiency each variable has been scaled and stored as an integer. The 70 | scale-factor is provided in the `scale-factor` attribute. The units for the 71 | variable data is also provided as an attribute (`units`). 72 | 73 | ## Python Examples 74 | 75 | Example scripts to extract wind resource data using python are provided below: 76 | 77 | The easiest way to access and extract data from the Resource eXtraction tool 78 | [`rex`](https://github.com/nrel/rex) 79 | 80 | 81 | ```python 82 | from rex import WindX 83 | 84 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5' 85 | with WindX(wtk_file, hsds=True) as f: 86 | meta = f.meta 87 | time_index = f.time_index 88 | wspd_100m = f['windspeed_100m'] 89 | ``` 90 | 91 | Note: `WindX` will automatically interpolate to the desired hub-height: 92 | 93 | ```python 94 | from rex import WindX 95 | 96 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5' 97 | with WindX(wtk_file, hsds=True) as f: 98 | print(f.datasets) # not 90m is not a valid dataset 99 | wspd_90m = f['windspeed_90m'] 100 | ``` 101 | 102 | `rex` also allows easy extraction of the nearest site to a desired (lat, lon) 103 | location: 104 | 105 | ```python 106 | from rex import WindX 107 | 108 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5' 109 | nwtc = (39.913561, -105.222422) 110 | with WindX(wtk_file, hsds=True) as f: 111 | nwtc_wspd = f.get_lat_lon_df('windspeed_100m', nwtc) 112 | ``` 113 | 114 | or to extract all sites in a given region: 115 | 116 | ```python 117 | from rex import WindX 118 | 119 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5' 120 | state='Colorado' 121 | with WindX(wtk_file, hsds=True) as f: 122 | co_wspd = f.get_region_df('windspeed_100m', state, region_col='state') 123 | ``` 124 | 125 | Lastly, `rex` can be used to extract all variables needed to run SAM at a given 126 | location: 127 | 128 | ```python 129 | from rex import WindX 130 | 131 | wtk_file = '/nrel/wtk/conus/wtk_conus_2010.h5' 132 | nwtc = (39.913561, -105.222422) 133 | with WindX(wtk_file, hsds=True) as f: 134 | nwtc_sam_vars = f.get_SAM_df(nwtc) 135 | ``` 136 | 137 | If you would rather access the WIND Toolkit data directly using h5pyd: 138 | 139 | ```python 140 | # Extract the average 100m wind speed 141 | import h5pyd 142 | import pandas as pd 143 | 144 | # Open .h5 file 145 | with h5pyd.File('/nrel/wtk/conus/wtk_conus_2010.h5', mode='r') as f: 146 | # Extract meta data and convert from records array to DataFrame 147 | meta = pd.DataFrame(f['meta'][...]) 148 | # 100m windspeed dataset 149 | wspd = f['windspeed_100m'] 150 | # Extract scale factor 151 | scale_factor = wspd.attrs['scale_factor'] 152 | # Extract, average, and unscale windspeed 153 | mean_wspd_100m = wspd[...].mean(axis=0) / scale_factor 154 | 155 | # Add mean windspeed to meta data 156 | meta['Average 100m Wind Speed'] = mean_wspd_100m 157 | ``` 158 | 159 | ```python 160 | # Extract time-series data for a single site 161 | import h5pyd 162 | import pandas as pd 163 | 164 | # Open .h5 file 165 | with h5pyd.File('/nrel/wtk/conus/wtk_conus_2010.h5', mode='r') as f: 166 | # Extract time_index and convert to datetime 167 | # NOTE: time_index is saved as byte-strings and must be decoded 168 | time_index = pd.to_datetime(f['time_index'][...].astype(str)) 169 | # Initialize DataFrame to store time-series data 170 | time_series = pd.DataFrame(index=time_index) 171 | # Extract 100m wind speed, wind direction, temperature, and pressure 172 | for var in ['windspeed_100m', 'winddirection_100m', 173 | 'temperature_100m', 'pressure_100m']: 174 | # Get dataset 175 | ds = f[var] 176 | # Extract scale factor 177 | scale_factor = ds.attrs['scale_factor'] 178 | # Extract site 100 and add to DataFrame 179 | time_series[var] = ds[:, 100] / scale_factor 180 | ``` 181 | 182 | ## References 183 | 184 | For more information about the WIND Toolkit please see the [website.](https://www.nrel.gov/grid/wind-toolkit.html) 185 | Users of the WIND Toolkit should use the following citations: 186 | - [Draxl, C., B.M. Hodge, A. Clifton, and J. McCaa. 2015. Overview and Meteorological Validation of the Wind Integration National Dataset Toolkit (Technical Report, NREL/TP-5000-61740). Golden, CO: National Renewable Energy Laboratory.](https://www.nrel.gov/docs/fy15osti/61740.pdf) 187 | - [Draxl, C., B.M. Hodge, A. Clifton, and J. McCaa. 2015. "The Wind Integration National Dataset (WIND) Toolkit." Applied Energy 151: 355366.](https://www.sciencedirect.com/science/article/pii/S0306261915004237?via%3Dihub) 188 | - [Lieberman-Cribbin, W., C. Draxl, and A. Clifton. 2014. Guide to Using the WIND Toolkit Validation Code (Technical Report, NREL/TP-5000-62595). Golden, CO: National Renewable Energy Laboratory.](https://www.nrel.gov/docs/fy15osti/62595.pdf) 189 | - [King, J., A. Clifton, and B.M. Hodge. 2014. Validation of Power Output for the WIND Toolkit (Technical Report, NREL/TP-5D00-61714). Golden, CO: National Renewable Energy Laboratory.](https://www.nrel.gov/docs/fy14osti/61714.pdf) -------------------------------------------------------------------------------- /dGen.md: -------------------------------------------------------------------------------- 1 | # dGen Data: Distributed Generation Market Demand (dGen) Model 2 | 3 | The Distributed Generation Market Demand (dGen) model simulates customer adoption of distributed energy resources (DERs) for residential, commercial, and industrial entities in the United States or other countries through 2050. 4 | 5 | The dGen model can be used for: 6 | - Identifying the sectors, locations, and customers for whom adopting DERs would have a high economic value 7 | - Generating forecasts as an input to estimate distribution hosting capacity analysis, integrated resource planning, and load forecasting 8 | - Understanding the economic or policy conditions in which DER adoption becomes viable 9 | - Illustrating sensitivity to market and policy changes such as retail electricity rate structures, net energy metering, and technology costs. 10 | 11 | For access to technical papers and publications, more information, and to contact the dGen team please visit [dGen's NREL website](https://www.nrel.gov/analysis/dgen/) 12 | 13 | 14 | 15 | ## Directory format & structure 16 | 17 | There are zipped .sql database files and zipped .pkl agent files. These are described in more detail below. 18 | 19 | 20 | 21 | #### Template PostgreSQL Database: 22 | 23 | - diffusion_load_profiles: This schema contains tables relating to the load profiles used by agents generated by the NREL Buildings team. These load profiles in parquet format, along with their metadata, are included in the data submission. 24 | 25 | - diffusion_resource_solar: This schema contains a table, solar_resource_hourly, which contains the solar capacity factor for a given geographic-azimuth-tilt combination that matches to the same geographic-azimuth-tilt combination found in the pre-generated agents pickle file. 26 | 27 | - diffusion_shared: This schema contains tables used for inputs in the input sheet. Please browse these tables as the names of these tables are representative of what these data are. 28 | 29 | - diffusion_storage: This schema contains a single table related to PySAM storage inputs. 30 | 31 | - diffusion_solar: This schema contains tables with additional data pertaining to modeling solar 32 | constraints, incentives, and costs. 33 | 34 | - diffusion_template: This schema contains tables that are copied to make a new schema upon 35 | completing a dgen model run. Many of these are populated with data from the input sheet, from various joins/functions done within the database, and of course data from the model run. 36 | 37 | 38 | #### Pre-Generated Agents & Load Profiles: 39 | 40 | Every dGen analysis starts with a base agent file that uses statistically-sampled agents meant to be comprehensive and representative of the modeled population. They are comprehensive in the sense they are intended to represent the summation of underlying statistics, e.g. the total retail electricity consumed in the state. They are representative in that agents are sampled to represent heterogeneity of the population, e.g. variance in the cost of electricity. As described in ( Sigrin et al. 2018) “during agent creation, each county in United States is seeded with sets of residential, commercial, and industrial agents, each instantiated at population-weighted random locations within the county’s geographic boundaries. Agents are referenced against geographic data sets to establish a load profile, solar resource availability, a feasible utility rate structure, and other techno-economic attributes specific to the agent’s location. Each agent is assigned a weight that is proportional to the number of customers the agent represents in its county. In this context, agents can be understood as statistically representative population clusters and do not represent individual entities.” 41 | 42 | Variable definitions and data types can be found in the data dictionary. 43 | 44 | 45 | 46 | ## Restoring Databases 47 | 48 | Example scripts to restore unzipped database files are provided below. For full documentation of the dGen Model and setting up and using the dGen Model, please visit our [open source repository](https://github.com/NREL/dgen) 49 | 50 | 1. Create the docker container and postgreSQL server: 51 | 52 | ``` 53 | $ docker run --name postgis_1 -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -d mdillon/postgis 54 | ``` 55 | 56 | 2. Connect to the postgreSQL server on the docker container and create a new database: 57 | 58 | ``` 59 | $ docker container ls 60 | $ docker exec -it psql -U postgres 61 | $ postgres=# CREATE DATABASE dgen_db; 62 | ``` 63 | 64 | 3. After downloading and unzipping the data, run the following in the command line (replacing 'path_to_where_you_saved_database_file' below with the actual path where you saved your database file): 65 | 66 | ``` 67 | $ cat /path_to_where_you_saved_data/dgen_db.sql | docker exec -i psql -U postgres -d dgen_db 68 | ``` 69 | 70 | Note, make sure linux commands are enabled in order to properly restore the database. 71 | Also note that the full database can take around an hour to restore. All of the database files will take time to restore, so please be patient and plan accordingly. 72 | 73 | 74 | ## License 75 | 76 | 77 | The open source dGen model is licensed under the BSD 3-Clause License 78 | 79 | Copyright (c) 2020, Alliance for Sustainable Energy, LLC 80 | All rights reserved. 81 | 82 | Redistribution and use in source and binary forms, with or without 83 | modification, are permitted provided that the following conditions are met: 84 | 85 | * Redistributions of source code must retain the above copyright notice, this 86 | list of conditions and the following disclaimer. 87 | 88 | * Redistributions in binary form must reproduce the above copyright notice, 89 | this list of conditions and the following disclaimer in the documentation 90 | and/or other materials provided with the distribution. 91 | 92 | * Neither the name of the copyright holder nor the names of its 93 | contributors may be used to endorse or promote products derived from 94 | this software without specific prior written permission. 95 | 96 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 97 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 98 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 99 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 100 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 101 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 102 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 103 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 104 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 105 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE 106 | -------------------------------------------------------------------------------- /pvdaq.md: -------------------------------------------------------------------------------- 1 | # Photovoltaic Field Array Time-Series (PVDAQ) 2 | 3 | ## Description 4 | 5 | The Photovoltaic field array (PVDAQ) data is composed of time-series, raw performance data taken through a variety of sensors connected to a PV array. The data is typically taken at 15 minute averaged resolution, but can vary between systems. NREL source data is typically aggregated into the main database every 24 hours. Data is then processed to the NREL PVDAQ data lake on a monthly basis. 6 | 7 | Some datasets have been acquired through previous research agreements with site owners, and with their permission, have now been made public. Those datasets are static and do not show any additional data increments. 8 | 9 | Our researchers utilize the data to monitor the durability of PV systems under a wide variety of conditions. Similar data within NREL archives also provides insites into experimental emerging technology systems. Addtionally, the data has proven useful in assisting in the development of data quality assurance software, and data analysis and machine learning tools. 10 | 11 | All Data for PVDAQ and DOE Solar Data Prize is covered under the [DOI:10.25984/1846021](https://dx.doi.org/10.25984/1846021) 12 | 13 | ### 2023 Solar Data Prize 14 | 15 | The American-Made Solar Data Bounty Prize was open to U.S.-based PV system owners and entities authorized to share data from PV systems. These owners were invited to submit at least five years of historical time series data at a minimum of 15-minute time resolution for one or two of their systems. Datasets collected through this prize are meant to assist commercial and academic research and development efforts seeking to improve the accuracy of PV system modeling, and thus lower the risk associated with developing and operating those assets. 16 | 17 | The Data Prize entries were submitted in one of two categories: systems < 5 MW DC capacity, and those >5 MW DC capacity. The data from the submissions are available to the public for download as part of the PVDAQ Data repository. The following are the system IDs of the winners, in numerical order, not placement by award. 18 | 19 | #### < 5 MW DC system IDs: 20 | 21 | * **2105** - A 237 kW multi building roof top deployment with highly variable mount orientations in Hawaii 22 | * **2107** - A 893 kW Fixed ground-mount facility in a highly active agricultural area in California 23 | * **9068** - A 4.7 MW Single-axis tracked facility in Colorado 24 | 25 | #### > 5 MW DC system IDs: 26 | 27 | * **7333** - A 257 MW Single-axis tracker facility in California. This dataset is at a very high time resolution of 10s for all channels. 28 | * **9069** - A 38.7 MW Fixed ground-mount facility in Georgia 29 | 30 | #### Details on the Prize Datasets 31 | 32 | These datasets differ from the regular PVDAQ repository storage architecture (See below) where data is broken down by year, month, and day. In each of the prize repositories the available metadata, any support files, and the entire dataset as was submittied and curated is available. Some of these the datasets are broken down by sensor channel set type, and in others the data is labeled by sensor channel tag names or bundles. 33 | 34 | **Note:** *Some of the prize datasets are extremely large and can have 10s of GBs of data. These could take a long time to download so please plan accordingly* 35 | 36 | ## Data Dictionary 37 | 38 | The PVDAQ data is partitioned by system_id, year, month and day. Raw data is reported at 15 minute increments in ISO 8601 date and time. The timestamp is striped and data is averaged daily. An example file output is included here. 39 | 40 | ## Data Tables 41 | 42 | * pvdaq_inverters - metadata about the inverter hardware on the system 43 | * pvdaq_meters - metadata about the meter hardware on the system 44 | * pvdaq_metrics - metadata about the sensor values captured as part of the PV time-series 45 | * pvdaq_mount - mounting configuration of the array or subsets of the array 46 | * pvdaq_other_instruments - metadata about other ancillary equipment fielded on the system 47 | * pvdaq_site - geo location details of a PV array 48 | * pvdaq_system - basic details about a PV array 49 | * pvdaq_pvdata - PV time series data. 50 | 51 | ## Table Schemas 52 | 53 | ### pvdaq_inverters 54 | 55 | * inverter_id (string) - database primary key 56 | * name (string) - alias given to the inverter by the array owner or autogenerated 57 | * manufacturer (string) 58 | * model (string) 59 | * serial_num (string) 60 | * num_strings (string)- how many strings are tied to the inverter 61 | * modules_per_string (string)- how many modules are tied to each string 62 | * type (string)- indicates type of inverter such as micro, string, central, etc. 63 | * quantity (string)- number of inverters fielded at the array site 64 | * time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval 65 | * site_id (string) - associated site 66 | * system_id (bigint) - associated system 67 | * comments (string)- any additional details 68 | 69 | ### pvdaq_meters 70 | 71 | * meter_id (string)- primary key of the meter 72 | * name (string) - alias given to the meter by the array owner or autogenerated 73 | * manufacturer (string) 74 | * model (string) 75 | * serial_num (string) 76 | * time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval 77 | * type (string)- is the type of the meter production, site or revenue 78 | * site_id (string) - associated site 79 | * system_id (bigint) - associated system 80 | * comments (string)- any additional details 81 | 82 | ### pvdaq_metrics 83 | 84 | * system_id (int) - associated system for the metric 85 | * metric_id (int) - primary key of the metric 86 | * sensor_name (string) - referenced name produced by the instrumentation or tagged by array owner 87 | * common_name (string) - a general grouping of sensor types (e.g. DC voltage, AC energy, POA irradiance) 88 | * raw_units (string) - raw unscaled or uncalibrated units of the values produced by the sensor 89 | * units (string) - units of the values produced by the sensor. Could be modified raw_units by calc_scale and calc_offset. 90 | * calc_scale (double) - scaling for adjusting the sensor values (default 1) 91 | * calc_offset (double) - offset for adjusting the sensor values (default 0) 92 | * calc_details (string) - mathematical equation used to calculate the sensor value, if needed. 93 | * aggregation_type (string) - avg, min, max, sample, union, median, or calculated 94 | * source_type (string) - What is generating the sensor value (Inverters, meters or other instruments). Can be NULL 95 | * source_id (int) - The assicated primary key of the senor type generating the value. Can be NULL 96 | * comments (string) - any additional details 97 | * standard_name (string)- a unique autogenerated name based on either the primary key and sensor_name or a combination of common_name, sensor_type, and sensor_id 98 | 99 | ### pvdaq_modules 100 | 101 | * module_id (string)- the module primary key 102 | * name (string) - alias given to the module by the array owner or autogenerated 103 | * inverter_id (string)- the associate inverter primary key tied to this module, if known. 104 | * manufacturer (string) 105 | * model (string) 106 | * serial_num (string) 107 | * type (string)- what is the technology of the module: CdTe, Crystalline Si, multicrystalline Si, etc. 108 | * quantity (string) - number of modules installed on system 109 | * reference_module (string)- is this a reference module 110 | * start_on (string) - date module was installed 111 | * end_on (string) - date module was removed 112 | * site_id (string) - associated site 113 | * system_id (bigint) - associated system 114 | * comments (string)- any additional details 115 | 116 | 117 | ### pvdaq_mount 118 | 119 | * mount_id (bigint) - the primary key for the mount 120 | * name (string) - alias given to the mount by the array owner or autogenerated 121 | * manufacturer (string) 122 | * model (string) 123 | * azimuth (string)- pointing of the mount in compass direction decimal degrees. 0 degrees = north, 90 degrees = east 124 | * tilt (string) - angle of mount pointing in degrees 125 | * tracking (string)- is the mount tracking or fixed 126 | * type (string)- configuration of the mount: ground, roof, canopy, etc. 127 | * site_id (string) - associated site 128 | * system_id (bigint) - associated system 129 | * comments (string)- any additional details 130 | 131 | ### pvdaq_other_instruments 132 | 133 | * instrument_id (string) - the primary key of the instrument 134 | * name (string) - alias given to the other instrument by the array owner or autogenerated 135 | * manufacturer (string) 136 | * model (string) 137 | * serial_num (string) 138 | * time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval 139 | * type (string) - identifies what the instrument is: ref cell, weather station, thermocouple, etc. 140 | * site_id (string) - associated site 141 | * system_id (bigint) - associated system 142 | * comments (string)- any additional details 143 | 144 | ### pvdaq_site 145 | 146 | * site_id (string) - primary key of the site 147 | * system_id (bigint) - associated system 148 | * public_name (string) - unique given name to the site 149 | * location (string) - text descriptive name of site location. Could include street address type details 150 | * latitude (string) - decimal latitude geo location 151 | * longitude (string) - decimal longitude geo location 152 | * elevation (string) - distance in meters above sea level, if known 153 | * av_pressure (string) - average annual atmospheric pressure at site in psi 154 | * av_temp (string)- average ambient temperature in degrees Celsius at site 155 | * climate_type (string) - The Koppen-Geiger classifier for the site location 156 | 157 | ### pvdaq_system 158 | 159 | * system_id (bigint) - primary key of the system 160 | * site_id (bigint)- associated site representing geolocation details for system 161 | * public_name (string)- unique name given to the array 162 | * area (string)- covered area of the array in square meters 163 | * power (string)- maximum calculated or nameplate DC power of the array in kW 164 | * started_on (string)- date system became active 165 | * ended_on (string) - day system was deactivated 166 | * comments (string) - any additional details 167 | 168 | ### pvdaq_pvdata 169 | 170 | * system_id (string) (Partitioned) - associated system for the data 171 | * measured_on (timestamp) - local timestamp as generated by the instrumentation. Could include DST. 172 | * utc_measured_on (timestamp) - calculated UTC timestamp from the measured_on value. Could include DST. 173 | * metric_id (int) - associated metric_id for the data 174 | * value (double) - value of the data. Join to metric_id table record for units or other details. 175 | 176 | Note: not every site or system_id will contain data for each attribute included in the data dictionary. 177 | 178 | ## Data Format 179 | 180 | The PVDAQ Dataset is made available in Parquet format on AWS and is partitioned by `year`, `month`, `day` in AWS Glue and Athena. The schema may change across dataset years on S3. 181 | 182 | Partition Keys of `pvdaq_pvdata` table, 183 | 184 | * year (string) (Partitioned) 185 | * month (string) (Partitioned) 186 | * day (string) (Partitioned) 187 | 188 | ## S3 Paths 189 | 190 | * s3://oedi-data-lake/pvdaq/inverters/*.parquet 191 | * s3://oedi-data-lake/pvdaq/meters/*.parquet 192 | * s3://oedi-data-lake/pvdaq/metrics/*.parquet 193 | * s3://oedi-data-lake/pvdaq/mount/*.parquet 194 | * s3://oedi-data-lake/pvdaq/other_instruments/*.parquet 195 | * s3://oedi-data-lake/pvdaq/site/*.parquet 196 | * s3://oedi-data-lake/pvdaq/system/*.parquet 197 | * s3://oedi-data-lake/pvdata/system_id=/year=/month=/day=/*.parquet 198 | 199 | - `s3://oedi-data-lake/pvdaq/` 200 | 201 | 202 | ## Bulk Downloads from the OEDI site 203 | 204 | The PVDAQ Access repository contains a small python program that can bundle all the daily data from a site and download it onto your local system. If accessing the data for a Solar Data Prize site, some adjustment to the code would be necessary, since all the data sits within a single directory for each site. 205 | 206 | [PVDAQ Access](https://github.com/NREL/pvdaq_access) 207 | 208 | 209 | ## Data Sources 210 | 211 | [https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html](https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html) 212 | 213 | ## Model, Methods, and Analysis Tools 214 | 215 | #### Rd Tools
216 | RdTools is an open-source library to support reproducible technical analysis of time series data from photovoltaic energy systems, particularly degredation effects.
217 | [Rd Tools](https://www.nrel.gov/pv/rdtools.html) 218 | 219 | #### PV Lib 220 | A toolbox provides a set of well-documented functions for simulating the performance of photovoltaic energy systems.
221 | [pv_lib-toolbox](https://pvpmc.sandia.gov/applications/pv_lib-toolbox/) 222 | 223 | #### PVAnalytics 224 | PVAnalytics is a python library that supports analytics for PV systems. It provides functions for quality control, filtering, and feature labeling and other tools supporting the analysis of PV system-level data. 225 | [PV_Analytics[(https://github.com/pvlib/pvanalytics) 226 | 227 | 228 | ### Other Data Sources 229 | 230 | #### DuraMAT 231 | A multi-institution consortium focused on discovery, development, de-risking, and enabling the commercialization of new materials and designs for PV modules.
232 | [Main Site](https://www.duramat.org/)
233 | 234 | * [Validation models for PV performance](https://datahub.duramat.org/dataset/data-for-validating-models-for-pv-module-performance/) 235 | * Machine Learning training set for validation of [satellite imagery of PV Array sites](https://datahub.duramat.org/dataset/satellite-images-training-and-validation-set) 236 | * Machine Learning training set for [Detection of Inverter Clipping - Real Data](https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data) 237 | * Machine Learning training set for [Detection of Inverter Clipping - Simulated Data](https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-simulated-data) 238 | * Machine learning training set for [ Detection of soiling cleaning events](https://datahub.duramat.org/dataset/automated-pv-systems-cleaning-and-detection) 239 | * Example data of [Soiling signal in time-series data](https://datahub.duramat.org/dataset/pvdaq-time-series-with-soiling-signal) 240 | * Spectral Irradiance Data Sets [Albuqueque](https://datahub.duramat.org/project/spectral-irradiance-data-and-resources) 241 | 242 | ### Addtional Resources 243 | 244 | [https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html](https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html) 245 | 246 | [https://www.nrel.gov/docs/fy17osti/69131.pdf](https://www.nrel.gov/docs/fy17osti/69131.pdf) 247 | 248 | ### Other Data Sources 249 | 250 | #### DuraMAT 251 | A multi-institution consortium focused on discovery, development, de-risking, and enabling the commercialization of new materials and designs for PV modules.
252 | [Main Site](https://www.duramat.org/)
253 | 254 | * [Validation models for PV performance](https://datahub.duramat.org/dataset/data-for-validating-models-for-pv-module-performance/) 255 | * Machine Learning training set for validation of [satellite imagery of PV Array sites](https://datahub.duramat.org/dataset/satellite-images-training-and-validation-set) 256 | * Machine Learning training set for [Detection of Inverter Clipping - Real Data](https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data) 257 | * Machine Learning training set for [Detection of Inverter Clipping - Simulated Data](https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-simulated-data) 258 | * Machine learning training set for [ Detection of soiling cleaning events](https://datahub.duramat.org/dataset/automated-pv-systems-cleaning-and-detection) 259 | * Example data of [Soiling signal in time-series data](https://datahub.duramat.org/dataset/pvdaq-time-series-with-soiling-signal) 260 | * Spectral Irradiance Data Sets [Albuqueque](https://datahub.duramat.org/project/spectral-irradiance-data-and-resources) 261 | 262 | ### Addtional Resources 263 | 264 | [https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html](https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html) 265 | 266 | [https://www.nrel.gov/docs/fy17osti/69131.pdf](https://www.nrel.gov/docs/fy17osti/69131.pdf) 267 | 268 | 269 | ## Python Connection Examples 270 | 271 | Athena data connection using PyAthena: 272 | ```python 273 | 274 | import pandas as pd 275 | from pyathena import connect 276 | 277 | conn = connect( 278 | s3_staging_dir='s3:///<>', ##user defined staging directory 279 | region_name='us-west-2', 280 | work_group='' ##specify workgroup if exists 281 | ) 282 | ``` 283 | 284 | Example #1: Querying with a limit: 285 | ```python 286 | df = pd.read_sql("SELECT * FROM oedi.<> limit 8;", conn) 287 | ``` 288 | 289 | For jupyter notebook example see our notebook which includes partitions and data dictionary: 290 | [examples repository](https://github.com/openEDI/open-data-access-tools/tree/integration/examples) 291 | 292 | 293 | ## Disclaimer and Attribution 294 | 295 | Copyright (c) 2024, Alliance for Sustainable Energy LLC, All rights reserved. 296 | 297 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 298 | 299 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 300 | 301 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 302 | 303 | * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 304 | 305 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 306 | -------------------------------------------------------------------------------- /windAIBench.md: -------------------------------------------------------------------------------- 1 | ## Data Description 2 | The floris_data.h5 file is a robust dataset providing 252,500 samples of diverse wind plant layouts operating under a wide range of yawing and atmospheric conditions. This data submission should be considered a benchmark dataset for comparison with new ML approaches. Wind plant layouts were randomly sampled using a specialized Plant Layout Generator (PLayGen). Given a user-specified plant size and average turbine spacing, PLayGen generates a realistic layout that is reflective of one of the four canonical configurations: (i) cluster, (ii) single string, (iii) multiple string, (iv) parallel string. 500 unique wind plant layouts were sampled with the number of turbines randomly sampled from N_turb ∈[25,200] and the turbine spacing randomly sampled from Δ_turb ∈[3D,10D], where D denotes the turbine rotor diameter. For each layout 500 different sets of atmospheric conditions with wind speeds and directions were sampled uniformly from u ∈[0,25] m/s and θ∈[0°,360°], respectively, where u is wind speed and θ is wind direction. Turbulence intensity was sampled uniformly between low (6%), medium (8%), and high (10%). For each atmospheric inflow scenario, the individual turbine yaw angles were randomly sampled from a one-sided truncated Gaussian on the interval [0°,30°] oriented relative to wind inflow direction. Given these randomly sampled wind plant layouts, controls, and atmospheric conditions, the power generation outputs were computed using FLORIS, an open-source analytic engineering flow model created by NREL that predicts steady state hub height velocity in normal or yawed operating conditions. The IEA onshore reference turbine, which has a 130 m rotor diameter, a 110 m hub height, and a rated power capacity of 3.4 MW was used as our fixed turbine technology throughout all simulations. 3 | The data generation process resulted in 250,000 individual samples. We supplement this data by selecting a subset of cases (50 atmospheric conditions from 50 layouts each for a total of 2,500 cases) for which FLORIS was re-run with wake steering control optimization. In these additional 2,500 cases the turbine yaw angles are optimized to maximize total plant power production. The data generation process was completed between X-Y on the Eagle HPC system located at NREL’s Golden, CO Campus. The data was reformatted and preprocessed for OEDI submission in May 2023. The data was generated as part of a broader effort to support the IEA ‘Net Zero by 2050’ roadmap which calls for an 11x increase in wind energy. Meeting these deployment goals raises many engineering challenges. Thus, the Department of Energy is looking to data-driven modeling to help address them. The role of AI/ML in supporting efforts to address these problems requires novel methods to handle modeling complexities. However, AI/ML research in wind energy has been performed in a nonsystematic, ad hoc manner, making it difficult to identify promising approaches and target investments appropriately. Future R&D planning requires a reliable framework for comparing different methods. In AI/ML research, benchmark data sets with well-defined problem statements have enabled consistent comparisons for emerging approaches and systematic ablation studies. Thus, this submission is encompassed in a larger goal to define benchmark data sets, problems, and metrics for wind energy research to facilitate a systematic approach to developing and comparing emerging AI/ML technologies that can inform future research investments. 4 | 5 | ## HDF5 – Hierarchical Data Format 6 | All data is contained within a singular HDF5 file, floris_data.h5. HDF5 is a Hierarchical Data Format which functions similarly to a file management system. The .h5 file itself is an object that acts as a container, or group, that can hold a variety of heterogeneous data objects or datasets. The two primary object types within a .h5 file are groups and datasets. Groups function similarly to a directory or folder that can contain objects, known as members, such as datasets or other groups. The root group of a .h5 file is denoted by a forward slash /. Objects within a .h5 file are often described with absolute path names, starting from the root group. For example: /Layout000/Scenarios/Scenario000 is the absolute path for the zeroth layout and corresponding zeroth scenario. 7 | For additional information on the HDF5 file format and API please see the documentation below: 8 | https://docs.hdfgroup.org/hdf5/develop/_getting_started.html 9 | 10 | ## Data Structure 11 | The floris_data.h5 file is structured as follows: 12 | |-- root (group, 500 members) 13 | |-- LayoutXXX (group, 3 members) 14 | |-- Number of Turbines (dataset) 15 | |-- Scenarios (group, 500 members) 16 | |-- ScenarioXXX (group, 6 members) 17 | |-- Optimal Yaw (group, 3 members)** 18 | |-- Turbine Power (dataset) 19 | |-- Turbine Wind Speed (dataset) 20 | |-- Yaw Angles (dataset) 21 | |-- Turbine Power (dataset) 22 | |-- Turbine Wind Speed (dataset) 23 | |-- Turbulence Intensity (dataset) 24 | |-- Wind Direction (dataset) 25 | |-- Wind Speed (dataset) 26 | |-- Yaw Angles (dataset) 27 | |-- Turbines (group, variable number of members) 28 | |-- TurbineXXX (group, 4 members) 29 | |-- Hub Height (dataset) 30 | |-- Rotor Diameter (dataset) 31 | |-- X Location (dataset) 32 | |-- Location (dataset) 33 | 34 | ** The Optimal Yaw Group is only present in 2500 ScenarioXXX groups. These 2500 groups represent the layouts and scenarios where FLORIS was re-run using wake steering and the yaw angles were optimized for power production. The standard datasets within these ScenarioXXX groups represent the FLORIS simulation with no yaw optimization. For convenience and ease of parsing, an opt_yaw_list.csv file has been provided for users to easily identify the LayoutXXX/ScenarioXXX groups which contain the Optimal Yaw Group and corresponding datasets. 35 | 36 | In the structure seen above “XXX” is used to indicate there are multiple groups. For the LayoutXXX and ScenarioXXX groups these values range from 000 to 499 indicating 500 groups for each. For the TurbineXXX group these values are variable, indicating a different number of turbines for each LayoutXXX, ranging between 25 to 200 inclusive. 37 | 38 | ## Data Dictionary 39 | Name Type Shape Data Type Units 40 | Number of Turbines Scalar () i4, 4-byte integer N/A 41 | Turbine Power Vector (# turbines,) f4, 4-byte float W 42 | Turbine Wind Speed Vector (# turbines,) f4, 4-byte float m/s 43 | Turbulence Intensity Scalar () f4, 4-byte float N/A 44 | Wind Direction Scalar () f4, 4-byte float degrees 45 | Wind Speed Scalar () f4, 4-byte float m/s 46 | Yaw Angles Vector (# turbines,) f4, 4-byte float degrees 47 | Hub Height Scalar () f4, 4-byte float m 48 | Rotor Diameter Scalar () f4, 4-byte float m 49 | X Location Scalar () f4, 4-byte float m 50 | Y Location Scalar () f4, 4-byte float m 51 | 52 | ## Submission Keywords 53 | energy, power, wind, AI, ML, AI/ML, wind plant, benchmark 54 | --------------------------------------------------------------------------------