├── LICENSE
├── README.md
├── README_template.md
├── archive
├── cit_043_bus_rapid_transit
│ └── README.md
├── ene_004_renewable_energy_share_of_total_energy_consumption
│ ├── README.md
│ └── ene_004_renewable_energy_share_of_total_energy_consumption_processing.py
├── ene_021a_renewable_energy_consumption
│ ├── README.md
│ └── ene_021a_renewable_energy_consumption_processing.py
├── ene_029a_energy_intensity
│ ├── README.md
│ └── ene_029a_energy_intensity_processing.py
├── ene_033_energy_consumption
│ ├── README.md
│ └── ene_033_energy_consumption_processing.py
├── ene_034_electricity_consumption
│ ├── README.md
│ └── ene_034_electricity_consumption_processing.py
└── ene_035_rw0_electricity_installed_capacity
│ ├── README.md
│ └── ene_035_rw0_electricity_installed_capacity_processing.py
├── bio_004a_coral_reef_locations
├── README.md
└── bio_004a_coral_reef_locations_processing.py
├── bio_021a_terrestrial_ecoregions
├── README.md
└── bio_021a_terrestrial_ecoregions_processing.py
├── bio_041_rw1_ocean_health_index
├── README.md
└── bio_041_rw1_ocean_health_index_processing.py
├── cit_022_rw1_road_traffic_death_rates
├── README.md
└── cit_022_rw1_road_traffic_death_rates_processing.py
├── cit_029_rw1_municipal_waste
├── README.md
└── cit_029_rw1_municipal_waste_processing.py
├── cit_031_rw1_air_quality_PM25_concentration
├── README.md
└── cit_031_rw1_air_quality_PM25_concentration_processing.py
├── cit_033a_urban_builtup_area
├── README.md
└── cit_033a_urban_built_up_area_processing.py
├── cit_043_rw0_bus_rapid_transit
├── README.md
└── cit_043_rw0_bus_rapid_transit.py
├── cit_045_infrastructure_investment_outlook
└── README.md
├── cli_008_greenhouse_gas_emissions_country_sector
├── README.md
└── cli_008_greenhouse_gas_emissions_country_sector_processing.py
├── cli_023_standard_precipitation_index
└── README.md
├── cli_029_rw2_vulnerability_to_climate_change
├── README.md
└── cli_029_rw2_vulnerability_to_climate_change_processing.py
├── cli_030_rw1_aridity
├── README.md
└── cli_030_rw1_aridity_processing.py
├── cli_049_rw1_dash_pik_historical_emissions
├── README.md
└── cli_049_rw1_dash_pik_historical_emissions_processing.py
├── cli_050-059_066-075_nexgddp_and_loca
├── README.md
└── cli_050-059_066-075_nexgddp_and_loca_processing.py
├── cli_064_social_cost_carbon
├── README.md
└── cli_064_social_cost_carbon_processing.py
├── cli_079_rw0_universal_thermal_climate_index
├── README.md
└── cli_079_rw0_universal_thermal_climate_index_processing.py
├── com_002_airports
├── README.md
└── com_002_airports_processing.py
├── com_007_rw1_fdi_regulatory_restrictiveness_index
├── README.md
└── com_007_rw1_fdi_regulatory_restrictiveness_index_processing.py
├── com_011_rw1_maritime_boundaries
├── README.md
└── com_011_rw1_maritime_boundaries_processing.py
├── com_015_rw1_recycling_rates
├── README.md
└── com_015_rw1_recycling_rates_processing.py
├── com_017_rw2_major_ports
├── README.md
└── com_017_rw2_major_ports_processing.py
├── com_028_rw1_effect_of_ag_prices_on_commodity_prices
├── README.md
└── com_028_rw1_effect_of_ag_prices_on_commodity_prices_processing.py
├── com_030a_rw1_fishing_activity
└── README.md
├── com_030c_rw1_trawling_activity
└── README.md
├── com_039_rw0_agricultural_trade_statistics
├── README.md
└── com_039_rw0_agricultural_trade_statistics_processing.py
├── dis_016_rw1_active_fault_lines
├── README.md
└── dis_016_rw1_active_fault_lines_processing.py
├── dis_017_storm_events_us
├── README.md
└── dis_017_storm_events_us_processing.py
├── ene_001a_reservoirs_and_dams
└── README.md
├── ene_009_renewable_generation_annually
├── Makefile
├── README.md
├── ene_009_renewable_generation_annually_processing.py
└── requirements.txt
├── ene_010_renewable_capacity_annually
├── Makefile
├── README.md
├── ene_010_renewable_capacity_annually_processing.py
└── requirements.txt
├── ene_017_rw1_energy_facility_emissions
├── README.md
└── ene_017_rw1_energy_facility_emissions_processing.py
├── ene_031a_solar_irradiance
└── README.md
├── foo_005_rw1_crop_area_production
├── README.md
└── foo_005_rw1_crop_area_production_processing.py
├── foo_005_rw2_crop_area_production
├── README.md
└── foo_005_rw2_crop_area_production_processing.py
├── foo_015_rw2_global_hunger_index
├── README.md
└── foo_015_rw2_global_hunger_index_processing.py
├── foo_041_rw1_non_co2_agricultural_emissions
├── README.md
└── foo_041_rw1_non_co2_agricultural_emissions_processing.py
├── foo_054_rw1_soil_carbon_stocks
└── README.md
├── foo_060_rw0_food_system_emissions
├── README.md
└── foo_060_rw0_food_system_emissions_processing.py
├── foo_061_rw0_blue_food_supply
├── README.md
└── foo_061_rw0_blue_food_supply_processing.py
├── foo_062_rw0_fishery_production
├── README.md
└── foo_062_rw0_fishery_production_processing.py
├── foo_066_rw0_food_product_shares
├── README.md
└── foo_066_rw0_food_product_shares.py
├── foo_067_rw0_crop_suitability_class
├── README.md
├── create_stats_table_for_crt_widget.py
├── foo_067_rw0_crop_suitability_class_processing.py
└── rice_ensemble_processing.py
├── foo_068_rw0_agro_ecological_zones
├── README.md
└── foo_068_rw0_agro_ecological_zones_processing.py
├── foo_069_rw0_relative_change_crop_yield
├── README.md
└── foo_069_rw0_relative_change_crop_yield_processing.py
├── for_001_rw2_tree_cover
└── README.md
├── for_005a_mangrove
├── README.md
└── for_005a_mangrove_processing.py
├── for_005b_rw0_mangrove_extent_change
├── README.md
└── for_005b_rw0_mangrove_extent_change_processing.py
├── for_007_rw2_tree_cover_gain
└── README.md
├── for_008_rw2_tree_cover_loss
└── README.md
├── for_014_rw1_internationally_important_wetlands
├── README.md
└── for_014_rw1_internationally_important_wetlands_processing.py
├── for_018_rw1_bonn_challenge_restoration_commitment
├── README.md
└── for_018_rw1_bonn_challenge_restoration_commitment_preprocessing.py
├── for_021_rw1_certified_forest
├── README.md
└── for_021_rw1_certified_forest_processing.py
├── for_029_peatlands
└── README.md
├── for_031_rw0_forest_landscape_integrity_index
├── README.md
└── for_031_rw0_forest_landscape_integrity_index_processing.py
├── ocn_001_gebco_bathymetry
├── README.md
└── ocn_001_gebco_bathymetry_processing.py
├── ocn_003_projected_sea_level_rise
├── README.md
└── ocn_003_projected_sea_level_rise.py
├── ocn_005_historical_cyclone_intensity
├── README.md
└── ocn_005_historical_cyclone_intensity_processing.py
├── ocn_006_projected_ocean_acidification
├── README.md
└── ocn_006_projected_ocean_acidification.py
├── ocn_006alt_projected_ocean_acidification_coral_reefs
├── README.md
└── ocn_006alt_projected_ocean_acidification_coral_reefs.py
├── ocn_007_coral_bleaching_monitoring
├── README.md
└── ocn_007_coral_bleaching_monitoring_processing.py
├── ocn_008_historical_coral_bleaching_stress_frequency
├── README.md
└── ocn_008_historical_coral_bleaching_stress_frequency_processing.py
├── ocn_009_sea_surface_temperature_variability
├── README.md
└── ocn_009_sea_surface_temperature_variability_processing.py
├── ocn_010_projected_coral_bleaching
├── README.md
└── ocn_010_projected_coral_bleaching_processing.py
├── ocn_012_coral_reef_tourism_value
├── README.md
└── ocn_012_coral_reef_tourism_value.py
├── ocn_013_coral_reef_fisheries_relative_catch
├── README.md
└── ocn_013_coral_reef_fisheries_relative_catch.py
├── ocn_014_index_of_coastal_protection_by_coral_reefs
├── README.md
└── ocn_014_index_of_coastal_protection_by_coral_reefs.py
├── ocn_016_rw0_ocean_plastics
├── README.md
└── ocn_016_rw0_ocean_plastics_processing.py
├── ocn_017_coral_reef_connectivity
└── README.md
├── pull_request_template.md
├── req_017_thailand_flooding
└── README.md
├── soc_002_rw1_gender_development_index
├── README.md
└── soc_002_rw1_gender_development_index_processing.py
├── soc_004_rw1_human_development_index
├── README.md
└── soc_004_rw1_human_development_index_processing.py
├── soc_005_rw1_political_rights_civil_liberties_index
├── README.md
└── soc_005_rw1_political_rights_civil_liberties_index_processing.py
├── soc_006_rw1_multidimensional_poverty_index
├── README.md
└── soc_006_rw1_multidimensional_poverty_index_processing.py
├── soc_021_rw1_environmental_performance_index
├── README.md
└── soc_021_rw1_environmental_index_processing.py
├── soc_023_rw1_fragile_states_index
├── README.md
└── soc_023_rw1_fragile_states_index_processing.py
├── soc_025a_gender_inequality_index
├── README.md
└── soc_025a_gender_inequality_index_processing.py
├── soc_026_rw0_global_gender_gap
├── README.md
└── soc_026_rw0_global_gender_gap.py
├── soc_037_rw1_malaria_extent
├── README.md
└── soc_037_rw1_malaria_extent_processing.py
├── soc_039_rw1_out_of_school_rate
├── README.md
└── soc_039_rw1_out_of_school_rate_processing.py
├── soc_043_rw0_refugees_internally_displaced_persons
├── README.MD
└── soc_043_rw0_refugees_internally_displaced_persons_processing.py
├── soc_045_rw1_women_political_representation
├── README.md
└── soc_045_rw1_women_political_representation_processing.py
├── soc_048_rw0_organized_violence_events
├── README.md
└── soc_048_rw0_organized_violence_events_processing.py
├── soc_049_rw0_water_conflict_map
├── README.md
└── soc_049_rw0_water_conflict_map_processing.py
├── soc_067_rw1_climate_risk_index
├── README.md
└── soc_067_rw1_climate_risk_index_processing.py
├── soc_068b_rw2_global_land_cover
├── README.md
└── soc_068b_rw2_global_land_cover_processing.py
├── soc_075_male_female_population_densities
└── README.md
├── soc_085_rw1_elevation
├── README.md
└── soc_085_rw1_elevation_processing.py
├── soc_086_subnational_hdi
├── README.md
└── soc_086_subnational_hdi_processing.py
├── soc_091_global_peace_index
├── README.md
└── soc_091_global_peace_index_processing.py
├── soc_092_positive_peace_index
├── README.md
└── soc_092_positive_peace_index_processing.py
├── soc_093_global_terrorism_index
├── README.md
└── soc_093_global_terrorism_index_processing.py
├── soc_104_rw0_global_land_cover
└── README.md
├── soc_107_rw0_population
└── README.md
├── soc_108_rw0_anthropogenic_biomes
└── README.md
├── utils
├── util_carto.py
├── util_cloud.py
└── util_files.py
├── wat_008_rw3_annual_surface_water_coverage
└── README.md
├── wat_026_rw1_wastewater_treatment_plants
├── README.md
└── wat_026_rw1_wastewater_treatment_plants_processing.py
├── wat_036_rw1_water_stress_country_ranking
├── README.md
└── wat_036_rw1_water_stress_country_ranking_processing.py
├── wat_039_rw0_wetlands
├── README.md
└── wat_039_rw0_wetlands_processing.py
├── wat_064_cost_of_sustainable_water_management
├── README.md
└── wat_064_cost_of_sustainable_water_management_processing.py
├── wat_065_rw0_hydropoli_tension_and_institu_vulnerability
├── README.md
└── wat_065_rw0_hydropoli_tension_and_institu_vulnerability_processing.py
├── wat_066_rw0_conflict_forecast
└── README.md
├── wat_067_rw0_aqueduct_riverine_flood_hazard
├── README.md
└── wat_067_rw0_aqueduct_riverine_flood_hazard_processing.py
├── wat_068_rw0_watersheds
├── README.md
└── wat_068_rw0_watersheds_processing.py
├── wat_069_rw0_saltmarshes
├── README.md
└── wat_069_rw0_saltmarshes_processing.py
└── wat_070_rw0_soil_erosion
├── README.md
└── wat_070_rw0_soil_erosion_processing.py
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 Resource Watch
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Resource Watch Dataset Pre-processing Github
2 | #### Purpose
3 | This Github repository was created to document the pre-processing done to any dataset displayed on [Resource Watch](https://resourcewatch.org/).
4 |
5 | #### File Structure
6 | The processing done to each dataset should be stored in a single file, named with the WRI ID and public title used on Resource Watch. This folder should **always** include a README.md file that describes the processing that was done. A template for this file can be found in this repository with the name README_template.md. If a script (preferably Python) was used to process the dataset, that code should also be included as a separate file. The general structure can be summarized as follows:
7 |
8 | ```
9 | Repository
10 | |
11 | |- Dataset 1 folder = {wri_id}_{public_title}
12 | | |-{wri_id}_{public_title}_processing.py # optional, script used to process the dataset
13 | | |-README.md # file describing the processing
14 | | +-...
15 | |
16 | |-Dataset 2 folder
17 | | +-...
18 | |
19 | +-...
20 | ```
21 |
22 | #### Contents of README.md
23 | If the pre-processing was done in Excel, and functions used should be clearly described. If it was done in Carto, any SQL statements should be included as code snippets. For datasets that were processed in Google Earth Engine (GEE), a link to the GEE script should be included, AND the code should be included in the README.md file as a code snippet for users who do not have access to Google Earth Engine.
24 |
25 | If the pre-processing was done using a script that has been uploaded to Github, the readme should still be included and describe the general steps that were taken - which datasets were used, how were they modified, etc.
26 |
27 | #### Contents of script, if included
28 | If a script was used to process the dataset, the code should be uploaded to this Github. This code should be thoroughly commented so that readers unfamiliar with the coding language can still follow the process.
29 |
30 | All codes should be written using open-source tools and programming languages. Tools and modules that require a subscription should be avoided (e.g., ArcGIS).
31 |
32 | An example of a basic raster pre-processing script can be found [here](https://github.com/resource-watch/data-pre-processing/blob/wat_070/wat_070_rw0_soil_erosion/wat_070_rw0_soil_erosion_processing.py). More information on the fields that be included in the manifest to upload a raster to Google Earth Engine can be found [here](https://developers.google.com/earth-engine/guides/image_manifest).
33 |
34 | An example of a basic vector pre-processing script can be found [here](https://github.com/resource-watch/data-pre-processing/tree/wat_070/com_017_rw2_major_ports).
35 |
36 |
--------------------------------------------------------------------------------
/README_template.md:
--------------------------------------------------------------------------------
1 | ## {Resource Watch Public Title} Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the {dataset formal name}]({learn more link}) for [display on Resource Watch]({link to dataset's metadata page on Resource Watch}).
3 |
4 | {Describe how the original data came from the source.}
5 |
6 | Below, we describe the steps used to {describe how you changed the data, e.g., "combine shapefiles for each continent into one global table on Carto"}.
7 |
8 | 1. {Describe what you did in numbered steps if it is useful.}
9 | ```
10 | Include any SQL or GEE code you used in a code snippet.
11 | ```
12 |
13 | Please see the [Python script]({link to Python script on Github}) for more details on this processing.
14 |
15 | You can view the processed {Resource Watch public title} dataset [on Resource Watch]({link to dataset's metadata page on Resource Watch}).
16 |
17 | You can also download the original dataset [directly through Resource Watch]({s3 link if available}), or [from the source website]({download from source link}).
18 |
19 | ###### Note: This dataset processing was done by [{name}]({link to WRI bio page}), and QC'd by [{name}]({link to WRI bio page}).
20 |
--------------------------------------------------------------------------------
/archive/cit_043_bus_rapid_transit/README.md:
--------------------------------------------------------------------------------
1 | ## Global Bus Rapid Transit Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Bus Rapid Transit](https://brtdata.org/indicators/systems/year_system_commenced) for [display on Resource Watch](https://resourcewatch.org/data/explore/Cities-with-Bus-Rapid-Transit).
3 |
4 | The Global Bus Rapid Transit (BRT) dataset includes the name of the BRT system, the year it commenced, the location of the BRT (city and region), and the source. Each of these values are provided for BRT systems launched between 1986 and 2019.
5 |
6 | While this dataset be viewed on the source website, it is not directly downloadable there. In order to display the BRT data on Resource Watch, the dataset was copied from the source website and joined with the city centroid coordinates from [Natural Earth's Populated Places dataset](https://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-populated-places/) for mapping purposes.
7 |
8 | Below, we describe the actions taken to upload the dataset and join it to city coordinates:
9 | 1. Copy and paste the data table from the source website into Excel. Upload this Excel spreadsheet to Carto as a table named "cit_043_bus_rapid_transit."
10 | 2. Join the Global Bus Rapid Transit data (“cit_043_bus_rapid_transit”) with the Populated Places dataset ("city_centroid"), which had previously been uploaded to the Resource Watch Carto account. These tables should be joined on the “city” column in each dataset, using the following SQL statement:
11 | ```
12 | SELECT city_centroid.city, cit_043_cities_with_bus_rapid_transit.city, city_centroid.the_geom,
13 | cit_043_cities_with_bus_rapid_transit.source, cit_043_cities_with_bus_rapid_transit.value,
14 | cit_043_cities_with_bus_rapid_transit.country
15 |
16 | FROM "wri-rw".city_centroid
17 |
18 | INNER JOIN cit_043_cities_with_bus_rapid_transit ON city_centroid.city = cit_043_cities_with_bus_rapid_transit.city
19 | ```
20 | You can view the processed Global Bus Rapid Transit (BRT) dataset [on Resource Watch](https://resourcewatch.org/data/explore/Cities-with-Bus-Rapid-Transit).
21 |
22 | You can also access the original dataset [on the source website](https://brtdata.org/indicators/systems/year_system_commenced).
23 |
24 | ###### Note: This dataset processing was done by [Ken Wakabayashi](https://www.wri.org/profile/ken-wakabayashi), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
25 |
--------------------------------------------------------------------------------
/archive/ene_004_renewable_energy_share_of_total_energy_consumption/README.md:
--------------------------------------------------------------------------------
1 | ## Renewable Energy Consumption Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Sustainable Energy for All (SE4ALL) dataset](https://datacatalog.worldbank.org/dataset/sustainable-energy-all) for [display on Resource Watch](https://resourcewatch.org/data/explore/bced4001-425a-4fad-8c22-8214d9340ea4).
3 |
4 | This dataset was provided by the source as a csv. The subset of renewable energy consumption data was selected, and the table was converted from wide to long form, using Python.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ene_004_renewable_energy_share_of_total_energy_consumption/ene_004_renewable_energy_share_of_total_energy_consumption_processing.py) for more details on this processing.
7 |
8 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/bced4001-425a-4fad-8c22-8214d9340ea4).
9 |
10 | You can also download original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/ene_004_renewable_energy_share_of_total_energy_consumption.zip), or [from the source website](https://datacatalog.worldbank.org/dataset/sustainable-energy-all).
11 |
12 | ###### Note: This dataset processing was done by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
--------------------------------------------------------------------------------
/archive/ene_021a_renewable_energy_consumption/README.md:
--------------------------------------------------------------------------------
1 | ## Renewable Energy Consumption Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Sustainable Energy for All (SE4ALL) dataset](https://datacatalog.worldbank.org/dataset/sustainable-energy-all) for [display on Resource Watch](https://resourcewatch.org/data/explore/ene021a-Renewable-Energy-Consumption).
3 |
4 | This dataset was provided by the source as a csv. The subset of renewable energy consumption data was selected, and the table was converted from wide to long form, using Python.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ene_021a_renewable_energy_consumption/ene_021a_renewable_energy_consumption_processing.py) for more details on this processing.
7 |
8 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/ene021a-Renewable-Energy-Consumption).
9 |
10 | You can also download original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/ene_021a_renewable_energy_consumption.zip), or [from the source website](https://datacatalog.worldbank.org/dataset/sustainable-energy-all).
11 |
12 | ###### Note: This dataset processing was done by [Tina Huang](https://www.wri.org/profile/tina-huang), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
--------------------------------------------------------------------------------
/archive/ene_029a_energy_intensity/README.md:
--------------------------------------------------------------------------------
1 | ## Energy Intensity Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Sustainable Energy for All (SE4ALL) dataset](https://datacatalog.worldbank.org/dataset/sustainable-energy-all) for [display on Resource Watch](https://resourcewatch.org/data/explore/ene029a-Energy-Intensity).
3 |
4 | This dataset was provided by the source as a csv. The subset of energy intensity data was selected, and the table was converted from wide to long form, using Python.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ene_029_energy_intensity/ene_029a_energy_intensity_processing.py) for more details on this processing.
7 |
8 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/ene029a-Energy-Intensity).
9 |
10 | You can also download original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/ene_029a_energy_intensity.zip), or [from the source website](https://datacatalog.worldbank.org/dataset/sustainable-energy-all).
11 |
12 | ###### Note: This dataset processing was done by [Tina Huang](https://www.wri.org/profile/tina-huang), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
--------------------------------------------------------------------------------
/archive/ene_033_energy_consumption/README.md:
--------------------------------------------------------------------------------
1 | ## Energy Consumption Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [U.S. Energy Information Administration (EIA) International Energy Statistics Total Energy Consumption dataset](https://www.eia.gov/beta/international/data/browser/#/?pa=000000001&c=ruvvvvvfvtvnvv1urvvvvfvvvvvvfvvvou20evvvvvvvvvnvvuvs&ct=0&ug=4&vs=INTL.44-2-AFG-QBTU.A&cy=2017&vo=0&v=H&start=1980&end=2017) for [display on Resource Watch](https://resourcewatch.org/data/explore/67cf410f-4cdf-4437-aa09-187e5fa590ae).
3 |
4 | This dataset was provided by the source as a csv, which you can download using the link above. The csv table was converted from wide to long form, using Python.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ene_033_energy_consumption/ene_033_energy_consumption_processing.py) for more details on this processing.
7 |
8 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/67cf410f-4cdf-4437-aa09-187e5fa590ae).
9 |
10 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/ene_033_energy_consumption.zip), or [from the source website](https://www.eia.gov/beta/international/data/browser/#/?pa=000000001&c=ruvvvvvfvtvnvv1urvvvvfvvvvvvfvvvou20evvvvvvvvvnvvuvs&ct=0&ug=4&vs=INTL.44-2-AFG-QBTU.A&cy=2017&vo=0&v=H&start=1980&end=2017).
11 |
12 | This script has been archived since we are managing all the EIA datasets on Resource Watch together using [upload_eia_data](https://github.com/resource-watch/nrt-scripts/tree/master/upload_eia_data).
13 |
14 | ###### Note: This dataset processing was done by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
--------------------------------------------------------------------------------
/archive/ene_034_electricity_consumption/README.md:
--------------------------------------------------------------------------------
1 | ## Electricity Consumption Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [U.S. Energy Information Administration (EIA) International Energy Statistics Electricity Net Consumption dataset](https://www.eia.gov/beta/international/data/browser/#/?pa=0000002&c=ruvvvvvfvtvnvv1vrvvvvfvvvvvvfvvvou20evvvvvvvvvvvvuvs&ct=0&tl_id=2-A&vs=INTL.2-2-AFG-BKWH.A&vo=0&v=H&start=1980&end=2016) for [display on Resource Watch](https://resourcewatch.org/data/explore/eef10736-8d8b-4ac9-a715-ef0653a83196).
3 |
4 | This dataset was provided by the source as a csv, which you can download using the link above. The csv table was converted from wide to long form, using Python. A new column 'electricity_consumption_ktoe' has also been created in Python to show the electricity consumption in kilotonnes of oil equivalent (ktoe).
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ene_034_electricity_consumption/ene_034_electricity_consumption_processing.py) for more details on this processing.
7 |
8 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/eef10736-8d8b-4ac9-a715-ef0653a83196).
9 |
10 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/ene_034_electricity_consumption.zip), or [from the source website](https://www.eia.gov/international/data/world/electricity/electricity-consumption?pd=2&p=0000002&u=0&f=A&v=mapbubble&a=-&i=none&vo=value&&t=C&g=00000000000000000000000000000000000000000000000001&l=249-ruvvvvvfvtvnvv1vrvvvvfvvvvvvfvvvou20evvvvvvvvvvvvvvs&s=315532800000&e=1514764800000
11 | ).
12 |
13 | This script has been archived since we are managing all the EIA datasets on Resource Watch together using [upload_eia_data](https://github.com/resource-watch/nrt-scripts/tree/master/upload_eia_data).
14 |
15 | ###### Note: This dataset processing was done by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
16 |
--------------------------------------------------------------------------------
/archive/ene_035_rw0_electricity_installed_capacity/README.md:
--------------------------------------------------------------------------------
1 | ## Electricity Installed Capacity Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Electricity Installed Capacity](https://www.eia.gov/international/data/world/electricity/electricity-capacity) for [display on Resource Watch](http://resourcewatch.org/data/explore/683aa637-aa4f-46ab-8260-4441de896131).
3 |
4 | The data was provided by the source through its API in a json format.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto. We read in the data as a pandas dataframe, deleted rows without data, and removed data of regions that consist of multiple geographies. A new column 'datetime' was created to store the time period of the data as the first date of the year.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ene_035_rw0_electricity_installed_capacity/ene_035_rw0_electricity_installed_capacity_processing.py) for more details on this processing.
9 |
10 | You can view the processed Electricity Installed Capacity dataset [on Resource Watch](http://resourcewatch.org/data/explore/683aa637-aa4f-46ab-8260-4441de896131).
11 |
12 | You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/ene_035_rw0_electricity_installed_capacity.zip), or [from the source website](https://www.eia.gov/international/data/world/electricity/electricity-capacity).
13 |
14 | This script has been archived since we are managing all the EIA datasets on Resource Watch together using [upload_eia_data](https://github.com/resource-watch/nrt-scripts/tree/master/upload_eia_data).
15 |
16 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
17 |
--------------------------------------------------------------------------------
/bio_004a_coral_reef_locations/README.md:
--------------------------------------------------------------------------------
1 | ## Coral Reef Locations Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Distribution of Coral Reefs (2018)](http://data.unep-wcmc.org/datasets/1) for [display on Resource Watch](https://resourcewatch.org/data/explore/1d23838e-40da-4cf3-b61c-56258d3a5c56).
3 |
4 | The source provided this dataset as two shapefiles - one of which contains polygon data, and the other contains point data.
5 |
6 | Below, we describe the steps used to reformat the shapefile:
7 | 1. Read in the polygon shapefile as a geopandas data frame.
8 | 2. Change the data type of column 'PROTECT', 'PROTECT_FE', and 'METADATA_I' to integers.
9 | 3. Convert the geometries of the data from shapely objects to geojsons.
10 | 4. Create a new column from the index of the dataframe to use as a unique id column (cartodb_id) in Carto.
11 |
12 | Next, a mask layer was created so that it could be overlayed on top of other datasets to highlight where coral reefs were located. In order to create this, a 10km buffer was generated around each coral reef polygon. This was created and exported as a shapefile in Google Earth Engine, using the following code:
13 |
14 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/bio_004a_coral_reef_locations/bio_004a_coral_reef_locations_processing.py) for more details on this processing.
15 |
16 | You can view the processed Coral Reef Locations dataset [on Resource Watch](https://resourcewatch.org/data/explore/1d23838e-40da-4cf3-b61c-56258d3a5c56).
17 |
18 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/bio_004a_coral_reef_locations.zip), or [from the source website](http://data.unep-wcmc.org/datasets/1).
19 |
20 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
21 |
--------------------------------------------------------------------------------
/bio_021a_terrestrial_ecoregions/README.md:
--------------------------------------------------------------------------------
1 | ## Terrestrial Ecoregions Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Terrestrial Ecoregions](http://maps.tnc.org/files/metadata/TerrEcos.xml) for [display on Resource Watch](https://resourcewatch.org/data/explore/d9034fa9-8db0-4d52-b018-46fae37d3136).
3 |
4 | The source provided the data as a shapefile.
5 |
6 | Below, we describe the steps used to reformat the shapefile to upload it to Carto.
7 |
8 | 1. Read in the polygon shapefile as a geopandas dataframe.
9 | 2. Project the data so its coordinate system is WGS84.
10 | 3. Convert the geometries of the data from shapely objects to geojsons.
11 | 4. Create a new column from the index of the dataframe to use as a unique id column (cartodb_id) in Carto.
12 |
13 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/bio_021a_terrestrial_ecoregions/bio_021a_terrestrial_ecoregions_processing.py) for more details on this processing.
14 |
15 | You can view the processed Terrestrial Ecoregions dataset [on Resource Watch](https://resourcewatch.org/data/explore/d9034fa9-8db0-4d52-b018-46fae37d3136).
16 |
17 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/bio_021a_terrestrial_ecoregions.zip), or [from the source website](https://geospatial.tnc.org/datasets/7b7fb9d945544d41b3e7a91494c42930_0).
18 |
19 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
20 |
--------------------------------------------------------------------------------
/bio_041_rw1_ocean_health_index/README.md:
--------------------------------------------------------------------------------
1 | ## Ocean Health Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Ocean Health Index](http://www.oceanhealthindex.org/) for [display on Resource Watch](https://resourcewatch.org/data/explore/7b52fb1a-a52d-44b9-bfef-45c3f2610c58).
3 |
4 | The source provided the data in a csv file. This data file was not modified from the original version for display on Resource Watch.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/bio_041_rw1_ocean_health_index/bio_041_rw1_ocean_health_index_processing.py) for more details on this processing.
7 |
8 | You can view the processed Ocean Health Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/7b52fb1a-a52d-44b9-bfef-45c3f2610c58).
9 |
10 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/bio_041_rw1_ocean_health_index.zip), or [from the source website](http://ohi-science.org/ohi-global/download).
11 |
12 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
13 |
--------------------------------------------------------------------------------
/bio_041_rw1_ocean_health_index/bio_041_rw1_ocean_health_index_processing.py:
--------------------------------------------------------------------------------
1 | import io
2 | import requests
3 | import pandas as pd
4 | import os
5 | import sys
6 | utils_path = os.path.join(os.path.abspath(os.getenv('PROCESSING_DIR')),'utils')
7 | if utils_path not in sys.path:
8 | sys.path.append(utils_path)
9 | import util_files
10 | import util_cloud
11 | import util_carto
12 | from zipfile import ZipFile
13 | import logging
14 |
15 | # Set up logging
16 | # Get the top-level logger object
17 | logger = logging.getLogger()
18 | for handler in logger.handlers: logger.removeHandler(handler)
19 | logger.setLevel(logging.INFO)
20 | # make it print to the console.
21 | console = logging.StreamHandler()
22 | logger.addHandler(console)
23 | logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
24 |
25 |
26 | # name of table on Carto where you want to upload data
27 | # this should be a table name that is not currently in use
28 | dataset_name = 'bio_041_rw1_ocean_health_index' #check
29 |
30 | logger.info('Executing script for dataset: ' + dataset_name)
31 | # create a new sub-directory within your specified dir called 'data'
32 | # within this directory, create files to store raw and processed data
33 | data_dir = util_files.prep_dirs(dataset_name)
34 |
35 | '''
36 | Download data and save to your data directory
37 | '''
38 | logger.info('Downloading raw data')
39 | # insert the url used to download the data from the source website
40 | url = 'https://raw.githubusercontent.com/OHI-Science/ohi-global/published/yearly_results/global2019/OHI_final_formatted_scores_2019-11-15.csv' #check
41 |
42 | # read in data to pandas dataframe
43 | r = requests.get(url)
44 | df = pd.read_csv(io.BytesIO(r.content))
45 |
46 | # save unprocessed source data to put on S3 (below)
47 | raw_data_file = os.path.join(data_dir, os.path.basename(url))
48 | df.to_csv(raw_data_file, header = False, index = False)
49 |
50 | '''
51 | Process data
52 | '''
53 | #save processed dataset to csv
54 | processed_data_file = os.path.join(data_dir, dataset_name+'_edit.csv')
55 | df.to_csv(processed_data_file, index=False)
56 |
57 | '''
58 | Upload processed data to Carto
59 | '''
60 | logger.info('Uploading processed data to Carto.')
61 | util_carto.upload_to_carto(processed_data_file, 'LINK')
62 |
63 | '''
64 | Upload original data and processed data to Amazon S3 storage
65 | '''
66 | # initialize AWS variables
67 | aws_bucket = 'wri-public-data'
68 | s3_prefix = 'resourcewatch/'
69 |
70 | logger.info('Uploading original data to S3.')
71 | # Upload raw data file to S3
72 |
73 | # Copy the raw data into a zipped file to upload to S3
74 | raw_data_dir = os.path.join(data_dir, dataset_name+'.zip')
75 | with ZipFile(raw_data_dir,'w') as zip:
76 | zip.write(raw_data_file, os.path.basename(raw_data_file))
77 | # Upload raw data file to S3
78 | uploaded = util_cloud.aws_upload(raw_data_dir, aws_bucket, s3_prefix+os.path.basename(raw_data_dir))
79 |
80 | logger.info('Uploading processed data to S3.')
81 | # Copy the processed data into a zipped file to upload to S3
82 | processed_data_dir = os.path.join(data_dir, dataset_name+'_edit.zip')
83 | with ZipFile(processed_data_dir,'w') as zip:
84 | zip.write(processed_data_file, os.path.basename(processed_data_file))
85 | # Upload processed data file to S3
86 | uploaded = util_cloud.aws_upload(processed_data_dir, aws_bucket, s3_prefix+os.path.basename(processed_data_dir))
87 |
--------------------------------------------------------------------------------
/cit_022_rw1_road_traffic_death_rates/README.md:
--------------------------------------------------------------------------------
1 | ## Road Traffic Death Rates Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Road Traffic Deaths dataset](http://apps.who.int/gho/data/node.wrapper.imr?x-id=198) for [display on Resource Watch](https://resourcewatch.org/data/explore/3b6f853a-622d-4fff-827c-901b5b4352b0).
3 |
4 | The source provided the dataset as a csv file.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 | 1. Read in the data as a pandas dataframe and rename all the columns.
8 | 2. Split the 'estimated_number_of_road_traffic_deaths_data' column and stored the lower and upper bound of the estimates in two new columns.
9 | 3. Add a column for datetime with January 1, 2016 for every row.
10 | 4. Reorder the columns.
11 | 5. Remove all the spaces within numbers in the 'estimated_number_of_road_traffic_deaths_data' column.
12 |
13 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cit_022_rw1_road_traffic_death_rates/cit_022_rw1_road_traffic_death_rates_processing.py) for more details on this processing.
14 |
15 | You can view the processed Road Traffic Death Rates dataset [on Resource Watch](https://resourcewatch.org/data/explore/3b6f853a-622d-4fff-827c-901b5b4352b0).
16 |
17 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/cit_022_rw1_road_traffic_death_rates.zip), or [from the source website](http://apps.who.int/gho/data/node.main.A997?lang=en).
18 |
19 | ###### Note: This dataset processing was done by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
20 |
--------------------------------------------------------------------------------
/cit_029_rw1_municipal_waste/README.md:
--------------------------------------------------------------------------------
1 | ## Municipal Waste Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Municipal Waste Generated per Capita](https://stats.oecd.org/Index.aspx?DataSetCode=MUNW) for [display on Resource Watch](https://resourcewatch.org/data/explore/23f41bb2-2312-41ab-aaf2-ef584f80b31a).
3 |
4 | The source provided the data as a csv file.
5 |
6 | Below, we describe the steps used to reformat the table to upload it to Carto.
7 |
8 | 1. Read in the csv file as a pandas dataframe.
9 | 2. Subset the dataframe based on the 'Variable' column to obtain municipal waste generated per capita for each country.
10 | 3. The columns 'YEA' and 'Year' are identical, so the 'YEA' column was removed.
11 | 4. Convert the column names to lowercase and replace the spaces with underscores.
12 | 5. Convert the years in the 'year' column to datetime objects and store them in a new column 'datetime'.
13 |
14 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cit_029_rw1_municipal_waste/cit_029_rw1_municipal_waste_processing.py) for more details on this processing.
15 |
16 | You can view the processed Municipal Waste dataset [on Resource Watch](https://resourcewatch.org/data/explore/23f41bb2-2312-41ab-aaf2-ef584f80b31a).
17 |
18 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/cit_029_rw1_municipal_waste.zip), or [from the source website](https://stats.oecd.org/Index.aspx?DataSetCode=MUNW).
19 |
20 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
21 |
--------------------------------------------------------------------------------
/cit_031_rw1_air_quality_PM25_concentration/README.md:
--------------------------------------------------------------------------------
1 | ## Air Quality: Surface Fine Particulate Matter (PM2.5) Concentrations Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Annual averaged 0.01° × 0.01° w GWR adjustment PM2.5](http://fizz.phys.dal.ca/~atmos/martin/?page_id=140) for [display on Resource Watch](https://resourcewatch.org/data/explore/31289042-1e02-4b43-9c7f-ef23a98fa3c7).
3 |
4 | Each year of data is provided by the source as a NetCDF file. In order to display this data on Resource Watch, the PM25 variable in each NetCDF was converted to a GeoTIFF, and the latitude and longitude fields had to be switched because they were transposed in the original dataset.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cit_031_rw1_air_quality_PM25_concentration/cit_031_rw1_air_quality_PM25_concentration_processing.py) for more details on this processing.
7 |
8 | You can view the processed Air Quality: Surface Fine Particulate Matter (PM2.5) Concentrations dataset [on Resource Watch](https://resourcewatch.org/data/explore/31289042-1e02-4b43-9c7f-ef23a98fa3c7).
9 |
10 | You can also download the original dataset [from the source website](http://fizz.phys.dal.ca/~atmos/martin/?page_id=140).
11 |
12 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
13 |
--------------------------------------------------------------------------------
/cit_033a_urban_builtup_area/cit_033a_urban_built_up_area_processing.py:
--------------------------------------------------------------------------------
1 | import os
2 | import subprocess
3 | import glob
4 | from shutil import copy
5 |
6 | # name of folder to store data for upload in
7 | dataset_name = 'cit_033a/'
8 |
9 | # Directory of indivual tile tiff files on local machine
10 | DATA_DIR = os.getenv('PROCESSING_DIR')+dataset_name 'GHS_BUILT_LDSMT_GLOBE_R2018A_3857_30_V2_0/V2-0/30x150000/'
11 |
12 | # Single folder to hold all individual files. Done for parallel upload to Google Cloud Bucket
13 | DEST_DIR = os.getenv('PROCESSING_DIR')+dataset_name 'GHS_BUILT_LDSMT_GLOBE_R2018A_3857_30_V2_0/V2-0/temp/'
14 |
15 | # Directory for Google Bucket where the individual tiff files will be stored before transferring to Google Earth Engine
16 | GS_BUCKET = 'gs://{}/temp/'.format(os.getenv('GEE_STAGING_BUCKET'))
17 |
18 | # Move to data directory
19 | os.chdir(DATA_DIR)
20 | # set number of assets to upload at a single time
21 | NUM_ASSETS_AT_ONCE = 50
22 |
23 | PAUSE_FOR_OVERLOAD = True
24 |
25 | #################################### Transfer files from Local to Bucket ######################################
26 |
27 | # Get the list of all individual tif files
28 | files = glob.glob(DATA_DIR + '/**/*.tif', recursive = True)
29 | #Create empty array for task id's
30 | task_ids = ['']*len(files)
31 |
32 | # Loop through all files in DEST_DIR
33 | for i,filey in enumerate(files):
34 |
35 | # Rename all files to include extension at the end to avoid overwritting of duplicate names
36 | filename = filey.split('.tif')[0]+'_{}.tif'.format(i)
37 | os.rename(filey,filename)
38 |
39 | # copy all files to a single directory to use parallel upload
40 | copy(filey, DEST_DIR)
41 |
42 |
43 | # Transfer all files to the Google Cloud Bucket
44 | cmd = ['gsutil','-m','cp','-r',DEST_DIR,GS_BUCKET]
45 | subprocess.call(cmd, shell=True)
46 |
47 | #################################### Transfer files from Bucket to GEE ######################################
48 |
49 | # Google Earth Engine asset to store individual tiff file
50 | EE_COLLECTION = 'projects/resource-watch-gee/cit_033a_urban_built_up_area_mosaic'
51 |
52 |
53 | def upload_asset(full_file_path, DATA_DIR=DATA_DIR, EE_COLLECTION=EE_COLLECTION, GS_BUCKET=GS_BUCKET):
54 | '''
55 | Function to upload geotiffs as images
56 | '''
57 |
58 | filename = os.path.basename(full_file_path)
59 |
60 | #Get asset id
61 | asset_id = EE_COLLECTION+'/'+filename.split('.')[0]
62 |
63 | #Upload GeoTIFF from google storage bucket to earth engine
64 | cmd = ['earthengine','upload','image','--asset_id='+asset_id,'--force',GS_BUCKET+'/'+filename]
65 |
66 | shell_output = subprocess.check_output(cmd, shell=True)
67 | shell_output = shell_output.decode("utf-8")
68 | print(shell_output)
69 |
70 | #Get task id
71 | task_id = ''
72 | if 'Started upload task with ID' in shell_output:
73 | task_id = shell_output.split(': ')[1]
74 | task_id = task_id.strip()
75 | else:
76 | print('Something went wrong!')
77 | task_id='ERROR'
78 | return task_id
79 |
80 | # Loop through each indivual tiles to upload them to Google Earth Engine
81 | for i,filey in enumerate(files):
82 | print(i)
83 |
84 | if i >=0 and i <= 5000: # repeat this process for len(files)
85 |
86 | task_id = upload_asset(filey)
87 |
88 | if PAUSE_FOR_OVERLOAD:
89 | if (i% NUM_ASSETS_AT_ONCE == 0) and (i>0):
90 | #Wait for all tasks to finish
91 | cmd = ['earthengine','task','wait','all']
92 | subprocess.call(cmd, shell=True)
93 |
--------------------------------------------------------------------------------
/cit_043_rw0_bus_rapid_transit/README.md:
--------------------------------------------------------------------------------
1 | ## Cities with Bus Rapid Transit Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Cities with Bus Rapid Transit](https://brtdata.org/indicators/systems/year_system_commenced) for [display on Resource Watch](https://bit.ly/3FGqorD).
3 |
4 | The Global Bus Rapid Transit (BRT) dataset includes the name of the BRT system, the year it commenced, the location of the BRT (city and region), and the source. Each of these values are provided for BRT systems launched between 1968 and 2020.
5 |
6 | While this dataset be viewed on the source website, it is not directly downloadable there. In order to display the BRT data on Resource Watch, the dataset was scraped from the source website and joined with the city centroid coordinates from [Natural Earth's Populated Places dataset](http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-populated-places/) for mapping purposes. Part of the scraping process was adapted from [Kiprono Elijah Koech article](https://towardsdatascience.com/web-scraping-scraping-table-data-1665b6b2271c).
7 |
8 | Below, we describe the main actions performed to process the scraped html table:
9 | 1. Import the data as a pandas dataframe.
10 | 2. Drop the 'year' column and rename the 'value' column as 'year'.
11 | 3. Convert years in the new 'year' column to datetime objects and store them in a new column 'datetime'.
12 | 4. Change to lowercase the column headers and remove special characters from them.
13 | 5. Join the Global Bus Rapid Transit data (“cit_043_rw0_bus_rapid_transit”) with the Populated Places dataset ("ne_10m_populated_places_simple"), which had previously been uploaded to the Resource Watch Carto account. These tables should be joined on the “ne_10m_populated_places_simple.name” and "cit_043_rw0_bus_rapid_transit_edit.city" columns from the respective table, using the following SQL statement:
14 |
15 | ```
16 | SELECT ne_10m_populated_places_simple.name, cit_043_rw0_bus_rapid_transit_edit.city,
17 | ST_Transform(ne_10m_populated_places_simple.the_geom, 3857) AS the_geom_webmercator, cit_043_rw0_bus_rapid_transit_edit.source,
18 | cit_043_rw0_bus_rapid_transit_edit.year, cit_043_rw0_bus_rapid_transit_edit.country
19 |
20 | FROM ne_10m_populated_places_simple
21 |
22 | INNER JOIN cit_043_rw0_bus_rapid_transit_edit ON ne_10m_populated_places_simple.name = cit_043_rw0_bus_rapid_transit_edit.city
23 | ```
24 |
25 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cit_043_rw0_bus_rapid_transit/cit_043_rw0_bus_rapid_transit.py) for more details on this processing.
26 |
27 | You can view the processed dataset for [display on Resource Watch](https://bit.ly/3FGqorD).
28 |
29 | You can also download the data [from the source website](https://brtdata.org/indicators/systems/year_system_commenced).
30 |
31 | ###### ###### Note: This dataset processing and QC was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes).
32 |
33 |
--------------------------------------------------------------------------------
/cit_045_infrastructure_investment_outlook/README.md:
--------------------------------------------------------------------------------
1 | ## Infrastructure Investment Outlook Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Infrastructure Investment Outlook dataset](https://outlook.gihub.org/) for [display on Resource Watch](https://resourcewatch.org/data/explore/Infrastructure-Investment-Outlook).
3 |
4 | The Global Infrastructure Investment Outlook data can be downloaded by country on the source website. The downloaded data includes the expected investment in infrastructure for that country (current trends), as well as the investment in infrastructure that would be needed to match the performance of other countries within the same income group (investment needs). Each of these values are provided for each year between 2007 and 2040.
5 |
6 | Because we wanted to display the data on Resource Watch for all countries, a complete dataset that included all countries was requested from the data provider, rather than processing the data individually for each country.
7 |
8 | Below, we describe the calculations that were done to the global dataset in Microsoft Excel to produce the data shown on Resource Watch.
9 |
10 | The global dataset that was provided included two spreadsheets: one for investment needs and one for current trends. Within each sheet, each row represents a particular country, and each column represents a single year.
11 |
12 | The Pivot Tables feature in Excel was used to combine the two sheets into one by creating a new column called “investment_path” that indicated whether the data was a "current trend" or an "investment need."
13 |
14 | The infrastructure investment gap for each year was calculated from this data by finding the the difference between the "current trend" value and the "investment need" value for each country and each year.
15 |
16 | The Pivot Tables feature was used again to create a new column called “year” so that all years of data could be stored stored in a single column called "value," rather than storing each year's data in a unique column. The “value” column was also divided by 1,000,000 to show the value in millions of USD, which was stored in a new “value_millions_” column.
17 |
18 | The total infrastructure investment gap for a country, which is shown on Resource Watch, is the sum of the infrastructure investment gap for all years between 2016 and 2040. This selection of years was used to be consistent with how the source calculated the total infrastructure investment gap displayed on their website.
19 |
20 |
21 | You can view the processed infrastructure investment outlook dataset [on Resource Watch](https://resourcewatch.org/data/explore/Infrastructure-Investment-Outlook).
22 |
23 | You can also download original dataset [from the source website](https://outlook.gihub.org/).
24 |
25 | ###### Note: This dataset processing was done by [Ken Wakabayashi](https://www.wri.org/profile/ken-wakabayashi), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
26 |
--------------------------------------------------------------------------------
/cli_008_greenhouse_gas_emissions_country_sector/README.md:
--------------------------------------------------------------------------------
1 | ## Greenhouse Gas Emissions by Country and Economic Sector Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Greenhouse Gas Emissions by Country and Economic Sector](https://www.climatewatchdata.org/data-explorer) for [display on Resource Watch](https://bit.ly/39sQ4ds).
3 |
4 | The source provided this dataset as a csv file accessed through its [data explorer](https://www.climatewatchdata.org/data-explorer). The following options were selected from the dropdown menu:
5 | 1. Data source: CAIT
6 | 2. Countries and regions: All selected
7 | 3. Sectors: All selected
8 | 4. Gases: All GHG
9 | 5. Start year: 1990
10 | 6. End year: 2018
11 |
12 | Below, we describe the main actions performed to process the csv file:
13 | 1. Import the data as a pandas dataframe.
14 | 2. Convert the dataframe from wide form to long form, in which one column indicates the year and the other columns indicate the emission values for the year.
15 | 3. Convert the table from long to wide form, in which the emission values of each sector are stored in individual columns.
16 | 4. Convert years in the 'year' column to datetime objects and store them in a new column 'datetime'.
17 | 5. Rename column headers to be more descriptive and to remove special characters so that it can be uploaded to Carto without losing information.
18 |
19 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cli_008_greenhouse_gas_emissions_country_sector/cli_008_greenhouse_gas_emissions_country_sector_processing.py) for more details on this processing.
20 |
21 | You can view the processed dataset for [display on Resource Watch](https://bit.ly/39sQ4ds).
22 |
23 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/cli_008_greenhouse_gas_emissions_country_sector.zip), or [from the source website](https://www.climatewatchdata.org/data-explorer).
24 |
25 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
26 |
--------------------------------------------------------------------------------
/cli_029_rw2_vulnerability_to_climate_change/README.md:
--------------------------------------------------------------------------------
1 | ## Vulnerability to Climate Change Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the ND-GAIN Country Index dataset](https://gain.nd.edu/our-work/country-index/) for [display on Resource Watch](https://resourcewatch.org/data/explore/cli029a-Vulnerability-to-Climate-Change-Index).
3 |
4 | This dataset comes in a zipped file containing data for the ND-GAIN Index, as well as several other scores and sub-indices, from 1995-2019.
5 |
6 | On Resource Watch, we show the ND-GAIN Index, and also include information on each country's Readiness, Vulnerability, and Food Vulnerability Scores.
7 |
8 | In order to upload the selected data on Resource Watch, the following steps were followed:
9 |
10 | 1. Download and unzip source data.
11 | 2. The source provides the data with each row representing a country and each year of data stored in a different column. This is considered to be "wide" form. We converted each of those files from wide form to long form, in which there is one column that indicates the year and one column that indicates the value for that year.
12 | 3. Merge the four tables for ND-GAIN Index, Readiness Score, Vulnerability, and Food Vulnerability Score into one table that has a column to store the value for each.
13 | 4. Rename the column 'Name' to 'country' to match the column name in the previous Carto table.
14 | 5. Convert the column names to lowercase to meet the column name requirements of Carto.
15 |
16 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cli_029_rw2_vulnerability_to_climate_change/cli_029_rw2_vulnerability_to_climate_change_processing.py) for more details on this processing.
17 |
18 | You can view the processed vulnerability to climate change dataset [on Resource Watch](https://resourcewatch.org/data/explore/cli029a-Vulnerability-to-Climate-Change-Index).
19 |
20 | ###### Note: This dataset processing was done by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder), updates made by [Alex Sweeney](https://github.com/alxswny) and [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Chris Rowe](https://www.wri.org/profile/chris-rowe).
21 |
--------------------------------------------------------------------------------
/cli_030_rw1_aridity/README.md:
--------------------------------------------------------------------------------
1 | ## Aridity Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Aridity - Annual](http://www.cgiar-csi.org/data/global-aridity-and-pet-database) for [display on Resource Watch](https://resourcewatch.org/data/explore/43d9dac0-88be-4db1-b71e-482756220817).
3 |
4 | The source provided the data as a GeoTIFF file within a zipped folder. The GeoTIFF inside the zipped folder was not modified from the original version for display on Resource Watch.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cli_030_rw1_aridity/cli_030_rw1_aridity_processing.py) for more details on this processing.
7 |
8 | You can view the processed Aridity dataset [on Resource Watch](https://resourcewatch.org/data/explore/43d9dac0-88be-4db1-b71e-482756220817).
9 |
10 | You can also download the original dataset [from the source website](https://cgiarcsi.community/2019/01/24/global-aridity-index-and-potential-evapotranspiration-climate-database-v2/).
11 |
12 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
13 |
--------------------------------------------------------------------------------
/cli_049_rw1_dash_pik_historical_emissions/README.md:
--------------------------------------------------------------------------------
1 | ## PIK Historical Emissions Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the PRIMAP-hist national historical emissions time series
3 | (1850-2017)](https://dataservices.gfz-potsdam.de/pik/showshort.php?id=escidoc:4736895) for [display on Resource Watch](https://resourcewatch.org/embed/widget/1d736449-18cb-4757-8c6a-d8a175d906f0).
4 |
5 | The source provides the data as a csv file.
6 |
7 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
8 |
9 | 1. Read in the data as a pandas dataframe.
10 | 2. Subset the dataframe to obtain the aggregated emissions for all countries.
11 | 3. Subset the dataframe to obtain emissions of CH4, CO2, F Gases, and N2O.
12 | 4. Subset the dataframe to obtain GHG emissions that are primarily reported by countries.
13 | 5. Convert the dataframe from wide to long format so there will be one column indicating the year and another column indicating the emissions values.
14 | 6. Rename the 'variable' and 'value' columns created by the previous step to be 'year' and 'yr_data'.
15 | 7. Convert the emission values of CH4 and N2O to be in the unit of GgCO2eq using global warming potential in IPCC Fourth Assessment Report.
16 | 8. Convert the emission values to be in MtCO2eq.
17 | 9. Change the values of 'unit' column from 'GgCO2eq' to 'MtCO2eq'.
18 | 10. Create a dictionary for all the major sectors and their corresponding codes in the dataframe based on the documentation at ftp://datapub.gfz-potsdam.de/download/10.5880.PIK.2019.018/PRIMAP-hist_v2.1_data-description.pdf.
19 | 11. Subset the dataframe to obtain the GHG emissions of the major sectors.
20 | 12. Convert the codes in the 'category' column to the sectors they represent.
21 | 13. Create a 'datetime' column to store the years as datetime objects and drop the 'year' column.
22 | 14. Sum the emissions of all types of GHG for each sector in each year.
23 | 15. Create a column 'source' to indicate the data source is PIK.
24 | 16. Create a 'gwp' column to indicate that the global warming potential used in the calculation is from the IPCC Fourth Assessment Report.
25 | 17. Create a 'gas' column to indicate that the emissions values are the sum of all GHG emissions.
26 | 18. Rename the 'category' column to 'sector'.
27 |
28 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cli_049_rw1_dash_pik_historical_emissions/cli_049_rw1_dash_pik_historical_emissions_processing.py) for more details on this processing.
29 |
30 | You can view the processed PIK Historical Emissions dataset [on Resource Watch](https://resourcewatch.org/embed/widget/1d736449-18cb-4757-8c6a-d8a175d906f0).
31 |
32 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/cli_049_rw1_dash_pik_historical_emissions.zip), or [from the source website](https://dataservices.gfz-potsdam.de/pik/showshort.php?id=escidoc:4736895).
33 |
34 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu) and [Daniel Gonzalez](mailto:Daniel.Gonzalez@wri.org), and QC'd by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid).
35 |
--------------------------------------------------------------------------------
/cli_050-059_066-075_nexgddp_and_loca/README.md:
--------------------------------------------------------------------------------
1 | ## NEX-GDDP and LOCA Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [NEX_GDDP and LOCA datasets](https://doi.org/10.46830/writn.19.00117) for display on Resource Watch as the following datasets:
3 |
4 | - [Projected Change in Annual Average Temperature](https://resourcewatch.org/data/explore/4ca6826c-718d-457d-b4e2-e9277d7ed62c).
5 | - [Projected Change in Annual Average Minimum Temperature](https://resourcewatch.org/data/explore/3d8e2e82-b33a-4898-90e5-6e4a1d007b82).
6 | - [Projected Change in Annual Average Maximum Temperature](https://resourcewatch.org/data/explore/c4b12251-2d61-458f-a2c0-096c37901ade).
7 | - [Projected Change in Heating Degree Days](https://resourcewatch.org/data/explore/1d2f4eae-10e1-4ea6-980c-501b34106de2).
8 | - [Projected Change in Cooling Degree Days](https://resourcewatch.org/data/explore/6fa000b5-8d91-46c8-8502-07427bc5eafc).
9 | - [Projected Change in Frost Free Season](https://resourcewatch.org/data/explore/d3f512c0-fbb0-47d5-9ac9-a452574c8e58).
10 | - [Projected Change in Extreme Heat Days](https://resourcewatch.org/data/explore/3941bbba-181b-434a-84c3-fcdfa5234735).
11 | - [Projected Change in Cumulative Precipitation](https://resourcewatch.org/data/explore/faf79d2c-5e54-4591-9d70-4bd1029c18e6).
12 | - [Projected Change in Dry Spells](https://resourcewatch.org/data/explore/d0f46576-411d-48aa-8df8-d89a3792cdce).
13 | - [Projected Change in Extreme Precipitation Days](https://resourcewatch.org/data/explore/66d28bbc-1e6e-4156-9ba2-875ecab665af).
14 |
15 | This dataset is provided by the source as a series of tif files, which were uploaded to Google Earth Engine.
16 |
17 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cli_050-059_066-075_nexgddp_and_loca/cli_050-059_066-075_nexgddp_and_loca_processing.py) for more details on this processing.
18 |
19 | You can view the processed dataset for display on Resource Watch at the links above.
20 |
21 | You can also download the original datasets for [NEX-GDDP](https://wri-public-data.s3.amazonaws.com/resourcewatch/raster/nexgddp.zip) and [LOCA](https://wri-public-data.s3.amazonaws.com/resourcewatch/raster/loca.zip).
22 |
23 | ###### Note: This dataset processing was done by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
--------------------------------------------------------------------------------
/cli_064_social_cost_carbon/README.md:
--------------------------------------------------------------------------------
1 | ## Country-Level Social Cost of Carbon Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Country-level Social Cost of Carbon dataset](https://www.nature.com/articles/s41558-018-0282-y.epdf?author_access_token=XLBRLEGdT_Kv0n8_OnvpedRgN0jAjWel9jnR3ZoTv0Ms70oz073vBeHQkQJXsJbey6vjdAHHSPxkHEN8nflPeQI6U86-MxWO1T1uUiSvN2A-srp5G9s7YwGWt6-cuKn2e83mvZEpXG3r-J0nv0gYuA%3D%3D) for [display on Resource Watch]().
3 |
4 | This dataset was provided by the source as a csv file in a GitHub repository. The data is stored in the csv file named "cscc_db_v2.csv" in the [Github repository](https://github.com/country-level-scc/cscc-database-2018) for the 2018 Country-level Social Cost of Carbon (CSCC) Database.
5 |
6 | This table was read into Python as a dataframe. The data was trimmed, the column for country code was renamed and the the table was converted from wide to a long form so that the final table contains a single column of CSCC percentiles and a single column of CSCC scores.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cli_064_social_cost_carbon/cli_064_social_cost_carbon_processing.py) for more details on this processing.
9 |
10 | You can view the processed Country-Level Social Cost of Carbon dataset [on Resource Watch]().
11 |
12 | You can also download original dataset [directly through Resource Watch](), or [from the source website](https://github.com/country-level-scc/cscc-database-2018/blob/master/cscc_db_v2.csv).
13 |
14 | ###### Note: This dataset processing was done by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
15 |
--------------------------------------------------------------------------------
/cli_079_rw0_universal_thermal_climate_index/README.md:
--------------------------------------------------------------------------------
1 | ## Universal Thermal Climate Index (UTCI) Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the ERA5-HEAT (Human thErmAl comforT) Universal Thermal Climate Index]({https://cds.climate.copernicus.eu/cdsapp#!/dataset/derived-utci-historical?tab=overview}).
3 |
4 | Data was downloaded via the Copernicus Climate Data Store (CDS) API.
5 |
6 | Below, we describe the steps used to reformat the raster so that it is formatted correctly for upload to Google Earth Engine.
7 |
8 | 1. Download netCDF's via CDS API
9 | 2. Remove dots in filenames and replace with underscores
10 | 3. Convert the Universal Thermal Climate Index (utci variable in netCDF file) from Kelvin to Celsius
11 | 4. Calculate daily averages for UTCI
12 | 5. Save the daily averages as GeoTIFFs
13 | 6. Calculate a monthly average from the daily average GeoTIFFs
14 | 7. Save the monthly averages as GeoTIFFs
15 |
16 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/cli_079_rw0_universal_thermal_climate_index/cli_079_rw0_universal_thermal_climate_index_processing.py) for more details on this processing.
17 |
18 | You can also download the original dataset [from the source website]({https://cds.climate.copernicus.eu/cdsapp#!/dataset/derived-utci-historical?tab=form}).
19 |
20 | ###### Note: This dataset processing was done by [Alex Sweeney](https://github.com/alxswny), and QC'd by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou).
21 |
--------------------------------------------------------------------------------
/com_002_airports/README.md:
--------------------------------------------------------------------------------
1 | ## Airports Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Airports dataset](https://openflights.org/data.html) for [display on Resource Watch](https://resourcewatch.org/data/explore/c111725c-e1c5-467b-a367-742db1c70893).
3 |
4 | The source provided the data in a dat format.
5 |
6 | Below, we describe the steps used to reformat the table before we uploaded it to Carto.
7 |
8 | 1. Read in the data as a pandas data frame and remove the extra column containing indexes.
9 | 2. Reorder the index and give the columns the correct header.
10 | 3. Replace '\N' and 'NaN' in the data frame with None.
11 | 4. Change the data types of the 'latitude', 'longitude', and 'daylight_savings_time' columns to float.
12 | 5. Change the data types of the 'altitude' column to integer.
13 |
14 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/com_002_airports/com_002_airports_processing.py) for more details on this processing.
15 |
16 | You can view the processed Airports dataset [on Resource Watch](https://resourcewatch.org/data/explore/c111725c-e1c5-467b-a367-742db1c70893).
17 |
18 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/com_002_airports.zip), or [from the source website](https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat).
19 |
20 | ###### Note: This dataset processing was done by [Matthew Iceland](https://github.com/miceland2) and [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
21 |
--------------------------------------------------------------------------------
/com_007_rw1_fdi_regulatory_restrictiveness_index/README.md:
--------------------------------------------------------------------------------
1 | ## FDI Regulatory Restrictiveness Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the OECD FDI Regulatory Restrictiveness Index](http://www.oecd.org/investment/fdiindex.htm) for [display on Resource Watch](https://resourcewatch.org/data/explore/10b47089-6457-48b4-a955-60f4f964e0f2).
3 |
4 | The source provided the data in a csv format.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Read the data as a pandas dataframe.
9 | 2. Remove the column 'TIME' since it is a duplicate of the column 'Year'.
10 | 3. Remove the column 'Flag Codes' and 'Flags' since they don't contain any value.
11 | 4. Remove the column 'Type of restriction', 'SERIES', 'Series', and 'RESTYPE' since they only contain one unique value.
12 | 5. Remove the column 'SECTOR' since it contains the same information as the column 'Sector / Industry'.
13 | 6. Convert the format of the data frame from long to wide so each sector will have its own column.
14 | 7. Convert the years in the 'Year' column to datetime objects and store them in a new column 'datetime'.
15 | 8. Replace the spaces within column names with underscores, remove the symbols, and convert the letters to lowercase to match Carto column name requirements.
16 |
17 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/com_007_rw1_fdi_regulatory_restrictiveness_index/com_007_rw1_fdi_regulatory_restrictiveness_index_processing.py) for more details on this processing.
18 |
19 | You can view the processed FDI Regulatory Restrictiveness Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/10b47089-6457-48b4-a955-60f4f964e0f2).
20 |
21 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/com_007_rw1_fdi_regulatory_restrictiveness_index.zip), or [from the source website](http://stats.oecd.org/Index.aspx?datasetcode=FDIINDEX#).
22 |
23 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
24 |
--------------------------------------------------------------------------------
/com_011_rw1_maritime_boundaries/README.md:
--------------------------------------------------------------------------------
1 | ## Maritime Boundaries Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Maritime Boundaries version 11](https://www.marineregions.org/sources.php#marbound) for [display on Resource Watch](https://resourcewatch.org/data/explore/6af67024-b917-4944-851a-152b566ff1a8).
3 |
4 | The data source provided the dataset as three polygon shapefiles.
5 | 1. Maritime Boundaries Geodatabase: Exclusive Economic Zones (200NM), version 11
6 | 2. 12 nautical miles zones: Territorial Seas (12NM), version 3
7 | 3. 24 nautical miles zones: Contiguous Zones (24NM), version 3
8 |
9 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
10 |
11 | 1. Import the three polygon shapefiles as geopandas dataframes.
12 | 2. Stack the three geopandas dataframes on top of each other.
13 | 3. Project the data so its coordinate system is WGS84.
14 | 4. Create a new column from the index of the dataframe to use as a unique id column (cartodb_id) in Carto.
15 | 5. Reorder columns by their column names.
16 | 6. Convert the column names to lowercase to match Carto column name requirements.
17 |
18 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/com_011_rw1_maritime_boundaries/com_011_rw1_maritime_boundaries_processing.py) for more details on this processing.
19 |
20 | You can view the processed Maritime Boundaries dataset [on Resource Watch](https://resourcewatch.org/data/explore/6af67024-b917-4944-851a-152b566ff1a8).
21 |
22 | You can also download the original dataset [from the source website](https://www.marineregions.org/downloads.php).
23 |
24 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
25 |
--------------------------------------------------------------------------------
/com_015_rw1_recycling_rates/README.md:
--------------------------------------------------------------------------------
1 | ## Recycled Waste Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Recovered Waste: Recycled](https://stats.oecd.org/Index.aspx?DataSetCode=MUNW) for [display on Resource Watch](https://resourcewatch.org/data/explore/46e7870a-5590-42c7-bf5b-56c7def7399b).
3 |
4 | The data source provided the dataset as one csv file.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Import the data as a pandas dataframe.
9 | 2. Subset the dataframe based on the 'Variable' column to obtain recycling waste for each country.
10 | 3. Remove column 'VAR' since it contains the same information as the column 'Variable'.
11 | 4. Remove column 'YEA' since it contains the same information as the column 'Year'.
12 | 5. Remove column 'Unit Code' since it contains the same information as the column 'Unit'.
13 | 6. Remove column 'Flag Codes' since it contains the same information as the column 'Flags'.
14 | 7. Remove column 'Reference Period Code' and 'Reference Period' since they are all NaNs.
15 | 8. Replace NaN in the table with None.
16 | 9. Convert years in the 'year' column to datetime objects and store them in a new column 'datetime'.
17 | 10. Replace whitespaces in column names with underscores and convert the column names to lowercase to match Carto column name requirements.
18 |
19 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/com_015_rw1_recycling_rates/com_015_rw1_recycling_rates_processing.py) for more details on this processing.
20 |
21 | You can view the processed Recycled Waste dataset [on Resource Watch](https://resourcewatch.org/data/explore/46e7870a-5590-42c7-bf5b-56c7def7399b).
22 |
23 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/com_015_rw1_recycling_rates.zip), or [from the source website](https://stats.oecd.org/Index.aspx?DataSetCode=MUNW).
24 |
25 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
26 |
--------------------------------------------------------------------------------
/com_017_rw2_major_ports/README.md:
--------------------------------------------------------------------------------
1 | ## Major Ports Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the World Port Index](https://msi.nga.mil/Publications/WPI) for [display on Resource Watch](https://resourcewatch.org/data/explore/28d1f505-571c-4a52-8215-48ea02aa4928).
3 |
4 | The data source provided the dataset as one shapefile and one csv file. The csv file was used and its column names were converted to lowercase before we uploaded it to Carto.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/com_017_rw2_major_ports/com_017_rw2_major_ports_processing.py) for more details on this processing.
7 |
8 | You can view the processed Major Ports dataset [on Resource Watch](https://resourcewatch.org/data/explore/28d1f505-571c-4a52-8215-48ea02aa4928).
9 |
10 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/com_017_rw2_major_ports.zip), or [from the source website](https://msi.nga.mil/Publications/WPI).
11 |
12 | ###### Note: This dataset processing was done by [Matthew Iceland](https://github.com/miceland2) and [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
13 |
--------------------------------------------------------------------------------
/com_028_rw1_effect_of_ag_prices_on_commodity_prices/README.md:
--------------------------------------------------------------------------------
1 | ## Effect of Agricultural Policies on Commodity Prices Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Nominal Rate of Protection](http://www.ag-incentives.org/indicator/nominal-rate-protection) for [display on Resource Watch](https://resourcewatch.org/data/explore/641c0a35-f2e5-4198-8ed9-576ea7e9685a).
3 |
4 | The data source provided the dataset as one csv file.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Import the data as a pandas dataframe.
9 | 2. Convert years in the 'year' column to datetime objects and store them in a new column 'datetime'.
10 | 3. Subset the dataframe to retain data that are aggregates of all products at country level.
11 | 4. Remove the 'notes' column since it only contains indexes instead of actual data.
12 | 5. Remove the 'productcode' column since it contains the same information as the column 'productname'.
13 | 6. Remove the 'source' column since it contains the same information as the column 'sourceversion'
14 | 7. Convert the column names to lowercase to match Carto column name requirements.
15 |
16 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/com_028_rw1_effect_of_ag_prices_on_commodity_prices/com_028_rw1_effect_of_ag_prices_on_commodity_prices_processing.py) for more details on this processing.
17 |
18 | You can view the processed Effect of Agricultural Policies on Commodity Prices dataset [on Resource Watch](https://resourcewatch.org/data/explore/641c0a35-f2e5-4198-8ed9-576ea7e9685a).
19 |
20 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/com_028_rw1_effect_of_ag_prices_on_commodity_prices.zip), or [from the source website](http://www.ag-incentives.org/indicator/nominal-rate-protection).
21 |
22 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
23 |
--------------------------------------------------------------------------------
/com_030a_rw1_fishing_activity/README.md:
--------------------------------------------------------------------------------
1 | ## Fishing Activity Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Global Fishing Activity](https://globalfishingwatch.org/?utm_source=wri_map&utm_medium=api_integration&utm_campaign=ocean_watch) for [display on Resource Watch](https://resourcewatch.org/data/explore/11f16cb9-def0-4bd5-a60e-50c542b837e3).
3 |
4 | The data was retrieved using a Global Fishing Watch (GFW) API. To get tiled PNGs of fishing effort for each year (from 2012 to 2020), nine `GET` requests were sent to the GFW API. The request filters data for the desired `date-range` and `geartype`. For the Fishing Activity dataset, we requested all gear types except `dredge_fishing` and `trawlers`. The API response includes a URL template for retrieving individual image "tiles" according to the location and zoom level, which Resource Watch uses to display the map of fishing activity (of certain types).
5 |
6 | #### Example: Request for fishing activity from 2012-2013
7 | ```
8 | https://gateway.api.globalfishingwatch.org/v2/4wings/generate-png?interval=10days&datasets[0]=public-global-fishing-effort:latest&color=%23f20089&date-range=2012-01-01T00:00:00.000Z,2013-01-01T00:00:00.000Z&filters[0]=geartype in ('tuna_purse_seines','driftnets','trollers','set_longlines','purse_seines','pots_and_traps','other_fishing','set_gillnets','fixed_gear','fishing','seiners','other_purse_seines','other_seines','squid_jigger','pole_and_line','drifting_longlines')
9 | ```
10 | ###### The request must contain the authorization header and a token as the value. Otherwise, the request will return an authorization error. GFW has provided WRI Ocean Watch with a token for our use.
11 |
12 |
You can view the processed Fishing Activity dataset [on Resource Watch](https://resourcewatch.org/data/explore/11f16cb9-def0-4bd5-a60e-50c542b837e3).
13 |
14 | You can also download the original dataset [from the source website](https://globalfishingwatch.org/data-download/datasets/public-fishing-effort).
15 |
16 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms), and QC'd by [Weiqi Zhou](https://www.wri.org/profile/Weiqi-Zhou).
--------------------------------------------------------------------------------
/com_030c_rw1_trawling_activity/README.md:
--------------------------------------------------------------------------------
1 | ## Trawling Activity Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Global Fishing Activity](https://globalfishingwatch.org/?utm_source=wri_map&utm_medium=api_integration&utm_campaign=ocean_watch) for [display on Resource Watch](https://resourcewatch.org/data/explore/6ccfb266-20cd-4838-82b0-5309987a62f0).
3 |
4 | The data was retrieved using a Global Fishing Watch (GFW) API. To get tiled PNGs of fishing effort for each year (from 2012 to 2020), nine `GET` requests were sent to the GFW API. The request filters data for the desired `date-range` and `geartype`. For the Trawling Activity dataset, we requested data for `dredge_fishing` and `trawlers`. The API response includes a URL template for retrieving individual image "tiles" according to the location and zoom level, which Resource Watch uses to display the map of fishing activity (of certain types).
5 |
6 | #### Example: Request for trawling activity from 2012-2013
7 | ```
8 | https://gateway.api.globalfishingwatch.org/v2/4wings/generate-png?interval=10days&datasets[0]=public-global-fishing-effort:latest&color=%23ff3f34&date-range=2012-01-01T00:00:00.000Z,2013-01-01T00:00:00.000Z&filters[0]=geartype in ('dredge_fishing','trawlers')
9 | ```
10 | ###### The request must contain the authorization header and a token as the value. Otherwise, the request will return an authorization error. GFW has provided WRI Ocean Watch with a token for our use.
11 |
12 |
You can view the processed Trawling Activity dataset [on Resource Watch](https://resourcewatch.org/data/explore/6ccfb266-20cd-4838-82b0-5309987a62f0).
13 |
14 | You can also download the original dataset [from the source website](https://globalfishingwatch.org/data-download/datasets/public-fishing-effort).
15 |
16 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms), and QC'd by [Weiqi Zhou](https://www.wri.org/profile/Weiqi-Zhou).
--------------------------------------------------------------------------------
/com_039_rw0_agricultural_trade_statistics/README.md:
--------------------------------------------------------------------------------
1 | ## Agriculture Trade Statistics Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Agricultural Trade Statistics](https://www.fao.org/faostat/en/#home) for [display on Resource Watch]({link to dataset's metadata page on Resource Watch}).
3 |
4 | The data source provided the dataset as one CSV file.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly for upload to Carto.
7 |
8 | 1. Import the data as a pandas dataframe.
9 | 2. Convert column headers to lower case.
10 | 3. Rename 'area' column to 'country'.
11 | 4. Remove whitespaces, parentheses, and add underscores from column headers.
12 | 5. Convert years in the 'year' column to datetime onjects and store them in a new column 'datetime'.
13 | 6. Remove rows without value.
14 | 7. Replace NaN in table with None.
15 |
16 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/com_039_rw0_agricultural_trade_statistics/com_039_rw0_agricultural_trade_statistics_processing.py) for more details on this processing.
17 |
18 | You can view the processed {Resource Watch public title} dataset [on Resource Watch]({link to dataset's metadata page on Resource Watch}).
19 |
20 | You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/com_039_rw0_agricultural_trade_statistics.zip), or [from the source website](https://www.fao.org/faostat/en/#data/TCL).
21 |
22 | ###### Note: This dataset processing was done by [Alex Sweeney](https://github.com/alxswny) and [Chris Rowe](https://www.wri.org/profile/chris-rowe), and QC'd by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou).
23 |
--------------------------------------------------------------------------------
/dis_016_rw1_active_fault_lines/README.md:
--------------------------------------------------------------------------------
1 | ## Active Fault Lines Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Earthquake Model Global Active Fault](https://github.com/GEMScienceTools/gem-global-active-faults/tree/2019.0) for [display on Resource Watch](https://resourcewatch.org/data/explore/c86b1409-7ddb-4ec2-b2fd-bf035db325b6).
3 |
4 | The data source provided the dataset as one shapefile.
5 |
6 | Below, we describe the steps used to reformat the shapefile so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Import the line shapefile as a geopandas dataframe.
9 | 2. Project the data so its coordinate system is WGS84.
10 | 3. Create a new column from the index of the dataframe to use as a unique id column (cartodb_id) in Carto.
11 | 4. Reorder columns by their column names.
12 |
13 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/dis_016_rw1_active_fault_lines/dis_016_rw1_active_fault_lines_processing.py) for more details on this processing.
14 |
15 | You can view the processed Active Fault Lines dataset [on Resource Watch](https://resourcewatch.org/data/explore/c86b1409-7ddb-4ec2-b2fd-bf035db325b6).
16 |
17 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/dis_016_rw1_active_fault_lines.zip), or [from the source website](https://zenodo.org/record/3376300).
18 |
19 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
20 |
--------------------------------------------------------------------------------
/dis_017_storm_events_us/README.md:
--------------------------------------------------------------------------------
1 | ## Storm Events in the US Data Pre-procecssing
2 | This file describes the data pre-processing that was done to [the Storm Events in the US dataset](https://www.ncdc.noaa.gov/stormevents/ftp.jsp) for display on Resource Watch as the following datasets:
3 | - [Tornadoes in the U.S.](https://resourcewatch.org/embed/widget/8a0f738e-4fb9-4a4c-8b9a-6363f619cdd5).
4 | - [Hail in the U.S.](https://resourcewatch.org/embed/widget/355f550b-ea6d-418d-b89d-5fbd58f3ba1b)
5 |
6 | The source provided this dataset as a set of annual csv files that were accessed via [ftp](ftp://ftp.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/). For each year, three csvs were available: the "details" csv (files starting with "StormEvents_details"), the "fatalities" csv (files starting with "StormEvents_fatalities"), and the "locations" csv (files starting with "StormEvents_locations"). The "details" and "locations" csvs for all of the years available were used.
7 |
8 | Below, we describe the steps used to append and merge the csv files:
9 | 1. The "locations" and "details" csv files for each year available were downloaded using the FTP library.
10 | 2. The annual files were appended so that there were two tables containing all of the years of data available: one table for the "details" and one for the "locations". Whitespaces in the appended locations table were deleted.
11 | 3. The "details" and the "locations" tables were merged based on the column "event_id".
12 |
13 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/dis_017_storm_events_us/dis_017_storm_events_us_processing.py) for more details on this processing.
14 |
15 | You can view the processed datasets on Resource Watch:
16 | - [Tornadoes in the U.S.](https://resourcewatch.org/embed/widget/8a0f738e-4fb9-4a4c-8b9a-6363f619cdd5).
17 | - [Hail in the U.S.](https://resourcewatch.org/embed/widget/355f550b-ea6d-418d-b89d-5fbd58f3ba1b)
18 |
19 | You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/dis_017_storm_events_us.zip), or [from the source website](https://www.ncdc.noaa.gov/stormevents/ftp.jsp).
20 |
21 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
22 |
--------------------------------------------------------------------------------
/ene_001a_reservoirs_and_dams/README.md:
--------------------------------------------------------------------------------
1 | ## Reservoirs and Dams Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the GRanD, v1.3 dataset](http://globaldamwatch.org/grand/) for [display on Resource Watch](https://resourcewatch.org/data/explore/ene001a-Global-Reservoir-and-Dam-GRanD-v13).
3 |
4 | This dataset comes in a zipped file containing two shapefiles - one containing the point locations of damns and the other containing polygons of reservoir areas.
5 |
6 | Below, we describe the steps used to combine these two datasets into one data table on Carto.
7 |
8 | 1. Copy files associated with the reservoirs shapefile into a folder named grand_reservoirs_v1_3 and zip this file. Copy files associated with the dams shapefile into a folder named grand_dams_v1_3 and zip this file. Upload each of these shapefiles as a new table in Carto.
9 | 2. For each table in Carto, add a text column called "type" - for example, the reservoirs table was modified with the following SQL statement:
10 | ```
11 | ALTER TABLE "wri-rw".grand_reservoirs_v1_3
12 |
13 | ADD type text
14 | ```
15 | 3. Fill the "type" column with the type of data in the file (dam or reservoir). For the reservoirs table, the SQL statement would be:
16 |
17 | ```
18 | UPDATE "wri-rw".grand_reservoirs_v1_3
19 |
20 | SET type='reservoir'
21 | ```
22 | 4. Change the name of one table to fit standardized Resource Watch naming conventions. In this case, the grand_dams_v1_3 was renamed as ene_001a_grand_dams_and_reservoirs_v1_3.
23 | 5. Combine the reservoirs and dams tables into one with the following SQL statement:
24 | ```
25 | INSERT INTO ene_001a_grand_dams_and_reservoirs_v1_3
26 |
27 | (the_geom, grand_id, res_name, dam_name, alt_name, river, alt_river, main_basin, sub_basin, near_city,
28 | alt_city, admin_unit, sec_admin, country, sec_cntry, year, alt_year, rem_year, dam_hgt_m, alt_hgt_m,
29 | dam_len_m, alt_len_m, area_skm, area_poly, area_rep, area_max, area_min, cap_mcm, cap_max, cap_rep,
30 | cap_min, depth_m, dis_avg_ls, dor_pc, elev_masl, catch_skm, catch_rep, data_info, use_irri, use_elec,
31 | use_supp, use_fcon, use_recr, use_navi, use_fish, use_pcon, use_live, use_othr, main_use, lake_ctrl, multi_dams,
32 | timeline, comments, url, quality, editor, long_dd, lat_dd, poly_src, type)
33 |
34 | SELECT the_geom, grand_id, res_name, dam_name, alt_name, river, alt_river, main_basin, sub_basin, near_city,
35 | alt_city, admin_unit, sec_admin, country, sec_cntry, year, alt_year, rem_year, dam_hgt_m, alt_hgt_m, dam_len_m,
36 | alt_len_m, area_skm, area_poly, area_rep, area_max, area_min, cap_mcm, cap_max, cap_rep, cap_min, depth_m,
37 | dis_avg_ls, dor_pc, elev_masl, catch_skm, catch_rep, data_info, use_irri, use_elec, use_supp, use_fcon, use_recr,
38 | use_navi, use_fish, use_pcon, use_live, use_othr, main_use, lake_ctrl, multi_dams, timeline, comments, url,
39 | quality, editor, long_dd, lat_dd, poly_src, type
40 |
41 | FROM grand_reservoirs_v1_3
42 | ```
43 | You can view the processed reservoirs and dams dataset [on Resource Watch](https://resourcewatch.org/data/explore/ene001a-Global-Reservoir-and-Dam-GRanD-v13).
44 |
45 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/ene_001a_reservoirs_and_dams.zip), or [from the source website](https://ln.sync.com/dl/bd47eb6b0/anhxaikr-62pmrgtq-k44xf84f-pyz4atkm/view/default/447819520013).
46 |
47 | ###### Note: This dataset processing was done by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
48 |
--------------------------------------------------------------------------------
/ene_009_renewable_generation_annually/Makefile:
--------------------------------------------------------------------------------
1 | .PHONY: default
2 | default: generation capacity
3 |
4 | .PHONY: capacity
5 | capacity:
6 | python ene_XXX_renewable_capacity_annually.py
7 |
8 | .PHONY: generation
9 | generation:
10 | python ene_XXX_renewable_generation_annually.py
11 |
12 | .PHONY: clean
13 | clean:
14 | rm *.zip
15 | rm ene_XXX_renewable_capacity_annually_edit.csv
16 | rm ene_XXX_renewable_generation_annually_edit.csv
17 |
--------------------------------------------------------------------------------
/ene_009_renewable_generation_annually/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas
2 | carto
3 | boto3
4 |
--------------------------------------------------------------------------------
/ene_010_renewable_capacity_annually/Makefile:
--------------------------------------------------------------------------------
1 | .PHONY: default
2 | default: generation capacity
3 |
4 | .PHONY: capacity
5 | capacity:
6 | python ene_XXX_renewable_capacity_annually.py
7 |
8 | .PHONY: generation
9 | generation:
10 | python ene_XXX_renewable_generation_annually.py
11 |
12 | .PHONY: clean
13 | clean:
14 | rm *.zip
15 | rm ene_XXX_renewable_capacity_annually_edit.csv
16 | rm ene_XXX_renewable_generation_annually_edit.csv
17 |
--------------------------------------------------------------------------------
/ene_010_renewable_capacity_annually/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas
2 | carto
3 | boto3
4 |
--------------------------------------------------------------------------------
/ene_017_rw1_energy_facility_emissions/README.md:
--------------------------------------------------------------------------------
1 | ## Energy Facility Emissions Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Energy Facility Emissions (US)](https://ghgdata.epa.gov/ghgp/main.do#) for [display on Resource Watch](https://bit.ly/3kNAD3i).
3 |
4 | The source provided this dataset as an xls file accessed through its [website](https://ghgdata.epa.gov/ghgp/main.do#). To access the file select the "Export Data" option and then select "All Reporting Years".
5 |
6 | Below, we describe the main actions performed to process the xls file:
7 |
8 | 1. Convert the years in the 'Year' column to datetime objects and store them in a new column 'datetime'.
9 | 2. Rename column headers to remove special characters and spaces so that the table can be uploaded to Carto without losing information.
10 |
11 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ene_017_rw1_energy_facility_emissions/ene_017_rw1_energy_facility_emissions_processing.py) for more details on this processing.
12 |
13 | You can view the processed Energy Facility Emissions (U.S.) dataset for [on Resource Watch](https://bit.ly/3kNAD3i).
14 |
15 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/ene_017_rw1_energy_facility_emissions.zip), or [from the source website](https://ghgdata.epa.gov/ghgp/main.do#).
16 |
17 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
18 |
--------------------------------------------------------------------------------
/foo_005_rw1_crop_area_production/README.md:
--------------------------------------------------------------------------------
1 | ## Crop Land Area and Production Dataset Pre-processing
2 |
3 | This file describes the data pre-processing that was done to [the Crop Land Area and Production](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PRFF8V&version=4.0) for [display on Resource Watch](https://resourcewatch.org/data/explore/54af072c-7bb5-4bb1-af84-ea7ba0b4fc22).
4 |
5 | The data was provided by the source as three zip files containing multiple TIFF files. The files show the harvest area, production area, and yield for several crops, disaggregated by specific technologies. Only files encompassing maize, soybean, wheat, rice, coffee, and cotton crops were processed at the complete crop level were used (combining all technologies for a total).
6 |
7 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_005_rw1_crop_area_production/foo_005_rw1_crop_area_production_processing.py) for more details on this processing.
8 |
9 | You can view the processed Crop Land Area and Production dataset [on Resource Watch](https://resourcewatch.org/data/explore/54af072c-7bb5-4bb1-af84-ea7ba0b4fc22).
10 |
11 | You can also download the original dataset [from the source website](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PRFF8V&version=4.0).
12 |
13 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), updates made by [Alex Sweeney](https://github.com/alxswny), and QC'd by [Chris Rowe](https://www.wri.org/profile/chris-rowe) and [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou).
14 |
--------------------------------------------------------------------------------
/foo_005_rw2_crop_area_production/README.md:
--------------------------------------------------------------------------------
1 | ## Crop Land Area and Production Dataset Pre-processing
2 |
3 | This file describes the data pre-processing that was done to [the Crop Land Area and Production](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SWPENT) for [display on Resource Watch](To be specified). This is a new dataset which contains updates to the existing dataset (foo_005_rw1_crop_area_production). The script differs from the previous one in two main ways - The zip files are pre-downloaded so will not be downloaded by this script and the filenames within the zip files will differ along with the dataset name.
4 |
5 | The data was provided by the source as three zip files containing multiple TIFF files. The files show the harvest area, production area, and yield for several crops, disaggregated by specific technologies. Only files encompassing maize, soybean, wheat, rice, coffee, and cotton crops were processed at the complete crop level were used (combining all technologies for a total).
6 |
7 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_005_rw2_crop_area_production/foo_005_rw2_crop_area_production_processing.py) for more details on this processing.
8 |
9 | You can view the processed Crop Land Area and Production dataset [on Resource Watch](To be specified).
10 |
11 | You can also download the original dataset [from the source website](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SWPENT).
12 |
13 | ###### Note: This dataset processing was done by (To be specified).
14 |
--------------------------------------------------------------------------------
/foo_015_rw2_global_hunger_index/README.md:
--------------------------------------------------------------------------------
1 | ## Global Hunger Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [2021 Global Hunger Index dataset](https://www.globalhungerindex.org/download/all.html) for [display on Resource Watch](https://resourcewatch.org/data/explore/c37b0b21-3d11-46ea-ba5d-4be4c0caa8ae).
3 |
4 | This dataset was provided by the source as a excel. The data shown on Resource Watch can be found in TABLE 1.1 Global Hunger Index Scores By 2021 GHI Rank.
5 |
6 | This table was read into Python as a dataframe. The data was cleaned, values listed as '<5' were replaced with 5, and the the table was converted from wide to a long form.
7 |
8 | Countries with incomplete data but significant cause for concern were not included in the source's data table, but they were noted [by the source](https://www.globalhungerindex.org/designations.html). A new column was added to the table to store a flag for "Incomplete data, significant concern," and rows were added to the table for each of these countries noted by the source.
9 |
10 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_015_rw2_global_hunger_index/foo_015_rw2_global_hunger_index_processing.py) for more details on this processing.
11 |
12 | You can view the processed Global Hunger Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/c37b0b21-3d11-46ea-ba5d-4be4c0caa8ae).
13 |
14 | You can also download original dataset [directly through Resource Watch](http://wri-projects.s3.amazonaws.com/resourcewatch/foo_015_rw2_global_hunger_index_edit.zip), or [from the source website](https://www.globalhungerindex.org/download/all.html).
15 |
16 | ###### Note: This dataset processing was done by [Tina Huang](https://www.wri.org/profile/tina-huang), and QC'd by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou).
17 |
--------------------------------------------------------------------------------
/foo_041_rw1_non_co2_agricultural_emissions/README.md:
--------------------------------------------------------------------------------
1 | ## Non-CO₂ Agricultural Emissions Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Non-CO₂ Agricultural Emissions](http://www.fao.org/faostat/en/#data/GT) for [display on Resource Watch](https://bit.ly/3aUxqvK).
3 |
4 | The source provided this dataset as a CSV file within a zip folder.
5 |
6 | Below we describe the main steps taken to process the data so that it is formatted correctly to be uploaded to Carto.
7 |
8 | 1. Read in the data as a pandas dataframe and subset the dataframe to select the total emissions of the agriculture sector.
9 | 2. Subset the dataframe to obtain emission values in carbon dioxide equivalent.
10 | 3. Subset the dataframe to only include country level information, not regions or continents.
11 | 4. Convert the emission values from gigagrams to gigatonnes and store them in a new column 'value_gigatonnes'.
12 | 5. Rename the 'Value' column to 'value_gigagrams'.
13 | 6. Convert the years in the 'Year' column to datetime objects and store them in a new column 'datetime'.
14 | 7. Convert column names to lowercase and replace spaces with underscores to match Carto column name requirements.
15 |
16 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_041_rw1_non_co2_agricultural_emissions/foo_041_rw1_non_co2_agricultural_emissions_processing.py) for more details on this processing.
17 |
18 | You can view the processed dataset for [display on Resource Watch](https://bit.ly/3aUxqvK).
19 |
20 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/foo_041_rw1_non_co2_agricultural_emissions.zip), or [from the source website](http://www.fao.org/faostat/en/#data/GT).
21 |
22 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
23 |
--------------------------------------------------------------------------------
/foo_054_rw1_soil_carbon_stocks/README.md:
--------------------------------------------------------------------------------
1 | ## Soil Carbon Stocks Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the SoilGrids250m 2.0 - Soil organic carbon stock](http://isric.org/explore/soilgrids) for [display on Resource Watch](https://resourcewatch.org/data/explore/c5a62289-bdc8-4821-83f0-6f05e3d36bdc).
3 |
4 | The source provided the data as a Google Earth Engine asset. The asset was not modified from the original version for display on Resource Watch.
5 |
6 | You can view the processed Soil Carbon Stocks dataset [on Resource Watch](https://resourcewatch.org/data/explore/c5a62289-bdc8-4821-83f0-6f05e3d36bdc).
7 |
8 | You can also download the original dataset [from the source website](https://files.isric.org/soilgrids/latest/data/soc/).
9 |
10 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
11 |
--------------------------------------------------------------------------------
/foo_060_rw0_food_system_emissions/README.md:
--------------------------------------------------------------------------------
1 | ## Food System Emissions Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Food System Emissions](https://edgar.jrc.ec.europa.eu/edgar_food#wtsau) for [display on Resource Watch](https://bit.ly/3yEqprW).
3 |
4 | The source provided this dataset as an excel file, containing data from 1990-2015. Below, we describe the main actions performed to process the data before uploading it to Carto.
5 |
6 | 1. Convert the dataframe from wide form to long form, in which one column indicates the year.
7 | 2. Convert the years in the 'Year' column to datetime objects and store them in a new column 'datetime'.
8 | 3. Convert the column headers to lowercase to match Carto column name requirements.
9 | 4. Replace NaN in the table with None.
10 | 5. The column "name" was renamed as "country_name" to avoid duplicate column names when performing spatial join with administrative boundaries in Resource Watch backoffice.
11 |
12 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/tree/master/foo_060_rw0_food_system_emissions/foo_060_rw0_food_system_emissions_processing.py) for more details on this processing.
13 |
14 | You can view the processed dataset for [display on Resource Watch](https://bit.ly/3yEqprW).
15 |
16 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/foo_060_rw0_food_system_emissions.zip), or [from the source website](https://edgar.jrc.ec.europa.eu/edgar_food#wtsau).
17 |
18 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
19 |
--------------------------------------------------------------------------------
/foo_061_rw0_blue_food_supply/README.md:
--------------------------------------------------------------------------------
1 | # Food from the Sea (Blue Food Supply) Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [New](http://www.fao.org/faostat/en/#data/FBS) and [Historic](http://www.fao.org/faostat/en/#data/FBSH) Food Balance Sheets for displaying the Food from the Sea dataset [on Resource Watch](https://resourcewatch.org/data/explore/24ad32a0-b25f-44ff-9bc0-2650ea29e0b4).
3 |
4 | The source provided two datasets, one for years collected with historic methodology and one for years collected with new methodology, as two CSV files within two zipped folders.
5 |
6 | Below we describe the main steps taken to process the data and combine the two files to upload the data to Carto.
7 |
8 | 1. Read in the data as pandas dataframes and subset the dataframes to only include values for food supply and protein supply.
9 | 2. Subset the dataframes to only include ocean-sourced food products and grand totals.
10 | 3. Subset the dataframes to only include country and global level information, not regions or continents.
11 | 4. Create a column called "Type" to distinguish between ocean-sourced total and grand total values.
12 | 5. Join the dataframes for the historic and new data, and sort by country, year, and type.
13 | 6. Rename the 'Year code' column to 'Year'.
14 | 7. Convert the years in the 'Year' column to datetime objects and store them in a new column 'datetime'.
15 | 8. Convert the value column to float.
16 | 9. Convert column names to lowercase and replace spaces with underscores to match Carto column name requirements.
17 |
18 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_061_rw0_marine_food_supply/foo_061_rw0_marine_food_supply_processing.py) for more details on this processing.
19 |
20 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/24ad32a0-b25f-44ff-9bc0-2650ea29e0b4).
21 |
22 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/foo_061_rw0_marine_food_supply.zip), or [from the source website](http://www.fao.org/faostat/en/#data/FBS).
23 |
24 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms), and QC'd by [Yujing Wu](https://www.wri.org/profile/Yuging-Wu).
25 |
--------------------------------------------------------------------------------
/foo_062_rw0_fishery_production/README.md:
--------------------------------------------------------------------------------
1 | # Fishery Production Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Fishery Production](http://www.fao.org/fishery/statistics/global-production/en) for [display on Resource Watch](https://resourcewatch.org/data/explore/ac9c2f07-9f23-4a33-9958-e02c571ec009).
3 |
4 | The source provided the data as four CSV files within zipped folders. One CSV contains the data for global (total) production as the quantity of production (tonnes). One contains the data for capture production as the quantity of production (tonnes). And the other two CSVs contain data for aquaculture as the quantity of production (tonnes) and the value of production (1000 USD).
5 |
6 | Below we describe the main steps taken to process the data so that it is formatted correctly to be uploaded to Carto.
7 |
8 | 1. Unzip each folder and read in the dataset and the country code list as a pandas dataframe.
9 | 2. Rename the 'UN_Code' column in the country code list to 'COUNTRY.UN_CODE' so it matches the column header in the dataset.
10 | 3. Merge the country code list to the dataset so each row in the dataset is matched with an ISO code and its full name.
11 | 4. Add a column to reflect the type of production measured by the value column for the dataset (ex GlobalProduction, Aquaculture, or Capture) and the variable measured (quantity or value).
12 | 5. Convert the data type of the 'VALUE' column to float.
13 | 6. Concatenate the three dataframes for Global Production, Aquaculture, and Capture.
14 | 7. Rename the 'PERIOD' column to 'year'.
15 | 8. Pivot the dataframe from long to wide form to sum the values for each type of production in a given year for each country.
16 | 9. Convert all column names to lowercase to match Carto column name requirements.
17 |
18 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_062_rw0_fishery_production/foo_062_rw0_fishery_production_processing.py) for more details on this processing.
19 |
20 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/ac9c2f07-9f23-4a33-9958-e02c571ec009).
21 |
22 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/foo_062_rw0_fishery_production.zip), or [from the source website](http://www.fao.org/fishery/statistics/global-production/en).
23 |
24 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms), and QC'd by [ Yujing Wu](https://www.wri.org/profile/yujing-wu).
25 |
--------------------------------------------------------------------------------
/foo_066_rw0_food_product_shares/README.md:
--------------------------------------------------------------------------------
1 | ## Global Food Product Import/Export Shares Dataset Pre-processing
2 | This file describes the data pre-processing that was done to Global Food Product Import Shares and Global Food Product Export Shares datasets for [display on Resource Watch](https://resourcewatch.org/).
3 |
4 | The source provided this dataset as excel files accessed through its [data explorer](https://wits.worldbank.org).
5 |
6 | Data for imports can be downloaded with the following link:
7 | https://wits.worldbank.org/CountryProfile/en/Country/WLD/StartYear/1988/EndYear/2019/TradeFlow/Export/Indicator/XPRT-PRDCT-SHR/Partner/ALL/Product/16-24_FoodProd
8 |
9 | Data for exports can be downloaded with the following link:
10 | https://wits.worldbank.org/CountryProfile/en/Country/WLD/StartYear/1988/EndYear/2019/TradeFlow/Import/Indicator/MPRT-PRDCT-SHR/Partner/ALL/Product/16-24_FoodProd#
11 |
12 | The following options were selected from the source website dropdown menu:
13 | 1. Country/region: World
14 | 2. Year: 1988-2019
15 | 3. Trade flow: Import/Export
16 | 4. Indicators: Import Product share(%)/Export Product share(%)
17 | 5. View By: Product
18 | 6. Product: Food Products
19 | 7. Partner: By country and region
20 |
21 | Below, we describe the main actions performed to process the excel files:
22 | 1. Import the data as a pandas dataframe.
23 | 2. Convert the dataframe from wide form to long form, in which one column indicates the year and the other columns indicate the percentage of imports or exports related to food products for the year.
24 | 3. Convert years in the 'year' column to datetime objects and store them in a new column 'datetime'.
25 | 4. Rename column headers to be more descriptive and to remove special characters so that it can be uploaded to Carto without losing information.
26 |
27 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_066_rw0_food_product_shares/foo_066_rw0_food_product_shares.py) for more details on this processing.
28 |
29 | You can also download the original datasets [from the source website](https://wits.worldbank.org).
30 |
31 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes).
32 |
--------------------------------------------------------------------------------
/foo_067_rw0_crop_suitability_class/README.md:
--------------------------------------------------------------------------------
1 | ## Crop Suitability Class Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Crop Suitability Class](https://gaez-data-portal-hqfao.hub.arcgis.com/pages/theme-details-theme-4).
3 |
4 | The source provided the data in multiple TIFF files available for download via URL. The TIFFs show crop suitability index for multiple timeframes and input technologies. Only files encompassing cotton, coffee, and rice were downloaded.
5 |
6 | Below, we describe the steps used to reformat the raster so that it is formatted correctly for upload to Google Earth Engine.
7 |
8 | 1. Run the [rice_ensemble_processing.py](https://github.com/resource-watch/data-pre-processing/blob/master/foo_067_rw0_crop_suitability_class/rice_ensemble_processing.py) script first.
9 | 1. This script downloads wetland and dryland rice GeoTIFFS that were produced from various climate models from the source.
10 | 2. It computes the average of the input GeoTIFFS. This is done for each timeframe, RCP scenario, watering regime, and crop.
11 | 3. Saves the average of the models to a new GeoTIFF that has "ensemble" in the beginning of its name.
12 | 2. Run the [foo_067_rw0_global_crop_suitability_class_processing.py](https://github.com/resource-watch/data-pre-processing/blob/master/foo_067_rw0_crop_suitability_class/foo_067_rw0_crop_suitability_class_processing.py) script.
13 | 1. This script downloads historic GeoTiffs for all crops, and ensemble GeoTIFFS for cotton and coffee from the source.
14 | 2. Adds in the ensemble GeoTIFFs that were created from [rice_ensemble_processing.py](https://github.com/resource-watch/data-pre-processing/blob/master/foo_067_rw0_crop_suitability_class/rice_ensemble_processing.py) for bulk upload.
15 | 3. Upload files to GEE.
16 |
17 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_067_rw0_crop_suitability_class/foo_067_rw0_crop_suitability_class_processing.py) for more details on this processing.
18 |
19 | You can also download the original dataset [from the source website](https://gaez-data-portal-hqfao.hub.arcgis.com/pages/data-viewer).
20 |
21 | ###### Note: This dataset processing was done by [Alex Sweeney](https://github.com/alxswny) and [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Chris Rowe](https://www.wri.org/profile/chris-rowe).
22 |
--------------------------------------------------------------------------------
/foo_068_rw0_agro_ecological_zones/README.md:
--------------------------------------------------------------------------------
1 | ## Agro-Ecological Zones Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the GAEZ v4 AEZ Classification by climate, soil, terrain, LC (57 classes)](https://gaez-data-portal-hqfao.hub.arcgis.com/pages/theme-details-theme-1).
3 |
4 | The source provided the data in multiple TIFF files available for download via URL. The TIFFs show Agro-Ecological Zone classification for the world.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_068_rw0_agro_ecological_zones/foo_068_rw0_agro_ecological_zones_processing.py) for more details on this processing.
7 |
8 | You can also download the original dataset [from the source website](https://gaez-data-portal-hqfao.hub.arcgis.com/pages/data-viewer).
9 |
10 | ###### Note: This dataset processing was done by [Alex Sweeney](https://github.com/alxswny) and [Chris Rowe](https://www.wri.org/profile/chris-rowe), and QC'd by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou).
11 |
--------------------------------------------------------------------------------
/foo_069_rw0_relative_change_crop_yield/README.md:
--------------------------------------------------------------------------------
1 | ## Relative Change in Crop Yield Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Relative change in crop yield](http://climate-impact-explorer.climateanalytics.org/).
3 |
4 | The data was retrieved from the Climate Analytics' Impact Data Explorer API, which provides a CSV file for each administrative unit (e.g, country or province).
5 |
6 | Below, we describe the steps used to combine CSVs for each country/province into one global table on Carto.
7 |
8 | 1. Loop through CSVs and append a country column that adds the ISO3 code.
9 | 2. Loop through CSVs and append a provincial column that adds the admin1 code.
10 | 3. Create a Pandas dataframe for each CSV.
11 | 4. Combine all the newly created dataframes together into one dataframe.
12 | 5. Create a new column 'datetime' to store years as datetime objects.
13 | 6. Drop the unnamed column.
14 | 7. Filter out CAT emissions scenario columns.
15 | 8. Filter out NGFS emissions scenario columns.
16 | 9. Replace periods in column headers with underscores.
17 | 10. Replace whitespaces with underscores in column headers.
18 | 11. Replace NaN with None.
19 | 12. Save Processed dataframe as a CSV.
20 |
21 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/foo_069_rw0_relative_change_crop_yield/foo_069_rw0_relative_change_crop_yield_processing.py) for more details on this processing.
22 |
23 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/foo_069_rw0_relative_change_crop_yield.zip), or [from the source website](https://cie-api.climateanalytics.org/api/).
24 |
25 | ###### Note: This dataset processing was done by [Alex Sweeney](https://github.com/alxswny) and [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Chris Rowe](https://www.wri.org/profile/chris-rowe).
26 |
--------------------------------------------------------------------------------
/for_001_rw2_tree_cover/README.md:
--------------------------------------------------------------------------------
1 | ## Tree Cover Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Forest Change 2000–2019](http://earthenginepartners.appspot.com/science-2013-global-forest) for [display on Resource Watch](https://resourcewatch.org/data/explore/027799db-c161-462a-9c20-aeb97b84e06e).
3 |
4 | The source provided the data as a Google Earth Engine asset. The asset was not modified from the original version for display on Resource Watch.
5 |
6 | You can view the processed Tree Cover dataset [on Resource Watch](https://resourcewatch.org/data/explore/027799db-c161-462a-9c20-aeb97b84e06e).
7 |
8 | You can also download the original dataset [from the source website](http://earthenginepartners.appspot.com/science-2013-global-forest/download_v1.7.html).
9 |
10 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
11 |
--------------------------------------------------------------------------------
/for_005a_mangrove/README.md:
--------------------------------------------------------------------------------
1 | ## Mangrove Forests Data Pre-Processing
2 | This file describes the data pre-processing that was done to [Mangrove Forests dataset](https://data.unep-wcmc.org/datasets/45) for [display on Resource Watch](https://resourcewatch.org/data/explore/386314c4-ab42-47a7-b2cd-596b788e114d).
3 |
4 | This dataset was provided by the source as a series of shapefiles, one for each year of data, including 1996, 2007, 2008, 2009, 2010, 2015, and 2016.
5 |
6 | In order to upload the selected data on Resource Watch, the following steps were taken:
7 |
8 | 1. Download and unzip source data.
9 | 2. Combine the shapefiles for each year into one shapefile that includes all years of data, and add a column named 'year' to indicate which year each row represents.
10 | 3. Overwrite the unique IDs in the table so that there will not be duplicates in the combined file.
11 |
12 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/for_005a_mangrove/for_005a_mangrove_processing.py) for more details on this processing.
13 |
14 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/386314c4-ab42-47a7-b2cd-596b788e114d).
15 |
16 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/for_005a_mangrove.zip), or [from the source website](https://data.unep-wcmc.org/datasets/45).
17 |
18 | ###### Note: This dataset processing was done by [Kristine Lister](https://www.wri.org/profile/kristine-lister), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
19 |
--------------------------------------------------------------------------------
/for_005b_rw0_mangrove_extent_change/README.md:
--------------------------------------------------------------------------------
1 | ## Mangrove Extent Change Data Pre-Processing
2 | This file describes the data pre-processing that was done to [Global Mangrove Watch (1996-2016)](https://data.unep-wcmc.org/datasets/45) for [display on Resource Watch](https://resourcewatch.org/data/explore/f31dece0-9256-428a-84de-3a59f5c06bb7).
3 |
4 | This dataset was provided by the source as two shapefiles, one for 1996 and one for 2016.
5 |
6 | To calculate the mangrove extent change from 1996 to 2016 for display on Resource Watch, the following steps were performed in Esri Modelbuilder in ArcMap:
7 |
8 | 1. Open the shapefiles as layers in ArcMap.
9 | 2. Convert the 1996 and 2016 layers to Lambert (a cylindrical equal area) projection (8287).
10 | 3. Calculate the geometric intersection of the 1996 and 2016 layers.
11 | 4. Erase the geometric intersection from the 1996 layer to create a loss layer.
12 | 5. Erase the geometric intersection from the 2016 layer to create a gain layer.
13 | 6. Merge the gain and loss layers.
14 | 7. Export the merged layer as a shapefile.
15 |
16 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/f31dece0-9256-428a-84de-3a59f5c06bb7).
17 |
18 | You can also download original underlying datasets [from the source website](https://data.unep-wcmc.org/datasets/45).
19 |
20 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms), and QC'd by []().
21 |
--------------------------------------------------------------------------------
/for_007_rw2_tree_cover_gain/README.md:
--------------------------------------------------------------------------------
1 | ## Tree Cover Gain Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Forest Change 2000–2019](http://earthenginepartners.appspot.com/science-2013-global-forest) for [display on Resource Watch](https://resourcewatch.org/data/explore/426fe3f0-12de-4adb-b3aa-850eb04f75b4).
3 |
4 | The source provided the data as a Google Earth Engine asset. The asset was not modified from the original version for display on Resource Watch.
5 |
6 | You can view the processed Tree Cover Gain dataset [on Resource Watch](https://resourcewatch.org/data/explore/426fe3f0-12de-4adb-b3aa-850eb04f75b4).
7 |
8 | You can also download the original dataset [from the source website](http://earthenginepartners.appspot.com/science-2013-global-forest/download_v1.7.html).
9 |
10 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
11 |
--------------------------------------------------------------------------------
/for_008_rw2_tree_cover_loss/README.md:
--------------------------------------------------------------------------------
1 | ## Tree Cover Loss Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Forest Change 2000–2020](http://earthenginepartners.appspot.com/science-2013-global-forest) for [display on Resource Watch](https://resourcewatch.org/data/explore/5c5e654e-182b-4ab4-8a3c-edff79cc68ea).
3 |
4 | The source provided the data as a Google Earth Engine asset. The asset was not modified from the original version for display on Resource Watch.
5 |
6 | You can view the processed Tree Cover Loss dataset [on Resource Watch](https://resourcewatch.org/data/explore/5c5e654e-182b-4ab4-8a3c-edff79cc68ea).
7 |
8 | You can also download the original dataset [from the source website](https://storage.googleapis.com/earthenginepartners-hansen/GFC-2020-v1.8/download.html).
9 |
10 | ###### Note: This dataset processing was done by [Kristine Lister](https://www.wri.org/profile/kristine-lister), and QC'd by [Kristine Lister](https://www.wri.org/profile/kristine-lister).
11 |
--------------------------------------------------------------------------------
/for_014_rw1_internationally_important_wetlands/README.md:
--------------------------------------------------------------------------------
1 | ## Ramsar Internationally Important Wetlands Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Wetlands of International Importance (the Ramsar List)](http://www.ramsar.org/about/wetlands-of-international-importance-ramsar-sites) for [display on Resource Watch](https://resourcewatch.org/data/explore/2ac1f43c-30bc-4ed9-846c-8443ce4987a9).
3 |
4 | The data source provided the dataset as one csv file.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Import the data as a pandas dataframe.
9 | 2. Create a new column 'wetland_type_general' of general wetland type for visualization.
10 | 3. Replace the 'nan' in column 'wetland_type_general' with 'Other'.
11 | 4. Replace NaN in the table with None.
12 | 5. Replace whitespaces in column names with underscores and convert the column names to lowercase to match Carto column name requirements.
13 |
14 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/for_014_rw1_internationally_important_wetlands/for_014_rw1_internationally_important_wetlands_processing.py) for more details on this processing.
15 |
16 | You can view the processed Ramsar Internationally Important Wetlands dataset [on Resource Watch](https://resourcewatch.org/data/explore/2ac1f43c-30bc-4ed9-846c-8443ce4987a9).
17 |
18 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/for_014_rw1_internationally_important_wetlands.zip), or [from the source website](https://rsis.ramsar.org/?pagetab=3).
19 |
20 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
21 |
--------------------------------------------------------------------------------
/for_018_rw1_bonn_challenge_restoration_commitment/README.md:
--------------------------------------------------------------------------------
1 | ## Bonn Challenge Restoration Commitment Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Bonn Challenge Restoration Commitment](http://www.bonnchallenge.org/) for [display on Resource Watch](https://resourcewatch.org/data/explore/fb5edc45-b105-4b13-a6c3-5f3e314a4086).
3 |
4 | The source provided the data as a table on its website.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Create a new column 'unit' to store the unit of the pledged areas.
9 | 2. Remove 'hectare' from all the entries in the column 'pledged_area'.
10 | 3. Convert the data type of the 'pledged_area' column to integer.
11 | 4. Convert the values in the column 'pledged_area' to be million hectares.
12 | 5. Subset the dataframe to remove data of pledgers that are not countries.
13 |
14 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/for_018_rw1_bonn_challenge_restoration_commitment/for_018_rw1_bonn_challenge_restoration_commitment_processing.py) for more details on this processing.
15 |
16 | You can view the processed Bonn Challenge Restoration Commitment dataset [on Resource Watch](https://resourcewatch.org/data/explore/fb5edc45-b105-4b13-a6c3-5f3e314a4086).
17 |
18 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/for_018_rw1_bonn_challenge_restoration_commitment.zip), or [from the source website](https://www.bonnchallenge.org/pledges).
19 |
20 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
21 |
--------------------------------------------------------------------------------
/for_021_rw1_certified_forest/README.md:
--------------------------------------------------------------------------------
1 | ## Certified Forest Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Certified Forest](https://unstats-undesa.opendata.arcgis.com/datasets/13a621d222c243dc906d7ee53d13eff3) dataset for [display on Resource Watch](https://bit.ly/3xqZ391).
3 |
4 | The data was accessed through the source's API and downloaded as a GeoJson file. Below we describe the main steps taken to process the data so that it is formatted correctly to be uploaded to Carto.
5 |
6 | 1. Read in the data as a geopandas dataframe and subset the dataframe to select only columns of interest.
7 | 2. Convert the table from wide to long form by melting the columns displaying years values into a single year column.
8 | 3. Convert the certified area values from thousands of hectares to hectares.
9 | 3. Convert the years in the 'Year' column to datetime objects and store them in a new column 'datetime'.
10 | 4. Convert column names to lowercase and replace spaces with underscores to match Carto column name requirements.
11 |
12 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/for_021_rw1_certified_forest/for_021_rw1_certified_forest_processing.py) for more details on this processing.
13 |
14 | You can view the processed dataset for [display on Resource Watch](https://bit.ly/3xqZ391).
15 |
16 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/for_021_rw1_certified_forest.zip), or [from the source website](https://unstats-undesa.opendata.arcgis.com/datasets/13a621d222c243dc906d7ee53d13eff3).
17 |
18 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
19 |
--------------------------------------------------------------------------------
/for_029_peatlands/README.md:
--------------------------------------------------------------------------------
1 | ## Peatlands Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the PEATMAP dataset](http://archive.researchdata.leeds.ac.uk/251/) for [display on Resource Watch](https://resourcewatch.org/data/explore/for019-Peatland-SEA).
3 |
4 | The Peatlands dataset comes in zipped files for each of the following continents: Africa, Asia, Europe, North America, Oceania, and South America. For some continents, a single shapefile was provided for the entire continent. For others, several shapefiles at a finer level, such as at the national level, were provided. For example, the zipped file for North America includes shapefiles for Canada, the United States, and other North American peatlands.
5 |
6 | Below, we describe the steps used to process and combine these regional datasets into a global peatlands dataset.
7 |
8 | 1. Upload individual regional shapefile datasets to Carto.
9 | 2. For each regional dataset in Carto, add a column called "region", and fill it with the name of the shapefile region. For example, the following SQL statements were used for Southeast Asia peatlands shapefile, which was stored in a table named for_029_peatland_sea:
10 |
11 | ```
12 | ALTER TABLE for_029_peatland_sea ADD COLUMN region VARCHAR
13 |
14 | UPDATE for_029_peatland_sea
15 |
16 | SET region = 'Southeast Asia';
17 | ```
18 | 3. Examine attribute columns and standardize area column names. Some shapefiles have an area column named “peat_area” and some have columns named “area”. The following SQL statement was used to standardize the area column name:
19 | ```
20 | ALTER TABLE for_029_peatland_africa
21 | RENAME peat_area TO area;
22 | ```
23 | 4. Combine regional shapefiles into a global peatlands dataset, called for_029_peatlands. For example, the following SQL statement was used to insert the Southeast Asia peatlands shapefile into the new, global table:
24 | ```
25 | INSERT INTO for_029_peatlands(the_geom, area, region) SELECT the_geom, area, region
26 | FROM for_029_peatland_sea
27 | ```
28 | You can view the processed, global peatland dataset [here](https://resourcewatch.carto.com/u/wri-rw/dataset/for_029_peatlands).
29 |
30 | You can also download original dataset, by region, [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/for_029_peatlands.zip), or [from the source website](http://archive.researchdata.leeds.ac.uk/251/).
31 |
32 | ###### Note: This dataset processing was done by [Tina Huang](https://www.wri.org/profile/tina-huang), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
33 |
--------------------------------------------------------------------------------
/for_031_rw0_forest_landscape_integrity_index/README.md:
--------------------------------------------------------------------------------
1 | ## Forest Landscape Integrity Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Forest Landscape Integrity Index](https://www.forestintegrity.com/) for [display on Resource Watch](https://resourcewatch.org/data/explore/e044a0d7-b3d4-4612-9848-71322b1772cc).
3 |
4 | This dataset is provided by the source as a GeoTIFF file. The following variable is shown on Resource Watch:
5 | - 2019 Forest Landscape Integrity Index (flii): Continuous index representing the degree of forest anthropogenic modification for the beginning of 2019
6 |
7 | The data source multiplied the data by 1000 to store values in Integer format. Data was divided by 1000 to obtain proper values (Range 0-10) for display on Resource Watch.
8 |
9 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/for_031_rw0_forest_landscape_integrity_index/for_031_rw0_forest_landscape_integrity_index_processing.py) for more details on this processing.
10 |
11 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/e044a0d7-b3d4-4612-9848-71322b1772cc).
12 |
13 | You can also download the original dataset [from the source website](https://www.forestintegrity.com/download-data).
14 |
15 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms) and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
16 |
--------------------------------------------------------------------------------
/ocn_001_gebco_bathymetry/README.md:
--------------------------------------------------------------------------------
1 | ## GEBCO Bathymetry Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [GEBCO_2020 Grid](https://www.gebco.net/data_and_products/gridded_bathymetry_data/) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_001_gebco_bathymetry/ocn_001_gebco_bathymetry_processing.py) for more details on this processing.
5 |
6 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
7 |
8 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/ocn_001_gebco_bathymetry.zip), or [from the source website](https://www.gebco.net/data_and_products/gridded_bathymetry_data/), by donwloading the GEBCO_2020 Grid [Data GeoTiff](https://www.bodc.ac.uk/data/open_download/gebco/gebco_2020/geotiff/)
9 | (4 Gbytes, 8 Gbytes uncompressed).
10 |
11 | ###### Note: This dataset processing was done by [Kristine Lister](https://www.wri.org/profile/kristine-lister).
12 |
--------------------------------------------------------------------------------
/ocn_003_projected_sea_level_rise/README.md:
--------------------------------------------------------------------------------
1 | ## Projected Sea Level Rise Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Total Ensemble Mean Sea Surface Height Time Series](https://icdc.cen.uni-hamburg.de/en/ar5-slr.html) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | This dataset is provided by the source as a single NetCDF file. The following variables are shown on Resource Watch:
5 | - 2010 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2010, relative to the average sea surface height during 1986–2005
6 | - 2020 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2020, relative to the average sea surface height during 1986–2005
7 | - 2030 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2030, relative to the average sea surface height during 1986–2005
8 | - 2040 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2040, relative to the average sea surface height during 1986–2005
9 | - 2050 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2050, relative to the average sea surface height during 1986–2005
10 | - 2060 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2060, relative to the average sea surface height during 1986–2005
11 | - 2070 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2070, relative to the average sea surface height during 1986–2005
12 | - 2080 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2080, relative to the average sea surface height during 1986–2005
13 | - 2090 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2090, relative to the average sea surface height during 1986–2005
14 | - 2100 Cumulative Sea Level Rise (m): Projected mean sea surface height in 2100, relative to the average sea surface height during 1986–2005
15 |
16 | To display these data on Resource Watch, a multiband GeoTIFF file was extracted from one subdataset within the source NetCDF file, with each band in the resulting GeoTIFF corresponding to one year in the time series. This GeoTIFF was then translated into the appropriate coordinates for web display.
17 |
18 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_003_projected_sea_level_rise/ocn_003_projected_sea_level_rise_processing.py) for more details on this processing.
19 |
20 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
21 |
22 | You can also download the original dataset [from the source website](https://icdc.cen.uni-hamburg.de/en/ar5-slr.html).
23 |
24 | ###### Note: This dataset processing was done by [Peter Kerins](https://www.wri.org/profile/peter-kerins).
--------------------------------------------------------------------------------
/ocn_005_historical_cyclone_intensity/README.md:
--------------------------------------------------------------------------------
1 | ## Historical Cyclone Intensity Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Tropical cyclones wind speed buffers footprint 1970-2009](https://preview.grid.unep.ch/index.php?preview=data&events=cyclones&evcat=4&lang=eng) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | This dataset is provided by the source as a (zipped) GeoTIFF file. The following variables are shown on Resource Watch:
5 | - 1970-2009 Maximum Storm Intensity (cy_intensity): Highest estimated category on the Saffir-Simpson hurricane wind scale, which measures storm intensity, during 1970-2009.
6 |
7 | Because the data were already provided in a compatible format and projection, very little processing was necessary to display this data on Resource Watch.
8 |
9 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_005_historical_cyclone_intensity/ocn_005_historical_cyclone_intensity_processing.py) for more details on this processing.
10 |
11 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
12 |
13 | You can also download the original dataset [from the source website](https://preview.grid.unep.ch/index.php?preview=data&events=cyclones&evcat=4&lang=eng).
14 |
15 | ###### Note: This dataset processing was done by [Peter Kerins](https://www.wri.org/profile/peter-kerins).
--------------------------------------------------------------------------------
/ocn_006_projected_ocean_acidification/README.md:
--------------------------------------------------------------------------------
1 | ## Projected Ocean Acidification Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Projections of Coral Bleaching and Ocean Acidification for Coral Reef Areas](https://coralreefwatch.noaa.gov/climate/projections/piccc_oa_and_bleaching/index.php) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | This dataset is contained within a NetCDF file. The following variable is shown on Resource Watch, with a separate layer for the projected value at the start of each decade between 2010 and 2100:
5 | - 2XXX Aragonite Saturation State: Common measure of carbonate ion concentration, which indicates the availability of the calcium carbonate that is widely used by marine calcifiers, from lobsters to clams to starfish.
6 |
7 | To process this data for display on Resource Watch, the NetCDF file was acquired directly from the source organization. The time series data were extracted to a multiband GeoTIFF, with one band for each year 2006-2100. This GeoTIFF was then masked (all bands) by using [a Natural Earth set of polygons](https://www.naturalearthdata.com/downloads/10m-physical-vectors/10m-land/) to exclude land masses and freshwater bodies, since the data are only intended and valid for the ocean.
8 |
9 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_006_projected_ocean_acidification/ocn_006_projected_ocean_acidification.py) for more details on this processing.
10 |
11 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
12 |
13 | You can also download a representation of the data [from the source website](https://coralreefwatch.noaa.gov/climate/projections/piccc_oa_and_bleaching/index.php).
14 |
15 | ###### Note: This dataset processing was done by [Peter Kerins](https://www.wri.org/profile/peter-kerins).
--------------------------------------------------------------------------------
/ocn_006alt_projected_ocean_acidification_coral_reefs/README.md:
--------------------------------------------------------------------------------
1 | ## Projected Ocean Acidification for Coral Reefs Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Projections of Coral Bleaching and Ocean Acidification for Coral Reef Areas](https://coralreefwatch.noaa.gov/climate/projections/piccc_oa_and_bleaching/index.php) to produce an dataset for use with coral reef content, and perhaps at some point [display on Resource Watch](https://resourcewatch.org/data/explore/). This is an alternate dataset to [the global version](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_006_projected_ocean_acidification/)
3 |
4 | This dataset is contained within a NetCDF file. The following variable is shown on Resource Watch, with a separate layer for the projected value at the start of each decade between 2010 and 2100:
5 | - 2XXX Aragonite Saturation State: Common measure of carbonate ion concentration, which indicates the availability of the calcium carbonate that is widely used by marine calcifiers, from lobsters to clams to starfish. For this dataset, that has been masked to include only areas containing coral reefs, according to the internal mask used by some datasets in the source package.
6 |
7 | To process this data for display on Resource Watch, the NetCDF file was acquired directly from the source organization. The time series data were extracted to a multiband GeoTIFF, with one band for each year 2006-2100. This GeoTIFF was then masked (all bands), using other rasters within the source package to define the scope, in order to show projections only in pixels containing coral reefs.
8 |
9 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_006alt_projected_ocean_acidification_coral_reefs/ocn_006alt_projected_ocean_acidification_coral_reefs.py) for more details on this processing.
10 |
11 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
12 |
13 | You can also download a representation of the data [from the source website](https://coralreefwatch.noaa.gov/climate/projections/piccc_oa_and_bleaching/index.php).
14 |
15 | ###### Note: This dataset processing was done by [Peter Kerins](https://www.wri.org/profile/peter-kerins).
--------------------------------------------------------------------------------
/ocn_007_coral_bleaching_monitoring/README.md:
--------------------------------------------------------------------------------
1 | ## Coral Bleaching Monitoring Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Daily Global 5km Satellite Coral Bleaching Heat Stress Monitoring](https://coralreefwatch.noaa.gov/product/5km/index.php) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | This dataset is provided by the source in a number of separate NetCDF files. The following variables are shown on Resource Watch:
5 | - Bleaching Alert Area (bleaching_alert_area_7d): Areas where bleaching thermal stress satisfies specific criteria based on HotSpot and Degree Heating Week (DHW) values
6 | - HotSpot (hotspots): Measures the magnitude of daily thermal stress that can lead to coral bleaching
7 | - Degree Heating Week (degree_heating_week): Indicates duration and magnitude of thermal stress (one DHW is equal to one week of SST 1°C warmer than the historical average for the warmest month of the year)
8 | - Sea Surface Temperature Anomaly (sea_surface_temperature_anomaly): Difference between the daily SST and the corresponding daily SST climatology
9 | - Sea Surface Temperature (sea_surface_temperature): Night-time temperature of ocean at surface
10 | - Sea Surface Tempature Trend (sea_surface_temperature_trend_7d): Pace and direction of the SST variation and thus coral bleaching heat stress
11 |
12 | To process this data for display on Resource Watch, each NetCDF file was downloaded, and the relevant subdatasets were extracted to individual, single-band GeoTIFFs. These GeoTIFF files were then merged into a single multi-band GeoTIFF, where each band corresponds to one variable.
13 |
14 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_007_coral_bleaching_monitoring/ocn_007_coral_bleaching_monitoring_processing.py) for more details on this processing.
15 |
16 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
17 |
18 | You can also download the original data [from the source website](https://coralreefwatch.noaa.gov/product/5km/index.php).
19 |
20 | ###### Note: This dataset processing was done by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid) and [Peter Kerins](https://www.wri.org/profile/peter-kerins).
--------------------------------------------------------------------------------
/ocn_008_historical_coral_bleaching_stress_frequency/README.md:
--------------------------------------------------------------------------------
1 | ## Historical Coral Bleaching Stress Frequency Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Thermal History - Stress Frequency (1985-2018), Version 2.1](https://coralreefwatch.noaa.gov/product/thermal_history/stress_frequency.php) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | This dataset is provided by the source as a netcdf file. The following variables are shown on Resource Watch:
5 | - n_gt0: The number of events for which the thermal stress, measured by Degree Heating Weeks, exceeded 0 degC-weeks.
6 | - n_ge4: The number of events for which the thermal stress, measured by Degree Heating Weeks, reached or exceeded 4 degC-weeks.
7 | - n_ge8: The number of events for which the thermal stress, measured by Degree Heating Weeks, reached or exceeded 8 degC-weeks.
8 | - rp_gt0: The average time between events for which the thermal stress, measured by Degree Heating Weeks, exceeded 0 degC-weeks.
9 | - rp_ge4: The average time between events for which the thermal stress, measured by Degree Heating Weeks, reached or exceeded 4 degC-weeks.
10 | - rp_ge8: The average time between events for which the thermal stress, measured by Degree Heating Weeks, reached or exceeded 8 degC-weeks.
11 |
12 | To process this data for display on Resource Watch, the data in each of these netcdf variables is first converted to individual tif files. Then, these individual tif files are merged into a single tif file with a band for each variable.
13 |
14 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_008_historic_coral_bleaching_stress_frequency/ocn_008_historic_coral_bleaching_stress_frequency_processing.py) for more details on this processing.
15 |
16 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/4c64fb3d-a05e-45ef-b886-2a75f418e00b).
17 |
18 | You can also download original dataset [from the source website](https://coralreefwatch.noaa.gov/product/thermal_history/stress_frequency.php).
19 |
20 | ###### Note: This dataset processing was done by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
21 |
--------------------------------------------------------------------------------
/ocn_009_sea_surface_temperature_variability/README.md:
--------------------------------------------------------------------------------
1 | ## Sea Surface Temperature Variability Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Thermal History - SST Variability (1985-2018)](https://coralreefwatch.noaa.gov/product/thermal_history/sst_variability.php) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | This dataset is contained within a NetCDF file. The following variables are shown on Resource Watch:
5 | - Sea Surface Temperature - Warmest Month Variability (stdv_maxmonth): Fluctuation in average SST during the warmest month of each year, from year to year (1985-2018), in areas containing coral reefs
6 | - Sea Surface Temperature - Annual Variability (stdv_annual): Fluctuation in average SST across the entire year, from year to year (1985-2018), in areas containing coral reefs
7 |
8 | To process this data for display on Resource Watch, the NetCDF file was downloaded. The relevant subdatasets were extracted to individual, single-band GeoTIFFs, as was a "mask" showing the extent of the dataset (which covers only areas containing coral reefs). These GeoTIFF files were masked accordingly, then merged into a single multi-band GeoTIFF, with separate bands containing the data for each variable.
9 |
10 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_009_sea_surface_temperature_variability/ocn_009_sea_surface_temperature_variability_processing.py) for more details on this processing.
11 |
12 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
13 |
14 | You can also download the original data [from the source website](https://coralreefwatch.noaa.gov/product/thermal_history/sst_variability.php).
15 |
16 | ###### Note: This dataset processing was done by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid) and [Peter Kerins](https://www.wri.org/profile/peter-kerins).
--------------------------------------------------------------------------------
/ocn_010_projected_coral_bleaching/README.md:
--------------------------------------------------------------------------------
1 | ## Projected Coral Bleaching Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Downscaled climate model projections of coral bleaching condition](https://coralreefwatch.noaa.gov/climate/projections/downscaled_bleaching_4km/index.php) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | This dataset is contained within an ArcGIS Layer Package file (and available for viewing in Google Earth via a Keyhole Markup Language zip archive). The following variables are shown on Resource Watch:
5 | - Onset of 2x per Decade Severe Coral Bleaching Conditions (per_decade_2x): First year when severe bleaching conditions are projected to occur at least twice per decade. These projections correspond to the RCP 8.5 "business as usual" scenario, under which current emissions trends continue.
6 | - Onset of 10x per Decade Severe Coral Bleaching Conditions (per_decade_10x): First year when severe bleaching conditions are projected to occur at least ten times per decade. These projections correspond to the RCP 8.5 "business as usual" scenario, under which current emissions trends continue.
7 |
8 | To process this data for display on Resource Watch, the ArcGIS Layer Package was downloaded, and its components were saved as GeoTIFFs. The relevant GeoTIFFs were merged into a single multi-band GeoTIFF, with separate bands containing the data for each variable.
9 |
10 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_010_projected_coral_bleaching/ocn_010_projected_coral_bleaching_processing.py) for more details on this processing.
11 |
12 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
13 |
14 | You can also download the original data [from the source website](https://coralreefwatch.noaa.gov/climate/projections/downscaled_bleaching_4km/index.php).
15 |
16 | ###### Note: This dataset processing was done by [Peter Kerins](https://www.wri.org/profile/peter-kerins).
17 |
--------------------------------------------------------------------------------
/ocn_012_coral_reef_tourism_value/README.md:
--------------------------------------------------------------------------------
1 | ## Coral Reef Tourism Value Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Mapped estimates of the dollar values of coral reefs to the tourism sector](http://maps.oceanwealth.org/) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | The dataset is contained within a GeoDataBase, containing many vector layers, many of which are available for viewing on the Mapping Ocean Wealth platform. The following variables are shown on Resource Watch:
5 | - Annual Value of Coral Reef Tourism (2013 USD/km²): Estimated total annual economic value per square kilometer of coral reefs to the tourism sector.
6 | - Annual Value of On-Reef Coral Tourism (2013 USD/km²): Estimated annual economic value per square kilometer of on-reef activities to the tourism sector.
7 | - Annual Value of Reef-Adjacent Coral Tourism (2013 USD/km²): Estimated annual economic value per square kilometer of reef-adjacent activities and services to the tourism sector.
8 |
9 | To process this data for display on Resource Watch, the GeoDataBase was downloaded, and the layers of interest were extracted into individual shapefiles. Existing attributes were used to calculate a new attribute describe value per square kilometer. This attribute was then rasterized into a single-band GeoTIFF, which geospatial properties identical to the originating vector files. These GeoTIFF files were then merged into one multi-band GeoTIFF, with separate bands containing the data for each variable.
10 |
11 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_012_coral_reef_tourism_value/ocn_012_coral_reef_tourism_value.py) for more details on this processing.
12 |
13 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/ocn013-Coral-Reef-Tourism-Value).
14 |
15 | The original data can be viewed on the [Mapping Ocean Wealth platform](http://maps.oceanwealth.org/), and are available for download upon request.
16 |
17 | ###### Note: This dataset processing was done by [Peter Kerins](https://www.wri.org/profile/peter-kerins).
18 |
--------------------------------------------------------------------------------
/ocn_013_coral_reef_fisheries_relative_catch/README.md:
--------------------------------------------------------------------------------
1 | ## Coral Reef Fisheries Relative Catch Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [estimated relative coral reef-associated fisheries catch](http://maps.oceanwealth.org/) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | The dataset is contained within a GeoDataBase, containing many vector layers, many of which are available for viewing on the Mapping Ocean Wealth platform. The following variable is shown on Resource Watch:
5 | - Coral Reef Fisheries Catch: Estimate of the relative size of coral reef fisheries catch (by weight) grouped by decile, based on estimated reef productivity and fishing effort, as well as the presence of protected "no-take" fishing areas.
6 |
7 | To process this data for display on Resource Watch, the GeoDataBase was downloaded, and the layer of interest was extracted into a shapefile. The appropriate decile attribute was then rasterized into a single-band GeoTIFF, which geospatial properties identical to the originating vector file.
8 |
9 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_013_coral_reef_fisheries_relative_catch/ocn_013_coral_reef_fisheries_relative_catch.py) for more details on this processing.
10 |
11 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
12 |
13 | The original data can be viewed on the [Mapping Ocean Wealth platform](http://maps.oceanwealth.org/), and are available for download upon request.
14 |
15 | ###### Note: This dataset processing was done by [Peter Kerins](https://www.wri.org/profile/peter-kerins).
--------------------------------------------------------------------------------
/ocn_014_index_of_coastal_protection_by_coral_reefs/README.md:
--------------------------------------------------------------------------------
1 | ## Index of Coastal Protection by Coral Reefs Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Global Coral Protection Index](http://maps.oceanwealth.org/) for [display on Resource Watch](https://resourcewatch.org/data/explore/).
3 |
4 | The dataset is contained within a GeoDataBase, containing many vector layers, many of which are available for viewing on the Mapping Ocean Wealth platform. The following variable is shown on Resource Watch:
5 | - 2014 Relative Value of Coral Reef Shoreline Protection: Relative value of protection against waves provided to coastlines by coral reefs through reduction of wave height and wave energy, estimated as a function of exposed populations and infrastructure receiving some protection.
6 |
7 | To process this data for display on Resource Watch, the GeoDataBase was downloaded, and the layer of interest was extracted into a shapefile. The appropriate decile attribute was then rasterized into a single-band GeoTIFF, which geospatial properties identical to the originating vector file.
8 |
9 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_014_index_of_coastal_protection_by_coral_reefs/ocn_014_index_of_coastal_protection_by_coral_reefs.py) for more details on this processing.
10 |
11 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/).
12 |
13 | The original data can be viewed on the [Mapping Ocean Wealth platform](http://maps.oceanwealth.org/), and are available for download upon request.
14 |
15 | ###### Note: This dataset processing was done by [Peter Kerins](https://www.wri.org/profile/peter-kerins).
--------------------------------------------------------------------------------
/ocn_016_rw0_ocean_plastics/README.md:
--------------------------------------------------------------------------------
1 | ## Plastic Density in the Oceans Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Modelled count and mass concentrations of plastic debris in the global ocean](https://app.dumpark.com/seas-of-plastic-2/#) for [display on Resource Watch](https://resourcewatch.org/data/explore/d43690a2-75cc-473c-bf41-7af938ccf280).
3 |
4 | The data was provided as GeoTIFF files within a zipped folder in [Google Drive](https://drive.google.com/file/d/0B4XxjklEZhMtOEVHLXc1WlM5Wm8/view) from the data provider to the Resource Watch team.
5 |
6 | To process the data, we used a Python script to download the data from Google Drive, shift the data for the longitude bounds to run from -180 to 180, and uploaded the data to Google Earth Engine.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/ocn_016_rw0_ocean_plastics/ocn_016_rw0_ocean_plastics_processing.py) for more details on this processing.
9 |
10 | You can view the processed Plastic Density in the Oceans dataset [on Resource Watch](https://resourcewatch.org/data/explore/d43690a2-75cc-473c-bf41-7af938ccf280).
11 |
12 | You can also download the original dataset [from the source website](https://drive.google.com/file/d/0B4XxjklEZhMtOEVHLXc1WlM5Wm8/view).
13 |
14 | ###### Note: This dataset processing was done by [Kristine Lister](https://www.wri.org/profile/kristine-lister), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
15 |
--------------------------------------------------------------------------------
/ocn_017_coral_reef_connectivity/README.md:
--------------------------------------------------------------------------------
1 | ## Coral Reef Connectivity Pre-processing
2 | This file describes the data pre-processing that was done to the [50 Reefs Global Coral Ocean Warming, Connectivity and Cyclone Dataset](https://conbio.onlinelibrary.wiley.com/doi/full/10.1111/conl.12587) for [display on Resource Watch](https://resourcewatch.org/data/explore/2e7527a0-c601-4e5a-a205-492314501744).
3 |
4 | The dataset is contained within a shapefile, which contains several attributes The following variable is shown on Resource Watch:
5 | - Coral Reef Connectivity: The estimated degree to which coral reefs are able to exchange eggs, larvae, juveniles, or adults with other coral reefs. The degree of connectivity may be due to location, ocean currents, species present, etc.
6 |
7 | One attribute within the shapefile was visualized on the Resource Watch platform without any additional processing or manipulation.
8 |
9 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/2e7527a0-c601-4e5a-a205-492314501744).
10 |
11 | You can also download the original dataset [from the provider](https://espace.library.uq.edu.au/view/UQ:0928a6a).
12 |
13 | ###### Note: This dataset processing was done by [Peter Kerins](https://www.wri.org/profile/peter-kerins).
14 |
--------------------------------------------------------------------------------
/pull_request_template.md:
--------------------------------------------------------------------------------
1 | ## Checklist for Reviewing a Pre-Processing Script
2 |
3 | - [ ] Does the python script contain the following 4 sections: Download data and save to your data directory, Process data, Upload processed data to Carto/Upload processed data to Google Earth Engine, Upload original data and processed data to Amazon S3 storage?
4 | - [ ] Does the script have the standardized variable names for: dataset_name, raw_data_file, processed_data_file?
5 | - [ ] Does the script create and use a 'data' directory?
6 | - [ ] Does the script use a python module to automatically download the data? If this is not possible, are there explicit instructions for how to download the data (step by step with instructions about every input parameter and button to click to find the exact data) and does it use shutil to move the data from 'Downloads' into the data directory?
7 | - [ ] Is the script automated as much as possible to minimize rewriting code the next time the dataset updates (ex 1: can you automatically pull out column names instead of typing them yourself? ex 2: if you are dropping columns with no data, did you use pandas to find the nodata columns instead of dropping the column by name?)
8 | - [ ] Are there comments on almost every line of code to explicitly state what is being done and why?
9 | - [ ] Are you uploading to the correct AWS location?
10 | - [ ] For GEE scripts, did you explicitly define the band manifest with a pyramiding policy so that we can easily change it later if we need to?
11 | - [ ] Does the README contain all relevant links?
12 | - [ ] Does the README state the original file type?
13 | - [ ] Does the README list all of the processing steps that were taken in the script?
14 | - [ ] For netcdfs, does the README state which variables were pulled from the netcdf?
15 | - [ ] Did you use the util functions whenever possible?
16 | - [ ] Have you checked the processed data file on your computer to make sure it matches the data you uploaded to Carto? (spaces or symbol names in column titles often get changed in the upload process-please change these before uploading so that the backed up processed data matches the data on Carto)
17 | - [ ] Does the folder name and python script file name match the dataset_name variable in the script? Are they all lowercase? Does the processing script end in '_processing.py'? (this part is often forgotten)
18 |
--------------------------------------------------------------------------------
/req_017_thailand_flooding/README.md:
--------------------------------------------------------------------------------
1 | ## Thailand Flooding Pre-processing
2 |
3 | This file describes the data pre-processing that was done to create the [Thailand Flooding dataset](https://earthobservatory.nasa.gov/images/76234/floods-swamp-historic-city-in-thailand) for display on [Resource Watch](https://resourcewatch.org/embed/map-swipe?zoom=10.124765919920378&lat=14.348293168896923&lng=100.57237782742156&layers=065b68fa-19af-4381-b5bf-8dd31d0a1309,23f38c8a-d7dc-4549-9e72-da5c576f4c31) to be included in the [Climate-related Physical Risks dashboard](https://resourcewatch.org/dashboards/climate-related-physical-risks).
4 |
5 | This dataset contains satellite imagery, provided by the source as geotiff files. These files come from the Advanced Land Imager (ALI) on NASA's Earth Observing-1 satellite and the United States Geological Survey. These images were modified to include two white rectangles and text, highlighting the Sony and Honda factories that were affected by the October 2011 floods in Thailand. These modifications were done by [Lihuan Zhou](https://www.wri.org/profile/lihuan-zhou) and shared with the Resource Watch team via e-mail. The modified images were then uploaded to Google Earth Engine so that they could be displayed on Resource Watch. The original images are available for display and download on NASA's Earth Observatory [Image of the Day](https://earthobservatory.nasa.gov/images/76234/floods-swamp-historic-city-in-thailand) for October 27, 2011.
6 |
7 | ###### Note: This dataset processing was done by [Daniel Gonzalez](https://www.wri.org/profile/daniel-gonzalez) and [Lihuan Zhou](https://www.wri.org/profile/lihuan-zhou).
8 |
--------------------------------------------------------------------------------
/soc_002_rw1_gender_development_index/README.md:
--------------------------------------------------------------------------------
1 | ## Gender Development Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Gender Development Index (GDI)](http://hdr.undp.org/en/content/gender-development-index-gdi) for [display on Resource Watch](https://bit.ly/3ks6hUb).
3 |
4 | The source provided this dataset as a csv file accessed through its [website](http://hdr.undp.org/en/indicators/137906#).
5 |
6 | Below, we describe the main actions performed to process the csv file:
7 | 1. Convert the dataframe from wide form to long form, in which one column indicates the year.
8 | 2. Convert the years in the 'Year' column to datetime objects and store them in a new column 'datetime'.
9 | 3. Rename column headers to remove special characters and spaces so that the table can be uploaded to Carto without losing information.
10 |
11 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_002_rw1_gender_development_index/soc_002_rw1_gender_development_index_processing.py) for more details on this processing.
12 |
13 | You can view the processed dataset for [display on Resource Watch](https://bit.ly/3ks6hUb).
14 |
15 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_002_rw1_gender_development_index.zip), or [from the source website](http://hdr.undp.org/en/indicators/137906#).
16 |
17 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
18 |
--------------------------------------------------------------------------------
/soc_004_rw1_human_development_index/README.md:
--------------------------------------------------------------------------------
1 | ## Human Development Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Human Development Index (HDI)](http://hdr.undp.org/en/2019-report) for [display on Resource Watch](https://resourcewatch.org/data/explore/fc6dea95-37a6-41a0-8c99-38b7a2ea7301).
3 |
4 | The source provided the data in a csv format.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Read in the data as a pandas dataframe.
9 | 2. Remove empty columns from the dataframe.
10 | 3. Remove empty rows and rows that only contain metadata.
11 | 4. Replace the '..', which is used to indicate no-data, in the dataframe with None.
12 | 5. Convert the data type of the column 'HDI Rank' to integer.
13 | 6. Convert the dataframe from wide to long format so there will be one column indicating the year and another column indicating the index.
14 | 7. Rename the 'variable' and 'value' columns created in the previous step to 'year' and 'yr_data'.
15 | 8. Convert the data type of the 'year' column to integer.
16 | 9. Convert the years in the 'year' column to datetime objects and store them in a new column 'datetime'.
17 | 10. Convert the data type of column 'yr_data' to float.
18 | 11. Rename the column 'HDI Rank' to 'hdi_rank' since space within column names is not supported by Carto.
19 | 12. Rename the column 'Country' to 'country_region' since the column contains both countries and regions.
20 |
21 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_004_rw1_human_development_index/soc_004_rw1_human_development_index_processing.py) for more details on this processing.
22 |
23 | You can view the processed Human Development Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/fc6dea95-37a6-41a0-8c99-38b7a2ea7301).
24 |
25 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_004_rw1_human_development_index.zip), or [from the source website](http://hdr.undp.org/en/indicators/137506#).
26 |
27 | ###### Note: This dataset processing was done by [Matthew Iceland](https://github.com/miceland2) and [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
28 |
--------------------------------------------------------------------------------
/soc_005_rw1_political_rights_civil_liberties_index/README.md:
--------------------------------------------------------------------------------
1 | ## Political Rights and Civil Liberties Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Freedom in the World: The Annual Survey of Political Rights and Civil Liberties](https://freedomhouse.org/report-types/freedom-world) for [display on Resource Watch](https://resourcewatch.org/data/explore/8eafc054-a350-43b5-af61-a64a9a7f8ffe).
3 |
4 | The source provided the data as an excel file.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 | 1. Read in the data as a pandas dataframe and remove all empty columns.
8 | 2. Added '_aggr' to the end of the column names 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'CL', 'PR', and 'Total' to match the column names in the previous Carto table.
9 | 3. Rename the 'country/territory' and 'c/t?' columns to replace characters unsupported by Carto with underscores.
10 | 4. Convert the column names to lowercase letters and replace spaces with underscores.
11 | 5. Create a new column 'year_reviewed' to indicate the year of development the data are based on.
12 | 5. Convert the years in the 'year_reviewed' column to datatime objects and store them in a new column 'datetime'.
13 |
14 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_005_rw1_political_rights_civil_liberties_index/soc_005_rw1_political_rights_civil_liberties_index_processing.py) for more details on this processing.
15 |
16 | You can view the processed Political Rights and Civil Liberties Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/8eafc054-a350-43b5-af61-a64a9a7f8ffe).
17 |
18 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_005_rw1_political_rights_civil_liberties_index.zip), or [from the source website](https://freedomhouse.org/report/freedom-world/2020/leaderless-struggle-democracy).
19 |
20 | ###### Note: This dataset processing was done by [Matthew Iceland](https://github.com/miceland2) and [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid).
21 |
--------------------------------------------------------------------------------
/soc_006_rw1_multidimensional_poverty_index/README.md:
--------------------------------------------------------------------------------
1 | ## Multidimensional Poverty Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Multidimensional Poverty Index](http://hdr.undp.org/en/2020-MPI) for [display on Resource Watch](https://resourcewatch.org/data/explore/d3486db9-5da4-4aee-a363-f71b643a7ce1).
3 |
4 | The source provided the data as an excel file.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Read in the data as a pandas dataframe and remove any notes or empty rows/columns from the data table.
9 | 2. Rename column headers to be more descriptive and to remove special characters so that it can be uploaded to Carto without losing information.
10 | 3. Split the column 'Year and survey' into two new columns:
11 | - 'yr_survey', which contains the year
12 | - 'survey', which contains the survey codes
13 | 4. Subset the dataframe to only include data of countries, removing any rows corresponding to regions.
14 | 5. Replace the '..', which is used to indicate no-data in the dataframe, with None.
15 | 6. Create a new column 'release_dt' to store the year the data was released in as the first date in that year (January 1, XXXX).
16 |
17 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_006_rw1_multidimensional_poverty_index/soc_006_rw1_multidimensional_poverty_index_processing.py) for more details on this processing.
18 |
19 | You can view the processed Multidimensional Poverty Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/d3486db9-5da4-4aee-a363-f71b643a7ce1).
20 |
21 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_006_rw1_multidimensional_poverty_index.zip), or [from the source website](http://hdr.undp.org/en/2020-MPI).
22 |
23 | ###### Note: This dataset processing was done by [Matthew Iceland](https://github.com/miceland2) and [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
24 |
--------------------------------------------------------------------------------
/soc_021_rw1_environmental_performance_index/README.md:
--------------------------------------------------------------------------------
1 | ## Environmental Performance Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Environmental Performance Index](https://epi.yale.edu/) for [display on Resource Watch](http://resourcewatch.org/data/explore/faaf0a0f-7475-49f3-bb8b-68edfaad4b77).
3 |
4 | The source provided the data as a csv file. Columns were renamed to be more descriptive and to remove special characters so that it can be uploaded to Carto without losing information.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_021_rw1_environmental_performance_index/soc_021_rw1_environmental_performance_index_processing.py) for more details on this processing.
7 |
8 | You can view the processed Environmental Performance Index dataset [on Resource Watch](http://resourcewatch.org/data/explore/faaf0a0f-7475-49f3-bb8b-68edfaad4b77).
9 |
10 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_021_rw1_environmental_performance_index.zip), or [from the source website](https://epi.yale.edu/downloads).
11 |
12 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu)
13 |
--------------------------------------------------------------------------------
/soc_023_rw1_fragile_states_index/README.md:
--------------------------------------------------------------------------------
1 | # Fragile States Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Fragile States Index](https://fragilestatesindex.org/) for [display on Resource Watch](https://bit.ly/2O6Qv4F).
3 |
4 | The source provided this dataset as a set of excel files, each containing data for a different year.
5 |
6 | The excel files were concatenated into a sole pandas dataframe. Column headers were renamed to remove special characters and spaces so that the table can be uploaded to Carto without losing information.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_023_rw1_fragile_states_index/soc_023_rw1_fragile_states_index_processing.py) for more details on this processing.
9 |
10 | You can view the processed dataset for [display on Resource Watch](https://bit.ly/2O6Qv4F).
11 |
12 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_023_rw1_fragile_states_index.zip), or [from the source website](https://fragilestatesindex.org/excel/).
13 |
14 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
15 |
--------------------------------------------------------------------------------
/soc_025a_gender_inequality_index/README.md:
--------------------------------------------------------------------------------
1 | ## Gender Inequality Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Gender Inequality Index](http://hdr.undp.org/en/content/gender-inequality-index-gii) for [display on Resource Watch](https://resourcewatch.org/data/explore/soc025-Gender-Inequality-Index).
3 |
4 | The original data is downloadable in a .xlsx format.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to carto. We read in the data as a pandas dataframe, deleted columns and rows without data, and renamed the columns. Then we converted the '..' to 'None'.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_025a_gender_inequality_index/soc.025%20Gender%20Inequality%20Index.ipynb) for more details on this processing.
9 |
10 | You can view the processed Gender Inequality Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/soc025-Gender-Inequality-Index).
11 |
12 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_025a_gender_inequality_index.zip), or [from the source website](http://hdr.undp.org/en/content/table-5-gender-inequality-index-gii).
13 |
14 | ###### Note: This dataset processing was done by [Liz Saccoccia](https://www.wri.org/profile/liz-saccoccia), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
15 |
--------------------------------------------------------------------------------
/soc_026_rw0_global_gender_gap/README.md:
--------------------------------------------------------------------------------
1 | ## Gender Gap Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Gender Gap Index](http://www3.weforum.org/docs/WEF_GGGR_2021.pdf) for [display on Resource Watch](https://resourcewatch.org/data/explore/0be2ce12-79b3-434b-b557-d6ea92d787fe).
3 |
4 | The source provides the data in a pdf that is reformated into a table. Below, we describe the steps used to download the Global Gender Gap index and four subindices from the pdf and upload the reformated table to Carto.
5 |
6 | 1. Read in the data tables for the global gender gap index and its four subindices from the Global Gender Gap Report pdf using the tabula-py module. Both the 2020 and 2021 datasets were pulled in.
7 | 2. Transform the data for each index/subindex into a pandas dataframe.
8 | 3. A year column was added to all dataframes to indicate the year of the report.
9 | 4. Country names were processed to fix inconsistencies in naming.
10 | 5. Dataframes containing the main index and subindicies were merged into one table using the columns 'year' and 'country'.
11 | 6. This dataframe was then appended to the existing data we had previously processed by reading in the existing Carto table as a dataframe and concatenating it with the new data.
12 |
13 |
14 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/tree/soc_026_rw0_global_gender_gap/soc_026_rw0_global_gender_gap.py) for more details on this processing.
15 |
16 | You can view the processed Gender Gap Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/0be2ce12-79b3-434b-b557-d6ea92d787fe).
17 |
18 | You can also download the original dataset [from the source website](https://www.weforum.org/reports/the-global-gender-gap-report-2021).
19 |
20 | ###### Note: This dataset processing was done by [Jason Winik](https://www.wri.org/profile/jason-winik), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
21 |
--------------------------------------------------------------------------------
/soc_037_rw1_malaria_extent/README.md:
--------------------------------------------------------------------------------
1 | ## Malaria Extent Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Malaria Extent](https://map.ox.ac.uk/explorer/#/explorer) for [display on Resource Watch](https://resourcewatch.org/data/explore/3db2f914-2c70-431c-8dce-5dd961bccbd5).
3 |
4 | The source provided the data as GeoTIFF files within a zipped folder. The GeoTIFF files inside the zipped folder were not modified from the original version for display on Resource Watch.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_037_rw1_malaria_extent/soc_037_rw1_malaria_extent_processing.py) for more details on this processing.
7 |
8 | You can view the processed Malaria Extent dataset [on Resource Watch](https://resourcewatch.org/data/explore/3db2f914-2c70-431c-8dce-5dd961bccbd5).
9 |
10 | You can also download the original dataset [from the source website](https://map.ox.ac.uk/explorer/#/explorer).
11 |
12 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid).
13 |
--------------------------------------------------------------------------------
/soc_039_rw1_out_of_school_rate/README.md:
--------------------------------------------------------------------------------
1 | ## Out-Of-School Rate Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Out-of-school Rate for Children, Adolescents and Youth of Primary, Lower Secondary and Upper Secondary School Age](http://data.uis.unesco.org/index.aspx) for [display on Resource Watch](https://resourcewatch.org/data/explore/b2483333-693a-44e2-ae00-47f21c6a00bd).
3 |
4 | The data source provided the dataset as one csv file.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Import the data as a pandas dataframe.
9 | 2. Subset the dataframe based on the 'Indicator' column to obtain number of out-of-school children, adolescents and youth of primary and secondary school age for both sexes.
10 | 3. Subset the dataframe based on the 'LOCATION' column to only retain country-level data.
11 | 4. Remove column 'TIME' since it contains the same information as the column 'Time'.
12 | 5. Remove column 'Flag Codes' since it contains the same information as the column 'Flags'.
13 | 6. Replace NaN in the table with None.
14 | 7. Remove rows where the 'Value' column is None.
15 | 8. Remove data in 2020 since it only has data for one country.
16 | 9. Convert years in the 'Time' column to datetime objects and store them in a new column 'datetime'.
17 | 10. Convert the column names to lowercase to match Carto column name requirements.
18 |
19 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_039_rw1_out_of_school_rate/soc_039_rw1_out_of_school_rate_processing.py) for more details on this processing.
20 |
21 | You can view the processed Out-Of-School Rate dataset [on Resource Watch](https://resourcewatch.org/data/explore/b2483333-693a-44e2-ae00-47f21c6a00bd).
22 |
23 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_039_rw1_out_of_school_rate.zip), or [from the source website](http://data.uis.unesco.org/index.aspx).
24 |
25 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
26 |
--------------------------------------------------------------------------------
/soc_043_rw0_refugees_internally_displaced_persons/README.MD:
--------------------------------------------------------------------------------
1 | ## Refugees and Internally Displaced Persons Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [IDMC data on Internally Displaced Persons and the demographic data for the population and solutions datasets](https://www.unhcr.org/refugee-statistics/download) for display on Resource Watch as the following datasets:
3 |
4 | - [Host Countries of Refugees and Internally Displaced Persons](https://resourcewatch.org/data/explore/c856396d-d0f2-4aae-9671-4903b2ebed4d)
5 | - [Origin Countries of Refugees and Internally Displaced Persons](https://resourcewatch.org/data/explore/7a8b5296-d283-4832-9be1-edd760bbb58f)
6 |
7 | The original data is downloadable in a CSV format.
8 |
9 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto:
10 | 1. Merge "population.csv" and "solution.csv" files as a single pandas dataframe based on the columns: "year", "country of origin" and "country of asylum".
11 | 2. Sum into "refugees_incl_refugee_like_situations" the following columns: "refugees under unhcr_s mandate", "venezuelans displaced abroad, "resettlement arrivals".
12 | 3. Create a column "total_population" encompassing the sum of the other columns.
13 | 4. Trim unwanted characters and spaces to comply with Carto's column name requirements.
14 |
15 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_043_rw0_refugees_internally_displaced_persons/soc_043_rw0_refugees_internally_displaced_persons_processing.py) for more details on this processing.
16 |
17 | You can view the processed datasets on Resource Watch:
18 | - [Host Countries of Refugees and Internally Displaced Persons](https://resourcewatch.org/data/explore/c856396d-d0f2-4aae-9671-4903b2ebed4d)
19 | - [Origin Countries of Refugees and Internally Displaced Persons](https://resourcewatch.org/data/explore/7a8b5296-d283-4832-9be1-edd760bbb58f)
20 |
21 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_043_rw0_refugees_internally_displaced_persons.zip), or [from the source website](https://www.unhcr.org/refugee-statistics/download).
22 |
23 | ###### Note: This dataset processing and qc was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes).
24 |
--------------------------------------------------------------------------------
/soc_045_rw1_women_political_representation/README.md:
--------------------------------------------------------------------------------
1 | ## Social Institutions and Gender Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Social Institutions and Gender Index](https://stats.oecd.org/Index.aspx?DataSetCode=GIDDB2019#) for [display on Resource Watch](https://bit.ly/3ejWtuc).
3 |
4 | The source provided this dataset as a csv file accessed through its [website](https://stats.oecd.org/Index.aspx?DataSetCode=GIDDB2019#).
5 |
6 | Below, we describe the main actions performed to process the csv file:
7 | 1. Convert the dataframe from long to wide form, converting the variables into columns.
8 | 2. Convert the years in the 'Year' column to datetime objects and store them in a new column 'datetime'.
9 | 3. Drop rows for data aggregated by region or income group (labeled 'all regions' or 'all income groups') so that only disaggregated data remains and data is not duplicated.
10 |
11 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_045_rw1_women_political_representation/soc_045_rw1_women_political_representation_processing.py) for more details on this processing.
12 |
13 | You can view the processed Women's Political Representation dataset [on Resource Watch](https://bit.ly/3ejWtuc).
14 |
15 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_002_rw1_gender_development_index.zip), or [from the source website](https://stats.oecd.org/Index.aspx?DataSetCode=GIDDB2019).
16 |
17 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
18 |
--------------------------------------------------------------------------------
/soc_048_rw0_organized_violence_events/README.md:
--------------------------------------------------------------------------------
1 | ## Organized Violence Events Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [UCDP Georeferenced Event Dataset (GED) Global version 21.1](https://ucdp.uu.se/downloads/ged/ged211.pdf) for [display on Resource Watch](http://resourcewatch.org/data/explore/9b6e6bce-efce-49a5-b603-385b8dae29e0).
3 |
4 | The source provided the data as a CSV file. This data file was not modified from the original version for display on Resource Watch.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_048_rw0_organized_violence_events/soc_048_rw0_organized_violence_events_processing.py) for more details on this processing.
7 |
8 | You can view the processed Organized Violence Events dataset [on Resource Watch](http://resourcewatch.org/data/explore/9b6e6bce-efce-49a5-b603-385b8dae29e0).
9 |
10 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_048_rw0_organized_violence_events.zip), or [from the source website](http://ucdp.uu.se/downloads/).
11 |
12 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
13 |
--------------------------------------------------------------------------------
/soc_048_rw0_organized_violence_events/soc_048_rw0_organized_violence_events_processing.py:
--------------------------------------------------------------------------------
1 | import os
2 | import pandas as pd
3 | from zipfile import ZipFile
4 | import sys
5 | utils_path = os.path.join(os.path.abspath(os.getenv('PROCESSING_DIR')),'utils')
6 | if utils_path not in sys.path:
7 | sys.path.append(utils_path)
8 | import util_files
9 | import util_cloud
10 | import util_carto
11 | import urllib
12 | import logging
13 |
14 | # Set up logging
15 | # Get the top-level logger object
16 | logger = logging.getLogger()
17 | for handler in logger.handlers: logger.removeHandler(handler)
18 | logger.setLevel(logging.INFO)
19 | # make it print to the console.
20 | console = logging.StreamHandler()
21 | logger.addHandler(console)
22 | logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
23 |
24 | # name of table on Carto where you want to upload data
25 | # this should be a table name that is not currently in use
26 | dataset_name = 'soc_048_rw0_organized_violence_events' #check
27 |
28 | logger.info('Executing script for dataset: ' + dataset_name)
29 | # create a new sub-directory within your specified dir called 'data'
30 | # within this directory, create files to store raw and processed data
31 | data_dir = util_files.prep_dirs(dataset_name)
32 |
33 | '''
34 | Download data and save to your data directory
35 | '''
36 | # insert the url used to find the series ids of the data
37 | url = 'https://ucdp.uu.se/downloads/ged/ged211-csv.zip'
38 |
39 | # download the data from the source
40 | raw_data_file = os.path.join(data_dir, os.path.basename(url))
41 | urllib.request.urlretrieve(url, raw_data_file)
42 |
43 | # unzip source data
44 | raw_data_file_unzipped = raw_data_file.split('.')[0]
45 | zip_ref = ZipFile(raw_data_file, 'r')
46 | zip_ref.extractall(raw_data_file_unzipped)
47 | zip_ref.close()
48 |
49 | '''
50 | Process the data
51 | '''
52 | # read the data into a pandas dataframe
53 | df = pd.read_csv(os.path.join(raw_data_file_unzipped, 'ged211.csv'), encoding='utf-8', header=0)
54 |
55 | # save dataset to csv
56 | processed_data_file = os.path.join(data_dir, dataset_name+'_edit.csv')
57 | df.to_csv(processed_data_file, index=False)
58 |
59 | '''
60 | Upload processed data to Carto
61 | '''
62 | logger.info('Uploading processed data to Carto.')
63 | util_carto.upload_to_carto(processed_data_file, 'LINK')
64 |
65 | '''
66 | Upload original data and processed data to Amazon S3 storage
67 | '''
68 | # initialize AWS variables
69 | aws_bucket = 'wri-public-data'
70 | s3_prefix = 'resourcewatch/'
71 |
72 | logger.info('Uploading original data to S3.')
73 | # Upload raw data file to S3
74 |
75 | # Copy the raw data into a zipped file to upload to S3
76 | raw_data_dir = os.path.join(data_dir, dataset_name+'.zip')
77 | with ZipFile(raw_data_dir,'w') as zip:
78 | zip.write(raw_data_file, os.path.basename(raw_data_file))
79 | # Upload raw data file to S3
80 | uploaded = util_cloud.aws_upload(raw_data_dir, aws_bucket, s3_prefix+os.path.basename(raw_data_dir))
81 |
82 | logger.info('Uploading processed data to S3.')
83 | # Copy the processed data into a zipped file to upload to S3
84 | processed_data_dir = os.path.join(data_dir, dataset_name+'_edit.zip')
85 | with ZipFile(processed_data_dir,'w') as zip:
86 | zip.write(processed_data_file, os.path.basename(processed_data_file))
87 | # Upload processed data file to S3
88 | uploaded = util_cloud.aws_upload(processed_data_dir, aws_bucket, s3_prefix+os.path.basename(processed_data_dir))
--------------------------------------------------------------------------------
/soc_049_rw0_water_conflict_map/README.md:
--------------------------------------------------------------------------------
1 | ## Water Conflict Map Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the World Water Conflict Chronology Map](https://www.worldwater.org/water-conflict/ ) for [display on Resource Watch](https://resourcewatch.org/data/explore/24928aa3-28d3-457c-ad2a-62f3c83ef663).
3 |
4 | The source provided the data as a PHP file.
5 |
6 | Below, we describe the steps used to reformat the table to upload it to Carto.
7 |
8 | 1. Rename the 'Start' and 'End' columns to 'start_year' and 'end_year' since 'end' is a reserved word in PostgreSQL.
9 | 2. Convert the start and end year of the conflicts to datetime objects using first day of January to fill day and month for each conflict and store them in two new columns 'start_dt' and 'end_dt'.
10 | 3. Convert column headers to lowercase and replace spaces within them with underscores so that the table can be uploaded to Carto without losing information.
11 |
12 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_049_rw0_water_conflict_map/soc_049_rw0_water_conflict_map_processing.py) for more details on this processing.
13 |
14 | You can view the processed Water Conflict Map dataset [on Resource Watch](https://resourcewatch.org/data/explore/24928aa3-28d3-457c-ad2a-62f3c83ef663).
15 |
16 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_049_rw0_water_conflict_map.zip), or [from the source website](https://www.worldwater.org/water-conflict/).
17 |
18 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
19 |
--------------------------------------------------------------------------------
/soc_067_rw1_climate_risk_index/README.md:
--------------------------------------------------------------------------------
1 | ## Climate Risk Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Climate Risk Index 2020](https://germanwatch.org/en/17307) for [display on Resource Watch](https://resourcewatch.org/data/explore/7e98607d-23d8-42f8-9662-5658f349bf0f).
3 |
4 | The source provided the data as a table in a pdf file.
5 |
6 | Below, we describe the steps used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Read in the data as a pandas dataframe.
9 | 2. Rename column headers to be more descriptive and to remove special characters so that it can be uploaded to Carto without losing information.
10 | 3. Create a new column 'datetime' to store the time period of the data as the first date of the year.
11 |
12 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/tree/master/soc_067_rw1_climate_risk_index) for more details on this processing.
13 |
14 | You can view the processed Climate Risk Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/7e98607d-23d8-42f8-9662-5658f349bf0f).
15 |
16 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_067_rw1_climate_risk_index.zip), or [from the source website](https://germanwatch.org/en/cri).
17 |
18 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
19 |
--------------------------------------------------------------------------------
/soc_068b_rw2_global_land_cover/README.md:
--------------------------------------------------------------------------------
1 | ## Global Land Cover (IPCC Classification) Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Land Cover Maps - v2.1.1](https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-land-cover?tab=overview) for [display on Resource Watch](https://resourcewatch.org/data/explore/0851d568-0960-4e21-b954-0d4f9d8854f9).
3 |
4 | Each year of data is provided by the source as a NetCDF file within a zipped folder. In order to display this data on Resource Watch, the lccs-class variable in each NetCDF was converted to a GeoTIFF.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_068b_rw2_global_land_cover/soc_068b_rw2_global_land_cover_processing.py) for more details on this processing.
7 |
8 | You can view the processed Global Land Cover (IPCC Classification) dataset [on Resource Watch](https://resourcewatch.org/data/explore/0851d568-0960-4e21-b954-0d4f9d8854f9).
9 |
10 | You can also download the original dataset [from the source website](http://maps.elie.ucl.ac.be/CCI/viewer/).
11 |
12 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
13 |
--------------------------------------------------------------------------------
/soc_075_male_female_population_densities/README.md:
--------------------------------------------------------------------------------
1 | ## Female & Male Population Densities Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Gridded Population of the World (GPW), v4: Basic Demographic Characteristics, v4.10 (2010)](https://sedac.ciesin.columbia.edu/data/set/gpw-v4-basic-demographic-characteristics-rev11) for [display on Resource Watch](https://resourcewatch.org/data/explore/soc075-Broad-Age-Groups).
3 |
4 | The original data contains two layers, male population density and female population density in people per square kilometer.
5 |
6 | To create the layer "Number of females per 100 males", we divide the female population density by the male population density and then multiply by 100.
7 |
8 | This calculation was done using Google Earth Engine, a free geospatial analysis system by Google. While the sytem is free you need to sign up with a Google account, which can be done [here](https://earthengine.google.com/).
9 |
10 | The code used to preprocess this layer in Google Earth Engine can be found [here](https://code.earthengine.google.com/69705398b91fdcbdad2298f08ada5da4) and is copied below.
11 | ```
12 | var female = ee.Image('projects/resource-watch-gee/soc_075_female_male_populations/female_density')
13 | var male = ee.Image('projects/resource-watch-gee/soc_075_female_male_populations/male_density')
14 |
15 | var scale = female.projection().nominalScale().getInfo()
16 |
17 | var m_f_ratio = male.divide(female).multiply(100)
18 |
19 | Export.image.toAsset({
20 | image: m_f_ratio,
21 | description: 'Male_Female_Ratio',
22 | assetId: 'projects/resource-watch-gee/soc_075_female_male_populations/male_female_ratio_density',
23 | scale: scale,
24 | maxPixels:1e13})
25 | ```
26 |
27 | You can view the processed Female & Male Population Densities dataset [on Resource Watch](https://resourcewatch.org/data/explore/soc075-Broad-Age-Groups).
28 |
29 | You can also download original dataset [from the source website](https://sedac.ciesin.columbia.edu/data/set/gpw-v4-basic-demographic-characteristics-rev11/data-download).
30 |
31 | ###### Note: This dataset processing was done by [Kristine Lister](https://www.wri.org/profile/kristine-lister), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
32 |
--------------------------------------------------------------------------------
/soc_085_rw1_elevation/README.md:
--------------------------------------------------------------------------------
1 | ## Elevation Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Advanced Land Observing Satellite (ALOS) Global Digital Surface Model (DSM) 30 m, Version 3.2](https://www.eorc.jaxa.jp/ALOS/en/aw3d30/aw3d30v3.2_product_e_e1.2.pdf) for [display on Resource Watch](https://resourcewatch.org/data/explore/cef9d930-61c3-4641-87b1-9f3072210d84).
3 |
4 | The source provided the data as a Google Earth Engine image collection. Each image in the collection is a 1°x1° tile. The tiles were mosaicked together to create a global dataset for display on Resource Watch.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_085_rw1_elevation/soc_085_rw1_elevation_processing.py) for more details on this processing.
7 |
8 | You can view the processed Elevation dataset [on Resource Watch](https://resourcewatch.org/data/explore/cef9d930-61c3-4641-87b1-9f3072210d84).
9 |
10 | You can also download the original dataset [from the source website](https://developers.google.com/earth-engine/datasets/catalog/JAXA_ALOS_AW3D30_V3_2).
11 |
12 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
13 |
--------------------------------------------------------------------------------
/soc_085_rw1_elevation/soc_085_rw1_elevation_processing.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | utils_path = os.path.join(os.path.abspath(os.getenv('PROCESSING_DIR')),'utils')
4 | if utils_path not in sys.path:
5 | sys.path.append(utils_path)
6 | import util_files
7 | import util_cloud
8 | import urllib
9 | import ee
10 | import logging
11 | import time
12 |
13 | # set up logging
14 | # get the top-level logger object
15 | logger = logging.getLogger()
16 | for handler in logger.handlers: logger.removeHandler(handler)
17 | logger.setLevel(logging.INFO)
18 | # make it print to the console.
19 | console = logging.StreamHandler()
20 | logger.addHandler(console)
21 | logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
22 |
23 | # name of asset on GEE where you want to upload data
24 | # this should be an asset name that is not currently in use
25 | dataset_name = 'soc_085_rw1_elevation' #check
26 |
27 | logger.info('Executing script for dataset: ' + dataset_name)
28 |
29 | '''
30 | Process data
31 | '''
32 | # initialize ee and eeUtil modules for exporting to Google Earth Engine
33 | auth = ee.ServiceAccountCredentials(os.getenv('GEE_SERVICE_ACCOUNT'), os.getenv('GOOGLE_APPLICATION_CREDENTIALS'))
34 | ee.Initialize(auth)
35 |
36 | # name of the image collection in GEE where the original data is stored
37 | EE_COLLECTION_ORI = 'JAXA/ALOS/AW3D30/V3_2'
38 |
39 | # mosaic the 'DSM' band of the images in the image collection
40 | mosaicked = ee.ImageCollection(EE_COLLECTION_ORI).select('DSM').mosaic()
41 |
42 | # set asset name to be used in GEE
43 | asset_name = f'projects/resource-watch-gee/{dataset_name}'
44 |
45 | # create a task to export the mosaiced image to the asset
46 | task = ee.batch.Export.image.toAsset(image = mosaicked,
47 | description = 'export mosaicked image to asset',
48 | region = ee.Geometry.Rectangle([-179.999, -90, 180, 90], 'EPSG:4326', False),
49 | pyramidingPolicy = {'DSM': 'MEAN'},
50 | scale = 30,
51 | maxPixels = 1e13,
52 | assetId = asset_name)
53 | # start the task
54 | task.start()
55 |
56 | # check task status
57 | # set the state to 'RUNNING' because we have started the task
58 | state = 'RUNNING'
59 | # set a start time to track the time it takes to upload the image
60 | start = time.time()
61 | # wait for task to complete, but quit if it takes more than 1800 seconds (30 minutes)
62 | while state == 'RUNNING' and (time.time() - start) < 1800:
63 | # wait for 10 minutes before checking the state
64 | time.sleep(600)
65 | # check the status of the upload
66 | status = task.status()['state']
67 | logging.info('Current Status: ' + status +', run time (min): ' + str((time.time() - start)/60))
68 | # log if the task is completed and change the state
69 | if status == 'COMPLETED':
70 | state = status
71 | logging.info(status)
72 | # log an error if the task fails and change the state
73 | elif status == 'FAILED':
74 | state = status
75 | logging.error(task.status()['error_message'])
76 | logging.debug(task.status())
77 |
78 |
--------------------------------------------------------------------------------
/soc_086_subnational_hdi/README.md:
--------------------------------------------------------------------------------
1 | ## Subnational Human Development Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Subnational Human Development Index](https://globaldatalab.org/shdi/view/shdi/) for [display on Resource Watch](https://bit.ly/3tCh7JU).
3 |
4 | The source provided two files, a csv file containing the shdi data and a shapefile containing subnational boundaries used to calculate the index.
5 |
6 | The following steps describe the processing performed by the Resource Watch team to the dataset:
7 |
8 | 1. The shapefile was read as a geopandas dataframe.
9 | 2. The csv file was read as a pandas dataframe. Then it was converted from wide to long form by storing the available years into a single column.
10 | 3. A column containing the year information as a datetime value was created.
11 | 4. The pandas dataframe was formatted to comply with Carto format requirements.
12 |
13 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_086_subnational_hdi/soc_086_subnational_hdi_processing.py) for more details on this processing.
14 |
15 | You can view the processed dataset for [display on Resource Watch](https://bit.ly/3tCh7JU).
16 |
17 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_086_subnational_hdi.zip), or [from the source website](https://globaldatalab.org/shdi/view/shdi/).
18 |
19 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
20 |
--------------------------------------------------------------------------------
/soc_091_global_peace_index/README.md:
--------------------------------------------------------------------------------
1 | ## Global Peace Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Global Peace Index 2019 dataset](http://visionofhumanity.org/app/uploads/2020/02/GPI-2019-overall-scores-2008-2019.xlsx) for [display on Resource Watch](https://resourcewatch.org/data/explore/soc091-Global-Peace-Index).
3 |
4 | This dataset was provided by the source as an excel file. The file includes the overall Global Peace Index score for 163 countries or independent territories from 2008 to 2019.
5 |
6 | The spreadsheet was read into Python as a dataframe. The data was cleaned, and the the table was converted from wide to a long form.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_091_global_peace_index/soc_091_global_peace_index_processing.py) for more details on this processing.
9 |
10 | You can view the processed Global Peace Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/soc091-Global-Peace-Index).
11 |
12 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_091_global_peace_index.zip), or [from the source website](http://visionofhumanity.org/app/uploads/2020/02/GPI-2019-overall-scores-2008-2019.xlsx).
13 |
14 | ###### Note: This dataset processing was done by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
15 |
--------------------------------------------------------------------------------
/soc_092_positive_peace_index/README.md:
--------------------------------------------------------------------------------
1 | ## Positive Peace Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Positive Peace Report 2019 dataset](http://visionofhumanity.org/app/uploads/2020/02/PPI-2019-overall-scores-2009-2018.xlsx) for [display on Resource Watch](https://resourcewatch.org/data/explore/soc092-Positive-Peace-Index).
3 |
4 | This dataset was provided by the source as an excel file. The file includes the overall Positive Peace Index score for 163 countries or independent territories from 2009 to 2018.
5 |
6 | The spreadsheet was read into Python as a dataframe. The data was cleaned, and the the table was converted from wide to a long form.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_092_positive_peace_index/soc_092_positive_peace_index_processing.py) for more details on this processing.
9 |
10 | You can view the processed Positive Peace Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/soc092-Positive-Peace-Index).
11 |
12 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_092_positive_peace_index.zip), or [from the source website](http://visionofhumanity.org/app/uploads/2020/02/PPI-2019-overall-scores-2009-2018.xlsx).
13 |
14 | ###### Note: This dataset processing was done by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
15 |
--------------------------------------------------------------------------------
/soc_093_global_terrorism_index/README.md:
--------------------------------------------------------------------------------
1 | ## Global Terrorism Index Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Global Terrorism Index 2019 dataset](http://visionofhumanity.org/app/uploads/2020/02/GTI-2019-overall-scores-2002-2018.xlsx) for [display on Resource Watch](https://resourcewatch.org/data/explore/soc093-Global-Terrorism-Index).
3 |
4 | This dataset was provided by the source as an excel file. The file includes the overall Global Terrorism Index score for 163 countries or independent territories from 2002 to 2018.
5 |
6 | The spreadsheet was read into Python as a dataframe. The data was cleaned, and the the table was converted from wide to a long form.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/soc_093_global_terrorism_index/soc_093_global_terrorism_index_processing.py) for more details on this processing.
9 |
10 | You can view the processed Global Terrorism Index dataset [on Resource Watch](https://resourcewatch.org/data/explore/soc093-Global-Terrorism-Index).
11 |
12 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/soc_093_global_terrorism_index.zip), or [from the source website](http://visionofhumanity.org/app/uploads/2020/02/GTI-2019-overall-scores-2002-2018.xlsx).
13 |
14 | ###### Note: This dataset processing was done by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
15 |
--------------------------------------------------------------------------------
/soc_104_rw0_global_land_cover/README.md:
--------------------------------------------------------------------------------
1 | ## Global Land Cover (Copernicus) Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Land Cover map at 100m resolution (LC100)](https://land.copernicus.eu/global/products/lc) for [display on Resource Watch](https://resourcewatch.org/data/explore/b2f00f99-46ed-43e6-a7a1-a5809d9369d4).
3 |
4 | The source provided the data as a [Google Earth Engine asset](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_Landcover_100m_Proba-V-C3_Global). The asset was not modified from the original version for display on Resource Watch.
5 |
6 | You can view the processed Global Land Cover dataset [on Resource Watch](https://resourcewatch.org/data/explore/b2f00f99-46ed-43e6-a7a1-a5809d9369d4).
7 |
8 | You can also download the original dataset [from the source website](https://land.copernicus.eu/global/products/lc).
9 |
10 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
11 |
--------------------------------------------------------------------------------
/soc_107_rw0_population/README.md:
--------------------------------------------------------------------------------
1 | ## Population (Grid, 100m) Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [Population Count - Constrained Individual Countries](https://www.worldpop.org/geodata/listing?id=78) for [display on Resource Watch](https://resourcewatch.org/data/explore/d6e42176-90c4-429d-8cae-7619c545a458).
3 |
4 | The source provided the data as a [Google Earth Engine image collection](https://developers.google.com/earth-engine/datasets/catalog/WorldPop_GP_100m_pop). The images in the image collection were mosaiced together for display on Resource Watch using the following code.
5 |
6 | ```javascript
7 | // Purpose: Mosaic all images for all years within an image collection and export it to an asset
8 | with each year of data as a band
9 | //Load in an image collection
10 | var worldpop = ee.ImageCollection('WorldPop/GP/100m/pop');
11 |
12 | // Save the dates of the time series
13 | var yearStart = 2000;
14 | var yearEnd = 2020;
15 |
16 | // Filter images for the year 2000 and mosaic all the images for the year to a single image. This image will serve as the base band for the compiled set of images.
17 | var yearStack = worldpop.filterMetadata('year','equals',yearStart)
18 | .mosaic()
19 | .rename('Y'+ [yearStart.toString()]);
20 |
21 | // For each year in the time series, filter the data to that year, mosaic all the images to a single image, and add the mosaicked image as a band to the compiled image.
22 | for (var i = yearStart; i <= yearEnd; i++) {
23 | var yearCollection = worldpop.filterMetadata('year', 'equals', i);
24 | var yearImage = yearCollection.mosaic().rename('Y'+ [i.toString()]);
25 | yearStack = yearStack.addBands(yearImage, null, true);
26 | }
27 |
28 | //Save the boundaries. The dataset does not extend below -60 degrees. Clip the extent to avoid unwanted zeros below this boundary.
29 | var rect = [-180, -60, 180, 89.9];
30 | var bounds = ee.Geometry.Rectangle(rect,null,false);
31 |
32 | //Export mosaicked image to an asset with boundaries obtained in the previous step.
33 | Export.image.toAsset({
34 | image: yearStack,
35 | description: 'Population_grid_100m',
36 | assetId: 'projects/resource-watch-gee/soc_107_population',
37 | region: bounds,
38 | maxPixels: 1e13
39 | });
40 |
41 | ```
42 |
43 | You can view the processed Population (Grid, 100m) dataset [on Resource Watch](https://resourcewatch.org/data/explore/d6e42176-90c4-429d-8cae-7619c545a458).
44 |
45 | You can also download the original dataset [from the source website](https://www.worldpop.org/geodata/listing?id=78).
46 |
47 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
48 |
--------------------------------------------------------------------------------
/soc_108_rw0_anthropogenic_biomes/README.md:
--------------------------------------------------------------------------------
1 | ## Anthropogenic Biomes Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Anthropogenic Biomes dataset](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/G0QDNQ) for [display on Resource Watch](https://resourcewatch.org/data/explore/c375a384-1aed-4809-8fa1-be48a0f9889b).
3 |
4 | The source provided the data as a Google Earth Engine asset. The asset was not modified from the original version for display on Resource Watch.
5 |
6 | You can view the processed Anthropogenic Biomes dataset [on Resource Watch](https://resourcewatch.org/data/explore/https://resourcewatch.org/data/explore/c375a384-1aed-4809-8fa1-be48a0f9889b).
7 |
8 | You can also download the original dataset [from the source website](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/G0QDNQ).
9 |
10 | ###### Note: This dataset processing was done by [Eduardo Castillero Reyes](https://wrimexico.org/profile/eduardo-castillero-reyes), and QC'd by...
--------------------------------------------------------------------------------
/wat_008_rw3_annual_surface_water_coverage/README.md:
--------------------------------------------------------------------------------
1 | ## Annual Surface Water Coverage Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the JRC Yearly Water Classification History, v1.3](https://global-surface-water.appspot.com) for [display on Resource Watch](https://resourcewatch.org/data/explore/79f16b9a-a062-4820-822d-7858609a8fd5).
3 |
4 | The source provided the data as a Google Earth Engine asset. The asset was not modified from the original version for display on Resource Watch.
5 |
6 | You can view the processed Annual Surface Water Coverage dataset [on Resource Watch](https://resourcewatch.org/data/explore/79f16b9a-a062-4820-822d-7858609a8fd5).
7 |
8 | You can also download the original dataset [from the source website](https://global-surface-water.appspot.com/download).
9 |
10 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
11 |
--------------------------------------------------------------------------------
/wat_026_rw1_wastewater_treatment_plants/README.md:
--------------------------------------------------------------------------------
1 | ## Wastewater Treatment Plants (U.S.) Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Environmental Protection Agency (EPA) Facility Registry Service (FRS): Wastewater Treatment Plants](https://catalog.data.gov/dataset/epa-facility-registry-service-frs-wastewater-treatment-plants) for [display on Resource Watch](https://resourcewatch.org/data/explore/a8581e62-63dd-4973-bb2a-b29552ad9e37).
3 |
4 | The source provided the data as a table within a geodatabase.
5 |
6 | Below, we describe the steps used to reformat the table to upload it to Carto.
7 |
8 | 1. Read in the table as a geopandas data frame.
9 | 2. Convert the column names to lowercase.
10 | 3. Convert the data type of the column 'registry_id' to integer.
11 | 4. Project the data so its coordinate system is WGS84.
12 | 5. Create 'latitude' and 'longitude' columns based on the 'geometry' column of the geopandas dataframe.
13 | 6. Drop the 'geometry' column from the geopandas dataframe.
14 |
15 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/wat_026_rw1_wastewater_treatment_plants/wat_026_rw1_wastewater_treatment_plants_processing.py) for more details on this processing.
16 |
17 | You can view the processed Wastewater Treatment Plants (U.S.) dataset [on Resource Watch](https://resourcewatch.org/data/explore/a8581e62-63dd-4973-bb2a-b29552ad9e37).
18 |
19 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/wat_026_rw1_wastewater_treatment_plants.zip), or [from the source website](https://hifld-geoplatform.opendata.arcgis.com/datasets/environmental-protection-agency-epa-facility-registry-service-frs-wastewater-treatment-plants/data).
20 |
21 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
22 |
--------------------------------------------------------------------------------
/wat_036_rw1_water_stress_country_ranking/README.md:
--------------------------------------------------------------------------------
1 | ## Water Stress Country Ranking Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Aqueduct Country and River Basin Rankings](https://www.wri.org/publication/aqueduct-30) for [display on Resource Watch](https://resourcewatch.org/data/explore/47053e23-7808-40d9-b4ae-1af73c5c8bab).
3 |
4 | The source provided the data as an excel file.
5 |
6 | Below, we describe the steps used to used to reformat the table so that it is formatted correctly to upload to Carto.
7 |
8 | 1. Read in the data from the 'results country' sheet of the excel file as a pandas dataframe.
9 | 2. Subset the dataframe based on the 'indicator_name' column to select the baseline water stress indicator.
10 | 3. Replace the '-9999' in the dataframe with nans since they indicate invalid hydrology.
11 | 4. Rename the column 'primary' to 'primary_country' since 'primary' is a reserved word in PostgreSQL.
12 | 5. Convert the dataframe from long to wide format so the score, risk category, and country ranking calculated using different weights will be stored in separate columns.
13 | 6. Rename the columns created in the previous step to indicate the weight used in the calculation of each column.
14 |
15 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/wat_036_rw1_water_stress_country_ranking/wat_036_rw1_water_stress_country_ranking_processing.py) for more details on this processing.
16 |
17 | You can view the processed Water Stress Country Ranking dataset [on Resource Watch](https://resourcewatch.org/data/explore/47053e23-7808-40d9-b4ae-1af73c5c8bab).
18 |
19 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/wat_036_rw1_water_stress_country_ranking.zip), or [from the source website](https://www.wri.org/applications/aqueduct/country-rankings/).
20 |
21 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid).
22 |
--------------------------------------------------------------------------------
/wat_039_rw0_wetlands/README.md:
--------------------------------------------------------------------------------
1 | ## Wetlands Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Lakes and Wetlands Database: Lakes and Wetlands Grid (Level 3)](https://www.worldwildlife.org/pages/global-lakes-and-wetlands-database) for [display on Resource Watch](https://resourcewatch.org/data/explore/098b3d64-3679-4448-bf05-039dc0224dd5).
3 |
4 | The source provided the data as an ESRI Grid file within a zipped folder. In order to display this data on Resource Watch, the ESRI Grid file was converted to a GeoTIFF.
5 |
6 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/wat_039_rw0_wetlands/wat_039_rw0_wetlands_processing.py) for more details on this processing.
7 |
8 | You can view the processed Wetlands dataset [on Resource Watch](https://resourcewatch.org/data/explore/098b3d64-3679-4448-bf05-039dc0224dd5).
9 |
10 | You can also download the original dataset [from the source website](https://www.worldwildlife.org/publications/global-lakes-and-wetlands-database-lakes-and-wetlands-grid-level-3).
11 |
12 | ###### Note: This dataset processing was done by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
13 |
--------------------------------------------------------------------------------
/wat_064_cost_of_sustainable_water_management/README.md:
--------------------------------------------------------------------------------
1 | ## Cost of Sustainable Water Management Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Achieving Abundance: Understanding the Cost of a Sustainable Water Future dataset](https://www.wri.org/resources/data-sets/achieving-abundance) for [display on Resource Watch](https://resourcewatch.org/data/explore/wat064-Cost-of-Sustainable-Water-Management).
3 |
4 | This dataset was provided by the source as two excel files. Each file includes the estimated total cost of delivering sustainable water management by 2030, along with the breakdown of how that cost is distributed across different aspects of sustainable water management. One file shows these costs at the country level, and the other file shows the costs at a basin-level. Resource Watch displays the country-level data from this dataset.
5 |
6 | The country-level spreadsheet was read into Python as a dataframe. New columns were added to show the percentage of the total cost that would come from each aspect. These columns were calculated by dividing the cost associated with each aspect of sustainable water management by the total cost, then multiplying them by 100. These percentages were rounded to the nearest integer.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/wat_064_cost_of_sustainable_water_management/wat_064_cost_of_sustainable_water_management_processing.py) for more details on this processing.
9 |
10 | You can view the processed Cost of Sustainable Water Management dataset [on Resource Watch](https://resourcewatch.org/data/explore/wat064-Cost-of-Sustainable-Water-Management).
11 |
12 | You can also download original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/wat_064_cost_of_sustainable_water_management.zip), or [from the source website](https://www.wri.org/resources/data-sets/achieving-abundance).
13 |
14 | ###### Note: This dataset processing was done by [Taufiq Rashid](https://www.wri.org/profile/taufiq-rashid), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
15 |
--------------------------------------------------------------------------------
/wat_065_rw0_hydropoli_tension_and_institu_vulnerability/README.md:
--------------------------------------------------------------------------------
1 | ## Hydropolitical Tension and Institutional Vulnerability Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Relative Risk Category for Hydropolitical Tension and Institutional Vulnerability](http://transboundarywaters.science.oregonstate.edu/content/transboundary-waters-assessment-programme-river-basins-component) for [display on Resource Watch](http://resourcewatch.org/data/explore/bc30d648-2a4f-4f5f-8a1c-b7a471e412bd).
3 |
4 | The source provided the data in a zipped file containing two shapefiles - one of which contains the data for 310 international basins and the other contains the data for 812 basin country units. Both shapefiles were used.
5 |
6 | Below, we describe the steps used to reformat the shapefiles to upload them to Carto.
7 |
8 | 1. Read in the two polygon shapefiles as geopandas dataframes.
9 | 2. Rename the columns in the dataframe for basin country units so column names are consistent across dataframes.
10 | 3. Create a new column 'type' in both dataframes to store whether the polygon is a basin or basin country unit.
11 | 4. Concatenate the two dataframes.
12 | 5. Project the data so its coordinate system is WGS84.
13 | 6. Convert the data type of numeric columns to float.
14 | 7. Convert the column names to lowercase to meet the column name requirements of Carto.
15 | 8. Convert the geometries of the data from shapely objects to geojsons.
16 | 9. Create a new column from the index of the dataframe to use as a unique id column (cartodb_id) in Carto.
17 | 10. Convert the NaNs in numeric columns to None to meet the import requirements of Carto.
18 |
19 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/wat_065_rw0_hydropoli_tension_and_institu_vulnerability/wat_065_rw0_hydropoli_tension_and_institu_vulnerability_processing.py) for more details on this processing.
20 |
21 | You can view the processed {Resource Watch public title} dataset [on Resource Watch](http://resourcewatch.org/data/explore/bc30d648-2a4f-4f5f-8a1c-b7a471e412bd).
22 |
23 | You can also download the original dataset [directly through Resource Watch](https://wri-public-data.s3.amazonaws.com/resourcewatch/wat_065_rw0_hydropoli_tension_and_institu_vulnerability.zip), or [from the source website](http://transboundarywaters.science.oregonstate.edu/sites/transboundarywaters.science.oregonstate.edu/files/Database/Data/Spatial/TFDDSpatialData_Public_20181119.zip).
24 |
25 | ###### Note: This dataset processing was done by [Yujing Wu](https://www.wri.org/profile/yujing-wu), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
26 |
--------------------------------------------------------------------------------
/wat_066_rw0_conflict_forecast/README.md:
--------------------------------------------------------------------------------
1 | ## Conflict Forecast Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Water, Peace and Security Global Early Warning Tool](https://waterpeacesecurity.org/info/methodology) for [display on Resource Watch](https://resourcewatch.org/data/explore/7b3e6674-9a1b-4e06-92e4-4548117aa59e).
3 |
4 | The source provides the data as a shapefile. This data file was not modified from the original version for display on Resource Watch.
5 |
6 | You can view the processed Conflict Forecast dataset [on Resource Watch](https://resourcewatch.org/data/explore/7b3e6674-9a1b-4e06-92e4-4548117aa59e).
7 |
8 | You can also download the original dataset [from the source website](https://waterpeacesecurity.org/map).
9 |
10 | ###### Note: This dataset processing was done by [Liz Saccoccia](https://www.wri.org/profile/liz-saccoccia), and QC'd by [Liz Saccoccia](https://www.wri.org/profile/liz-saccoccia).
11 |
--------------------------------------------------------------------------------
/wat_067_rw0_aqueduct_riverine_flood_hazard/README.md:
--------------------------------------------------------------------------------
1 | ## Aqueduct Riverine Flood Hazard Dataset Pre-processing
2 |
3 | This file describes the data pre-processing that was done to [the Aqueduct Riverine Flood Hazards](https://www.wri.org/publication/aqueduct-floods-methodology) for [display on Resource Watch](https://resourcewatch.org/data/explore/765e5c3f-f569-4d91-806e-5056c5d1663e).
4 |
5 | The data was provided by the source as four geotiff files. The files showed the risk of riverine floods in a business as usual climate scenario using a flood magnitude of 100 years in 2010 (baseline), 2030, 2050, and 2080.
6 |
7 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/wat_067_rw0_aqueduct_riverine_flood_hazard/wat_067_rw0_aqueduct_riverine_flood_hazard_processing.py) for more details on this processing.
8 |
9 | You can view the processed Aqueduct Riverine Flood Hazard dataset [on Resource Watch](https://resourcewatch.org/data/explore/765e5c3f-f569-4d91-806e-5056c5d1663e).
10 |
11 | You can also download the original dataset [from the source website](http://wri-projects.s3.amazonaws.com/AqueductFloodTool/download/v2/index.html).
12 |
13 | ###### Note: This dataset processing was done by [Daniel Gonzalez](https://www.wri.org/profile/daniel-gonzalez), and QC'd by [Amelia Snyder](https://www.wri.org/profile/amelia-snyder).
14 |
--------------------------------------------------------------------------------
/wat_068_rw0_watersheds/README.md:
--------------------------------------------------------------------------------
1 | ## Watersheds Data Pre-Processing
2 | This file describes the data pre-processing for the [HydroBASINS](https://www.hydrosheds.org/page/hydrobasins) dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/ab6216ee-9a2b-412f-b538-8644d5834c7a).
3 |
4 | The source provided data as regional tiles in individual polygon shapefiles, one for each
5 | region and each Pfafstetter level.
6 |
7 | Due to the resolution of the sub-basins delineations, the breakdowns provided by levels 3-8 are most relevant for analysis, and were thus selected for display on Resource Watch.
8 |
9 | Below, we describe the steps used to combine the regional shapefiles by level and insert these into a single carto table.
10 |
11 | For each relevant basin level:
12 | 1. Read in the shapefiles for the nine regional tiles as a geopandas dataframe.
13 | 2. Convert the column names to lowercase to match Carto column name requirements.
14 | 3. Combine shapefiles for all regional tiles at the given level into one shapefile, with a column for 'level' to indicate which level the file represents.
15 | 4. Upload the combined shapefile to Carto.
16 |
17 | The shapefiles for each level were then combined into into a single table on carto. For example, the following SQL statement was used to insert the level 4 shapefile into the new, combined table:
18 | ```
19 | INSERT INTO wat_068_rw0_watersheds_edit(the_geom, hybas_id, next_down, next_sink, main_bas, dist_sink, dist_main, sub_area, up_area, pfaf_id, endo, coast, _order, sort, level) SELECT the_geom, hybas_id, next_down, next_sink, main_bas, dist_sink, dist_main, sub_area, up_area, pfaf_id, endo, coast, _order, sort, level
20 | FROM wat_068_rw0_watersheds_lev04_edit
21 | ```
22 |
23 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/wat_068_rw0_watersheds/wat_068_rw0_watersheds_processing.py) for more details on this processing.
24 |
25 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/ab6216ee-9a2b-412f-b538-8644d5834c7a).
26 |
27 | You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/wat_068_rw0_watersheds.zip), or [from the source website](https://www.dropbox.com/sh/hmpwobbz9qixxpe/AACPCyoHHAQUt_HNdIbWOFF4a/HydroBASINS/standard?dl=0&subfolder_nav_tracking=1).
28 |
29 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
30 |
--------------------------------------------------------------------------------
/wat_069_rw0_saltmarshes/README.md:
--------------------------------------------------------------------------------
1 | ## Saltmarshes Dataset Pre-processing
2 | This file describes the data pre-processing that was done to [the Global Distribution of Saltmarshes](https://data.unep-wcmc.org/datasets/43%20(v.6)) for [display on Resource Watch](https://resourcewatch.org/data/explore/6a22c67d-9e3c-4f8c-94d2-d90fe886b476).
3 |
4 | The source provided this dataset as two shapefiles - one of which contains polygon data, and the other contains point data. Resource Watch only displays the polygon layer.
5 |
6 | Below, we describe the steps used to reformat the polygon shapefile:
7 | 1. Read in the polygon shapefile as a geopandas data frame.
8 | 2. Convert the data type of column 'METADATA_I' to integer.
9 | 3. Convert the column names to lowercase so the table can be uploaded to Carto without losing information.
10 | 4. Project the data so its coordinate system is WGS84.
11 |
12 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/wat_069_rw0_saltmarshes/wat_069_rw0_saltmarshes_processing.py) for more details on this processing.
13 |
14 | You can view the processed Saltmarshes dataset [on Resource Watch](https://resourcewatch.org/data/explore/6a22c67d-9e3c-4f8c-94d2-d90fe886b476).
15 |
16 | You can also download the original dataset [directly through Resource Watch](http://wri-public-data.s3.amazonaws.com/resourcewatch/wat_069_rw0_saltmarshes.zip), or [from the source website](https://data.unep-wcmc.org/datasets/43%20(v.6)).
17 |
18 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms), and QC'd by [Yujing Wu](https://www.wri.org/profile/yujing-wu).
19 |
--------------------------------------------------------------------------------
/wat_070_rw0_soil_erosion/README.md:
--------------------------------------------------------------------------------
1 | ## Soil Erosion Dataset Pre-processing
2 | This file describes the data pre-processing that was done to the [Soil Erosion Prevalence](http://landscapeportal.org/blog/30/) for [display on Resource Watch](https://resourcewatch.org/data/explore/719c3024-9f06-47c8-b646-9c2f26dd9737).
3 |
4 | This dataset is provided as a series of GeoTIFF files from the data provider to the Resource Watch data team.
5 |
6 | To display these data on Resource Watch, each GeoTIFF was translated into the appropriate projection for web display and uploaded to Google Earth Engine.
7 |
8 | Please see the [Python script](https://github.com/resource-watch/data-pre-processing/blob/master/wat_070_rw0_soil_erosion/wat_070_rw0_soil_erosion_processing.py) for more details on this processing.
9 |
10 | You can view the processed dataset for [display on Resource Watch](https://resourcewatch.org/data/explore/719c3024-9f06-47c8-b646-9c2f26dd9737).
11 |
12 | You can also contact the source to get access original dataset [from the source website](http://landscapeportal.org/blog/30/).
13 |
14 | ###### Note: This dataset processing was done by [Rachel Thoms](https://www.wri.org/profile/rachel-thoms) and QC'd by [Weiqi Zhou](https://www.wri.org/profile/weiqi-zhou).
15 |
--------------------------------------------------------------------------------