├── LICENSE.md ├── README.md ├── _config.yml └── img ├── dataset_teaser.png └── dataset_teaser_small.png /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Vision Systems, Inc 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## OpenSentinelMap 2 | 3 | The OpenSentinelMap dataset contains Sentinel-2 imagery and per-pixel semantic label masks derived from OpenStreetMap. It is described in [this paper](https://openaccess.thecvf.com/content/CVPR2022W/EarthVision/papers/Johnson_OpenSentinelMap_A_Large-Scale_Land_Use_Dataset_Using_OpenStreetMap_and_Sentinel-2_CVPRW_2022_paper.pdf). 4 | 5 | ![this is an overview image](/img/dataset_teaser.png) 6 | 7 | ### Data Access 8 | 9 | #### Azure Blob Storage (Free) 10 | 11 | The dataset may be freely downloaded from Azure Blob Storage: 12 | 13 | [Spatial Cell Metadata](https://vsipublic.blob.core.usgovcloudapi.net/vsi-open-sentinel-map/spatial_cell_info.csv) 14 | 15 | [OSM Label Categories](https://vsipublic.blob.core.usgovcloudapi.net/vsi-open-sentinel-map/osm_categories.json) 16 | 17 | [OSM Rasterized Label Images](https://vsipublic.blob.core.usgovcloudapi.net/vsi-open-sentinel-map/osm_label_images.tgz) 18 | 19 | 20 | [OSM Sentinel Imagery 2017](https://vsipublic.blob.core.usgovcloudapi.net/vsi-open-sentinel-map/osm_sentinel_imagery_2017.tgz) 21 | 22 | [OSM Sentinel Imagery 2018](https://vsipublic.blob.core.usgovcloudapi.net/vsi-open-sentinel-map/osm_sentinel_imagery_2018.tgz) 23 | 24 | [OSM Sentinel Imagery 2019](https://vsipublic.blob.core.usgovcloudapi.net/vsi-open-sentinel-map/osm_sentinel_imagery_2019.tgz) 25 | 26 | [OSM Sentinel Imagery 2020](https://vsipublic.blob.core.usgovcloudapi.net/vsi-open-sentinel-map/osm_sentinel_imagery_2020.tgz) 27 | 28 | 29 | [EuroSAT Sentinel L2A Resamples](https://vsipublic.blob.core.usgovcloudapi.net/vsi-open-sentinel-map/EuroSAT_sentinel2.tar.gz) 30 | 31 | 32 | #### AWS (Paid) 33 | 34 | As a backup option, or for faster download speeds, the dataset is also available on Amazon S3. You can use the following command to download it, but beware that Amazon will charge your AWS profile about $40 in data transfer fees (about 9 cents a GB, and 445 GB in total). NOTE: This option will be deprecated soon in favor of Azure Blob Storage. 35 | 36 | ``` 37 | aws s3 cp s3://vsi-open-sentinel-map/ ./open-sentinel-map --recursive --request-payer 38 | ``` 39 | 40 | ### Data Format 41 | 42 | #### Imagery 43 | 44 | Image data is separated by year from 2017 to 2020. Each year's worth of Sentinel imagery is compressed into a osm_sentinel_imagery_{YEAR}.tgz file. These files can be untarred using the following command. 45 | ``` 46 | tar -xvzf osm_sentinel_imagery_{YEAR}.tgz 47 | ``` 48 | The untarred folders of sentinel imagery will have the format 49 | ``` 50 | MGRS_TILE/ 51 | SPATIAL_CELL/ 52 | {ID}_{YEAR}.npz 53 | ``` 54 | where each .npz file is a compressed numpy file containing the 32-bit float Bottom-of-Atmosphere imagery data. This file can be loaded from python using the numpy.load function, and the bands accessed via their keys. The bands are grouped by spatial resolution, and accessible using the key "gsd_{RESOLUTION}" (i.e. "gsd_10", "gsd_20", "gsd_60"). 55 | 56 | Note: The original Sentinel-2 data is stored as unsigned 16-bit integers. Our dataset converts to 32-bit floats and applies the Sentinel-2 scaling factor (divison by 10,000) to retrieve surface reflectance values. Although this should result in values from 0 to 1, some values will exceed 1 due to small errors in the data. We decided to keep these values greater than 1 for training robustness. 57 | 58 | The "gsd_10" array bands have the order blue, green, red, and then NIR. The "gsd_20" bands have 4 vegetation red edge bands, followed by two SWIR bands. The "gsd_60" array consists of the coastal aerosol and water vapour bands. The exact corresponding bands from the Sentinel-2 platform are listed in the table below. Find more information about these spectral bands [here](https://gisgeography.com/sentinel-2-bands-combinations/). 59 | 60 | | Data Key | Sentinel-2 Bands | 61 | | -------- | ---------------- | 62 | | gsd_10 | B02, B03, B04, B08 | 63 | | gsd_20 | B05, B06, B07, B8A, B11, B12 | 64 | | gsd_60 | B01, B09 | 65 | 66 | In addition to the "gsd_*" bands, the image files contain an "scl" band. The "scl" band contains the Scene Classification Layer values, which inform the quality of each pixel at 20 m. resolution. These values are described in the table below. 67 | 68 | | Label | Classification | 69 | | ----- | -------------- | 70 | | 0 | NO_DATA | 71 | | 1 | SATURATED_OR_DEFECTIVE | 72 | | 2 | CAST_SHADOWS | 73 | | 3 | CLOUD_SHADOWS | 74 | | 4 | VEGETATION | 75 | | 5 | NOT_VEGETATED | 76 | | 6 | WATER | 77 | | 7 | UNCLASSIFIED | 78 | | 8 | CLOUD_MEDIUM_PROBABILITY | 79 | | 9 | CLOUD_HIGH_PROBABILITY | 80 | | 10 | THIN_CIRRUS | 81 | | 11 | SNOW or ICE | 82 | 83 | The image files also contain a "bad_percent" key, which is a float value between 0 and 1 describing the percentage of pixels within the "scl" band which we've determined to be bad data. Currently we filter images with more than 25% of their pixels having bad data. You can use this key to filter the dataset using a different threshold. 84 | 85 | #### Annotations 86 | 87 | The label images can be untarred using the command 88 | ``` 89 | tar -xvzf osm_label_images.tgz 90 | ``` 91 | 92 | These images are in PNG format, with label values as described in the osm_categories.json file. 93 | 94 | #### Auxiliary Data 95 | 96 | The spatial_cell_info CSV file contains metadata for each spatial cell: the lon/lat bounds, the MGRS tile it is within, and the training split it belongs to. Note that the current data split was performed at the MGRS tile level to prevent data leakage. Use caution if performing your own train/test split. 97 | 98 | The osm_categories JSON file details the exact mapping from OpenStreetMap tags to OpenSentinelMap labels. 99 | 100 | ### Licenses 101 | 102 | This dataset is made available under the MIT license, freely available for both academic and commercial use. 103 | 104 | Access to Sentinel data is free, full and open for the broad Regional, National, European and International user community. View [Terms and Conditions](https://scihub.copernicus.eu/twiki/do/view/SciHubWebPortal/TermsConditions). 105 | 106 | OpenStreetMap® is _open data_, licensed under the [Open Data Commons Open Database License](https://opendatacommons.org/licenses/odbl/) (ODbL) by the [OpenStreetMap Foundation](https://wiki.osmfoundation.org/wiki/Main_Page) (OSMF). 107 | 108 | ### Contact 109 | 110 | [email](mailto:dan@visionsystemsinc.com) 111 | 112 | ### How to Cite 113 | 114 | bibtex: 115 | ``` 116 | @InProceedings{Johnson_2022_CVPR, 117 | author = {Johnson, Noah and Treible, Wayne and Crispell, Daniel}, 118 | title = {OpenSentinelMap: A Large-Scale Land Use Dataset Using OpenStreetMap and Sentinel-2 Imagery}, 119 | booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, 120 | month = {June}, 121 | year = {2022}, 122 | pages = {1333-1341} 123 | } 124 | ``` 125 | 126 | ### Acknowledgements 127 | 128 | This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2021-2011000004. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. 129 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-slate -------------------------------------------------------------------------------- /img/dataset_teaser.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VisionSystemsInc/open-sentinel-map/e0da8a4a81c990ad4dd6e87e30a890bbbe0d4d0d/img/dataset_teaser.png -------------------------------------------------------------------------------- /img/dataset_teaser_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VisionSystemsInc/open-sentinel-map/e0da8a4a81c990ad4dd6e87e30a890bbbe0d4d0d/img/dataset_teaser_small.png --------------------------------------------------------------------------------