├── .gitignore ├── README.md ├── data └── catalog.json ├── environment.yaml ├── explore_dataset.ipynb ├── prepare ├── __init__.py ├── preproc_s1_snap.xml ├── split.py └── utils.py ├── s1s2_water.py └── settings.toml /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | data/* 3 | tiles/* 4 | !data/catalog.json -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images 2 | This repository provides tools to work with the [S1S2-Water dataset](https://zenodo.org/records/11278238). 3 | 4 | [S1S2-Water dataset](https://zenodo.org/records/11278238) is a global reference dataset for training, validation and testing of convolutional neural networks for semantic segmentation of surface water bodies in publicly available Sentinel-1 and Sentinel-2 satellite images. The dataset consists of 65 triplets of Sentinel-1 and Sentinel-2 images with quality checked binary water mask. Samples are drawn globally on the basis of the Sentinel-2 tile-grid (100 x 100 km) under consideration of pre-dominant landcover and availability of water bodies. Each sample is complemented with STAC-compliant metadata and Digital Elevation Model (DEM) raster from the Copernicus DEM. 5 | 6 | The following pre-print article describes the dataset: 7 | 8 | > Wieland, M., Fichtner, F., Martinis, S., Groth, S., Krullikowski, C., Plank, S., Motagh, M. (2023). S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images. *IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing*, doi: [10.1109/JSTARS.2023.3333969](https://dx.doi.org/10.1109/JSTARS.2023.3333969). 9 | 10 | ## Dataset version update (2024-05-24) 11 | The dataset on Zenodo has been updated to a new version. Sentinel-1 scenes for samples #31 and #59 have been missing in [v1.0.0](https://zenodo.org/records/8314175) and are now included in [v1.0.1](https://zenodo.org/records/11278238) along with all relevant masks and metadata. 12 | 13 | ## Dataset access 14 | The dataset (~170 GB) is available for download at: https://zenodo.org/records/11278238 (v1.0.1) 15 | 16 | Download the dataset parts and extract them into a single data directory as follows. 17 | 18 | ``` 19 | . 20 | └── data/ 21 | ├── 1/ 22 | │ ├── sentinel12_copdem30_1_elevation.tif 23 | │ ├── sentinel12_copdem30_1_slope.tif 24 | │ ├── sentinel12_s1_1_img.tif 25 | │ ├── sentinel12_s1_1_msk.tif 26 | │ ├── sentinel12_s1_1_valid.tif 27 | │ ├── sentinel12_s2_1_img.tif 28 | │ ├── sentinel12_s2_1_msk.tif 29 | │ ├── sentinel12_s2_1_valid.tif 30 | │ └── sentinel12_1_meta.json 31 | ├── 5/ 32 | │ ├── sentinel12_copdem30_5_elevation.tif 33 | │ ├── sentinel12_copdem30_5_slope.tif 34 | │ ├── sentinel12_s1_5_img.tif 35 | │ ├── sentinel12_s1_5_msk.tif 36 | │ ├── sentinel12_s1_5_valid.tif 37 | │ ├── sentinel12_s2_5_img.tif 38 | │ ├── sentinel12_s2_5_msk.tif 39 | │ ├── sentinel12_s2_5_valid.tif 40 | │ └── sentinel12_5_meta.json 41 | ├── .../ 42 | │ └── ... 43 | └── catalog.json 44 | ``` 45 | 46 | ## Dataset information 47 | Each file follows the naming scheme sentinel12_SENSOR_ID_LAYER.tif (e.g. `sentinel12_s1_5_img.tif`). Raster layers are stored as Cloud Optimized GeoTIFF (COG) and are projected to Universal Transverse Mercator (UTM). 48 | 49 | | Sensor | Layer |Description | Values | Format | Bands | 50 | | - | - | - | - | - | - | 51 | | S1 | IMG | Sentinel-1 image
GRD product | Unit: dB (scaled by factor 100) | GeoTIFF
10980 x 10980 px
2 bands
Int16 | 0: VV
1: VH 52 | | S2 | IMG | Sentinel-2 image
L1C product | Unit: TOA reflectance (scaled by factor 10000) | GeoTIFF
10980 x 10980 px
6 bands
UInt16 | 0: Blue
1: Green
2: Red
3: NIR
4: SWIR1
5: SWIR2 53 | | S1 / S2 | MSK | Annotation mask
Hand-labelled water mask | 0: No Water
1: Water | GeoTIFF
10980 x 10980 px
1 band
UInt8 | 0: Water mask 54 | | S1 / S2 | VALID | Valid pixel mask
Hand-labelled valid pixel mask | 0: Invalid (cloud, cloud-shadow, nodata)
1: Valid | GeoTIFF
10980 x 10980 px
1 band
UInt8 | 0: Valid mask 55 | | COPDEM30 | ELEVATION | Copernicus DEM elevation | Unit: Meters | GeoTIFF
3660 x 3660 px
1 band
Int16 | 0: Elevation 56 | | COPDEM30 | SLOPE | Copernicus DEM slope | Unit: Degrees | GeoTIFF
3660 x 3660 px
1 band
Int16 | 0: Slope 57 | | N.a. | META | METADATA | STAC metadata item | JSON | N.a. 58 | 59 | ## Data preparation 60 | Make sure to download the dataset as described above. Clone this repository, adjust [settings.toml](settings.toml) and run [s1s2_water.py](s1s2_water.py) to prepare the dataset according to your desired settings. 61 | 62 | The following splits images and masks for a specific sensor (Sentinel-1 or Sentinel-2) into training, validation and testing tiles with predefined shape and band combination. Slope information can be appended to the image band stack if required. 63 | 64 | ```python 65 | $ python s1s2_water.py --settings settings.toml 66 | ``` 67 | 68 | Data preparation parameters are defined in a [settings TOML file](settings.toml) (**--settings**) 69 | 70 | ```toml 71 | SENSOR = "s2" # prepare Sentinel-1 or Sentinel-2 data ["s1", "s2"] 72 | TILE_SHAPE = [256, 256] # desired tile shape in pixel 73 | IMG_BANDS_IDX = [0, 1, 2, 3, 4, 5] # desired image band combination 74 | SLOPE = true # append slope band to image bands 75 | EXCLUDE_NODATA = true # exclude tiles with nodata values 76 | DATA_DIR = "/path/to/data_directory" # data directory that holds the original images 77 | OUT_DIR = "/path/to/output_directory" # output directory that stores the prepared train, val and test tiles 78 | 79 | # Sentinel-1 image bands 80 | # {"VV": 0, "VH": 1} 81 | 82 | # Sentinel-2 image bands 83 | # {"Blue": 0, "Green": 1, "Red": 2, "NIR": 3, "SWIR1": 4, "SWIR2": 5} 84 | ``` 85 | 86 | Information on the deployed preprocessing steps for Sentinel-1 imagery can be found in the [SNAP GPT file](prepare/preproc_s1_snap.xml). 87 | 88 | ## Installation 89 | ```shell 90 | $ conda env create --file environment.yaml 91 | $ conda activate s1s2_water 92 | ``` 93 | -------------------------------------------------------------------------------- /data/catalog.json: -------------------------------------------------------------------------------- 1 | { 2 | "id": "s1s2_water", 3 | "type": "Catalog", 4 | "title": "s1s2_water", 5 | "stac_version": "1.0.0", 6 | "description": "S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images", 7 | "links": [ 8 | { 9 | "rel": "root", 10 | "href": "./catalog.json", 11 | "type": "application/json", 12 | "title": "catalog" 13 | }, 14 | { 15 | "rel": "self", 16 | "href": "./catalog.json", 17 | "type": "application/json" 18 | }, 19 | { 20 | "rel": "item", 21 | "href": "./93/sentinel12_93_meta.json", 22 | "type": "application/json", 23 | "title": "metadata" 24 | }, 25 | { 26 | "rel": "item", 27 | "href": "./91/sentinel12_91_meta.json", 28 | "type": "application/json", 29 | "title": "metadata" 30 | }, 31 | { 32 | "rel": "item", 33 | "href": "./9/sentinel12_9_meta.json", 34 | "type": "application/json", 35 | "title": "metadata" 36 | }, 37 | { 38 | "rel": "item", 39 | "href": "./89/sentinel12_89_meta.json", 40 | "type": "application/json", 41 | "title": "metadata" 42 | }, 43 | { 44 | "rel": "item", 45 | "href": "./88/sentinel12_88_meta.json", 46 | "type": "application/json", 47 | "title": "metadata" 48 | }, 49 | { 50 | "rel": "item", 51 | "href": "./87/sentinel12_87_meta.json", 52 | "type": "application/json", 53 | "title": "metadata" 54 | }, 55 | { 56 | "rel": "item", 57 | "href": "./86/sentinel12_86_meta.json", 58 | "type": "application/json", 59 | "title": "metadata" 60 | }, 61 | { 62 | "rel": "item", 63 | "href": "./85/sentinel12_85_meta.json", 64 | "type": "application/json", 65 | "title": "metadata" 66 | }, 67 | { 68 | "rel": "item", 69 | "href": "./84/sentinel12_84_meta.json", 70 | "type": "application/json", 71 | "title": "metadata" 72 | }, 73 | { 74 | "rel": "item", 75 | "href": "./83/sentinel12_83_meta.json", 76 | "type": "application/json", 77 | "title": "metadata" 78 | }, 79 | { 80 | "rel": "item", 81 | "href": "./82/sentinel12_82_meta.json", 82 | "type": "application/json", 83 | "title": "metadata" 84 | }, 85 | { 86 | "rel": "item", 87 | "href": "./80/sentinel12_80_meta.json", 88 | "type": "application/json", 89 | "title": "metadata" 90 | }, 91 | { 92 | "rel": "item", 93 | "href": "./8/sentinel12_8_meta.json", 94 | "type": "application/json", 95 | "title": "metadata" 96 | }, 97 | { 98 | "rel": "item", 99 | "href": "./78/sentinel12_78_meta.json", 100 | "type": "application/json", 101 | "title": "metadata" 102 | }, 103 | { 104 | "rel": "item", 105 | "href": "./77/sentinel12_77_meta.json", 106 | "type": "application/json", 107 | "title": "metadata" 108 | }, 109 | { 110 | "rel": "item", 111 | "href": "./76/sentinel12_76_meta.json", 112 | "type": "application/json", 113 | "title": "metadata" 114 | }, 115 | { 116 | "rel": "item", 117 | "href": "./75/sentinel12_75_meta.json", 118 | "type": "application/json", 119 | "title": "metadata" 120 | }, 121 | { 122 | "rel": "item", 123 | "href": "./73/sentinel12_73_meta.json", 124 | "type": "application/json", 125 | "title": "metadata" 126 | }, 127 | { 128 | "rel": "item", 129 | "href": "./71/sentinel12_71_meta.json", 130 | "type": "application/json", 131 | "title": "metadata" 132 | }, 133 | { 134 | "rel": "item", 135 | "href": "./7/sentinel12_7_meta.json", 136 | "type": "application/json", 137 | "title": "metadata" 138 | }, 139 | { 140 | "rel": "item", 141 | "href": "./69/sentinel12_69_meta.json", 142 | "type": "application/json", 143 | "title": "metadata" 144 | }, 145 | { 146 | "rel": "item", 147 | "href": "./68/sentinel12_68_meta.json", 148 | "type": "application/json", 149 | "title": "metadata" 150 | }, 151 | { 152 | "rel": "item", 153 | "href": "./67/sentinel12_67_meta.json", 154 | "type": "application/json", 155 | "title": "metadata" 156 | }, 157 | { 158 | "rel": "item", 159 | "href": "./65/sentinel12_65_meta.json", 160 | "type": "application/json", 161 | "title": "metadata" 162 | }, 163 | { 164 | "rel": "item", 165 | "href": "./64/sentinel12_64_meta.json", 166 | "type": "application/json", 167 | "title": "metadata" 168 | }, 169 | { 170 | "rel": "item", 171 | "href": "./62/sentinel12_62_meta.json", 172 | "type": "application/json", 173 | "title": "metadata" 174 | }, 175 | { 176 | "rel": "item", 177 | "href": "./6/sentinel12_6_meta.json", 178 | "type": "application/json", 179 | "title": "metadata" 180 | }, 181 | { 182 | "rel": "item", 183 | "href": "./59/sentinel12_59_meta.json", 184 | "type": "application/json", 185 | "title": "metadata" 186 | }, 187 | { 188 | "rel": "item", 189 | "href": "./57/sentinel12_57_meta.json", 190 | "type": "application/json", 191 | "title": "metadata" 192 | }, 193 | { 194 | "rel": "item", 195 | "href": "./55/sentinel12_55_meta.json", 196 | "type": "application/json", 197 | "title": "metadata" 198 | }, 199 | { 200 | "rel": "item", 201 | "href": "./54/sentinel12_54_meta.json", 202 | "type": "application/json", 203 | "title": "metadata" 204 | }, 205 | { 206 | "rel": "item", 207 | "href": "./53/sentinel12_53_meta.json", 208 | "type": "application/json", 209 | "title": "metadata" 210 | }, 211 | { 212 | "rel": "item", 213 | "href": "./52/sentinel12_52_meta.json", 214 | "type": "application/json", 215 | "title": "metadata" 216 | }, 217 | { 218 | "rel": "item", 219 | "href": "./51/sentinel12_51_meta.json", 220 | "type": "application/json", 221 | "title": "metadata" 222 | }, 223 | { 224 | "rel": "item", 225 | "href": "./5/sentinel12_5_meta.json", 226 | "type": "application/json", 227 | "title": "metadata" 228 | }, 229 | { 230 | "rel": "item", 231 | "href": "./49/sentinel12_49_meta.json", 232 | "type": "application/json", 233 | "title": "metadata" 234 | }, 235 | { 236 | "rel": "item", 237 | "href": "./48/sentinel12_48_meta.json", 238 | "type": "application/json", 239 | "title": "metadata" 240 | }, 241 | { 242 | "rel": "item", 243 | "href": "./47/sentinel12_47_meta.json", 244 | "type": "application/json", 245 | "title": "metadata" 246 | }, 247 | { 248 | "rel": "item", 249 | "href": "./41/sentinel12_41_meta.json", 250 | "type": "application/json", 251 | "title": "metadata" 252 | }, 253 | { 254 | "rel": "item", 255 | "href": "./40/sentinel12_40_meta.json", 256 | "type": "application/json", 257 | "title": "metadata" 258 | }, 259 | { 260 | "rel": "item", 261 | "href": "./39/sentinel12_39_meta.json", 262 | "type": "application/json", 263 | "title": "metadata" 264 | }, 265 | { 266 | "rel": "item", 267 | "href": "./37/sentinel12_37_meta.json", 268 | "type": "application/json", 269 | "title": "metadata" 270 | }, 271 | { 272 | "rel": "item", 273 | "href": "./36/sentinel12_36_meta.json", 274 | "type": "application/json", 275 | "title": "metadata" 276 | }, 277 | { 278 | "rel": "item", 279 | "href": "./35/sentinel12_35_meta.json", 280 | "type": "application/json", 281 | "title": "metadata" 282 | }, 283 | { 284 | "rel": "item", 285 | "href": "./33/sentinel12_33_meta.json", 286 | "type": "application/json", 287 | "title": "metadata" 288 | }, 289 | { 290 | "rel": "item", 291 | "href": "./32/sentinel12_32_meta.json", 292 | "type": "application/json", 293 | "title": "metadata" 294 | }, 295 | { 296 | "rel": "item", 297 | "href": "./31/sentinel12_31_meta.json", 298 | "type": "application/json", 299 | "title": "metadata" 300 | }, 301 | { 302 | "rel": "item", 303 | "href": "./30/sentinel12_30_meta.json", 304 | "type": "application/json", 305 | "title": "metadata" 306 | }, 307 | { 308 | "rel": "item", 309 | "href": "./29/sentinel12_29_meta.json", 310 | "type": "application/json", 311 | "title": "metadata" 312 | }, 313 | { 314 | "rel": "item", 315 | "href": "./28/sentinel12_28_meta.json", 316 | "type": "application/json", 317 | "title": "metadata" 318 | }, 319 | { 320 | "rel": "item", 321 | "href": "./27/sentinel12_27_meta.json", 322 | "type": "application/json", 323 | "title": "metadata" 324 | }, 325 | { 326 | "rel": "item", 327 | "href": "./26/sentinel12_26_meta.json", 328 | "type": "application/json", 329 | "title": "metadata" 330 | }, 331 | { 332 | "rel": "item", 333 | "href": "./25/sentinel12_25_meta.json", 334 | "type": "application/json", 335 | "title": "metadata" 336 | }, 337 | { 338 | "rel": "item", 339 | "href": "./23/sentinel12_23_meta.json", 340 | "type": "application/json", 341 | "title": "metadata" 342 | }, 343 | { 344 | "rel": "item", 345 | "href": "./22/sentinel12_22_meta.json", 346 | "type": "application/json", 347 | "title": "metadata" 348 | }, 349 | { 350 | "rel": "item", 351 | "href": "./21/sentinel12_21_meta.json", 352 | "type": "application/json", 353 | "title": "metadata" 354 | }, 355 | { 356 | "rel": "item", 357 | "href": "./19/sentinel12_19_meta.json", 358 | "type": "application/json", 359 | "title": "metadata" 360 | }, 361 | { 362 | "rel": "item", 363 | "href": "./17/sentinel12_17_meta.json", 364 | "type": "application/json", 365 | "title": "metadata" 366 | }, 367 | { 368 | "rel": "item", 369 | "href": "./16/sentinel12_16_meta.json", 370 | "type": "application/json", 371 | "title": "metadata" 372 | }, 373 | { 374 | "rel": "item", 375 | "href": "./15/sentinel12_15_meta.json", 376 | "type": "application/json", 377 | "title": "metadata" 378 | }, 379 | { 380 | "rel": "item", 381 | "href": "./13/sentinel12_13_meta.json", 382 | "type": "application/json", 383 | "title": "metadata" 384 | }, 385 | { 386 | "rel": "item", 387 | "href": "./12/sentinel12_12_meta.json", 388 | "type": "application/json", 389 | "title": "metadata" 390 | }, 391 | { 392 | "rel": "item", 393 | "href": "./11/sentinel12_11_meta.json", 394 | "type": "application/json", 395 | "title": "metadata" 396 | }, 397 | { 398 | "rel": "item", 399 | "href": "./10/sentinel12_10_meta.json", 400 | "type": "application/json", 401 | "title": "metadata" 402 | }, 403 | { 404 | "rel": "item", 405 | "href": "./1/sentinel12_1_meta.json", 406 | "type": "application/json", 407 | "title": "metadata" 408 | } 409 | ] 410 | } -------------------------------------------------------------------------------- /environment.yaml: -------------------------------------------------------------------------------- 1 | name: s1s2_water 2 | dependencies: 3 | - python=3.11 4 | - rasterio>=1.3.8 5 | - pip: 6 | - ukis-pysat[raster]>=1.5.1 7 | - pydantic>=2.3.0 8 | - toml>=0.10.2 9 | - tifffile>=2023.8.30 10 | - pystac==1.8.3 11 | - pystac_client==0.6.1 12 | - geopandas>=0.13.2 13 | - folium>=0.14.0 14 | - tqdm>=4.66.1 15 | 16 | -------------------------------------------------------------------------------- /explore_dataset.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "25de2f43", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "import folium\n", 11 | "import geopandas as gpd\n", 12 | "\n", 13 | "from pystac_client import Client" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 2, 19 | "id": "8308622a", 20 | "metadata": {}, 21 | "outputs": [ 22 | { 23 | "data": { 24 | "text/plain": [ 25 | "'s1s2_water'" 26 | ] 27 | }, 28 | "execution_count": 2, 29 | "metadata": {}, 30 | "output_type": "execute_result" 31 | } 32 | ], 33 | "source": [ 34 | "# open static STAC catalog\n", 35 | "catalog = Client.open(\"data/catalog.json\")\n", 36 | "catalog.title" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 3, 42 | "id": "082a5d51", 43 | "metadata": { 44 | "scrolled": true 45 | }, 46 | "outputs": [ 47 | { 48 | "data": { 49 | "text/html": [ 50 | "
\n", 51 | "\n", 64 | "\n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | "
geometryfloodsplitdatetimelandcovers1_srcidss2_srcidscopdem30_slopecopdem30_elevationdate_s1date_s2
0POLYGON ((166.27203 -45.14621, 167.66760 -45.1...Falseval2020-01-01T00:00:00ZTree cover, broadleaved, evergreen, closed to ...[S1B_IW_GRDH_1SDV_20200206T075413_20200206T075...[S2B_MSIL1C_20200209T224759_N0209_R115_T58GFQ_...sentinel12_copdem30_93_slopesentinel12_copdem30_93_elevation2020020620200209
1POLYGON ((106.16716 39.74440, 107.44788 39.724...Falsetrain2020-01-01T00:00:00ZGrassland[S1A_IW_GRDH_1SDV_20201108T104645_20201108T104...[S2A_MSIL1C_20201107T033941_N0209_R061_T48SXJ_...sentinel12_copdem30_91_slopesentinel12_copdem30_91_elevation2020110820201107
2POLYGON ((-14.09034 8.95716, -14.09034 9.94593...Falsetrain2020-01-01T00:00:00ZWater bodies[S1A_IW_GRDH_1SDV_20190422T190812_20190422T190...[S2A_MSIL1C_20190422T110621_N0207_R137_T28PFR_...sentinel12_copdem30_9_slopesentinel12_copdem30_9_elevation2019042220190422
3POLYGON ((50.99980 25.31673, 52.09064 25.31270...Falsetest2020-01-01T00:00:00ZWater bodies[S1A_IW_GRDH_1SDV_20200410T023157_20200410T023...[S2A_MSIL1C_20200414T070621_N0209_R106_T39RWH_...sentinel12_copdem30_89_slopesentinel12_copdem30_89_elevation2020041020200414
4POLYGON ((32.02016 23.50747, 33.09560 23.51053...Falsetest2020-01-01T00:00:00ZBare areas[S1B_IW_GRDH_1SDV_20201124T155420_20201124T155...[S2B_MSIL1C_20201126T082259_N0209_R121_T36QVL_...sentinel12_copdem30_88_slopesentinel12_copdem30_88_elevation2020112420201126
\n", 154 | "
" 155 | ], 156 | "text/plain": [ 157 | " geometry flood split \\\n", 158 | "0 POLYGON ((166.27203 -45.14621, 167.66760 -45.1... False val \n", 159 | "1 POLYGON ((106.16716 39.74440, 107.44788 39.724... False train \n", 160 | "2 POLYGON ((-14.09034 8.95716, -14.09034 9.94593... False train \n", 161 | "3 POLYGON ((50.99980 25.31673, 52.09064 25.31270... False test \n", 162 | "4 POLYGON ((32.02016 23.50747, 33.09560 23.51053... False test \n", 163 | "\n", 164 | " datetime landcover \\\n", 165 | "0 2020-01-01T00:00:00Z Tree cover, broadleaved, evergreen, closed to ... \n", 166 | "1 2020-01-01T00:00:00Z Grassland \n", 167 | "2 2020-01-01T00:00:00Z Water bodies \n", 168 | "3 2020-01-01T00:00:00Z Water bodies \n", 169 | "4 2020-01-01T00:00:00Z Bare areas \n", 170 | "\n", 171 | " s1_srcids \\\n", 172 | "0 [S1B_IW_GRDH_1SDV_20200206T075413_20200206T075... \n", 173 | "1 [S1A_IW_GRDH_1SDV_20201108T104645_20201108T104... \n", 174 | "2 [S1A_IW_GRDH_1SDV_20190422T190812_20190422T190... \n", 175 | "3 [S1A_IW_GRDH_1SDV_20200410T023157_20200410T023... \n", 176 | "4 [S1B_IW_GRDH_1SDV_20201124T155420_20201124T155... \n", 177 | "\n", 178 | " s2_srcids \\\n", 179 | "0 [S2B_MSIL1C_20200209T224759_N0209_R115_T58GFQ_... \n", 180 | "1 [S2A_MSIL1C_20201107T033941_N0209_R061_T48SXJ_... \n", 181 | "2 [S2A_MSIL1C_20190422T110621_N0207_R137_T28PFR_... \n", 182 | "3 [S2A_MSIL1C_20200414T070621_N0209_R106_T39RWH_... \n", 183 | "4 [S2B_MSIL1C_20201126T082259_N0209_R121_T36QVL_... \n", 184 | "\n", 185 | " copdem30_slope copdem30_elevation date_s1 \\\n", 186 | "0 sentinel12_copdem30_93_slope sentinel12_copdem30_93_elevation 20200206 \n", 187 | "1 sentinel12_copdem30_91_slope sentinel12_copdem30_91_elevation 20201108 \n", 188 | "2 sentinel12_copdem30_9_slope sentinel12_copdem30_9_elevation 20190422 \n", 189 | "3 sentinel12_copdem30_89_slope sentinel12_copdem30_89_elevation 20200410 \n", 190 | "4 sentinel12_copdem30_88_slope sentinel12_copdem30_88_elevation 20201124 \n", 191 | "\n", 192 | " date_s2 \n", 193 | "0 20200209 \n", 194 | "1 20201107 \n", 195 | "2 20190422 \n", 196 | "3 20200414 \n", 197 | "4 20201126 " 198 | ] 199 | }, 200 | "execution_count": 3, 201 | "metadata": {}, 202 | "output_type": "execute_result" 203 | } 204 | ], 205 | "source": [ 206 | "# get all items and convert them to geopandas dataframe\n", 207 | "items = [item.to_dict() for item in catalog.get_all_items()]\n", 208 | "gdf = gpd.GeoDataFrame.from_features(features=items, crs=3857).to_crs(4326)\n", 209 | "gdf.head()" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 4, 215 | "id": "70ccf360", 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/html": [ 221 | "
Make this Notebook Trusted to load map: File -> Trust Notebook
" 366 | ], 367 | "text/plain": [ 368 | "" 369 | ] 370 | }, 371 | "execution_count": 4, 372 | "metadata": {}, 373 | "output_type": "execute_result" 374 | } 375 | ], 376 | "source": [ 377 | "# visualize samples on map (train, val, test)\n", 378 | "m = folium.Map(tiles=\"Stamen Terrain\")\n", 379 | "\n", 380 | "popup = folium.GeoJsonPopup(\n", 381 | " fields=[\"split\", \"landcover\"],\n", 382 | " aliases=[\"split\", \"landcover\"],\n", 383 | " sticky=True,\n", 384 | " labels=True,\n", 385 | " style=\"background-color: #F0EFEF; border: 1px solid black; border-radius: 3px; box-shadow: 3px;\",\n", 386 | ")\n", 387 | "\n", 388 | "\n", 389 | "def style_function(feature):\n", 390 | " return {\n", 391 | " \"fillOpacity\": 0.9,\n", 392 | " \"weight\": 0,\n", 393 | " \"fillColor\": \"#00ff00\"\n", 394 | " if feature[\"properties\"][\"split\"] == \"train\"\n", 395 | " else \"#0000ff\"\n", 396 | " if feature[\"properties\"][\"split\"] == \"val\"\n", 397 | " else \"#ff0000\",\n", 398 | " }\n", 399 | "\n", 400 | "\n", 401 | "folium.GeoJson(gdf.to_json(), popup=popup, style_function=style_function).add_to(m)\n", 402 | "m.fit_bounds(m.get_bounds())\n", 403 | "m\n" 404 | ] 405 | } 406 | ], 407 | "metadata": { 408 | "kernelspec": { 409 | "display_name": "Python 3 (ipykernel)", 410 | "language": "python", 411 | "name": "python3" 412 | }, 413 | "language_info": { 414 | "codemirror_mode": { 415 | "name": "ipython", 416 | "version": 3 417 | }, 418 | "file_extension": ".py", 419 | "mimetype": "text/x-python", 420 | "name": "python", 421 | "nbconvert_exporter": "python", 422 | "pygments_lexer": "ipython3", 423 | "version": "3.11.5" 424 | } 425 | }, 426 | "nbformat": 4, 427 | "nbformat_minor": 5 428 | } 429 | -------------------------------------------------------------------------------- /prepare/__init__.py: -------------------------------------------------------------------------------- 1 | version = "1.0.1" 2 | -------------------------------------------------------------------------------- /prepare/preproc_s1_snap.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 1.0 5 | 6 | Read 7 | 8 | 9 | $(inname) 10 | 11 | 12 | 13 | Apply-Orbit-File 14 | 15 | 16 | 17 | 18 | Sentinel Precise (Auto Download) 19 | 3 20 | false 21 | 22 | 23 | 24 | ThermalNoiseRemoval 25 | 26 | 27 | 28 | 29 | VV,VH 30 | true 31 | false 32 | 33 | 34 | 35 | Calibration 36 | 37 | 38 | 39 | 40 | 41 | Product Auxiliary File 42 | 43 | false 44 | false 45 | false 46 | false 47 | VV,VH 48 | true 49 | false 50 | false 51 | 52 | 53 | 54 | Speckle-Filter 55 | 56 | 57 | 58 | 59 | Sigma0_VV,Sigma0_VH 60 | Median 61 | 3 62 | 3 63 | 2 64 | true 65 | 1.0 66 | 1 67 | 5x5 68 | 3x3 69 | 0.9 70 | 50 71 | 72 | 73 | 74 | Terrain-Correction 75 | 76 | 77 | 78 | 79 | 80 | External DEM 81 | S1A_IW_GRDH_1SDV_20221231T174827_20221231T174852_046579_059500/tmp/dem.tif 82 | -3.4028235e+38 83 | true 84 | BILINEAR_INTERPOLATION 85 | BILINEAR_INTERPOLATION 86 | 10.0 87 | 8.983152841195215e-05 88 | GEOGCS["WGS84(DD)", 89 | 90 | DATUM["WGS84", 91 | 92 | SPHEROID["WGS84", 6378137.0, 298.257223563]], 93 | 94 | PRIMEM["Greenwich", 0.0], 95 | 96 | UNIT["degree", 0.017453292519943295], 97 | 98 | AXIS["Geodetic longitude", EAST], 99 | 100 | AXIS["Geodetic latitude", NORTH]] 101 | 102 | false 103 | 0.0 104 | 0.0 105 | false 106 | false 107 | false 108 | false 109 | false 110 | false 111 | true 112 | false 113 | false 114 | false 115 | false 116 | Use projected local incidence angle from DEM 117 | Use projected local incidence angle from DEM 118 | Latest Auxiliary File 119 | 120 | 121 | 122 | 123 | LinearToFromdB 124 | 125 | 126 | 127 | 128 | Sigma0_VV,Sigma0_VH 129 | 130 | 131 | 132 | Reproject 133 | 134 | 135 | 136 | 137 | 138 | AUTO:42001 139 | Nearest 140 | 10.0 141 | 10.0 142 | false 143 | 144 | 145 | 146 | Write 147 | 148 | 149 | 150 | 151 | $(outname) 152 | BEAM-DIMAP 153 | 154 | 155 | -------------------------------------------------------------------------------- /prepare/split.py: -------------------------------------------------------------------------------- 1 | import json 2 | import logging 3 | import numpy as np 4 | import sys 5 | import tifffile as tiff 6 | import tqdm 7 | 8 | from pathlib import Path 9 | from prepare.utils import scale_min_max, tile_array 10 | from pystac_client import Client 11 | from ukis_pysat.raster import Image 12 | 13 | 14 | def run(data_dir, out_dir, sensor="s1", tile_shape=(256, 256), img_bands_idx=[0, 1], slope=False, exclude_nodata=False): 15 | logging.info("Splitting training samples") 16 | 17 | if Path(Path(data_dir) / "catalog.json").is_file: 18 | catalog = Client.open(Path(data_dir) / "catalog.json") 19 | else: 20 | raise NotImplementedError("Cannot find catalog.json file in data_dir") 21 | 22 | if sensor == "s1": 23 | scale_min, scale_max = 0, 100.0 24 | elif sensor == "s2": 25 | scale_min, scale_max = 0, 10000.0 26 | else: 27 | raise NotImplementedError(f"Sensor {str(sensor)} not supported ['s1', 's2']") 28 | 29 | Path(out_dir).mkdir(parents=True, exist_ok=True) 30 | Path(Path(out_dir) / "train/img").mkdir(parents=True, exist_ok=True) 31 | Path(Path(out_dir) / "train/msk").mkdir(parents=True, exist_ok=True) 32 | Path(Path(out_dir) / "test/img").mkdir(parents=True, exist_ok=True) 33 | Path(Path(out_dir) / "test/msk").mkdir(parents=True, exist_ok=True) 34 | Path(Path(out_dir) / "val/img").mkdir(parents=True, exist_ok=True) 35 | Path(Path(out_dir) / "val/msk").mkdir(parents=True, exist_ok=True) 36 | 37 | items = [item.to_dict() for item in catalog.get_all_items()] 38 | sys.stdout.flush() 39 | for i, item in tqdm.tqdm(enumerate(items), total=len(items)): 40 | split = item["properties"]["split"] 41 | subdir = Path(item["assets"][f"{sensor}_img"]["href"]).parent.name 42 | msk_file = Path(data_dir) / Path(subdir) / Path(item["assets"][f"{sensor}_msk"]["href"]).name 43 | valid_file = Path(data_dir) / Path(subdir) / Path(item["assets"][f"{sensor}_valid"]["href"]).name 44 | slope_file = Path(data_dir) / Path(subdir) / Path(item["assets"]["copdem30_slope"]["href"]).name 45 | img_file = Path(data_dir) / Path(subdir) / Path(item["assets"][f"{sensor}_img"]["href"]).name 46 | 47 | msk = Image(data=msk_file, dimorder="last") 48 | valid = Image(data=valid_file, dimorder="last") 49 | slope = Image(data=slope_file, dimorder="last") if slope else None 50 | img = Image(data=img_file, dimorder="last") 51 | img_scaled = scale_min_max(img.arr[:, :, img_bands_idx], min=scale_min, max=scale_max) 52 | 53 | if slope: 54 | slope.warp(resampling_method=2, dst_crs=img.dataset.crs, target_align=img) 55 | img_scaled = np.append(img_scaled, slope.arr, axis=2) 56 | 57 | img_tiles = tile_array(img_scaled, xsize=tile_shape[0], ysize=tile_shape[1], overlap=0.0, padding=False) 58 | msk_tiles = tile_array(msk.arr, xsize=tile_shape[0], ysize=tile_shape[1], overlap=0.0, padding=False) 59 | valid_tiles = ( 60 | tile_array(valid.arr, xsize=tile_shape[0], ysize=tile_shape[1], overlap=0.0, padding=False) 61 | if exclude_nodata 62 | else None 63 | ) 64 | 65 | for j in range(len(img_tiles)): 66 | if exclude_nodata: 67 | if 0 in valid_tiles[j, :, :, :]: 68 | continue 69 | tiff.imsave( 70 | Path(out_dir) / f"{split}/img/{Path(img_file).stem}_{j}.tif", 71 | img_tiles[j, :, :, :], 72 | planarconfig="contig", 73 | ) 74 | tiff.imsave(Path(out_dir) / f"{split}/msk/{Path(msk_file).stem}_{j}.tif", msk_tiles[j, :, :, :]) 75 | -------------------------------------------------------------------------------- /prepare/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def scale_min_max(array, min=0, max=10000): 5 | bands = [] 6 | for i in range(array.shape[2]): 7 | bands.append(array[:, :, i].astype(np.float32) / max) 8 | return np.dstack(bands) 9 | 10 | 11 | def rolling_window(array, window=(0,), asteps=None, wsteps=None, axes=None, toend=True): 12 | array = np.asarray(array) 13 | orig_shape = np.asarray(array.shape) 14 | window = np.atleast_1d(window).astype(int) 15 | 16 | if axes is not None: 17 | axes = np.atleast_1d(axes) 18 | w = np.zeros(array.ndim, dtype=int) 19 | for axis, size in zip(axes, window): 20 | w[axis] = size 21 | window = w 22 | 23 | # Check if window is legal: 24 | if window.ndim > 1: 25 | raise ValueError("`window` must be one-dimensional.") 26 | if np.any(window < 0): 27 | raise ValueError("All elements of `window` must be larger then 1.") 28 | if len(array.shape) < len(window): 29 | raise ValueError("`window` length must be less or equal `array` dimension.") 30 | 31 | _asteps = np.ones_like(orig_shape) 32 | if asteps is not None: 33 | asteps = np.atleast_1d(asteps) 34 | if asteps.ndim != 1: 35 | raise ValueError("`asteps` must be either a scalar or one dimensional.") 36 | if len(asteps) > array.ndim: 37 | raise ValueError("`asteps` cannot be longer then the `array` dimension.") 38 | # does not enforce alignment, so that steps can be same as window too. 39 | _asteps[-len(asteps) :] = asteps 40 | 41 | if np.any(asteps < 1): 42 | raise ValueError("All elements of `asteps` must be larger then 1.") 43 | asteps = _asteps 44 | 45 | _wsteps = np.ones_like(window) 46 | if wsteps is not None: 47 | wsteps = np.atleast_1d(wsteps) 48 | if wsteps.shape != window.shape: 49 | raise ValueError("`wsteps` must have the same shape as `window`.") 50 | if np.any(wsteps < 0): 51 | raise ValueError("All elements of `wsteps` must be larger then 0.") 52 | 53 | _wsteps[:] = wsteps 54 | _wsteps[window == 0] = 1 55 | wsteps = _wsteps 56 | 57 | # Check that the window would not be larger then the original: 58 | if np.any(orig_shape[-len(window) :] < window * wsteps): 59 | raise ValueError("`window` * `wsteps` larger then `array` in at least one dimension.") 60 | 61 | new_shape = orig_shape 62 | 63 | # For calculating the new shape 0s must act like 1s: 64 | _window = window.copy() 65 | _window[_window == 0] = 1 66 | 67 | new_shape[-len(window) :] += wsteps - _window * wsteps 68 | new_shape = (new_shape + asteps - 1) // asteps 69 | # make sure the new_shape is at least 1 in any "old" dimension (ie. steps 70 | # is (too) large, but we do not care. 71 | new_shape[new_shape < 1] = 1 72 | shape = new_shape 73 | 74 | strides = np.asarray(array.strides) 75 | strides *= asteps 76 | new_strides = array.strides[-len(window) :] * wsteps 77 | 78 | # The full new shape and strides: 79 | if toend: 80 | new_shape = np.concatenate((shape, window)) 81 | new_strides = np.concatenate((strides, new_strides)) 82 | else: 83 | _ = np.zeros_like(shape) 84 | _[-len(window) :] = window 85 | _window = _.copy() 86 | _[-len(window) :] = new_strides 87 | _new_strides = _ 88 | 89 | new_shape = np.zeros(len(shape) * 2, dtype=int) 90 | new_strides = np.zeros(len(shape) * 2, dtype=int) 91 | 92 | new_shape[::2] = shape 93 | new_strides[::2] = strides 94 | new_shape[1::2] = _window 95 | new_strides[1::2] = _new_strides 96 | 97 | new_strides = new_strides[new_shape != 0] 98 | new_shape = new_shape[new_shape != 0] 99 | 100 | return np.lib.stride_tricks.as_strided(array, shape=new_shape, strides=new_strides) 101 | 102 | 103 | def tile_array(array, xsize=512, ysize=512, overlap=0.1, padding=True): 104 | # get dtype, rows, cols, bands and dtype from first file 105 | dtype = array.dtype 106 | rows = array.shape[0] 107 | cols = array.shape[1] 108 | if array.ndim == 2: 109 | array = np.expand_dims(array, 2) 110 | bands = array.shape[2] 111 | 112 | # get steps 113 | xsteps = int(xsize - (xsize * overlap)) 114 | ysteps = int(ysize - (ysize * overlap)) 115 | 116 | if padding is True: 117 | # pad array on all sides to fit all tiles. 118 | # replicate values here instead of filling with nan. 119 | # nan padding would cause issues for standardization and classification later on. 120 | ypad = ysize + 1 121 | xpad = xsize + 1 122 | array = np.pad( 123 | array, 124 | ( 125 | (int(ysize * overlap), ypad + int(ysize * overlap)), 126 | (int(xsize * overlap), xpad + int(xsize * overlap)), 127 | (0, 0), 128 | ), 129 | mode="symmetric", 130 | ) 131 | 132 | # tile the data into overlapping patches 133 | # this skips any tile at the end of row and col that exceeds the shape of the input array 134 | # therefore padding the input array is needed beforehand 135 | X_ = rolling_window(array, (xsize, ysize, bands), asteps=(xsteps, ysteps, bands)) 136 | 137 | # access single tiles and write them to file and/or to ndarray of shape (tiles, rows, cols, bands) 138 | X = [] 139 | for i in range(X_.shape[0]): 140 | for j in range(X_.shape[1]): 141 | X.append(X_[i, j, 0, :, :, :]) 142 | 143 | return np.asarray(X, dtype=dtype) 144 | -------------------------------------------------------------------------------- /s1s2_water.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import toml 3 | 4 | from pathlib import Path 5 | from prepare.split import run as run_split 6 | 7 | 8 | parser = argparse.ArgumentParser( 9 | description="Prepare s1s2_water images and masks for train, validation and test", 10 | formatter_class=argparse.RawTextHelpFormatter, 11 | ) 12 | parser.add_argument("--settings", help=f"Path to TOML file with settings", required=True) 13 | args = parser.parse_args() 14 | 15 | if Path(args.settings).is_file(): 16 | with open(args.settings) as f: 17 | settings = toml.load(f) 18 | else: 19 | raise Exception("Cannot find settings file.") 20 | 21 | run_split( 22 | data_dir=settings["DATA_DIR"], 23 | out_dir=settings["OUT_DIR"], 24 | sensor=settings["SENSOR"], 25 | tile_shape=settings["TILE_SHAPE"], 26 | img_bands_idx=settings["IMG_BANDS_IDX"], 27 | slope=settings["SLOPE"], 28 | exclude_nodata=settings["EXCLUDE_NODATA"], 29 | ) 30 | -------------------------------------------------------------------------------- /settings.toml: -------------------------------------------------------------------------------- 1 | SENSOR = "s2" # prepare Sentinel-1 or Sentinel-2 data ["s1", "s2"] 2 | TILE_SHAPE = [256, 256] # desired tile shape 3 | IMG_BANDS_IDX = [0, 1, 2, 3, 4, 5] # desired image band combination 4 | SLOPE = true # append slope band to image bands 5 | EXCLUDE_NODATA = true # exclude tiles with nodata values 6 | DATA_DIR = "/path/to/data_directory" # data directory that holds the original images 7 | OUT_DIR = "/path/to/output_directory" # output directory that stores the prepared train, val and test tiles 8 | 9 | # Sentinel-1 image bands 10 | # {"VV": 0, "VH": 1} 11 | 12 | # Sentinel-2 image bands 13 | # {"Blue": 0, "Green": 1, "Red": 2, "NIR": 3, "SWIR1": 4, "SWIR2": 5} --------------------------------------------------------------------------------