├── .gitignore
├── README.md
├── data
└── catalog.json
├── environment.yaml
├── explore_dataset.ipynb
├── prepare
├── __init__.py
├── preproc_s1_snap.xml
├── split.py
└── utils.py
├── s1s2_water.py
└── settings.toml
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | data/*
3 | tiles/*
4 | !data/catalog.json
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images
2 | This repository provides tools to work with the [S1S2-Water dataset](https://zenodo.org/records/11278238).
3 |
4 | [S1S2-Water dataset](https://zenodo.org/records/11278238) is a global reference dataset for training, validation and testing of convolutional neural networks for semantic segmentation of surface water bodies in publicly available Sentinel-1 and Sentinel-2 satellite images. The dataset consists of 65 triplets of Sentinel-1 and Sentinel-2 images with quality checked binary water mask. Samples are drawn globally on the basis of the Sentinel-2 tile-grid (100 x 100 km) under consideration of pre-dominant landcover and availability of water bodies. Each sample is complemented with STAC-compliant metadata and Digital Elevation Model (DEM) raster from the Copernicus DEM.
5 |
6 | The following pre-print article describes the dataset:
7 |
8 | > Wieland, M., Fichtner, F., Martinis, S., Groth, S., Krullikowski, C., Plank, S., Motagh, M. (2023). S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images. *IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing*, doi: [10.1109/JSTARS.2023.3333969](https://dx.doi.org/10.1109/JSTARS.2023.3333969).
9 |
10 | ## Dataset version update (2024-05-24)
11 | The dataset on Zenodo has been updated to a new version. Sentinel-1 scenes for samples #31 and #59 have been missing in [v1.0.0](https://zenodo.org/records/8314175) and are now included in [v1.0.1](https://zenodo.org/records/11278238) along with all relevant masks and metadata.
12 |
13 | ## Dataset access
14 | The dataset (~170 GB) is available for download at: https://zenodo.org/records/11278238 (v1.0.1)
15 |
16 | Download the dataset parts and extract them into a single data directory as follows.
17 |
18 | ```
19 | .
20 | └── data/
21 | ├── 1/
22 | │ ├── sentinel12_copdem30_1_elevation.tif
23 | │ ├── sentinel12_copdem30_1_slope.tif
24 | │ ├── sentinel12_s1_1_img.tif
25 | │ ├── sentinel12_s1_1_msk.tif
26 | │ ├── sentinel12_s1_1_valid.tif
27 | │ ├── sentinel12_s2_1_img.tif
28 | │ ├── sentinel12_s2_1_msk.tif
29 | │ ├── sentinel12_s2_1_valid.tif
30 | │ └── sentinel12_1_meta.json
31 | ├── 5/
32 | │ ├── sentinel12_copdem30_5_elevation.tif
33 | │ ├── sentinel12_copdem30_5_slope.tif
34 | │ ├── sentinel12_s1_5_img.tif
35 | │ ├── sentinel12_s1_5_msk.tif
36 | │ ├── sentinel12_s1_5_valid.tif
37 | │ ├── sentinel12_s2_5_img.tif
38 | │ ├── sentinel12_s2_5_msk.tif
39 | │ ├── sentinel12_s2_5_valid.tif
40 | │ └── sentinel12_5_meta.json
41 | ├── .../
42 | │ └── ...
43 | └── catalog.json
44 | ```
45 |
46 | ## Dataset information
47 | Each file follows the naming scheme sentinel12_SENSOR_ID_LAYER.tif (e.g. `sentinel12_s1_5_img.tif`). Raster layers are stored as Cloud Optimized GeoTIFF (COG) and are projected to Universal Transverse Mercator (UTM).
48 |
49 | | Sensor | Layer |Description | Values | Format | Bands |
50 | | - | - | - | - | - | - |
51 | | S1 | IMG | Sentinel-1 image
GRD product | Unit: dB (scaled by factor 100) | GeoTIFF
10980 x 10980 px
2 bands
Int16 | 0: VV
1: VH
52 | | S2 | IMG | Sentinel-2 image
L1C product | Unit: TOA reflectance (scaled by factor 10000) | GeoTIFF
10980 x 10980 px
6 bands
UInt16 | 0: Blue
1: Green
2: Red
3: NIR
4: SWIR1
5: SWIR2
53 | | S1 / S2 | MSK | Annotation mask
Hand-labelled water mask | 0: No Water
1: Water | GeoTIFF
10980 x 10980 px
1 band
UInt8 | 0: Water mask
54 | | S1 / S2 | VALID | Valid pixel mask
Hand-labelled valid pixel mask | 0: Invalid (cloud, cloud-shadow, nodata)
1: Valid | GeoTIFF
10980 x 10980 px
1 band
UInt8 | 0: Valid mask
55 | | COPDEM30 | ELEVATION | Copernicus DEM elevation | Unit: Meters | GeoTIFF
3660 x 3660 px
1 band
Int16 | 0: Elevation
56 | | COPDEM30 | SLOPE | Copernicus DEM slope | Unit: Degrees | GeoTIFF
3660 x 3660 px
1 band
Int16 | 0: Slope
57 | | N.a. | META | METADATA | STAC metadata item | JSON | N.a.
58 |
59 | ## Data preparation
60 | Make sure to download the dataset as described above. Clone this repository, adjust [settings.toml](settings.toml) and run [s1s2_water.py](s1s2_water.py) to prepare the dataset according to your desired settings.
61 |
62 | The following splits images and masks for a specific sensor (Sentinel-1 or Sentinel-2) into training, validation and testing tiles with predefined shape and band combination. Slope information can be appended to the image band stack if required.
63 |
64 | ```python
65 | $ python s1s2_water.py --settings settings.toml
66 | ```
67 |
68 | Data preparation parameters are defined in a [settings TOML file](settings.toml) (**--settings**)
69 |
70 | ```toml
71 | SENSOR = "s2" # prepare Sentinel-1 or Sentinel-2 data ["s1", "s2"]
72 | TILE_SHAPE = [256, 256] # desired tile shape in pixel
73 | IMG_BANDS_IDX = [0, 1, 2, 3, 4, 5] # desired image band combination
74 | SLOPE = true # append slope band to image bands
75 | EXCLUDE_NODATA = true # exclude tiles with nodata values
76 | DATA_DIR = "/path/to/data_directory" # data directory that holds the original images
77 | OUT_DIR = "/path/to/output_directory" # output directory that stores the prepared train, val and test tiles
78 |
79 | # Sentinel-1 image bands
80 | # {"VV": 0, "VH": 1}
81 |
82 | # Sentinel-2 image bands
83 | # {"Blue": 0, "Green": 1, "Red": 2, "NIR": 3, "SWIR1": 4, "SWIR2": 5}
84 | ```
85 |
86 | Information on the deployed preprocessing steps for Sentinel-1 imagery can be found in the [SNAP GPT file](prepare/preproc_s1_snap.xml).
87 |
88 | ## Installation
89 | ```shell
90 | $ conda env create --file environment.yaml
91 | $ conda activate s1s2_water
92 | ```
93 |
--------------------------------------------------------------------------------
/data/catalog.json:
--------------------------------------------------------------------------------
1 | {
2 | "id": "s1s2_water",
3 | "type": "Catalog",
4 | "title": "s1s2_water",
5 | "stac_version": "1.0.0",
6 | "description": "S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images",
7 | "links": [
8 | {
9 | "rel": "root",
10 | "href": "./catalog.json",
11 | "type": "application/json",
12 | "title": "catalog"
13 | },
14 | {
15 | "rel": "self",
16 | "href": "./catalog.json",
17 | "type": "application/json"
18 | },
19 | {
20 | "rel": "item",
21 | "href": "./93/sentinel12_93_meta.json",
22 | "type": "application/json",
23 | "title": "metadata"
24 | },
25 | {
26 | "rel": "item",
27 | "href": "./91/sentinel12_91_meta.json",
28 | "type": "application/json",
29 | "title": "metadata"
30 | },
31 | {
32 | "rel": "item",
33 | "href": "./9/sentinel12_9_meta.json",
34 | "type": "application/json",
35 | "title": "metadata"
36 | },
37 | {
38 | "rel": "item",
39 | "href": "./89/sentinel12_89_meta.json",
40 | "type": "application/json",
41 | "title": "metadata"
42 | },
43 | {
44 | "rel": "item",
45 | "href": "./88/sentinel12_88_meta.json",
46 | "type": "application/json",
47 | "title": "metadata"
48 | },
49 | {
50 | "rel": "item",
51 | "href": "./87/sentinel12_87_meta.json",
52 | "type": "application/json",
53 | "title": "metadata"
54 | },
55 | {
56 | "rel": "item",
57 | "href": "./86/sentinel12_86_meta.json",
58 | "type": "application/json",
59 | "title": "metadata"
60 | },
61 | {
62 | "rel": "item",
63 | "href": "./85/sentinel12_85_meta.json",
64 | "type": "application/json",
65 | "title": "metadata"
66 | },
67 | {
68 | "rel": "item",
69 | "href": "./84/sentinel12_84_meta.json",
70 | "type": "application/json",
71 | "title": "metadata"
72 | },
73 | {
74 | "rel": "item",
75 | "href": "./83/sentinel12_83_meta.json",
76 | "type": "application/json",
77 | "title": "metadata"
78 | },
79 | {
80 | "rel": "item",
81 | "href": "./82/sentinel12_82_meta.json",
82 | "type": "application/json",
83 | "title": "metadata"
84 | },
85 | {
86 | "rel": "item",
87 | "href": "./80/sentinel12_80_meta.json",
88 | "type": "application/json",
89 | "title": "metadata"
90 | },
91 | {
92 | "rel": "item",
93 | "href": "./8/sentinel12_8_meta.json",
94 | "type": "application/json",
95 | "title": "metadata"
96 | },
97 | {
98 | "rel": "item",
99 | "href": "./78/sentinel12_78_meta.json",
100 | "type": "application/json",
101 | "title": "metadata"
102 | },
103 | {
104 | "rel": "item",
105 | "href": "./77/sentinel12_77_meta.json",
106 | "type": "application/json",
107 | "title": "metadata"
108 | },
109 | {
110 | "rel": "item",
111 | "href": "./76/sentinel12_76_meta.json",
112 | "type": "application/json",
113 | "title": "metadata"
114 | },
115 | {
116 | "rel": "item",
117 | "href": "./75/sentinel12_75_meta.json",
118 | "type": "application/json",
119 | "title": "metadata"
120 | },
121 | {
122 | "rel": "item",
123 | "href": "./73/sentinel12_73_meta.json",
124 | "type": "application/json",
125 | "title": "metadata"
126 | },
127 | {
128 | "rel": "item",
129 | "href": "./71/sentinel12_71_meta.json",
130 | "type": "application/json",
131 | "title": "metadata"
132 | },
133 | {
134 | "rel": "item",
135 | "href": "./7/sentinel12_7_meta.json",
136 | "type": "application/json",
137 | "title": "metadata"
138 | },
139 | {
140 | "rel": "item",
141 | "href": "./69/sentinel12_69_meta.json",
142 | "type": "application/json",
143 | "title": "metadata"
144 | },
145 | {
146 | "rel": "item",
147 | "href": "./68/sentinel12_68_meta.json",
148 | "type": "application/json",
149 | "title": "metadata"
150 | },
151 | {
152 | "rel": "item",
153 | "href": "./67/sentinel12_67_meta.json",
154 | "type": "application/json",
155 | "title": "metadata"
156 | },
157 | {
158 | "rel": "item",
159 | "href": "./65/sentinel12_65_meta.json",
160 | "type": "application/json",
161 | "title": "metadata"
162 | },
163 | {
164 | "rel": "item",
165 | "href": "./64/sentinel12_64_meta.json",
166 | "type": "application/json",
167 | "title": "metadata"
168 | },
169 | {
170 | "rel": "item",
171 | "href": "./62/sentinel12_62_meta.json",
172 | "type": "application/json",
173 | "title": "metadata"
174 | },
175 | {
176 | "rel": "item",
177 | "href": "./6/sentinel12_6_meta.json",
178 | "type": "application/json",
179 | "title": "metadata"
180 | },
181 | {
182 | "rel": "item",
183 | "href": "./59/sentinel12_59_meta.json",
184 | "type": "application/json",
185 | "title": "metadata"
186 | },
187 | {
188 | "rel": "item",
189 | "href": "./57/sentinel12_57_meta.json",
190 | "type": "application/json",
191 | "title": "metadata"
192 | },
193 | {
194 | "rel": "item",
195 | "href": "./55/sentinel12_55_meta.json",
196 | "type": "application/json",
197 | "title": "metadata"
198 | },
199 | {
200 | "rel": "item",
201 | "href": "./54/sentinel12_54_meta.json",
202 | "type": "application/json",
203 | "title": "metadata"
204 | },
205 | {
206 | "rel": "item",
207 | "href": "./53/sentinel12_53_meta.json",
208 | "type": "application/json",
209 | "title": "metadata"
210 | },
211 | {
212 | "rel": "item",
213 | "href": "./52/sentinel12_52_meta.json",
214 | "type": "application/json",
215 | "title": "metadata"
216 | },
217 | {
218 | "rel": "item",
219 | "href": "./51/sentinel12_51_meta.json",
220 | "type": "application/json",
221 | "title": "metadata"
222 | },
223 | {
224 | "rel": "item",
225 | "href": "./5/sentinel12_5_meta.json",
226 | "type": "application/json",
227 | "title": "metadata"
228 | },
229 | {
230 | "rel": "item",
231 | "href": "./49/sentinel12_49_meta.json",
232 | "type": "application/json",
233 | "title": "metadata"
234 | },
235 | {
236 | "rel": "item",
237 | "href": "./48/sentinel12_48_meta.json",
238 | "type": "application/json",
239 | "title": "metadata"
240 | },
241 | {
242 | "rel": "item",
243 | "href": "./47/sentinel12_47_meta.json",
244 | "type": "application/json",
245 | "title": "metadata"
246 | },
247 | {
248 | "rel": "item",
249 | "href": "./41/sentinel12_41_meta.json",
250 | "type": "application/json",
251 | "title": "metadata"
252 | },
253 | {
254 | "rel": "item",
255 | "href": "./40/sentinel12_40_meta.json",
256 | "type": "application/json",
257 | "title": "metadata"
258 | },
259 | {
260 | "rel": "item",
261 | "href": "./39/sentinel12_39_meta.json",
262 | "type": "application/json",
263 | "title": "metadata"
264 | },
265 | {
266 | "rel": "item",
267 | "href": "./37/sentinel12_37_meta.json",
268 | "type": "application/json",
269 | "title": "metadata"
270 | },
271 | {
272 | "rel": "item",
273 | "href": "./36/sentinel12_36_meta.json",
274 | "type": "application/json",
275 | "title": "metadata"
276 | },
277 | {
278 | "rel": "item",
279 | "href": "./35/sentinel12_35_meta.json",
280 | "type": "application/json",
281 | "title": "metadata"
282 | },
283 | {
284 | "rel": "item",
285 | "href": "./33/sentinel12_33_meta.json",
286 | "type": "application/json",
287 | "title": "metadata"
288 | },
289 | {
290 | "rel": "item",
291 | "href": "./32/sentinel12_32_meta.json",
292 | "type": "application/json",
293 | "title": "metadata"
294 | },
295 | {
296 | "rel": "item",
297 | "href": "./31/sentinel12_31_meta.json",
298 | "type": "application/json",
299 | "title": "metadata"
300 | },
301 | {
302 | "rel": "item",
303 | "href": "./30/sentinel12_30_meta.json",
304 | "type": "application/json",
305 | "title": "metadata"
306 | },
307 | {
308 | "rel": "item",
309 | "href": "./29/sentinel12_29_meta.json",
310 | "type": "application/json",
311 | "title": "metadata"
312 | },
313 | {
314 | "rel": "item",
315 | "href": "./28/sentinel12_28_meta.json",
316 | "type": "application/json",
317 | "title": "metadata"
318 | },
319 | {
320 | "rel": "item",
321 | "href": "./27/sentinel12_27_meta.json",
322 | "type": "application/json",
323 | "title": "metadata"
324 | },
325 | {
326 | "rel": "item",
327 | "href": "./26/sentinel12_26_meta.json",
328 | "type": "application/json",
329 | "title": "metadata"
330 | },
331 | {
332 | "rel": "item",
333 | "href": "./25/sentinel12_25_meta.json",
334 | "type": "application/json",
335 | "title": "metadata"
336 | },
337 | {
338 | "rel": "item",
339 | "href": "./23/sentinel12_23_meta.json",
340 | "type": "application/json",
341 | "title": "metadata"
342 | },
343 | {
344 | "rel": "item",
345 | "href": "./22/sentinel12_22_meta.json",
346 | "type": "application/json",
347 | "title": "metadata"
348 | },
349 | {
350 | "rel": "item",
351 | "href": "./21/sentinel12_21_meta.json",
352 | "type": "application/json",
353 | "title": "metadata"
354 | },
355 | {
356 | "rel": "item",
357 | "href": "./19/sentinel12_19_meta.json",
358 | "type": "application/json",
359 | "title": "metadata"
360 | },
361 | {
362 | "rel": "item",
363 | "href": "./17/sentinel12_17_meta.json",
364 | "type": "application/json",
365 | "title": "metadata"
366 | },
367 | {
368 | "rel": "item",
369 | "href": "./16/sentinel12_16_meta.json",
370 | "type": "application/json",
371 | "title": "metadata"
372 | },
373 | {
374 | "rel": "item",
375 | "href": "./15/sentinel12_15_meta.json",
376 | "type": "application/json",
377 | "title": "metadata"
378 | },
379 | {
380 | "rel": "item",
381 | "href": "./13/sentinel12_13_meta.json",
382 | "type": "application/json",
383 | "title": "metadata"
384 | },
385 | {
386 | "rel": "item",
387 | "href": "./12/sentinel12_12_meta.json",
388 | "type": "application/json",
389 | "title": "metadata"
390 | },
391 | {
392 | "rel": "item",
393 | "href": "./11/sentinel12_11_meta.json",
394 | "type": "application/json",
395 | "title": "metadata"
396 | },
397 | {
398 | "rel": "item",
399 | "href": "./10/sentinel12_10_meta.json",
400 | "type": "application/json",
401 | "title": "metadata"
402 | },
403 | {
404 | "rel": "item",
405 | "href": "./1/sentinel12_1_meta.json",
406 | "type": "application/json",
407 | "title": "metadata"
408 | }
409 | ]
410 | }
--------------------------------------------------------------------------------
/environment.yaml:
--------------------------------------------------------------------------------
1 | name: s1s2_water
2 | dependencies:
3 | - python=3.11
4 | - rasterio>=1.3.8
5 | - pip:
6 | - ukis-pysat[raster]>=1.5.1
7 | - pydantic>=2.3.0
8 | - toml>=0.10.2
9 | - tifffile>=2023.8.30
10 | - pystac==1.8.3
11 | - pystac_client==0.6.1
12 | - geopandas>=0.13.2
13 | - folium>=0.14.0
14 | - tqdm>=4.66.1
15 |
16 |
--------------------------------------------------------------------------------
/explore_dataset.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "id": "25de2f43",
7 | "metadata": {},
8 | "outputs": [],
9 | "source": [
10 | "import folium\n",
11 | "import geopandas as gpd\n",
12 | "\n",
13 | "from pystac_client import Client"
14 | ]
15 | },
16 | {
17 | "cell_type": "code",
18 | "execution_count": 2,
19 | "id": "8308622a",
20 | "metadata": {},
21 | "outputs": [
22 | {
23 | "data": {
24 | "text/plain": [
25 | "'s1s2_water'"
26 | ]
27 | },
28 | "execution_count": 2,
29 | "metadata": {},
30 | "output_type": "execute_result"
31 | }
32 | ],
33 | "source": [
34 | "# open static STAC catalog\n",
35 | "catalog = Client.open(\"data/catalog.json\")\n",
36 | "catalog.title"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": 3,
42 | "id": "082a5d51",
43 | "metadata": {
44 | "scrolled": true
45 | },
46 | "outputs": [
47 | {
48 | "data": {
49 | "text/html": [
50 | "
\n", 68 | " | geometry | \n", 69 | "flood | \n", 70 | "split | \n", 71 | "datetime | \n", 72 | "landcover | \n", 73 | "s1_srcids | \n", 74 | "s2_srcids | \n", 75 | "copdem30_slope | \n", 76 | "copdem30_elevation | \n", 77 | "date_s1 | \n", 78 | "date_s2 | \n", 79 | "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", 84 | "POLYGON ((166.27203 -45.14621, 167.66760 -45.1... | \n", 85 | "False | \n", 86 | "val | \n", 87 | "2020-01-01T00:00:00Z | \n", 88 | "Tree cover, broadleaved, evergreen, closed to ... | \n", 89 | "[S1B_IW_GRDH_1SDV_20200206T075413_20200206T075... | \n", 90 | "[S2B_MSIL1C_20200209T224759_N0209_R115_T58GFQ_... | \n", 91 | "sentinel12_copdem30_93_slope | \n", 92 | "sentinel12_copdem30_93_elevation | \n", 93 | "20200206 | \n", 94 | "20200209 | \n", 95 | "
1 | \n", 98 | "POLYGON ((106.16716 39.74440, 107.44788 39.724... | \n", 99 | "False | \n", 100 | "train | \n", 101 | "2020-01-01T00:00:00Z | \n", 102 | "Grassland | \n", 103 | "[S1A_IW_GRDH_1SDV_20201108T104645_20201108T104... | \n", 104 | "[S2A_MSIL1C_20201107T033941_N0209_R061_T48SXJ_... | \n", 105 | "sentinel12_copdem30_91_slope | \n", 106 | "sentinel12_copdem30_91_elevation | \n", 107 | "20201108 | \n", 108 | "20201107 | \n", 109 | "
2 | \n", 112 | "POLYGON ((-14.09034 8.95716, -14.09034 9.94593... | \n", 113 | "False | \n", 114 | "train | \n", 115 | "2020-01-01T00:00:00Z | \n", 116 | "Water bodies | \n", 117 | "[S1A_IW_GRDH_1SDV_20190422T190812_20190422T190... | \n", 118 | "[S2A_MSIL1C_20190422T110621_N0207_R137_T28PFR_... | \n", 119 | "sentinel12_copdem30_9_slope | \n", 120 | "sentinel12_copdem30_9_elevation | \n", 121 | "20190422 | \n", 122 | "20190422 | \n", 123 | "
3 | \n", 126 | "POLYGON ((50.99980 25.31673, 52.09064 25.31270... | \n", 127 | "False | \n", 128 | "test | \n", 129 | "2020-01-01T00:00:00Z | \n", 130 | "Water bodies | \n", 131 | "[S1A_IW_GRDH_1SDV_20200410T023157_20200410T023... | \n", 132 | "[S2A_MSIL1C_20200414T070621_N0209_R106_T39RWH_... | \n", 133 | "sentinel12_copdem30_89_slope | \n", 134 | "sentinel12_copdem30_89_elevation | \n", 135 | "20200410 | \n", 136 | "20200414 | \n", 137 | "
4 | \n", 140 | "POLYGON ((32.02016 23.50747, 33.09560 23.51053... | \n", 141 | "False | \n", 142 | "test | \n", 143 | "2020-01-01T00:00:00Z | \n", 144 | "Bare areas | \n", 145 | "[S1B_IW_GRDH_1SDV_20201124T155420_20201124T155... | \n", 146 | "[S2B_MSIL1C_20201126T082259_N0209_R121_T36QVL_... | \n", 147 | "sentinel12_copdem30_88_slope | \n", 148 | "sentinel12_copdem30_88_elevation | \n", 149 | "20201124 | \n", 150 | "20201126 | \n", 151 | "