├── .gitkeep ├── 00_quickstart.ipynb ├── 10_model_diagnostics.ipynb ├── LICENSE ├── test_imports.py ├── requirements.txt ├── .gitignore ├── Makefile ├── pyproject.toml ├── CITATION.cff ├── ci.yml ├── README.md └── git_hub_repo_predictive_mapping_of_oil_spill_induced_mangrove_degradation.md /.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /00_quickstart.ipynb: -------------------------------------------------------------------------------- 1 | {} -------------------------------------------------------------------------------- /10_model_diagnostics.ipynb: -------------------------------------------------------------------------------- 1 | {} -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy ... -------------------------------------------------------------------------------- /test_imports.py: -------------------------------------------------------------------------------- 1 | def test_imports(): 2 | import pmd 3 | import pmd.indices 4 | import pmd.features 5 | import pmd.model 6 | assert True 7 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | affine 2 | geopandas 3 | rasterio 4 | rasterstats 5 | numpy 6 | pandas 7 | scikit-learn 8 | xgboost 9 | shap 10 | matplotlib 11 | pyyaml 12 | scipy 13 | joblib 14 | tqdm 15 | pyproj 16 | rtree 17 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .venv/ 2 | __pycache__/ 3 | *.egg-info/ 4 | data/raw/* 5 | !data/raw/.gitkeep 6 | data/interim/* 7 | !data/interim/.gitkeep 8 | data/processed/* 9 | !data/processed/.gitkeep 10 | figures/* 11 | !figures/.gitkeep 12 | .ipynb_checkpoints/ 13 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: format lint test 2 | 3 | format: 4 | python -m pip install ruff black || true 5 | ruff check --select I --fix . || true 6 | black . || true 7 | 8 | lint: 9 | python -m pip install ruff || true 10 | ruff check . || true 11 | 12 | test: 13 | python -m pytest -q 14 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "pmd" 3 | version = "1.0.0" 4 | description = "Predictive mapping of oil spill-induced mangrove degradation (Nigeria)" 5 | authors = [{name="Desmond R. Eteh"}, {name="U. C. Akajiaku"}] 6 | readme = "README.md" 7 | requires-python = ">=3.9" 8 | 9 | [project.scripts] 10 | pmd = "pmd.cli:app" 11 | 12 | [build-system] 13 | requires = ["setuptools", "wheel"] 14 | build-backend = "setuptools.build_meta" 15 | -------------------------------------------------------------------------------- /CITATION.cff: -------------------------------------------------------------------------------- 1 | cff-version: 1.2.0 2 | message: If you use this work, please cite it. 3 | title: Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria 4 | authors: 5 | - family-names: Eteh 6 | given-names: Desmond Rowland 7 | - family-names: Akajiaku 8 | given-names: Ugochukwu Charles 9 | - family-names: Mogo 10 | given-names: Felicia Chinwe 11 | version: "1.0.0" 12 | date-released: 2025-10-14 13 | license: MIT 14 | -------------------------------------------------------------------------------- /ci.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | on: {push: {branches: [main, master]}, pull_request: {}} 3 | jobs: 4 | build: 5 | runs-on: ubuntu-latest 6 | steps: 7 | - uses: actions/checkout@v4 8 | - uses: actions/setup-python@v5 9 | with: {python-version: '3.11'} 10 | - run: pip install -r requirements.txt 11 | - run: python -m pip install ruff black pytest 12 | - run: ruff check . 13 | - run: black --check . 14 | - run: pytest -q 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria 2 | **Remote Sensing + Machine Learning (Python + GEE)** 3 | 4 | This repository contains a reproducible workflow for mapping and **predicting oil spill‑induced mangrove degradation** 5 | in the Niger Delta (Rivers State, Nigeria) using **Sentinel‑2 / Landsat 8**, **SRTM**, **NOSDRA oil‑spill records**, 6 | and **Gradient Boosted Decision Trees (XGBoost)** with **SHAP** explainability. 7 | 8 | > Paper context and methods adapted from the project draft provided by the authors (uploaded by the user). 9 | 10 | ## Highlights 11 | - Compute **NDVI / Red‑Edge NDVI (RENDVI)** time series (2020 → 2024) 12 | - Supervised **LULC classification** (Random Forest in **Google Earth Engine**) for 2020 & 2024 13 | - **Oil spill hotspot severity** using Kernel Density Estimation (KDE) + **K‑Means** 14 | - Feature stack: ΔNDVI, ΔRENDVI, spill density, **ESI** rank, elevation class, LULC transition 15 | - Train **XGBoost** classifier; evaluate **Accuracy, Precision, Recall, F1, ROC‑AUC** 16 | - **Explain predictions** with **SHAP**; export risk probability map GeoTIFF + figures 17 | 18 | ## Repository structure 19 | ``` 20 | predictive-mangrove-degradation/ 21 | ├─ README.md 22 | ├─ LICENSE 23 | ├─ CITATION.cff 24 | ├─ pyproject.toml 25 | ├─ requirements.txt 26 | ├─ Makefile 27 | ├─ .gitignore 28 | ├─ .github/workflows/ci.yml 29 | ├─ configs/ 30 | │ └─ rivers_state_example.yaml 31 | ├─ data/ 32 | │ ├─ raw/ # put input rasters/vectors here 33 | │ ├─ interim/ # intermediate outputs 34 | │ └─ processed/ # final maps & model artifacts 35 | ├─ notebooks/ 36 | │ ├─ 00_quickstart.ipynb 37 | │ └─ 10_model_diagnostics.ipynb 38 | ├─ scripts/ 39 | │ ├─ gee_lulc_classifier.js # RF in GEE for 2020/2024 40 | │ └─ prepare_shapefile_grid.py 41 | ├─ src/pmd/ 42 | │ ├─ __init__.py 43 | │ ├─ cli.py # command line interface 44 | │ ├─ io.py # loading/saving 45 | │ ├─ indices.py # NDVI/RENDVI + delta 46 | │ ├─ geoutils.py # raster/vector helpers 47 | │ ├─ spills.py # KDE + clustering 48 | │ ├─ features.py # stack features for ML 49 | │ ├─ model.py # train/eval xgboost + shap 50 | │ ├─ visualize.py # plots & maps 51 | │ └─ esi_zones.py # ESI handling 52 | └─ tests/ 53 | └─ test_imports.py 54 | ``` 55 | 56 | ## Quick start 57 | 1. **Clone** this repo and create a Python env: 58 | ```bash 59 | uv venv && source .venv/bin/activate 60 | uv pip install -r requirements.txt 61 | ``` 62 | (Or use `pip` / `conda` as you prefer.) 63 | 64 | 2. **Data placement** (`data/raw/`): 65 | - `sentinel_2020.tif`, `sentinel_2024.tif` – atmospherically‑corrected surface reflectance (bands: B4, B5, B6, B8) 66 | - `landsat8_2020.tif`, `landsat8_2024.tif` – surface reflectance (B4, B5) 67 | - `srtm_30m.tif` – elevation 68 | - `lulc_2020.tif`, `lulc_2024.tif` – exported from **GEE** (see `scripts/gee_lulc_classifier.js`) 69 | - `spills_2023_2025.geojson` – **NOSDRA** events (point features: date, barrels, cause) 70 | - `esi.shp` – Environmental Sensitivity Index polygons (with rank field) 71 | - `rivers_state_boundary.shp` – study boundary 72 | 73 | All rasters should share **CRS = EPSG:32632** and **resolution = 30 m**. 74 | 75 | 3. **Configure** paths in `configs/rivers_state_example.yaml`. 76 | 77 | 4. **Run the pipeline** (end‑to‑end): 78 | ```bash 79 | python -m pmd.cli compute-indices --cfg configs/rivers_state_example.yaml 80 | python -m pmd.cli build-spill-features --cfg configs/rivers_state_example.yaml 81 | python -m pmd.cli stack-features --cfg configs/rivers_state_example.yaml 82 | python -m pmd.cli train-model --cfg configs/rivers_state_example.yaml 83 | python -m pmd.cli predict-map --cfg configs/rivers_state_example.yaml 84 | python -m pmd.cli explain-model --cfg configs/rivers_state_example.yaml 85 | ``` 86 | 87 | 5. **Outputs** land here: 88 | - `data/processed/ndvi_2020.tif`, `ndvi_2024.tif`, `delta_ndvi.tif` 89 | - `data/processed/rendvi_2020.tif`, `rendvi_2024.tif`, `delta_rendvi.tif` 90 | - `data/interim/spill_kde.tif`, `spill_clusters.geojson` 91 | - `data/processed/feature_stack.parquet` 92 | - `data/processed/models/xgb_model.json`, `scaler.pkl`, `metrics.json` 93 | - `data/processed/prediction_prob.tif`, `predicted_classes.tif` 94 | - `figures/roc_curve.png`, `figures/shap_summary.png`, `figures/feature_importance.png` 95 | 96 | ## Google Earth Engine (LULC) 97 | Use `scripts/gee_lulc_classifier.js` in the **GEE Code Editor** to export 2020/2024 LULC (6 classes) as GeoTIFF. 98 | 99 | ## CLI help 100 | ```bash 101 | python -m pmd.cli --help 102 | python -m pmd.cli compute-indices --help 103 | ``` 104 | 105 | ## License 106 | MIT (see `LICENSE`). 107 | 108 | ## Citation 109 | If you use this repository, please cite it using `CITATION.cff` and, if applicable, cite the accompanying manuscript. 110 | 111 | --- 112 | 113 | ### Acknowledgement 114 | Methods and study framing are aligned with the uploaded project manuscript on *Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria Using Remote Sensing and Machine Learning*. 115 | -------------------------------------------------------------------------------- /git_hub_repo_predictive_mapping_of_oil_spill_induced_mangrove_degradation.md: -------------------------------------------------------------------------------- 1 | # Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria 2 | 3 | > **Repository template** for remote sensing + machine learning workflow to map, monitor, and predict oil‑spill driven mangrove degradation in the Niger Delta (Nigeria). Designed for full reproducibility and easy extension to other coastal regions. 4 | 5 | --- 6 | 7 | ## 📁 Repository Structure 8 | 9 | ``` 10 | Predictive-Mangrove-Degradation/ 11 | ├─ README.md 12 | ├─ LICENSE 13 | ├─ CITATION.cff 14 | ├─ .gitignore 15 | ├─ requirements.txt 16 | ├─ environment.yml 17 | ├─ Makefile 18 | ├─ dvc.yaml # optional if you use DVC 19 | ├─ pyproject.toml # optional; for packaging if needed 20 | ├─ .pre-commit-config.yaml 21 | ├─ .github/ 22 | │ └─ workflows/ 23 | │ └─ ci.yml # lint + tests 24 | ├─ configs/ 25 | │ ├─ study_area.geojson # AOI polygon (placeholder) 26 | │ ├─ params.yaml # all hyperparameters & data paths 27 | │ └─ classes.json # LULC class map 28 | ├─ data/ 29 | │ ├─ raw/ # (gitignored) raw downloads 30 | │ ├─ interim/ # (gitignored) cleaned/intermediate 31 | │ └─ processed/ # (gitignored) features/tiles ready for ML 32 | ├─ docs/ 33 | │ ├─ methodology.md 34 | │ ├─ data_sources.md 35 | │ ├─ model_report.md 36 | │ └─ governance.md 37 | ├─ notebooks/ 38 | │ ├─ 00_explore_AOI.ipynb 39 | │ ├─ 10_build_indices.ipynb 40 | │ ├─ 20_LULC_RF.ipynb 41 | │ ├─ 30_spill_hotspots.ipynb 42 | │ ├─ 40_train_xgboost.ipynb 43 | │ └─ 50_shap_interpretation.ipynb 44 | ├─ src/ 45 | │ ├─ __init__.py 46 | │ ├─ utils/ 47 | │ │ ├─ io.py 48 | │ │ ├─ geoutils.py 49 | │ │ └─ viz.py 50 | │ ├─ data/ 51 | │ │ ├─ download_sat.py 52 | │ │ ├─ download_spills.py 53 | │ │ ├─ build_dem.py 54 | │ │ └─ tiles.py 55 | │ ├─ features/ 56 | │ │ ├─ indices.py # NDVI/RENDVI/ΔNDVI 57 | │ │ ├─ lulc.py # RF classification 58 | │ │ └─ sensitivity.py # ESI + elevation features 59 | │ ├─ modeling/ 60 | │ │ ├─ dataset.py # tabular feature assembly 61 | │ │ ├─ train_gbdt.py # XGBoost training + CV + metrics 62 | │ │ ├─ predict.py 63 | │ │ └─ shap_report.py 64 | │ └─ spill/ 65 | │ ├─ kde.py # kernel density for hotspots 66 | │ └─ clusters.py # k‑means severity clusters 67 | ├─ models/ 68 | │ ├─ artifacts/ # (gitignored) trained models 69 | │ └─ reports/ # auto‑generated metrics/plots 70 | └─ figures/ # key PNG/SVG figures for README & papers 71 | ``` 72 | 73 | > **Tip:** clone as a template, then replace AOI and parameters in `configs/params.yaml`. 74 | 75 | --- 76 | 77 | ## 🚀 Quickstart 78 | 79 | ### 1) Create the environment 80 | 81 | ```bash 82 | # conda (recommended) 83 | conda env create -f environment.yml 84 | conda activate mangrove-ml 85 | 86 | # OR pip 87 | python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate 88 | pip install -r requirements.txt 89 | pre-commit install 90 | ``` 91 | 92 | ### 2) Configure the study area & parameters 93 | 94 | - Put your Area of Interest polygon in `configs/study_area.geojson` (WGS84). 95 | - Edit `configs/params.yaml` to set: 96 | - time windows (e.g., 2020 vs 2024), 97 | - sensors (Sentinel‑2, Landsat‑8), 98 | - DEM source (SRTM 30 m), 99 | - NOSDRA oil‑spill API query filters, 100 | - ML hyperparameters (XGBoost/GBDT), 101 | - output tiling size & CRS (UTM 32N for Rivers State). 102 | 103 | ### 3) Pull data (satellite, DEM, spills) 104 | 105 | ```bash 106 | # sentinel-2 / landsat-8 / SRTM using open APIs or GEE exports 107 | python -m src.data.download_sat --cfg configs/params.yaml 108 | 109 | # NOSDRA spills (2023–2025) JSON → GeoPackage 110 | python -m src.data.download_spills --cfg configs/params.yaml 111 | 112 | # Build DEM stack 113 | python -m src.data.build_dem --cfg configs/params.yaml 114 | ``` 115 | 116 | ### 4) Build features 117 | 118 | ```bash 119 | # NDVI, RENDVI, ΔNDVI, ΔRENDVI rasters 120 | python -m src.features.indices --cfg configs/params.yaml 121 | 122 | # LULC (Random Forest) and change detection 123 | python -m src.features.lulc --cfg configs/params.yaml 124 | 125 | # KDE spill intensity, severity clusters, ESI weights, elevation bands 126 | python -m src.spill.kde --cfg configs/params.yaml 127 | python -m src.spill.clusters --cfg configs/params.yaml 128 | python -m src.features.sensitivity --cfg configs/params.yaml 129 | ``` 130 | 131 | ### 5) Train model & explain 132 | 133 | ```bash 134 | # Assemble tabular dataset from raster stacks + vector overlays 135 | python -m src.modeling.dataset --cfg configs/params.yaml 136 | 137 | # Train Gradient Boosted Decision Trees (XGBoost) 138 | python -m src.modeling.train_gbdt --cfg configs/params.yaml 139 | 140 | # Generate probability maps + class maps (risk tiers) 141 | python -m src.modeling.predict --cfg configs/params.yaml 142 | 143 | # SHAP feature attribution report 144 | python -m src.modeling.shap_report --cfg configs/params.yaml 145 | ``` 146 | 147 | ### 6) Reproduce figures & paper tables 148 | 149 | Jupyter notebooks in `notebooks/` regenerate all exploratory charts and publication figures. Final plots are saved to `figures/`. 150 | 151 | --- 152 | 153 | ## 🧠 Scientific Overview 154 | 155 | **Goal.** Predict and map mangrove degradation risk driven by oil spills in the Niger Delta by integrating multi‑temporal satellite indices (NDVI, RENDVI), LULC change, DEM‑based elevation, Environmental Sensitivity Index (ESI), and oil‑spill hotspot metrics into an explainable ML model (GBDT/XGBoost). 156 | 157 | **Core signals.** 158 | - **ΔNDVI/ΔRENDVI:** vegetation stress and canopy loss between two epochs. 159 | - **KDE spill intensity & k‑means severity:** spatial pressure of spills. 160 | - **Elevation bands (e.g., <5 m):** low‑lying retention areas. 161 | - **ESI class (e.g., 10b mangroves):** intrinsic ecological vulnerability. 162 | - **LULC transitions:** e.g., flooded vegetation → bare/built. 163 | 164 | **Model.** Gradient‑boosted trees with spatially stratified CV, tuned via `configs/params.yaml`. Model cards and SHAP explanations are auto‑exported to `models/reports/`. 165 | 166 | --- 167 | 168 | ## 🔧 Key Configuration (`configs/params.yaml`) 169 | 170 | ```yaml 171 | project: 172 | name: predictive-mangrove-degradation 173 | crs: "EPSG:32632" # UTM 32N 174 | tile_size: 512 175 | aoi_file: configs/study_area.geojson 176 | out_dir: data/processed 177 | 178 | satellite: 179 | sensors: ["sentinel2", "landsat8"] 180 | s2: 181 | level: L2A 182 | date_start: "2020-01-01" 183 | date_end: "2020-03-31" # dry season example 184 | date_start_2: "2024-01-01" 185 | date_end_2: "2024-03-31" 186 | cloud_pct: 20 187 | l8: 188 | date_start: "2020-01-01" 189 | date_end: "2020-12-31" 190 | 191 | dem: 192 | source: "SRTM_30m" 193 | bands: ["elevation", "slope"] 194 | elevation_bins: [0,5,10,100] 195 | 196 | spills: 197 | source: "NOSDRA" 198 | years: [2023,2024,2025] 199 | min_volume_bbl: 1 200 | kde_bandwidth_m: 2500 201 | clusters_k: 5 202 | 203 | esi: 204 | source_file: data/raw/ESI/esi_nigeria.gpkg 205 | class_weights: 206 | "10b": 3 207 | "10a": 2 208 | "9b": 2 209 | default: 1 210 | 211 | lulc: 212 | classes: [water, trees, rangeland, flooded_veg, built, bare] 213 | rf_trees: 100 214 | 215 | model: 216 | algo: xgboost 217 | test_size: 0.2 218 | cv_folds: 5 219 | params: 220 | n_estimators: 400 221 | learning_rate: 0.05 222 | max_depth: 6 223 | subsample: 0.8 224 | colsample_bytree: 0.8 225 | reg_lambda: 1.0 226 | 227 | outputs: 228 | prob_threshold: 0.5 229 | risk_bins: [0.2, 0.4, 0.6, 0.8] 230 | ``` 231 | 232 | --- 233 | 234 | ## 🧩 Example Scripts 235 | 236 | ### `src/features/indices.py` 237 | 238 | ```python 239 | import argparse, yaml, numpy as np, rasterio 240 | from rasterio.enums import Resampling 241 | from pathlib import Path 242 | 243 | # Minimal NDVI/RENDVI builder from pre-downloaded surface reflectance stacks 244 | # Expects aligned rasters: NIR, RED, RE1 (705 nm), RE2 (740 nm) 245 | 246 | 247 | def ndvi(nir, red): 248 | return (nir - red) / (nir + red + 1e-6) 249 | 250 | 251 | def rendvi(re, nir): 252 | return (nir - re) / (nir + re + 1e-6) 253 | 254 | 255 | def write_like(src_path, out_path, arr): 256 | with rasterio.open(src_path) as src: 257 | profile = src.profile 258 | profile.update(dtype=rasterio.float32, count=1, compress="lzw") 259 | with rasterio.open(out_path, "w", **profile) as dst: 260 | dst.write(arr.astype(np.float32), 1) 261 | 262 | 263 | def main(cfg): 264 | in_dir = Path(cfg["project"]["out_dir"]) / "sr_stacks" 265 | out_dir = Path(cfg["project"]["out_dir"]) / "features" 266 | out_dir.mkdir(parents=True, exist_ok=True) 267 | 268 | # Example file naming convention (adjust to your pipeline) 269 | nir_2020 = in_dir / "nir_2020.tif" 270 | red_2020 = in_dir / "red_2020.tif" 271 | re1_2020 = in_dir / "re1_2020.tif" 272 | 273 | nir_2024 = in_dir / "nir_2024.tif" 274 | red_2024 = in_dir / "red_2024.tif" 275 | re1_2024 = in_dir / "re1_2024.tif" 276 | 277 | with rasterio.open(nir_2020) as n0, rasterio.open(red_2020) as r0: 278 | ndvi_2020 = ndvi(n0.read(1), r0.read(1)) 279 | with rasterio.open(nir_2024) as n1, rasterio.open(red_2024) as r1: 280 | ndvi_2024 = ndvi(n1.read(1), r1.read(1)) 281 | 282 | write_like(nir_2020, out_dir / "ndvi_2020.tif", ndvi_2020) 283 | write_like(nir_2024, out_dir / "ndvi_2024.tif", ndvi_2024) 284 | write_like(nir_2024, out_dir / "d_ndvi_2020_2024.tif", ndvi_2024 - ndvi_2020) 285 | 286 | with rasterio.open(re1_2020) as re0: 287 | rendvi_2020 = rendvi(re0.read(1), rasterio.open(nir_2020).read(1)) 288 | with rasterio.open(re1_2024) as re1src: 289 | rendvi_2024 = rendvi(re1src.read(1), rasterio.open(nir_2024).read(1)) 290 | 291 | write_like(re1_2020, out_dir / "rendvi_2020.tif", rendvi_2020) 292 | write_like(re1_2024, out_dir / "rendvi_2024.tif", rendvi_2024) 293 | write_like(re1_2024, out_dir / "d_rendvi_2020_2024.tif", rendvi_2024 - rendvi_2020) 294 | 295 | 296 | if __name__ == "__main__": 297 | p = argparse.ArgumentParser() 298 | p.add_argument("--cfg", required=True) 299 | args = p.parse_args() 300 | with open(args.cfg) as f: 301 | cfg = yaml.safe_load(f) 302 | main(cfg) 303 | ``` 304 | 305 | ### `src/spill/kde.py` 306 | 307 | ```python 308 | import geopandas as gpd 309 | import numpy as np 310 | from sklearn.neighbors import KernelDensity 311 | from shapely.geometry import Point 312 | from rasterio.features import rasterize 313 | import rasterio 314 | 315 | # Build a KDE raster of spill intensity from point events (bbl‑weighted) 316 | 317 | 318 | def build_kde(points_gpkg, bandwidth_m, out_raster, template_raster): 319 | gdf = gpd.read_file(points_gpkg).to_crs("EPSG:32632") 320 | X = np.vstack([gdf.geometry.x.values, gdf.geometry.y.values]).T 321 | weights = gdf["volume_bbl"].values 322 | 323 | kde = KernelDensity(bandwidth=bandwidth_m, kernel="gaussian", metric="euclidean") 324 | kde.fit(X, sample_weight=weights) 325 | 326 | with rasterio.open(template_raster) as src: 327 | profile = src.profile 328 | xs = np.arange(src.bounds.left, src.bounds.right, src.res[0]) 329 | ys = np.arange(src.bounds.bottom, src.bounds.top, src.res[1]) 330 | xx, yy = np.meshgrid(xs, ys) 331 | grid = np.vstack([xx.ravel(), yy.ravel()]).T 332 | z = np.exp(kde.score_samples(grid)).reshape(yy.shape) 333 | 334 | profile.update(count=1, dtype=rasterio.float32, compress="lzw") 335 | with rasterio.open(out_raster, "w", **profile) as dst: 336 | dst.write(z.astype("float32"), 1) 337 | ``` 338 | 339 | ### `src/modeling/train_gbdt.py` 340 | 341 | ```python 342 | import argparse, yaml, json 343 | import numpy as np 344 | import pandas as pd 345 | from sklearn.model_selection import StratifiedKFold 346 | from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, roc_auc_score 347 | import xgboost as xgb 348 | from pathlib import Path 349 | 350 | # Tabular dataset must include columns listed in cfg["model"]["features"] 351 | 352 | 353 | def kfold_train(X, y, params, cv_folds): 354 | skf = StratifiedKFold(n_splits=cv_folds, shuffle=True, random_state=42) 355 | metrics = [] 356 | models = [] 357 | for tr, va in skf.split(X, y): 358 | dtr = xgb.DMatrix(X.iloc[tr], label=y.iloc[tr]) 359 | dva = xgb.DMatrix(X.iloc[va], label=y.iloc[va]) 360 | model = xgb.train(params, dtr, num_boost_round=params.get("n_estimators", 400)) 361 | p = model.predict(dva) 362 | yhat = (p >= 0.5).astype(int) 363 | metrics.append({ 364 | "acc": accuracy_score(y.iloc[va], yhat), 365 | "prec": precision_score(y.iloc[va], yhat), 366 | "rec": recall_score(y.iloc[va], yhat), 367 | "f1": f1_score(y.iloc[va], yhat), 368 | "auc": roc_auc_score(y.iloc[va], p), 369 | }) 370 | models.append(model) 371 | return models, pd.DataFrame(metrics) 372 | 373 | 374 | def main(cfg): 375 | df = pd.read_parquet("data/processed/training_dataset.parquet") 376 | features = cfg["model"].get("features", [ 377 | "d_ndvi", "d_rendvi", "spill_kde", "elevation_bin", "esi_weight", "lulc_change" 378 | ]) 379 | X = df[features] 380 | y = df["label_degraded"].astype(int) 381 | 382 | params = { 383 | "objective": "binary:logistic", 384 | "eval_metric": "logloss", 385 | "eta": cfg["model"]["params"]["learning_rate"], 386 | "max_depth": cfg["model"]["params"]["max_depth"], 387 | "subsample": cfg["model"]["params"]["subsample"], 388 | "colsample_bytree": cfg["model"]["params"]["colsample_bytree"], 389 | "lambda": cfg["model"]["params"]["reg_lambda"], 390 | "n_estimators": cfg["model"]["params"]["n_estimators"], 391 | "verbosity": 0 392 | } 393 | 394 | models, metr = kfold_train(X, y, params, cfg["model"]["cv_folds"]) 395 | outdir = Path("models/artifacts"); outdir.mkdir(parents=True, exist_ok=True) 396 | for i, m in enumerate(models): 397 | m.save_model(outdir / f"gbdt_fold{i}.json") 398 | metr.to_csv("models/reports/cv_metrics.csv", index=False) 399 | 400 | print(metr.describe()) 401 | 402 | 403 | if __name__ == "__main__": 404 | p = argparse.ArgumentParser() 405 | p.add_argument("--cfg", required=True) 406 | args = p.parse_args() 407 | with open(args.cfg) as f: 408 | cfg = yaml.safe_load(f) 409 | main(cfg) 410 | ``` 411 | 412 | --- 413 | 414 | ## 🧪 Testing & CI 415 | 416 | - Run lint & tests locally: 417 | 418 | ```bash 419 | pre-commit run --all-files 420 | pytest -q 421 | ``` 422 | 423 | - CI (`.github/workflows/ci.yml`) runs pre‑commit and tests on pushes and PRs. 424 | 425 | --- 426 | 427 | ## 🗃️ Data Sources (placeholders) 428 | 429 | - **Sentinel‑2 L2A** (Copernicus Open Access Hub) 430 | - **Landsat‑8 SR** (USGS EarthExplorer) 431 | - **NOSDRA Oil Spill Monitor** (API/CSV exports) 432 | - **SRTM 30 m DEM** (USGS) 433 | - **Environmental Sensitivity Index (ESI)** shapefiles (NOAA / national agency) 434 | - **Global Mangrove Watch (GMW)** for baseline mangrove extent 435 | 436 | > See `docs/data_sources.md` for exact links and access tips. 437 | 438 | --- 439 | 440 | ## 📊 Outputs 441 | 442 | - **Degradation probability raster** (GeoTIFF) 443 | - **Risk tiers** (very low → very high) vectorized for planning 444 | - **Hotspot maps** (KDE & clusters) 445 | - **Model card** with CV metrics + **SHAP** feature attributions 446 | - **Change maps**: LULC, ΔNDVI/ΔRENDVI, and overlays with ESI/elevation 447 | 448 | --- 449 | 450 | ## 🧩 Reproducibility & Data Governance 451 | 452 | - All parameters tracked in `configs/params.yaml`. 453 | - Optional data versioning with **DVC** (see `dvc.yaml`). 454 | - Scripts idempotent; safe to re‑run when inputs update. 455 | - Include `docs/governance.md` to describe data licenses, consent, and ethical use. 456 | 457 | --- 458 | 459 | ## 🙌 Contributing 460 | 461 | 1. Fork & branch (`feat/…`, `fix/…`). 462 | 2. Run `pre-commit` hooks; add tests. 463 | 3. Open a PR with a clear description and screenshots of maps where relevant. 464 | 465 | See `CONTRIBUTING.md` and `CODE_OF_CONDUCT.md` for details. 466 | 467 | --- 468 | 469 | ## 📜 License 470 | 471 | MIT (see `LICENSE`). 472 | 473 | --- 474 | 475 | ## ✍️ Citation 476 | 477 | Use the generated `CITATION.cff` or cite the repository as: 478 | 479 | > YourName et al. (2025). *Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria*. GitHub repository. https://github.com/Akajiaku11 480 | 481 | --- 482 | 483 | ## 🔖 File: `README.md` 484 | 485 | (This README content mirrors the sections above, adapted for GitHub formatting with badges.) 486 | 487 | --- 488 | 489 | ## 🔖 File: `requirements.txt` 490 | 491 | ``` 492 | geopandas 493 | rasterio 494 | rioxarray 495 | xarray 496 | numpy 497 | pandas 498 | scikit-learn 499 | xgboost 500 | shap 501 | pyyaml 502 | pyproj 503 | tqdm 504 | matplotlib 505 | seaborn 506 | contextily 507 | requests 508 | joblib 509 | jupyter 510 | ``` 511 | 512 | --- 513 | 514 | ## 🔖 File: `environment.yml` 515 | 516 | ```yaml 517 | name: mangrove-ml 518 | channels: [conda-forge] 519 | dependencies: 520 | - python=3.11 521 | - geopandas 522 | - rasterio 523 | - rioxarray 524 | - xarray 525 | - numpy 526 | - pandas 527 | - scikit-learn 528 | - xgboost 529 | - shap 530 | - pyyaml 531 | - pyproj 532 | - tqdm 533 | - matplotlib 534 | - contextily 535 | - requests 536 | - joblib 537 | - jupyterlab 538 | - pip 539 | - pip: 540 | - pre-commit 541 | - pytest 542 | ``` 543 | 544 | --- 545 | 546 | ## 🔖 File: `.gitignore` 547 | 548 | ``` 549 | # Python 550 | __pycache__/ 551 | *.pyc 552 | .venv/ 553 | 554 | # Jupyter 555 | .ipynb_checkpoints/ 556 | 557 | # Data & models 558 | /data/raw/ 559 | /data/interim/ 560 | /data/processed/ 561 | /models/artifacts/ 562 | /models/reports/*.tmp 563 | 564 | # DVC 565 | /.dvc/ 566 | .dvc/ 567 | *.dvc 568 | 569 | # OS 570 | .DS_Store 571 | Thumbs.db 572 | ``` 573 | 574 | --- 575 | 576 | ## 🔖 File: `.github/workflows/ci.yml` 577 | 578 | ```yaml 579 | name: CI 580 | on: [push, pull_request] 581 | jobs: 582 | build: 583 | runs-on: ubuntu-latest 584 | steps: 585 | - uses: actions/checkout@v4 586 | - uses: actions/setup-python@v5 587 | with: 588 | python-version: '3.11' 589 | - name: Install deps 590 | run: | 591 | python -m pip install --upgrade pip 592 | pip install -r requirements.txt 593 | pip install pre-commit pytest 594 | - name: Pre-commit 595 | run: pre-commit run --all-files 596 | - name: Tests 597 | run: pytest -q 598 | ``` 599 | 600 | --- 601 | 602 | ## 🔖 File: `CITATION.cff` 603 | 604 | ```yaml 605 | cff-version: 1.2.0 606 | title: Predictive Mapping of Oil Spill-Induced Mangrove Degradation in Nigeria 607 | authors: 608 | - family-names: Eteh 609 | given-names: Desmond Rowland 610 | - family-names: Akajiaku 611 | given-names: Ugochukwu Charles 612 | - name: Contributors 613 | version: 1.0.0 614 | license: MIT 615 | date-released: 2025-10-14 616 | repository-code: https://github.com/Akajiaku11/Predictive-Mangrove-Degradation 617 | ``` 618 | 619 | --- 620 | 621 | ## 🔖 File: `LICENSE` 622 | 623 | MIT License (insert standard boilerplate with your name and year). 624 | 625 | --- 626 | 627 | ## 🔖 File: `docs/methodology.md` 628 | 629 | - End‑to‑end narrative of preprocessing, indices, LULC, KDE/Clusters, ESI/elevation integration, ML training, spatial CV, and SHAP explanation. 630 | - Include equations for NDVI, RENDVI, KDE, k‑means objective, and GBDT loss. 631 | 632 | --- 633 | 634 | ## 🔖 File: `docs/data_sources.md` 635 | 636 | - How to access Copernicus (Sentinel‑2), USGS (Landsat & SRTM), NOSDRA spill data, ESI shapefiles, and GMW. 637 | - Data licenses and acceptable‑use notes. 638 | 639 | --- 640 | 641 | ## 🔖 File: `docs/model_report.md` 642 | 643 | - Auto‑filled by training step with CV table (accuracy/precision/recall/F1/AUC), confusion matrix, PR/ROC curves, and SHAP bar/summary plots. 644 | 645 | --- 646 | 647 | ## 🔖 File: `docs/governance.md` 648 | 649 | - Ethical use, environmental justice considerations, and guidance for communicating uncertainty. 650 | 651 | --- 652 | 653 | ## 🔖 File: `Makefile` 654 | 655 | ```make 656 | .PHONY: all data features train predict report 657 | 658 | all: data features train predict report 659 | 660 | init: 661 | pre-commit install 662 | 663 | data: 664 | python -m src.data.download_sat --cfg configs/params.yaml 665 | python -m src.data.download_spills --cfg configs/params.yaml 666 | python -m src.data.build_dem --cfg configs/params.yaml 667 | 668 | features: 669 | python -m src.features.indices --cfg configs/params.yaml 670 | python -m src.features.lulc --cfg configs/params.yaml 671 | python -m src.spill.kde --cfg configs/params.yaml 672 | python -m src.spill.clusters --cfg configs/params.yaml 673 | python -m src.features.sensitivity --cfg configs/params.yaml 674 | 675 | train: 676 | python -m src.modeling.dataset --cfg configs/params.yaml 677 | python -m src.modeling.train_gbdt --cfg configs/params.yaml 678 | 679 | predict: 680 | python -m src.modeling.predict --cfg configs/params.yaml 681 | 682 | report: 683 | python -m src.modeling.shap_report --cfg configs/params.yaml 684 | ``` 685 | 686 | --- 687 | 688 | ## 🧭 Next Steps for Your GitHub 689 | 690 | 1. Create a new repo under **`github.com/Akajiaku11`** named `Predictive-Mangrove-Degradation`. 691 | 2. Copy this scaffold into the repo, commit, and push. 692 | 3. Add AOI & parameters, then run `make all` to generate first results. 693 | 4. Upload key maps in `figures/` and publish a concise `docs/model_report.md`. 694 | 5. (Optional) Turn on GitHub Pages to host an interactive map or docs. 695 | 696 | --- 697 | 698 | ## ✉️ Contact & Acknowledgements 699 | 700 | - Open an issue for questions/feature requests. 701 | - Acknowledge Rivers State, NOSDRA, and open‑data providers in publications. 702 | 703 | --- 704 | 705 | *This template is designed to be publication‑ready and policy‑useful (SDGs 13/14/15, blue‑carbon, environmental compliance). Replace placeholders with your study‑specific details before release.* 706 | 707 | --------------------------------------------------------------------------------