├── .gitkeep
├── 00_quickstart.ipynb
├── 10_model_diagnostics.ipynb
├── LICENSE
├── test_imports.py
├── requirements.txt
├── .gitignore
├── Makefile
├── pyproject.toml
├── CITATION.cff
├── ci.yml
├── README.md
└── git_hub_repo_predictive_mapping_of_oil_spill_induced_mangrove_degradation.md


/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/00_quickstart.ipynb:
--------------------------------------------------------------------------------
1 | {}


--------------------------------------------------------------------------------
/10_model_diagnostics.ipynb:
--------------------------------------------------------------------------------
1 | {}


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 | 
3 | Copyright (c) 2025
4 | 
5 | Permission is hereby granted, free of charge, to any person obtaining a copy ...


--------------------------------------------------------------------------------
/test_imports.py:
--------------------------------------------------------------------------------
1 | def test_imports():
2 |     import pmd
3 |     import pmd.indices
4 |     import pmd.features
5 |     import pmd.model
6 |     assert True
7 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | affine
 2 | geopandas
 3 | rasterio
 4 | rasterstats
 5 | numpy
 6 | pandas
 7 | scikit-learn
 8 | xgboost
 9 | shap
10 | matplotlib
11 | pyyaml
12 | scipy
13 | joblib
14 | tqdm
15 | pyproj
16 | rtree
17 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | .venv/
 2 | __pycache__/
 3 | *.egg-info/
 4 | data/raw/*
 5 | !data/raw/.gitkeep
 6 | data/interim/*
 7 | !data/interim/.gitkeep
 8 | data/processed/*
 9 | !data/processed/.gitkeep
10 | figures/*
11 | !figures/.gitkeep
12 | .ipynb_checkpoints/
13 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | .PHONY: format lint test
 2 | 
 3 | format:
 4 | 	python -m pip install ruff black || true
 5 | 	ruff check --select I --fix . || true
 6 | 	black . || true
 7 | 
 8 | lint:
 9 | 	python -m pip install ruff || true
10 | 	ruff check . || true
11 | 
12 | test:
13 | 	python -m pytest -q
14 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [project]
 2 | name = "pmd"
 3 | version = "1.0.0"
 4 | description = "Predictive mapping of oil spill-induced mangrove degradation (Nigeria)"
 5 | authors = [{name="Desmond R. Eteh"}, {name="U. C. Akajiaku"}]
 6 | readme = "README.md"
 7 | requires-python = ">=3.9"
 8 | 
 9 | [project.scripts]
10 | pmd = "pmd.cli:app"
11 | 
12 | [build-system]
13 | requires = ["setuptools", "wheel"]
14 | build-backend = "setuptools.build_meta"
15 | 


--------------------------------------------------------------------------------
/CITATION.cff:
--------------------------------------------------------------------------------
 1 | cff-version: 1.2.0
 2 | message: If you use this work, please cite it.
 3 | title: Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria
 4 | authors:
 5 |   - family-names: Eteh
 6 |     given-names: Desmond Rowland
 7 |   - family-names: Akajiaku
 8 |     given-names: Ugochukwu Charles
 9 |   - family-names: Mogo
10 |     given-names: Felicia Chinwe
11 | version: "1.0.0"
12 | date-released: 2025-10-14
13 | license: MIT
14 | 


--------------------------------------------------------------------------------
/ci.yml:
--------------------------------------------------------------------------------
 1 | name: CI
 2 | on: {push: {branches: [main, master]}, pull_request: {}}
 3 | jobs:
 4 |   build:
 5 |     runs-on: ubuntu-latest
 6 |     steps:
 7 |       - uses: actions/checkout@v4
 8 |       - uses: actions/setup-python@v5
 9 |         with: {python-version: '3.11'}
10 |       - run: pip install -r requirements.txt
11 |       - run: python -m pip install ruff black pytest
12 |       - run: ruff check .
13 |       - run: black --check .
14 |       - run: pytest -q
15 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria
  2 | **Remote Sensing + Machine Learning (Python + GEE)**
  3 | 
  4 | This repository contains a reproducible workflow for mapping and **predicting oil spill‑induced mangrove degradation**
  5 | in the Niger Delta (Rivers State, Nigeria) using **Sentinel‑2 / Landsat 8**, **SRTM**, **NOSDRA oil‑spill records**,
  6 | and **Gradient Boosted Decision Trees (XGBoost)** with **SHAP** explainability.
  7 | 
  8 | > Paper context and methods adapted from the project draft provided by the authors (uploaded by the user).
  9 | 
 10 | ## Highlights
 11 | - Compute **NDVI / Red‑Edge NDVI (RENDVI)** time series (2020 → 2024)
 12 | - Supervised **LULC classification** (Random Forest in **Google Earth Engine**) for 2020 & 2024
 13 | - **Oil spill hotspot severity** using Kernel Density Estimation (KDE) + **K‑Means**
 14 | - Feature stack: ΔNDVI, ΔRENDVI, spill density, **ESI** rank, elevation class, LULC transition
 15 | - Train **XGBoost** classifier; evaluate **Accuracy, Precision, Recall, F1, ROC‑AUC**
 16 | - **Explain predictions** with **SHAP**; export risk probability map GeoTIFF + figures
 17 | 
 18 | ## Repository structure
 19 | ```
 20 | predictive-mangrove-degradation/
 21 | ├─ README.md
 22 | ├─ LICENSE
 23 | ├─ CITATION.cff
 24 | ├─ pyproject.toml
 25 | ├─ requirements.txt
 26 | ├─ Makefile
 27 | ├─ .gitignore
 28 | ├─ .github/workflows/ci.yml
 29 | ├─ configs/
 30 | │  └─ rivers_state_example.yaml
 31 | ├─ data/
 32 | │  ├─ raw/           # put input rasters/vectors here
 33 | │  ├─ interim/       # intermediate outputs
 34 | │  └─ processed/     # final maps & model artifacts
 35 | ├─ notebooks/
 36 | │  ├─ 00_quickstart.ipynb
 37 | │  └─ 10_model_diagnostics.ipynb
 38 | ├─ scripts/
 39 | │  ├─ gee_lulc_classifier.js        # RF in GEE for 2020/2024
 40 | │  └─ prepare_shapefile_grid.py
 41 | ├─ src/pmd/
 42 | │  ├─ __init__.py
 43 | │  ├─ cli.py                         # command line interface
 44 | │  ├─ io.py                          # loading/saving
 45 | │  ├─ indices.py                     # NDVI/RENDVI + delta
 46 | │  ├─ geoutils.py                    # raster/vector helpers
 47 | │  ├─ spills.py                      # KDE + clustering
 48 | │  ├─ features.py                    # stack features for ML
 49 | │  ├─ model.py                       # train/eval xgboost + shap
 50 | │  ├─ visualize.py                   # plots & maps
 51 | │  └─ esi_zones.py                   # ESI handling
 52 | └─ tests/
 53 |    └─ test_imports.py
 54 | ```
 55 | 
 56 | ## Quick start
 57 | 1. **Clone** this repo and create a Python env:
 58 |    ```bash
 59 |    uv venv && source .venv/bin/activate
 60 |    uv pip install -r requirements.txt
 61 |    ```
 62 |    (Or use `pip` / `conda` as you prefer.)
 63 | 
 64 | 2. **Data placement** (`data/raw/`):
 65 |    - `sentinel_2020.tif`, `sentinel_2024.tif`   – atmospherically‑corrected surface reflectance (bands: B4, B5, B6, B8)
 66 |    - `landsat8_2020.tif`, `landsat8_2024.tif`   – surface reflectance (B4, B5)
 67 |    - `srtm_30m.tif`                             – elevation
 68 |    - `lulc_2020.tif`, `lulc_2024.tif`           – exported from **GEE** (see `scripts/gee_lulc_classifier.js`)
 69 |    - `spills_2023_2025.geojson`                 – **NOSDRA** events (point features: date, barrels, cause)
 70 |    - `esi.shp`                                  – Environmental Sensitivity Index polygons (with rank field)
 71 |    - `rivers_state_boundary.shp`                – study boundary
 72 | 
 73 |    All rasters should share **CRS = EPSG:32632** and **resolution = 30 m**.
 74 | 
 75 | 3. **Configure** paths in `configs/rivers_state_example.yaml`.
 76 | 
 77 | 4. **Run the pipeline** (end‑to‑end):
 78 |    ```bash
 79 |    python -m pmd.cli compute-indices --cfg configs/rivers_state_example.yaml
 80 |    python -m pmd.cli build-spill-features --cfg configs/rivers_state_example.yaml
 81 |    python -m pmd.cli stack-features --cfg configs/rivers_state_example.yaml
 82 |    python -m pmd.cli train-model --cfg configs/rivers_state_example.yaml
 83 |    python -m pmd.cli predict-map --cfg configs/rivers_state_example.yaml
 84 |    python -m pmd.cli explain-model --cfg configs/rivers_state_example.yaml
 85 |    ```
 86 | 
 87 | 5. **Outputs** land here:
 88 |    - `data/processed/ndvi_2020.tif`, `ndvi_2024.tif`, `delta_ndvi.tif`
 89 |    - `data/processed/rendvi_2020.tif`, `rendvi_2024.tif`, `delta_rendvi.tif`
 90 |    - `data/interim/spill_kde.tif`, `spill_clusters.geojson`
 91 |    - `data/processed/feature_stack.parquet`
 92 |    - `data/processed/models/xgb_model.json`, `scaler.pkl`, `metrics.json`
 93 |    - `data/processed/prediction_prob.tif`, `predicted_classes.tif`
 94 |    - `figures/roc_curve.png`, `figures/shap_summary.png`, `figures/feature_importance.png`
 95 | 
 96 | ## Google Earth Engine (LULC)
 97 | Use `scripts/gee_lulc_classifier.js` in the **GEE Code Editor** to export 2020/2024 LULC (6 classes) as GeoTIFF.
 98 | 
 99 | ## CLI help
100 | ```bash
101 | python -m pmd.cli --help
102 | python -m pmd.cli compute-indices --help
103 | ```
104 | 
105 | ## License
106 | MIT (see `LICENSE`).
107 | 
108 | ## Citation
109 | If you use this repository, please cite it using `CITATION.cff` and, if applicable, cite the accompanying manuscript.
110 | 
111 | ---
112 | 
113 | ### Acknowledgement
114 | Methods and study framing are aligned with the uploaded project manuscript on *Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria Using Remote Sensing and Machine Learning*. 
115 | 


--------------------------------------------------------------------------------
/git_hub_repo_predictive_mapping_of_oil_spill_induced_mangrove_degradation.md:
--------------------------------------------------------------------------------
  1 | # Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria
  2 | 
  3 | > **Repository template** for remote sensing + machine learning workflow to map, monitor, and predict oil‑spill driven mangrove degradation in the Niger Delta (Nigeria). Designed for full reproducibility and easy extension to other coastal regions.
  4 | 
  5 | ---
  6 | 
  7 | ## 📁 Repository Structure
  8 | 
  9 | ```
 10 | Predictive-Mangrove-Degradation/
 11 | ├─ README.md
 12 | ├─ LICENSE
 13 | ├─ CITATION.cff
 14 | ├─ .gitignore
 15 | ├─ requirements.txt
 16 | ├─ environment.yml
 17 | ├─ Makefile
 18 | ├─ dvc.yaml                 # optional if you use DVC
 19 | ├─ pyproject.toml           # optional; for packaging if needed
 20 | ├─ .pre-commit-config.yaml
 21 | ├─ .github/
 22 | │  └─ workflows/
 23 | │     └─ ci.yml             # lint + tests
 24 | ├─ configs/
 25 | │  ├─ study_area.geojson    # AOI polygon (placeholder)
 26 | │  ├─ params.yaml           # all hyperparameters & data paths
 27 | │  └─ classes.json          # LULC class map
 28 | ├─ data/
 29 | │  ├─ raw/                  # (gitignored) raw downloads
 30 | │  ├─ interim/              # (gitignored) cleaned/intermediate
 31 | │  └─ processed/            # (gitignored) features/tiles ready for ML
 32 | ├─ docs/
 33 | │  ├─ methodology.md
 34 | │  ├─ data_sources.md
 35 | │  ├─ model_report.md
 36 | │  └─ governance.md
 37 | ├─ notebooks/
 38 | │  ├─ 00_explore_AOI.ipynb
 39 | │  ├─ 10_build_indices.ipynb
 40 | │  ├─ 20_LULC_RF.ipynb
 41 | │  ├─ 30_spill_hotspots.ipynb
 42 | │  ├─ 40_train_xgboost.ipynb
 43 | │  └─ 50_shap_interpretation.ipynb
 44 | ├─ src/
 45 | │  ├─ __init__.py
 46 | │  ├─ utils/
 47 | │  │  ├─ io.py
 48 | │  │  ├─ geoutils.py
 49 | │  │  └─ viz.py
 50 | │  ├─ data/
 51 | │  │  ├─ download_sat.py
 52 | │  │  ├─ download_spills.py
 53 | │  │  ├─ build_dem.py
 54 | │  │  └─ tiles.py
 55 | │  ├─ features/
 56 | │  │  ├─ indices.py         # NDVI/RENDVI/ΔNDVI
 57 | │  │  ├─ lulc.py            # RF classification
 58 | │  │  └─ sensitivity.py     # ESI + elevation features
 59 | │  ├─ modeling/
 60 | │  │  ├─ dataset.py         # tabular feature assembly
 61 | │  │  ├─ train_gbdt.py      # XGBoost training + CV + metrics
 62 | │  │  ├─ predict.py
 63 | │  │  └─ shap_report.py
 64 | │  └─ spill/
 65 | │     ├─ kde.py             # kernel density for hotspots
 66 | │     └─ clusters.py        # k‑means severity clusters
 67 | ├─ models/
 68 | │  ├─ artifacts/            # (gitignored) trained models
 69 | │  └─ reports/              # auto‑generated metrics/plots
 70 | └─ figures/                 # key PNG/SVG figures for README & papers
 71 | ```
 72 | 
 73 | > **Tip:** clone as a template, then replace AOI and parameters in `configs/params.yaml`.
 74 | 
 75 | ---
 76 | 
 77 | ## 🚀 Quickstart
 78 | 
 79 | ### 1) Create the environment
 80 | 
 81 | ```bash
 82 | # conda (recommended)
 83 | conda env create -f environment.yml
 84 | conda activate mangrove-ml
 85 | 
 86 | # OR pip
 87 | python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
 88 | pip install -r requirements.txt
 89 | pre-commit install
 90 | ```
 91 | 
 92 | ### 2) Configure the study area & parameters
 93 | 
 94 | - Put your Area of Interest polygon in `configs/study_area.geojson` (WGS84).
 95 | - Edit `configs/params.yaml` to set:
 96 |   - time windows (e.g., 2020 vs 2024),
 97 |   - sensors (Sentinel‑2, Landsat‑8),
 98 |   - DEM source (SRTM 30 m),
 99 |   - NOSDRA oil‑spill API query filters,
100 |   - ML hyperparameters (XGBoost/GBDT),
101 |   - output tiling size & CRS (UTM 32N for Rivers State).
102 | 
103 | ### 3) Pull data (satellite, DEM, spills)
104 | 
105 | ```bash
106 | # sentinel-2 / landsat-8 / SRTM using open APIs or GEE exports
107 | python -m src.data.download_sat --cfg configs/params.yaml
108 | 
109 | # NOSDRA spills (2023–2025) JSON → GeoPackage
110 | python -m src.data.download_spills --cfg configs/params.yaml
111 | 
112 | # Build DEM stack
113 | python -m src.data.build_dem --cfg configs/params.yaml
114 | ```
115 | 
116 | ### 4) Build features
117 | 
118 | ```bash
119 | # NDVI, RENDVI, ΔNDVI, ΔRENDVI rasters
120 | python -m src.features.indices --cfg configs/params.yaml
121 | 
122 | # LULC (Random Forest) and change detection
123 | python -m src.features.lulc --cfg configs/params.yaml
124 | 
125 | # KDE spill intensity, severity clusters, ESI weights, elevation bands
126 | python -m src.spill.kde --cfg configs/params.yaml
127 | python -m src.spill.clusters --cfg configs/params.yaml
128 | python -m src.features.sensitivity --cfg configs/params.yaml
129 | ```
130 | 
131 | ### 5) Train model & explain
132 | 
133 | ```bash
134 | # Assemble tabular dataset from raster stacks + vector overlays
135 | python -m src.modeling.dataset --cfg configs/params.yaml
136 | 
137 | # Train Gradient Boosted Decision Trees (XGBoost)
138 | python -m src.modeling.train_gbdt --cfg configs/params.yaml
139 | 
140 | # Generate probability maps + class maps (risk tiers)
141 | python -m src.modeling.predict --cfg configs/params.yaml
142 | 
143 | # SHAP feature attribution report
144 | python -m src.modeling.shap_report --cfg configs/params.yaml
145 | ```
146 | 
147 | ### 6) Reproduce figures & paper tables
148 | 
149 | Jupyter notebooks in `notebooks/` regenerate all exploratory charts and publication figures. Final plots are saved to `figures/`.
150 | 
151 | ---
152 | 
153 | ## 🧠 Scientific Overview
154 | 
155 | **Goal.** Predict and map mangrove degradation risk driven by oil spills in the Niger Delta by integrating multi‑temporal satellite indices (NDVI, RENDVI), LULC change, DEM‑based elevation, Environmental Sensitivity Index (ESI), and oil‑spill hotspot metrics into an explainable ML model (GBDT/XGBoost).
156 | 
157 | **Core signals.**
158 | - **ΔNDVI/ΔRENDVI:** vegetation stress and canopy loss between two epochs.
159 | - **KDE spill intensity & k‑means severity:** spatial pressure of spills.
160 | - **Elevation bands (e.g., <5 m):** low‑lying retention areas.
161 | - **ESI class (e.g., 10b mangroves):** intrinsic ecological vulnerability.
162 | - **LULC transitions:** e.g., flooded vegetation → bare/built.
163 | 
164 | **Model.** Gradient‑boosted trees with spatially stratified CV, tuned via `configs/params.yaml`. Model cards and SHAP explanations are auto‑exported to `models/reports/`.
165 | 
166 | ---
167 | 
168 | ## 🔧 Key Configuration (`configs/params.yaml`)
169 | 
170 | ```yaml
171 | project:
172 |   name: predictive-mangrove-degradation
173 |   crs: "EPSG:32632"         # UTM 32N
174 |   tile_size: 512
175 |   aoi_file: configs/study_area.geojson
176 |   out_dir: data/processed
177 | 
178 | satellite:
179 |   sensors: ["sentinel2", "landsat8"]
180 |   s2:
181 |     level: L2A
182 |     date_start: "2020-01-01"
183 |     date_end:   "2020-03-31"  # dry season example
184 |     date_start_2: "2024-01-01"
185 |     date_end_2:   "2024-03-31"
186 |     cloud_pct: 20
187 |   l8:
188 |     date_start: "2020-01-01"
189 |     date_end:   "2020-12-31"
190 | 
191 | dem:
192 |   source: "SRTM_30m"
193 |   bands: ["elevation", "slope"]
194 |   elevation_bins: [0,5,10,100]
195 | 
196 | spills:
197 |   source: "NOSDRA"
198 |   years: [2023,2024,2025]
199 |   min_volume_bbl: 1
200 |   kde_bandwidth_m: 2500
201 |   clusters_k: 5
202 | 
203 | esi:
204 |   source_file: data/raw/ESI/esi_nigeria.gpkg
205 |   class_weights:
206 |     "10b": 3
207 |     "10a": 2
208 |     "9b": 2
209 |     default: 1
210 | 
211 | lulc:
212 |   classes: [water, trees, rangeland, flooded_veg, built, bare]
213 |   rf_trees: 100
214 | 
215 | model:
216 |   algo: xgboost
217 |   test_size: 0.2
218 |   cv_folds: 5
219 |   params:
220 |     n_estimators: 400
221 |     learning_rate: 0.05
222 |     max_depth: 6
223 |     subsample: 0.8
224 |     colsample_bytree: 0.8
225 |     reg_lambda: 1.0
226 | 
227 | outputs:
228 |   prob_threshold: 0.5
229 |   risk_bins: [0.2, 0.4, 0.6, 0.8]
230 | ```
231 | 
232 | ---
233 | 
234 | ## 🧩 Example Scripts
235 | 
236 | ### `src/features/indices.py`
237 | 
238 | ```python
239 | import argparse, yaml, numpy as np, rasterio
240 | from rasterio.enums import Resampling
241 | from pathlib import Path
242 | 
243 | # Minimal NDVI/RENDVI builder from pre-downloaded surface reflectance stacks
244 | # Expects aligned rasters: NIR, RED, RE1 (705 nm), RE2 (740 nm)
245 | 
246 | 
247 | def ndvi(nir, red):
248 |     return (nir - red) / (nir + red + 1e-6)
249 | 
250 | 
251 | def rendvi(re, nir):
252 |     return (nir - re) / (nir + re + 1e-6)
253 | 
254 | 
255 | def write_like(src_path, out_path, arr):
256 |     with rasterio.open(src_path) as src:
257 |         profile = src.profile
258 |     profile.update(dtype=rasterio.float32, count=1, compress="lzw")
259 |     with rasterio.open(out_path, "w", **profile) as dst:
260 |         dst.write(arr.astype(np.float32), 1)
261 | 
262 | 
263 | def main(cfg):
264 |     in_dir = Path(cfg["project"]["out_dir"]) / "sr_stacks"
265 |     out_dir = Path(cfg["project"]["out_dir"]) / "features"
266 |     out_dir.mkdir(parents=True, exist_ok=True)
267 | 
268 |     # Example file naming convention (adjust to your pipeline)
269 |     nir_2020 = in_dir / "nir_2020.tif"
270 |     red_2020 = in_dir / "red_2020.tif"
271 |     re1_2020 = in_dir / "re1_2020.tif"
272 | 
273 |     nir_2024 = in_dir / "nir_2024.tif"
274 |     red_2024 = in_dir / "red_2024.tif"
275 |     re1_2024 = in_dir / "re1_2024.tif"
276 | 
277 |     with rasterio.open(nir_2020) as n0, rasterio.open(red_2020) as r0:
278 |         ndvi_2020 = ndvi(n0.read(1), r0.read(1))
279 |     with rasterio.open(nir_2024) as n1, rasterio.open(red_2024) as r1:
280 |         ndvi_2024 = ndvi(n1.read(1), r1.read(1))
281 | 
282 |     write_like(nir_2020, out_dir / "ndvi_2020.tif", ndvi_2020)
283 |     write_like(nir_2024, out_dir / "ndvi_2024.tif", ndvi_2024)
284 |     write_like(nir_2024, out_dir / "d_ndvi_2020_2024.tif", ndvi_2024 - ndvi_2020)
285 | 
286 |     with rasterio.open(re1_2020) as re0:
287 |         rendvi_2020 = rendvi(re0.read(1), rasterio.open(nir_2020).read(1))
288 |     with rasterio.open(re1_2024) as re1src:
289 |         rendvi_2024 = rendvi(re1src.read(1), rasterio.open(nir_2024).read(1))
290 | 
291 |     write_like(re1_2020, out_dir / "rendvi_2020.tif", rendvi_2020)
292 |     write_like(re1_2024, out_dir / "rendvi_2024.tif", rendvi_2024)
293 |     write_like(re1_2024, out_dir / "d_rendvi_2020_2024.tif", rendvi_2024 - rendvi_2020)
294 | 
295 | 
296 | if __name__ == "__main__":
297 |     p = argparse.ArgumentParser()
298 |     p.add_argument("--cfg", required=True)
299 |     args = p.parse_args()
300 |     with open(args.cfg) as f:
301 |         cfg = yaml.safe_load(f)
302 |     main(cfg)
303 | ```
304 | 
305 | ### `src/spill/kde.py`
306 | 
307 | ```python
308 | import geopandas as gpd
309 | import numpy as np
310 | from sklearn.neighbors import KernelDensity
311 | from shapely.geometry import Point
312 | from rasterio.features import rasterize
313 | import rasterio
314 | 
315 | # Build a KDE raster of spill intensity from point events (bbl‑weighted)
316 | 
317 | 
318 | def build_kde(points_gpkg, bandwidth_m, out_raster, template_raster):
319 |     gdf = gpd.read_file(points_gpkg).to_crs("EPSG:32632")
320 |     X = np.vstack([gdf.geometry.x.values, gdf.geometry.y.values]).T
321 |     weights = gdf["volume_bbl"].values
322 | 
323 |     kde = KernelDensity(bandwidth=bandwidth_m, kernel="gaussian", metric="euclidean")
324 |     kde.fit(X, sample_weight=weights)
325 | 
326 |     with rasterio.open(template_raster) as src:
327 |         profile = src.profile
328 |         xs = np.arange(src.bounds.left, src.bounds.right, src.res[0])
329 |         ys = np.arange(src.bounds.bottom, src.bounds.top, src.res[1])
330 |     xx, yy = np.meshgrid(xs, ys)
331 |     grid = np.vstack([xx.ravel(), yy.ravel()]).T
332 |     z = np.exp(kde.score_samples(grid)).reshape(yy.shape)
333 | 
334 |     profile.update(count=1, dtype=rasterio.float32, compress="lzw")
335 |     with rasterio.open(out_raster, "w", **profile) as dst:
336 |         dst.write(z.astype("float32"), 1)
337 | ```
338 | 
339 | ### `src/modeling/train_gbdt.py`
340 | 
341 | ```python
342 | import argparse, yaml, json
343 | import numpy as np
344 | import pandas as pd
345 | from sklearn.model_selection import StratifiedKFold
346 | from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, roc_auc_score
347 | import xgboost as xgb
348 | from pathlib import Path
349 | 
350 | # Tabular dataset must include columns listed in cfg["model"]["features"]
351 | 
352 | 
353 | def kfold_train(X, y, params, cv_folds):
354 |     skf = StratifiedKFold(n_splits=cv_folds, shuffle=True, random_state=42)
355 |     metrics = []
356 |     models = []
357 |     for tr, va in skf.split(X, y):
358 |         dtr = xgb.DMatrix(X.iloc[tr], label=y.iloc[tr])
359 |         dva = xgb.DMatrix(X.iloc[va], label=y.iloc[va])
360 |         model = xgb.train(params, dtr, num_boost_round=params.get("n_estimators", 400))
361 |         p = model.predict(dva)
362 |         yhat = (p >= 0.5).astype(int)
363 |         metrics.append({
364 |             "acc": accuracy_score(y.iloc[va], yhat),
365 |             "prec": precision_score(y.iloc[va], yhat),
366 |             "rec": recall_score(y.iloc[va], yhat),
367 |             "f1": f1_score(y.iloc[va], yhat),
368 |             "auc": roc_auc_score(y.iloc[va], p),
369 |         })
370 |         models.append(model)
371 |     return models, pd.DataFrame(metrics)
372 | 
373 | 
374 | def main(cfg):
375 |     df = pd.read_parquet("data/processed/training_dataset.parquet")
376 |     features = cfg["model"].get("features", [
377 |         "d_ndvi", "d_rendvi", "spill_kde", "elevation_bin", "esi_weight", "lulc_change"
378 |     ])
379 |     X = df[features]
380 |     y = df["label_degraded"].astype(int)
381 | 
382 |     params = {
383 |         "objective": "binary:logistic",
384 |         "eval_metric": "logloss",
385 |         "eta": cfg["model"]["params"]["learning_rate"],
386 |         "max_depth": cfg["model"]["params"]["max_depth"],
387 |         "subsample": cfg["model"]["params"]["subsample"],
388 |         "colsample_bytree": cfg["model"]["params"]["colsample_bytree"],
389 |         "lambda": cfg["model"]["params"]["reg_lambda"],
390 |         "n_estimators": cfg["model"]["params"]["n_estimators"],
391 |         "verbosity": 0
392 |     }
393 | 
394 |     models, metr = kfold_train(X, y, params, cfg["model"]["cv_folds"])
395 |     outdir = Path("models/artifacts"); outdir.mkdir(parents=True, exist_ok=True)
396 |     for i, m in enumerate(models):
397 |         m.save_model(outdir / f"gbdt_fold{i}.json")
398 |     metr.to_csv("models/reports/cv_metrics.csv", index=False)
399 | 
400 |     print(metr.describe())
401 | 
402 | 
403 | if __name__ == "__main__":
404 |     p = argparse.ArgumentParser()
405 |     p.add_argument("--cfg", required=True)
406 |     args = p.parse_args()
407 |     with open(args.cfg) as f:
408 |         cfg = yaml.safe_load(f)
409 |     main(cfg)
410 | ```
411 | 
412 | ---
413 | 
414 | ## 🧪 Testing & CI
415 | 
416 | - Run lint & tests locally:
417 | 
418 | ```bash
419 | pre-commit run --all-files
420 | pytest -q
421 | ```
422 | 
423 | - CI (`.github/workflows/ci.yml`) runs pre‑commit and tests on pushes and PRs.
424 | 
425 | ---
426 | 
427 | ## 🗃️ Data Sources (placeholders)
428 | 
429 | - **Sentinel‑2 L2A** (Copernicus Open Access Hub)
430 | - **Landsat‑8 SR** (USGS EarthExplorer)
431 | - **NOSDRA Oil Spill Monitor** (API/CSV exports)
432 | - **SRTM 30 m DEM** (USGS)
433 | - **Environmental Sensitivity Index (ESI)** shapefiles (NOAA / national agency)
434 | - **Global Mangrove Watch (GMW)** for baseline mangrove extent
435 | 
436 | > See `docs/data_sources.md` for exact links and access tips.
437 | 
438 | ---
439 | 
440 | ## 📊 Outputs
441 | 
442 | - **Degradation probability raster** (GeoTIFF)
443 | - **Risk tiers** (very low → very high) vectorized for planning
444 | - **Hotspot maps** (KDE & clusters)
445 | - **Model card** with CV metrics + **SHAP** feature attributions
446 | - **Change maps**: LULC, ΔNDVI/ΔRENDVI, and overlays with ESI/elevation
447 | 
448 | ---
449 | 
450 | ## 🧩 Reproducibility & Data Governance
451 | 
452 | - All parameters tracked in `configs/params.yaml`.
453 | - Optional data versioning with **DVC** (see `dvc.yaml`).
454 | - Scripts idempotent; safe to re‑run when inputs update.
455 | - Include `docs/governance.md` to describe data licenses, consent, and ethical use.
456 | 
457 | ---
458 | 
459 | ## 🙌 Contributing
460 | 
461 | 1. Fork & branch (`feat/…`, `fix/…`).
462 | 2. Run `pre-commit` hooks; add tests.
463 | 3. Open a PR with a clear description and screenshots of maps where relevant.
464 | 
465 | See `CONTRIBUTING.md` and `CODE_OF_CONDUCT.md` for details.
466 | 
467 | ---
468 | 
469 | ## 📜 License
470 | 
471 | MIT (see `LICENSE`).
472 | 
473 | ---
474 | 
475 | ## ✍️ Citation
476 | 
477 | Use the generated `CITATION.cff` or cite the repository as:
478 | 
479 | > YourName et al. (2025). *Predictive Mapping of Oil Spill‑Induced Mangrove Degradation in Nigeria*. GitHub repository. https://github.com/Akajiaku11
480 | 
481 | ---
482 | 
483 | ## 🔖 File: `README.md`
484 | 
485 | (This README content mirrors the sections above, adapted for GitHub formatting with badges.)
486 | 
487 | ---
488 | 
489 | ## 🔖 File: `requirements.txt`
490 | 
491 | ```
492 | geopandas
493 | rasterio
494 | rioxarray
495 | xarray
496 | numpy
497 | pandas
498 | scikit-learn
499 | xgboost
500 | shap
501 | pyyaml
502 | pyproj
503 | tqdm
504 | matplotlib
505 | seaborn
506 | contextily
507 | requests
508 | joblib
509 | jupyter
510 | ```
511 | 
512 | ---
513 | 
514 | ## 🔖 File: `environment.yml`
515 | 
516 | ```yaml
517 | name: mangrove-ml
518 | channels: [conda-forge]
519 | dependencies:
520 |   - python=3.11
521 |   - geopandas
522 |   - rasterio
523 |   - rioxarray
524 |   - xarray
525 |   - numpy
526 |   - pandas
527 |   - scikit-learn
528 |   - xgboost
529 |   - shap
530 |   - pyyaml
531 |   - pyproj
532 |   - tqdm
533 |   - matplotlib
534 |   - contextily
535 |   - requests
536 |   - joblib
537 |   - jupyterlab
538 |   - pip
539 |   - pip:
540 |       - pre-commit
541 |       - pytest
542 | ```
543 | 
544 | ---
545 | 
546 | ## 🔖 File: `.gitignore`
547 | 
548 | ```
549 | # Python
550 | __pycache__/
551 | *.pyc
552 | .venv/
553 | 
554 | # Jupyter
555 | .ipynb_checkpoints/
556 | 
557 | # Data & models
558 | /data/raw/
559 | /data/interim/
560 | /data/processed/
561 | /models/artifacts/
562 | /models/reports/*.tmp
563 | 
564 | # DVC
565 | /.dvc/
566 | .dvc/
567 | *.dvc
568 | 
569 | # OS
570 | .DS_Store
571 | Thumbs.db
572 | ```
573 | 
574 | ---
575 | 
576 | ## 🔖 File: `.github/workflows/ci.yml`
577 | 
578 | ```yaml
579 | name: CI
580 | on: [push, pull_request]
581 | jobs:
582 |   build:
583 |     runs-on: ubuntu-latest
584 |     steps:
585 |       - uses: actions/checkout@v4
586 |       - uses: actions/setup-python@v5
587 |         with:
588 |           python-version: '3.11'
589 |       - name: Install deps
590 |         run: |
591 |           python -m pip install --upgrade pip
592 |           pip install -r requirements.txt
593 |           pip install pre-commit pytest
594 |       - name: Pre-commit
595 |         run: pre-commit run --all-files
596 |       - name: Tests
597 |         run: pytest -q
598 | ```
599 | 
600 | ---
601 | 
602 | ## 🔖 File: `CITATION.cff`
603 | 
604 | ```yaml
605 | cff-version: 1.2.0
606 | title: Predictive Mapping of Oil Spill-Induced Mangrove Degradation in Nigeria
607 | authors:
608 |   - family-names: Eteh
609 |     given-names: Desmond Rowland
610 |   - family-names: Akajiaku
611 |     given-names: Ugochukwu Charles
612 |   - name: Contributors
613 | version: 1.0.0
614 | license: MIT
615 | date-released: 2025-10-14
616 | repository-code: https://github.com/Akajiaku11/Predictive-Mangrove-Degradation
617 | ```
618 | 
619 | ---
620 | 
621 | ## 🔖 File: `LICENSE`
622 | 
623 | MIT License (insert standard boilerplate with your name and year).
624 | 
625 | ---
626 | 
627 | ## 🔖 File: `docs/methodology.md`
628 | 
629 | - End‑to‑end narrative of preprocessing, indices, LULC, KDE/Clusters, ESI/elevation integration, ML training, spatial CV, and SHAP explanation.
630 | - Include equations for NDVI, RENDVI, KDE, k‑means objective, and GBDT loss.
631 | 
632 | ---
633 | 
634 | ## 🔖 File: `docs/data_sources.md`
635 | 
636 | - How to access Copernicus (Sentinel‑2), USGS (Landsat & SRTM), NOSDRA spill data, ESI shapefiles, and GMW.
637 | - Data licenses and acceptable‑use notes.
638 | 
639 | ---
640 | 
641 | ## 🔖 File: `docs/model_report.md`
642 | 
643 | - Auto‑filled by training step with CV table (accuracy/precision/recall/F1/AUC), confusion matrix, PR/ROC curves, and SHAP bar/summary plots.
644 | 
645 | ---
646 | 
647 | ## 🔖 File: `docs/governance.md`
648 | 
649 | - Ethical use, environmental justice considerations, and guidance for communicating uncertainty.
650 | 
651 | ---
652 | 
653 | ## 🔖 File: `Makefile`
654 | 
655 | ```make
656 | .PHONY: all data features train predict report
657 | 
658 | all: data features train predict report
659 | 
660 | init:
661 | 	pre-commit install
662 | 
663 | data:
664 | 	python -m src.data.download_sat --cfg configs/params.yaml
665 | 	python -m src.data.download_spills --cfg configs/params.yaml
666 | 	python -m src.data.build_dem --cfg configs/params.yaml
667 | 
668 | features:
669 | 	python -m src.features.indices --cfg configs/params.yaml
670 | 	python -m src.features.lulc --cfg configs/params.yaml
671 | 	python -m src.spill.kde --cfg configs/params.yaml
672 | 	python -m src.spill.clusters --cfg configs/params.yaml
673 | 	python -m src.features.sensitivity --cfg configs/params.yaml
674 | 
675 | train:
676 | 	python -m src.modeling.dataset --cfg configs/params.yaml
677 | 	python -m src.modeling.train_gbdt --cfg configs/params.yaml
678 | 
679 | predict:
680 | 	python -m src.modeling.predict --cfg configs/params.yaml
681 | 
682 | report:
683 | 	python -m src.modeling.shap_report --cfg configs/params.yaml
684 | ```
685 | 
686 | ---
687 | 
688 | ## 🧭 Next Steps for Your GitHub
689 | 
690 | 1. Create a new repo under **`github.com/Akajiaku11`** named `Predictive-Mangrove-Degradation`.
691 | 2. Copy this scaffold into the repo, commit, and push.
692 | 3. Add AOI & parameters, then run `make all` to generate first results.
693 | 4. Upload key maps in `figures/` and publish a concise `docs/model_report.md`.
694 | 5. (Optional) Turn on GitHub Pages to host an interactive map or docs.
695 | 
696 | ---
697 | 
698 | ## ✉️ Contact & Acknowledgements
699 | 
700 | - Open an issue for questions/feature requests.
701 | - Acknowledge Rivers State, NOSDRA, and open‑data providers in publications.
702 | 
703 | ---
704 | 
705 | *This template is designed to be publication‑ready and policy‑useful (SDGs 13/14/15, blue‑carbon, environmental compliance). Replace placeholders with your study‑specific details before release.*
706 | 
707 | 


--------------------------------------------------------------------------------