├── output.png
├── output (1).png
├── output (2).png
├── methane_classifier.joblib
├── README.md
├── file
└── methane_emissions_multispectral_demo.py


/output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Otutu11/Methane-Emissions-Detection-with-Multispectral-Sensors/HEAD/output.png


--------------------------------------------------------------------------------
/output (1).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Otutu11/Methane-Emissions-Detection-with-Multispectral-Sensors/HEAD/output (1).png


--------------------------------------------------------------------------------
/output (2).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Otutu11/Methane-Emissions-Detection-with-Multispectral-Sensors/HEAD/output (2).png


--------------------------------------------------------------------------------
/methane_classifier.joblib:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Otutu11/Methane-Emissions-Detection-with-Multispectral-Sensors/HEAD/methane_classifier.joblib


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Methane Emissions Detection with Multispectral Sensors
  2 | 
  3 | This repository demonstrates how **artificial intelligence (AI)** and **multispectral remote sensing** can be applied to detect and quantify methane (CH₄) emissions.  
  4 | Using a **synthetic dataset** (simulated reflectance bands, thermal channels, and atmospheric variables), we train:
  5 | 
  6 | - A **regression model** to estimate methane column enhancement (ppm-eq).  
  7 | - A **classification model** to detect the presence of methane leaks.  
  8 | 
  9 | The pipeline integrates data generation, preprocessing, model training, evaluation, and visualization.
 10 | 
 11 | ---
 12 | 
 13 | ## 🚀 Features
 14 | - Synthetic **multispectral dataset** (>1,500 samples) with realistic CH₄-driven spectral effects.
 15 | - Derived vegetation and spectral indices (NDVI, NDSI, NBR, Albedo).
 16 | - **Random Forest Regressor** for CH₄ enhancement prediction.
 17 | - **Gradient Boosting Classifier** for emission detection.
 18 | - Evaluation metrics: MAE, RMSE, R² (regression), Accuracy, ROC-AUC, Confusion Matrix (classification).
 19 | - Visual outputs: regression parity plots, ROC curves, feature importances, confusion matrices.
 20 | 
 21 | ---
 22 | 
 23 | ## 📂 Project Structure
 24 | .
 25 | ├── outputs/
 26 | │ ├── synthetic_methane_multispectral.csv
 27 | │ ├── methane_regressor.joblib
 28 | │ ├── methane_classifier.joblib
 29 | │ ├── regression_parity.png
 30 | │ ├── regression_feature_importance.png
 31 | │ ├── classification_roc.png
 32 | │ ├── classification_confusion.png
 33 | │ ├── classification_feature_importance.png
 34 | │ ├── classification_report.txt
 35 | │ ├── cv_results.txt
 36 | │ └── README_Methane_Emissions_Demo.txt
 37 | ├── methane_emissions_multispectral_demo.py
 38 | └── README.md
 39 | 
 40 | yaml
 41 | Copy code
 42 | 
 43 | ---
 44 | 
 45 | ## 📊 Dataset
 46 | The dataset is synthetically generated to mimic real-world methane absorption and scene variability.  
 47 | It includes:
 48 | 
 49 | - **Bands**: Blue, Green, Red, NIR, SWIR1, SWIR2, TIR  
 50 | - **Scene Variables**: Surface pressure, humidity, wind speed, solar zenith  
 51 | - **Indices**: NDVI, NDSI, NBR, Albedo  
 52 | - **Targets**:  
 53 |   - `ch4_ppm_eq`: methane column enhancement (continuous, ppm-eq)  
 54 |   - `emission_label`: binary indicator (1 = emission present, 0 = none)  
 55 | 
 56 | ---
 57 | 
 58 | ## 🧪 Methodology
 59 | 1. **Synthetic Data Generation**  
 60 |    - Simulates spectral absorption in SWIR bands due to methane plumes.  
 61 |    - Includes meteorological drivers (wind, temperature) affecting detection.  
 62 | 
 63 | 2. **Feature Engineering**  
 64 |    - Computation of vegetation and reflectance indices for sensitivity.  
 65 | 
 66 | 3. **Model Training**  
 67 |    - Regression → RandomForestRegressor.  
 68 |    - Classification → GradientBoostingClassifier.  
 69 | 
 70 | 4. **Evaluation**  
 71 |    - Regression: MAE, RMSE, R² + Parity plots.  
 72 |    - Classification: Accuracy, ROC-AUC, Confusion Matrix, Feature Importances.  
 73 | 
 74 | ---
 75 | 
 76 | ## ⚙️ Installation
 77 | ```bash
 78 | git clone https://github.com/yourusername/Methane-Emissions-Detection-with-Multispectral-Sensors.git
 79 | cd Methane-Emissions-Detection-with-Multispectral-Sensors
 80 | pip install -r requirements.txt
 81 | requirements.txt
 82 | 
 83 | nginx
 84 | Copy code
 85 | numpy
 86 | pandas
 87 | scikit-learn
 88 | matplotlib
 89 | joblib
 90 | ▶️ Usage
 91 | Run the demo script:
 92 | 
 93 | bash
 94 | Copy code
 95 | python methane_emissions_multispectral_demo.py
 96 | This will generate:
 97 | 
 98 | A synthetic dataset (outputs/synthetic_methane_multispectral.csv)
 99 | 
100 | Trained models (.joblib)
101 | 
102 | Evaluation plots + reports in the outputs/ folder
103 | 
104 | Quick Inference
105 | python
106 | Copy code
107 | from joblib import load
108 | import pandas as pd
109 | 
110 | # Load dataset
111 | df = pd.read_csv("outputs/synthetic_methane_multispectral.csv")
112 | X = df[['blue','green','red','nir','swir1','swir2','tir_bt','surface_pressure',
113 |         'humidity','wind_speed','solar_zenith','ndvi','ndsi','nbr','albedo']].values
114 | 
115 | # Load models
116 | reg = load("outputs/methane_regressor.joblib")
117 | clf = load("outputs/methane_classifier.joblib")
118 | 
119 | # Predict
120 | y_ch4 = reg.predict(X)             # ppm-eq methane estimate
121 | p_leak = clf.predict_proba(X)[:,1] # probability of leak
122 | 📈 Results (Synthetic Example)
123 | Regression
124 | 
125 | MAE = ~0.40 ppm-eq
126 | 
127 | RMSE = ~0.57 ppm-eq
128 | 
129 | R² = ~0.85
130 | 
131 | Classification
132 | 
133 | Accuracy = ~89%
134 | 
135 | ROC-AUC = ~0.93
136 | 
137 | Cross-validated ROC-AUC = ~0.92 ± 0.01
138 | 
139 | 🌍 Applications
140 | Methane leak detection in oil & gas facilities.
141 | 
142 | Greenhouse gas monitoring from airborne/satellite sensors.
143 | 
144 | Climate change mitigation & compliance monitoring.
145 | 
146 | 📚 References
147 | Cusworth, D. H., et al. (2021). "A review of satellite remote sensing for methane emissions detection." Atmospheric Environment.
148 | 
149 | Thorpe, A. K., et al. (2017). "Mapping methane concentrations from airborne remote sensing." Remote Sensing of Environment.
150 | 
151 | Jongaramrungruang, S., et al. (2019). "Towards accurate methane detection using hyperspectral imaging." Atmospheric Measurement Techniques.
152 | 
153 | 📝 License
154 | This project is released under the MIT License.
155 | 
156 | Author: Anslem Otutu
157 | Github: @Otutu11
158 | 


--------------------------------------------------------------------------------
/file:
--------------------------------------------------------------------------------
  1 | # file: methane_emissions_multispectral_demo.py
  2 | # Purpose: End-to-end synthetic demo for "Methane-Emissions-Detection-with-Multispectral-Sensors"
  3 | # Author: You :)
  4 | # Run: python methane_emissions_multispectral_demo.py
  5 | 
  6 | import os
  7 | import numpy as np
  8 | import pandas as pd
  9 | import matplotlib.pyplot as plt
 10 | 
 11 | from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
 12 | from sklearn.ensemble import RandomForestRegressor, GradientBoostingClassifier, RandomForestClassifier
 13 | from sklearn.metrics import (
 14 |     mean_absolute_error, mean_squared_error, r2_score,
 15 |     roc_auc_score, roc_curve, confusion_matrix, classification_report
 16 | )
 17 | from joblib import dump
 18 | 
 19 | # -----------------------------
 20 | # 0) Setup
 21 | # -----------------------------
 22 | np.random.seed(42)
 23 | OUTDIR = "outputs"
 24 | os.makedirs(OUTDIR, exist_ok=True)
 25 | 
 26 | # -----------------------------
 27 | # 1) Generate synthetic data (>100)
 28 | # -----------------------------
 29 | N = 1500  # plenty more than 100
 30 | 
 31 | # Multispectral reflectances (0..1)
 32 | blue  = np.clip(np.random.normal(0.12, 0.03, N), 0, 1)
 33 | green = np.clip(np.random.normal(0.18, 0.04, N), 0, 1)
 34 | red   = np.clip(np.random.normal(0.20, 0.05, N), 0, 1)
 35 | nir   = np.clip(np.random.normal(0.40, 0.08, N), 0, 1)
 36 | swir1 = np.clip(np.random.normal(0.26, 0.06, N), 0, 1)   # ~1.6 μm
 37 | swir2 = np.clip(np.random.normal(0.22, 0.06, N), 0, 1)   # ~2.2 μm (CH4-sensitive proxy)
 38 | 
 39 | # Thermal + scene/atmos
 40 | tir_bt           = np.clip(np.random.normal(300, 6,  N), 250, 330)  # K
 41 | surface_pressure = np.random.normal(1013, 8, N)                     # hPa
 42 | humidity         = np.clip(np.random.normal(0.55, 0.15, N), 0, 1)   # 0..1
 43 | wind_speed       = np.clip(np.random.gamma(2.0, 1.5, N), 0, 20)     # m/s
 44 | solar_zenith     = np.clip(np.random.normal(35, 12, N), 0, 80)      # deg
 45 | 
 46 | # Leak presence and intensity
 47 | leak = np.random.binomial(1, 0.35, N)  # 35% of scenes contain a leak
 48 | base_ch4 = np.random.normal(2.0, 0.3, N)  # ppm-eq background enhancement
 49 | leak_strength = leak * np.random.gamma(shape=2.0, scale=1.2, size=N)  # emission intensity
 50 | 
 51 | # Spectral effects of methane (lower SWIR reflectance with more CH4)
 52 | swir2_effect = -0.25 * leak_strength + np.random.normal(0, 0.02, N)
 53 | swir2_obs = np.clip(swir2 + swir2_effect, 0, 1)
 54 | 
 55 | swir1_effect = -0.08 * leak_strength + np.random.normal(0, 0.015, N)
 56 | swir1_obs = np.clip(swir1 + swir1_effect, 0, 1)
 57 | 
 58 | # Continuous target: methane column enhancement (ppm-eq)
 59 | ch4_ppm_eq = (
 60 |     base_ch4
 61 |     + 0.9 * leak_strength
 62 |     + 0.02 * (tir_bt - 295)        # warmer scenes -> more buoyant plumes
 63 |     - 0.05 * (wind_speed - 4)      # high wind disperses signal
 64 |     + 0.3  * (0.25 - swir2_obs)    # optical absorption cue
 65 |     + np.random.normal(0, 0.15, N)
 66 | )
 67 | 
 68 | # Binary detection label: top 40% CH4 enhancement = emission present
 69 | thresh = np.percentile(ch4_ppm_eq, 60)
 70 | emission_label = (ch4_ppm_eq >= thresh).astype(int)
 71 | 
 72 | # Derived indices
 73 | ndvi   = (nir - red) / (nir + red + 1e-6)
 74 | ndsi   = (swir1_obs - swir2_obs) / (swir1_obs + swir2_obs + 1e-6)   # SWIR ratio
 75 | nbr    = (nir - swir2_obs) / (nir + swir2_obs + 1e-6)
 76 | albedo = 0.1*blue + 0.1*green + 0.2*red + 0.3*nir + 0.15*swir1_obs + 0.15*swir2_obs
 77 | 
 78 | df = pd.DataFrame({
 79 |     "blue": blue, "green": green, "red": red, "nir": nir,
 80 |     "swir1": swir1_obs, "swir2": swir2_obs, "tir_bt": tir_bt,
 81 |     "surface_pressure": surface_pressure, "humidity": humidity,
 82 |     "wind_speed": wind_speed, "solar_zenith": solar_zenith,
 83 |     "ndvi": ndvi, "ndsi": ndsi, "nbr": nbr, "albedo": albedo,
 84 |     "ch4_ppm_eq": ch4_ppm_eq, "emission_label": emission_label
 85 | })
 86 | 
 87 | csv_path = os.path.join(OUTDIR, "synthetic_methane_multispectral.csv")
 88 | df.to_csv(csv_path, index=False)
 89 | 
 90 | # -----------------------------
 91 | # 2) Split
 92 | # -----------------------------
 93 | features = [
 94 |     "blue","green","red","nir","swir1","swir2","tir_bt",
 95 |     "surface_pressure","humidity","wind_speed","solar_zenith",
 96 |     "ndvi","ndsi","nbr","albedo"
 97 | ]
 98 | X = df[features].values
 99 | y_reg = df["ch4_ppm_eq"].values
100 | y_cls = df["emission_label"].values
101 | 
102 | Xtr_r, Xte_r, ytr_r, yte_r = train_test_split(X, y_reg, test_size=0.2, random_state=42)
103 | Xtr_c, Xte_c, ytr_c, yte_c = train_test_split(X, y_cls, test_size=0.2, random_state=42, stratify=y_cls)
104 | 
105 | # -----------------------------
106 | # 3) Regression model
107 | # -----------------------------
108 | reg = RandomForestRegressor(n_estimators=400, random_state=42, n_jobs=-1)
109 | reg.fit(Xtr_r, ytr_r)
110 | yp_r = reg.predict(Xte_r)
111 | 
112 | mae  = mean_absolute_error(yte_r, yp_r)
113 | rmse = mean_squared_error(yte_r, yp_r, squared=False)
114 | r2   = r2_score(yte_r, yp_r)
115 | 
116 | # Save parity plot
117 | plt.figure()
118 | plt.scatter(yte_r, yp_r, alpha=0.6)
119 | mn = min(yte_r.min(), yp_r.min()); mx = max(yte_r.max(), yp_r.max())
120 | plt.plot([mn, mx], [mn, mx])
121 | plt.xlabel("True CH4 enhancement (ppm-eq)")
122 | plt.ylabel("Predicted CH4 enhancement (ppm-eq)")
123 | plt.title(f"Regression Parity | MAE={mae:.3f} RMSE={rmse:.3f} R2={r2:.3f}")
124 | plt.tight_layout()
125 | plt.savefig(os.path.join(OUTDIR, "regression_parity.png")); plt.close()
126 | 
127 | # Feature importance (regression)
128 | imp_r = pd.Series(reg.feature_importances_, index=features).sort_values()
129 | plt.figure()
130 | plt.barh(imp_r.index, imp_r.values)
131 | plt.xlabel("Importance")
132 | plt.title("Regression Feature Importances")
133 | plt.tight_layout()
134 | plt.savefig(os.path.join(OUTDIR, "regression_feature_importance.png")); plt.close()
135 | 
136 | # -----------------------------
137 | # 4) Classification model
138 | # -----------------------------
139 | clf = GradientBoostingClassifier(random_state=42)
140 | clf.fit(Xtr_c, ytr_c)
141 | proba = clf.predict_proba(Xte_c)[:, 1]
142 | yp_c  = (proba >= 0.5).astype(int)
143 | 
144 | acc = (yp_c == yte_c).mean()
145 | auc = roc_auc_score(yte_c, proba)
146 | 
147 | # ROC
148 | fpr, tpr, thr = roc_curve(yte_c, proba)
149 | plt.figure()
150 | plt.plot(fpr, tpr); plt.plot([0,1],[0,1])
151 | plt.xlabel("False Positive Rate"); plt.ylabel("True Positive Rate")
152 | plt.title(f"ROC Curve (AUC={auc:.3f})")
153 | plt.tight_layout()
154 | plt.savefig(os.path.join(OUTDIR, "classification_roc.png")); plt.close()
155 | 
156 | # Confusion Matrix
157 | cm = confusion_matrix(yte_c, yp_c)
158 | plt.figure()
159 | im = plt.imshow(cm, interpolation="nearest")
160 | plt.title("Confusion Matrix")
161 | plt.xlabel("Predicted"); plt.ylabel("True")
162 | plt.xticks([0,1], ["No Emission","Emission"])
163 | plt.yticks([0,1], ["No Emission","Emission"])
164 | for (i,j), v in np.ndenumerate(cm):
165 |     plt.text(j, i, int(v), ha="center", va="center")
166 | plt.colorbar(im, fraction=0.046, pad=0.04)
167 | plt.tight_layout()
168 | plt.savefig(os.path.join(OUTDIR, "classification_confusion.png")); plt.close()
169 | 
170 | # Report
171 | report = classification_report(yte_c, yp_c, target_names=["No Emission","Emission"])
172 | with open(os.path.join(OUTDIR, "classification_report.txt"), "w") as f:
173 |     f.write(report)
174 | 
175 | # Surrogate RF for feature importance (interpretability)
176 | rf_sur = RandomForestClassifier(n_estimators=400, random_state=42, n_jobs=-1)
177 | rf_sur.fit(Xtr_c, ytr_c)
178 | imp_c = pd.Series(rf_sur.feature_importances_, index=features).sort_values()
179 | plt.figure()
180 | plt.barh(imp_c.index, imp_c.values)
181 | plt.xlabel("Importance")
182 | plt.title("Classification Feature Importances (RF surrogate)")
183 | plt.tight_layout()
184 | plt.savefig(os.path.join(OUTDIR, "classification_feature_importance.png")); plt.close()
185 | 
186 | # Cross-validation (AUC)
187 | cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
188 | cv_auc = cross_val_score(clf, X, y_cls, cv=cv, scoring="roc_auc")
189 | with open(os.path.join(OUTDIR, "cv_results.txt"), "w") as f:
190 |     f.write(f"ROC-AUC (5-fold): mean={cv_auc.mean():.3f}, std={cv_auc.std():.3f}\n")
191 |     f.write(f"Fold scores: {np.round(cv_auc, 3)}\n")
192 | 
193 | # -----------------------------
194 | # 5) Save artifacts
195 | # -----------------------------
196 | dump(reg, os.path.join(OUTDIR, "methane_regressor.joblib"))
197 | dump(clf, os.path.join(OUTDIR, "methane_classifier.joblib"))
198 | 
199 | with open(os.path.join(OUTDIR, "README_Methane_Emissions_Demo.txt"), "w") as f:
200 |     f.write(
201 |         "Methane-Emissions-Detection-with-Multispectral-Sensors (Synthetic Demo)\n\n"
202 |         "Files:\n"
203 |         "- synthetic_methane_multispectral.csv : synthetic dataset\n"
204 |         "- methane_regressor.joblib : RandomForestRegressor for CH4 enhancement (ppm-eq)\n"
205 |         "- methane_classifier.joblib : GradientBoostingClassifier for emission detection\n"
206 |         "- regression_parity.png, regression_feature_importance.png\n"
207 |         "- classification_roc.png, classification_confusion.png, classification_feature_importance.png\n"
208 |         "- classification_report.txt, cv_results.txt\n\n"
209 |         "Quick usage:\n"
210 |         "from joblib import load\n"
211 |         "import pandas as pd\n"
212 |         "df = pd.read_csv('outputs/synthetic_methane_multispectral.csv')\n"
213 |         "X = df[['blue','green','red','nir','swir1','swir2','tir_bt','surface_pressure',"
214 |         "'humidity','wind_speed','solar_zenith','ndvi','ndsi','nbr','albedo']].values\n"
215 |         "reg = load('outputs/methane_regressor.joblib'); y_ch4 = reg.predict(X)\n"
216 |         "clf = load('outputs/methane_classifier.joblib'); p_leak = clf.predict_proba(X)[:,1]\n"
217 |     )
218 | 
219 | # -----------------------------
220 | # 6) Print summary to console
221 | # -----------------------------
222 | print("=== Regression ===")
223 | print(f"MAE={mae:.3f}  RMSE={rmse:.3f}  R2={r2:.3f}")
224 | print("\n=== Classification ===")
225 | print(f"Accuracy={acc:.3f}  ROC-AUC={auc:.3f}")
226 | print("\nClassification report:\n", report)
227 | print(f"\n5-fold ROC-AUC: mean={cv_auc.mean():.3f} ± {cv_auc.std():.3f}")
228 | print(f"\nArtifacts saved to: {OUTDIR}/")
229 | 


--------------------------------------------------------------------------------
/methane_emissions_multispectral_demo.py:
--------------------------------------------------------------------------------
  1 | # file: methane_emissions_multispectral_demo.py
  2 | # Purpose: End-to-end synthetic demo for "Methane-Emissions-Detection-with-Multispectral-Sensors"
  3 | # Author: You :)
  4 | # Run: python methane_emissions_multispectral_demo.py
  5 | 
  6 | import os
  7 | import numpy as np
  8 | import pandas as pd
  9 | import matplotlib.pyplot as plt
 10 | 
 11 | from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
 12 | from sklearn.ensemble import RandomForestRegressor, GradientBoostingClassifier, RandomForestClassifier
 13 | from sklearn.metrics import (
 14 |     mean_absolute_error, mean_squared_error, r2_score,
 15 |     roc_auc_score, roc_curve, confusion_matrix, classification_report
 16 | )
 17 | from joblib import dump
 18 | 
 19 | # -----------------------------
 20 | # 0) Setup
 21 | # -----------------------------
 22 | np.random.seed(42)
 23 | OUTDIR = "outputs"
 24 | os.makedirs(OUTDIR, exist_ok=True)
 25 | 
 26 | # -----------------------------
 27 | # 1) Generate synthetic data (>100)
 28 | # -----------------------------
 29 | N = 1500  # plenty more than 100
 30 | 
 31 | # Multispectral reflectances (0..1)
 32 | blue  = np.clip(np.random.normal(0.12, 0.03, N), 0, 1)
 33 | green = np.clip(np.random.normal(0.18, 0.04, N), 0, 1)
 34 | red   = np.clip(np.random.normal(0.20, 0.05, N), 0, 1)
 35 | nir   = np.clip(np.random.normal(0.40, 0.08, N), 0, 1)
 36 | swir1 = np.clip(np.random.normal(0.26, 0.06, N), 0, 1)   # ~1.6 μm
 37 | swir2 = np.clip(np.random.normal(0.22, 0.06, N), 0, 1)   # ~2.2 μm (CH4-sensitive proxy)
 38 | 
 39 | # Thermal + scene/atmos
 40 | tir_bt           = np.clip(np.random.normal(300, 6,  N), 250, 330)  # K
 41 | surface_pressure = np.random.normal(1013, 8, N)                     # hPa
 42 | humidity         = np.clip(np.random.normal(0.55, 0.15, N), 0, 1)   # 0..1
 43 | wind_speed       = np.clip(np.random.gamma(2.0, 1.5, N), 0, 20)     # m/s
 44 | solar_zenith     = np.clip(np.random.normal(35, 12, N), 0, 80)      # deg
 45 | 
 46 | # Leak presence and intensity
 47 | leak = np.random.binomial(1, 0.35, N)  # 35% of scenes contain a leak
 48 | base_ch4 = np.random.normal(2.0, 0.3, N)  # ppm-eq background enhancement
 49 | leak_strength = leak * np.random.gamma(shape=2.0, scale=1.2, size=N)  # emission intensity
 50 | 
 51 | # Spectral effects of methane (lower SWIR reflectance with more CH4)
 52 | swir2_effect = -0.25 * leak_strength + np.random.normal(0, 0.02, N)
 53 | swir2_obs = np.clip(swir2 + swir2_effect, 0, 1)
 54 | 
 55 | swir1_effect = -0.08 * leak_strength + np.random.normal(0, 0.015, N)
 56 | swir1_obs = np.clip(swir1 + swir1_effect, 0, 1)
 57 | 
 58 | # Continuous target: methane column enhancement (ppm-eq)
 59 | ch4_ppm_eq = (
 60 |     base_ch4
 61 |     + 0.9 * leak_strength
 62 |     + 0.02 * (tir_bt - 295)        # warmer scenes -> more buoyant plumes
 63 |     - 0.05 * (wind_speed - 4)      # high wind disperses signal
 64 |     + 0.3  * (0.25 - swir2_obs)    # optical absorption cue
 65 |     + np.random.normal(0, 0.15, N)
 66 | )
 67 | 
 68 | # Binary detection label: top 40% CH4 enhancement = emission present
 69 | thresh = np.percentile(ch4_ppm_eq, 60)
 70 | emission_label = (ch4_ppm_eq >= thresh).astype(int)
 71 | 
 72 | # Derived indices
 73 | ndvi   = (nir - red) / (nir + red + 1e-6)
 74 | ndsi   = (swir1_obs - swir2_obs) / (swir1_obs + swir2_obs + 1e-6)   # SWIR ratio
 75 | nbr    = (nir - swir2_obs) / (nir + swir2_obs + 1e-6)
 76 | albedo = 0.1*blue + 0.1*green + 0.2*red + 0.3*nir + 0.15*swir1_obs + 0.15*swir2_obs
 77 | 
 78 | df = pd.DataFrame({
 79 |     "blue": blue, "green": green, "red": red, "nir": nir,
 80 |     "swir1": swir1_obs, "swir2": swir2_obs, "tir_bt": tir_bt,
 81 |     "surface_pressure": surface_pressure, "humidity": humidity,
 82 |     "wind_speed": wind_speed, "solar_zenith": solar_zenith,
 83 |     "ndvi": ndvi, "ndsi": ndsi, "nbr": nbr, "albedo": albedo,
 84 |     "ch4_ppm_eq": ch4_ppm_eq, "emission_label": emission_label
 85 | })
 86 | 
 87 | csv_path = os.path.join(OUTDIR, "synthetic_methane_multispectral.csv")
 88 | df.to_csv(csv_path, index=False)
 89 | 
 90 | # -----------------------------
 91 | # 2) Split
 92 | # -----------------------------
 93 | features = [
 94 |     "blue","green","red","nir","swir1","swir2","tir_bt",
 95 |     "surface_pressure","humidity","wind_speed","solar_zenith",
 96 |     "ndvi","ndsi","nbr","albedo"
 97 | ]
 98 | X = df[features].values
 99 | y_reg = df["ch4_ppm_eq"].values
100 | y_cls = df["emission_label"].values
101 | 
102 | Xtr_r, Xte_r, ytr_r, yte_r = train_test_split(X, y_reg, test_size=0.2, random_state=42)
103 | Xtr_c, Xte_c, ytr_c, yte_c = train_test_split(X, y_cls, test_size=0.2, random_state=42, stratify=y_cls)
104 | 
105 | # -----------------------------
106 | # 3) Regression model
107 | # -----------------------------
108 | reg = RandomForestRegressor(n_estimators=400, random_state=42, n_jobs=-1)
109 | reg.fit(Xtr_r, ytr_r)
110 | yp_r = reg.predict(Xte_r)
111 | 
112 | mae  = mean_absolute_error(yte_r, yp_r)
113 | rmse = mean_squared_error(yte_r, yp_r, squared=False)
114 | r2   = r2_score(yte_r, yp_r)
115 | 
116 | # Save parity plot
117 | plt.figure()
118 | plt.scatter(yte_r, yp_r, alpha=0.6)
119 | mn = min(yte_r.min(), yp_r.min()); mx = max(yte_r.max(), yp_r.max())
120 | plt.plot([mn, mx], [mn, mx])
121 | plt.xlabel("True CH4 enhancement (ppm-eq)")
122 | plt.ylabel("Predicted CH4 enhancement (ppm-eq)")
123 | plt.title(f"Regression Parity | MAE={mae:.3f} RMSE={rmse:.3f} R2={r2:.3f}")
124 | plt.tight_layout()
125 | plt.savefig(os.path.join(OUTDIR, "regression_parity.png")); plt.close()
126 | 
127 | # Feature importance (regression)
128 | imp_r = pd.Series(reg.feature_importances_, index=features).sort_values()
129 | plt.figure()
130 | plt.barh(imp_r.index, imp_r.values)
131 | plt.xlabel("Importance")
132 | plt.title("Regression Feature Importances")
133 | plt.tight_layout()
134 | plt.savefig(os.path.join(OUTDIR, "regression_feature_importance.png")); plt.close()
135 | 
136 | # -----------------------------
137 | # 4) Classification model
138 | # -----------------------------
139 | clf = GradientBoostingClassifier(random_state=42)
140 | clf.fit(Xtr_c, ytr_c)
141 | proba = clf.predict_proba(Xte_c)[:, 1]
142 | yp_c  = (proba >= 0.5).astype(int)
143 | 
144 | acc = (yp_c == yte_c).mean()
145 | auc = roc_auc_score(yte_c, proba)
146 | 
147 | # ROC
148 | fpr, tpr, thr = roc_curve(yte_c, proba)
149 | plt.figure()
150 | plt.plot(fpr, tpr); plt.plot([0,1],[0,1])
151 | plt.xlabel("False Positive Rate"); plt.ylabel("True Positive Rate")
152 | plt.title(f"ROC Curve (AUC={auc:.3f})")
153 | plt.tight_layout()
154 | plt.savefig(os.path.join(OUTDIR, "classification_roc.png")); plt.close()
155 | 
156 | # Confusion Matrix
157 | cm = confusion_matrix(yte_c, yp_c)
158 | plt.figure()
159 | im = plt.imshow(cm, interpolation="nearest")
160 | plt.title("Confusion Matrix")
161 | plt.xlabel("Predicted"); plt.ylabel("True")
162 | plt.xticks([0,1], ["No Emission","Emission"])
163 | plt.yticks([0,1], ["No Emission","Emission"])
164 | for (i,j), v in np.ndenumerate(cm):
165 |     plt.text(j, i, int(v), ha="center", va="center")
166 | plt.colorbar(im, fraction=0.046, pad=0.04)
167 | plt.tight_layout()
168 | plt.savefig(os.path.join(OUTDIR, "classification_confusion.png")); plt.close()
169 | 
170 | # Report
171 | report = classification_report(yte_c, yp_c, target_names=["No Emission","Emission"])
172 | with open(os.path.join(OUTDIR, "classification_report.txt"), "w") as f:
173 |     f.write(report)
174 | 
175 | # Surrogate RF for feature importance (interpretability)
176 | rf_sur = RandomForestClassifier(n_estimators=400, random_state=42, n_jobs=-1)
177 | rf_sur.fit(Xtr_c, ytr_c)
178 | imp_c = pd.Series(rf_sur.feature_importances_, index=features).sort_values()
179 | plt.figure()
180 | plt.barh(imp_c.index, imp_c.values)
181 | plt.xlabel("Importance")
182 | plt.title("Classification Feature Importances (RF surrogate)")
183 | plt.tight_layout()
184 | plt.savefig(os.path.join(OUTDIR, "classification_feature_importance.png")); plt.close()
185 | 
186 | # Cross-validation (AUC)
187 | cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
188 | cv_auc = cross_val_score(clf, X, y_cls, cv=cv, scoring="roc_auc")
189 | with open(os.path.join(OUTDIR, "cv_results.txt"), "w") as f:
190 |     f.write(f"ROC-AUC (5-fold): mean={cv_auc.mean():.3f}, std={cv_auc.std():.3f}\n")
191 |     f.write(f"Fold scores: {np.round(cv_auc, 3)}\n")
192 | 
193 | # -----------------------------
194 | # 5) Save artifacts
195 | # -----------------------------
196 | dump(reg, os.path.join(OUTDIR, "methane_regressor.joblib"))
197 | dump(clf, os.path.join(OUTDIR, "methane_classifier.joblib"))
198 | 
199 | with open(os.path.join(OUTDIR, "README_Methane_Emissions_Demo.txt"), "w") as f:
200 |     f.write(
201 |         "Methane-Emissions-Detection-with-Multispectral-Sensors (Synthetic Demo)\n\n"
202 |         "Files:\n"
203 |         "- synthetic_methane_multispectral.csv : synthetic dataset\n"
204 |         "- methane_regressor.joblib : RandomForestRegressor for CH4 enhancement (ppm-eq)\n"
205 |         "- methane_classifier.joblib : GradientBoostingClassifier for emission detection\n"
206 |         "- regression_parity.png, regression_feature_importance.png\n"
207 |         "- classification_roc.png, classification_confusion.png, classification_feature_importance.png\n"
208 |         "- classification_report.txt, cv_results.txt\n\n"
209 |         "Quick usage:\n"
210 |         "from joblib import load\n"
211 |         "import pandas as pd\n"
212 |         "df = pd.read_csv('outputs/synthetic_methane_multispectral.csv')\n"
213 |         "X = df[['blue','green','red','nir','swir1','swir2','tir_bt','surface_pressure',"
214 |         "'humidity','wind_speed','solar_zenith','ndvi','ndsi','nbr','albedo']].values\n"
215 |         "reg = load('outputs/methane_regressor.joblib'); y_ch4 = reg.predict(X)\n"
216 |         "clf = load('outputs/methane_classifier.joblib'); p_leak = clf.predict_proba(X)[:,1]\n"
217 |     )
218 | 
219 | # -----------------------------
220 | # 6) Print summary to console
221 | # -----------------------------
222 | print("=== Regression ===")
223 | print(f"MAE={mae:.3f}  RMSE={rmse:.3f}  R2={r2:.3f}")
224 | print("\n=== Classification ===")
225 | print(f"Accuracy={acc:.3f}  ROC-AUC={auc:.3f}")
226 | print("\nClassification report:\n", report)
227 | print(f"\n5-fold ROC-AUC: mean={cv_auc.mean():.3f} ± {cv_auc.std():.3f}")
228 | print(f"\nArtifacts saved to: {OUTDIR}/")
229 | 


--------------------------------------------------------------------------------