├── output.png ├── output (1).png ├── output (2).png ├── methane_classifier.joblib ├── README.md ├── file └── methane_emissions_multispectral_demo.py /output.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Otutu11/Methane-Emissions-Detection-with-Multispectral-Sensors/HEAD/output.png -------------------------------------------------------------------------------- /output (1).png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Otutu11/Methane-Emissions-Detection-with-Multispectral-Sensors/HEAD/output (1).png -------------------------------------------------------------------------------- /output (2).png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Otutu11/Methane-Emissions-Detection-with-Multispectral-Sensors/HEAD/output (2).png -------------------------------------------------------------------------------- /methane_classifier.joblib: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Otutu11/Methane-Emissions-Detection-with-Multispectral-Sensors/HEAD/methane_classifier.joblib -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Methane Emissions Detection with Multispectral Sensors 2 | 3 | This repository demonstrates how **artificial intelligence (AI)** and **multispectral remote sensing** can be applied to detect and quantify methane (CH₄) emissions. 4 | Using a **synthetic dataset** (simulated reflectance bands, thermal channels, and atmospheric variables), we train: 5 | 6 | - A **regression model** to estimate methane column enhancement (ppm-eq). 7 | - A **classification model** to detect the presence of methane leaks. 8 | 9 | The pipeline integrates data generation, preprocessing, model training, evaluation, and visualization. 10 | 11 | --- 12 | 13 | ## 🚀 Features 14 | - Synthetic **multispectral dataset** (>1,500 samples) with realistic CH₄-driven spectral effects. 15 | - Derived vegetation and spectral indices (NDVI, NDSI, NBR, Albedo). 16 | - **Random Forest Regressor** for CH₄ enhancement prediction. 17 | - **Gradient Boosting Classifier** for emission detection. 18 | - Evaluation metrics: MAE, RMSE, R² (regression), Accuracy, ROC-AUC, Confusion Matrix (classification). 19 | - Visual outputs: regression parity plots, ROC curves, feature importances, confusion matrices. 20 | 21 | --- 22 | 23 | ## 📂 Project Structure 24 | . 25 | ├── outputs/ 26 | │ ├── synthetic_methane_multispectral.csv 27 | │ ├── methane_regressor.joblib 28 | │ ├── methane_classifier.joblib 29 | │ ├── regression_parity.png 30 | │ ├── regression_feature_importance.png 31 | │ ├── classification_roc.png 32 | │ ├── classification_confusion.png 33 | │ ├── classification_feature_importance.png 34 | │ ├── classification_report.txt 35 | │ ├── cv_results.txt 36 | │ └── README_Methane_Emissions_Demo.txt 37 | ├── methane_emissions_multispectral_demo.py 38 | └── README.md 39 | 40 | yaml 41 | Copy code 42 | 43 | --- 44 | 45 | ## 📊 Dataset 46 | The dataset is synthetically generated to mimic real-world methane absorption and scene variability. 47 | It includes: 48 | 49 | - **Bands**: Blue, Green, Red, NIR, SWIR1, SWIR2, TIR 50 | - **Scene Variables**: Surface pressure, humidity, wind speed, solar zenith 51 | - **Indices**: NDVI, NDSI, NBR, Albedo 52 | - **Targets**: 53 | - `ch4_ppm_eq`: methane column enhancement (continuous, ppm-eq) 54 | - `emission_label`: binary indicator (1 = emission present, 0 = none) 55 | 56 | --- 57 | 58 | ## 🧪 Methodology 59 | 1. **Synthetic Data Generation** 60 | - Simulates spectral absorption in SWIR bands due to methane plumes. 61 | - Includes meteorological drivers (wind, temperature) affecting detection. 62 | 63 | 2. **Feature Engineering** 64 | - Computation of vegetation and reflectance indices for sensitivity. 65 | 66 | 3. **Model Training** 67 | - Regression → RandomForestRegressor. 68 | - Classification → GradientBoostingClassifier. 69 | 70 | 4. **Evaluation** 71 | - Regression: MAE, RMSE, R² + Parity plots. 72 | - Classification: Accuracy, ROC-AUC, Confusion Matrix, Feature Importances. 73 | 74 | --- 75 | 76 | ## ⚙️ Installation 77 | ```bash 78 | git clone https://github.com/yourusername/Methane-Emissions-Detection-with-Multispectral-Sensors.git 79 | cd Methane-Emissions-Detection-with-Multispectral-Sensors 80 | pip install -r requirements.txt 81 | requirements.txt 82 | 83 | nginx 84 | Copy code 85 | numpy 86 | pandas 87 | scikit-learn 88 | matplotlib 89 | joblib 90 | ▶️ Usage 91 | Run the demo script: 92 | 93 | bash 94 | Copy code 95 | python methane_emissions_multispectral_demo.py 96 | This will generate: 97 | 98 | A synthetic dataset (outputs/synthetic_methane_multispectral.csv) 99 | 100 | Trained models (.joblib) 101 | 102 | Evaluation plots + reports in the outputs/ folder 103 | 104 | Quick Inference 105 | python 106 | Copy code 107 | from joblib import load 108 | import pandas as pd 109 | 110 | # Load dataset 111 | df = pd.read_csv("outputs/synthetic_methane_multispectral.csv") 112 | X = df[['blue','green','red','nir','swir1','swir2','tir_bt','surface_pressure', 113 | 'humidity','wind_speed','solar_zenith','ndvi','ndsi','nbr','albedo']].values 114 | 115 | # Load models 116 | reg = load("outputs/methane_regressor.joblib") 117 | clf = load("outputs/methane_classifier.joblib") 118 | 119 | # Predict 120 | y_ch4 = reg.predict(X) # ppm-eq methane estimate 121 | p_leak = clf.predict_proba(X)[:,1] # probability of leak 122 | 📈 Results (Synthetic Example) 123 | Regression 124 | 125 | MAE = ~0.40 ppm-eq 126 | 127 | RMSE = ~0.57 ppm-eq 128 | 129 | R² = ~0.85 130 | 131 | Classification 132 | 133 | Accuracy = ~89% 134 | 135 | ROC-AUC = ~0.93 136 | 137 | Cross-validated ROC-AUC = ~0.92 ± 0.01 138 | 139 | 🌍 Applications 140 | Methane leak detection in oil & gas facilities. 141 | 142 | Greenhouse gas monitoring from airborne/satellite sensors. 143 | 144 | Climate change mitigation & compliance monitoring. 145 | 146 | 📚 References 147 | Cusworth, D. H., et al. (2021). "A review of satellite remote sensing for methane emissions detection." Atmospheric Environment. 148 | 149 | Thorpe, A. K., et al. (2017). "Mapping methane concentrations from airborne remote sensing." Remote Sensing of Environment. 150 | 151 | Jongaramrungruang, S., et al. (2019). "Towards accurate methane detection using hyperspectral imaging." Atmospheric Measurement Techniques. 152 | 153 | 📝 License 154 | This project is released under the MIT License. 155 | 156 | Author: Anslem Otutu 157 | Github: @Otutu11 158 | -------------------------------------------------------------------------------- /file: -------------------------------------------------------------------------------- 1 | # file: methane_emissions_multispectral_demo.py 2 | # Purpose: End-to-end synthetic demo for "Methane-Emissions-Detection-with-Multispectral-Sensors" 3 | # Author: You :) 4 | # Run: python methane_emissions_multispectral_demo.py 5 | 6 | import os 7 | import numpy as np 8 | import pandas as pd 9 | import matplotlib.pyplot as plt 10 | 11 | from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score 12 | from sklearn.ensemble import RandomForestRegressor, GradientBoostingClassifier, RandomForestClassifier 13 | from sklearn.metrics import ( 14 | mean_absolute_error, mean_squared_error, r2_score, 15 | roc_auc_score, roc_curve, confusion_matrix, classification_report 16 | ) 17 | from joblib import dump 18 | 19 | # ----------------------------- 20 | # 0) Setup 21 | # ----------------------------- 22 | np.random.seed(42) 23 | OUTDIR = "outputs" 24 | os.makedirs(OUTDIR, exist_ok=True) 25 | 26 | # ----------------------------- 27 | # 1) Generate synthetic data (>100) 28 | # ----------------------------- 29 | N = 1500 # plenty more than 100 30 | 31 | # Multispectral reflectances (0..1) 32 | blue = np.clip(np.random.normal(0.12, 0.03, N), 0, 1) 33 | green = np.clip(np.random.normal(0.18, 0.04, N), 0, 1) 34 | red = np.clip(np.random.normal(0.20, 0.05, N), 0, 1) 35 | nir = np.clip(np.random.normal(0.40, 0.08, N), 0, 1) 36 | swir1 = np.clip(np.random.normal(0.26, 0.06, N), 0, 1) # ~1.6 μm 37 | swir2 = np.clip(np.random.normal(0.22, 0.06, N), 0, 1) # ~2.2 μm (CH4-sensitive proxy) 38 | 39 | # Thermal + scene/atmos 40 | tir_bt = np.clip(np.random.normal(300, 6, N), 250, 330) # K 41 | surface_pressure = np.random.normal(1013, 8, N) # hPa 42 | humidity = np.clip(np.random.normal(0.55, 0.15, N), 0, 1) # 0..1 43 | wind_speed = np.clip(np.random.gamma(2.0, 1.5, N), 0, 20) # m/s 44 | solar_zenith = np.clip(np.random.normal(35, 12, N), 0, 80) # deg 45 | 46 | # Leak presence and intensity 47 | leak = np.random.binomial(1, 0.35, N) # 35% of scenes contain a leak 48 | base_ch4 = np.random.normal(2.0, 0.3, N) # ppm-eq background enhancement 49 | leak_strength = leak * np.random.gamma(shape=2.0, scale=1.2, size=N) # emission intensity 50 | 51 | # Spectral effects of methane (lower SWIR reflectance with more CH4) 52 | swir2_effect = -0.25 * leak_strength + np.random.normal(0, 0.02, N) 53 | swir2_obs = np.clip(swir2 + swir2_effect, 0, 1) 54 | 55 | swir1_effect = -0.08 * leak_strength + np.random.normal(0, 0.015, N) 56 | swir1_obs = np.clip(swir1 + swir1_effect, 0, 1) 57 | 58 | # Continuous target: methane column enhancement (ppm-eq) 59 | ch4_ppm_eq = ( 60 | base_ch4 61 | + 0.9 * leak_strength 62 | + 0.02 * (tir_bt - 295) # warmer scenes -> more buoyant plumes 63 | - 0.05 * (wind_speed - 4) # high wind disperses signal 64 | + 0.3 * (0.25 - swir2_obs) # optical absorption cue 65 | + np.random.normal(0, 0.15, N) 66 | ) 67 | 68 | # Binary detection label: top 40% CH4 enhancement = emission present 69 | thresh = np.percentile(ch4_ppm_eq, 60) 70 | emission_label = (ch4_ppm_eq >= thresh).astype(int) 71 | 72 | # Derived indices 73 | ndvi = (nir - red) / (nir + red + 1e-6) 74 | ndsi = (swir1_obs - swir2_obs) / (swir1_obs + swir2_obs + 1e-6) # SWIR ratio 75 | nbr = (nir - swir2_obs) / (nir + swir2_obs + 1e-6) 76 | albedo = 0.1*blue + 0.1*green + 0.2*red + 0.3*nir + 0.15*swir1_obs + 0.15*swir2_obs 77 | 78 | df = pd.DataFrame({ 79 | "blue": blue, "green": green, "red": red, "nir": nir, 80 | "swir1": swir1_obs, "swir2": swir2_obs, "tir_bt": tir_bt, 81 | "surface_pressure": surface_pressure, "humidity": humidity, 82 | "wind_speed": wind_speed, "solar_zenith": solar_zenith, 83 | "ndvi": ndvi, "ndsi": ndsi, "nbr": nbr, "albedo": albedo, 84 | "ch4_ppm_eq": ch4_ppm_eq, "emission_label": emission_label 85 | }) 86 | 87 | csv_path = os.path.join(OUTDIR, "synthetic_methane_multispectral.csv") 88 | df.to_csv(csv_path, index=False) 89 | 90 | # ----------------------------- 91 | # 2) Split 92 | # ----------------------------- 93 | features = [ 94 | "blue","green","red","nir","swir1","swir2","tir_bt", 95 | "surface_pressure","humidity","wind_speed","solar_zenith", 96 | "ndvi","ndsi","nbr","albedo" 97 | ] 98 | X = df[features].values 99 | y_reg = df["ch4_ppm_eq"].values 100 | y_cls = df["emission_label"].values 101 | 102 | Xtr_r, Xte_r, ytr_r, yte_r = train_test_split(X, y_reg, test_size=0.2, random_state=42) 103 | Xtr_c, Xte_c, ytr_c, yte_c = train_test_split(X, y_cls, test_size=0.2, random_state=42, stratify=y_cls) 104 | 105 | # ----------------------------- 106 | # 3) Regression model 107 | # ----------------------------- 108 | reg = RandomForestRegressor(n_estimators=400, random_state=42, n_jobs=-1) 109 | reg.fit(Xtr_r, ytr_r) 110 | yp_r = reg.predict(Xte_r) 111 | 112 | mae = mean_absolute_error(yte_r, yp_r) 113 | rmse = mean_squared_error(yte_r, yp_r, squared=False) 114 | r2 = r2_score(yte_r, yp_r) 115 | 116 | # Save parity plot 117 | plt.figure() 118 | plt.scatter(yte_r, yp_r, alpha=0.6) 119 | mn = min(yte_r.min(), yp_r.min()); mx = max(yte_r.max(), yp_r.max()) 120 | plt.plot([mn, mx], [mn, mx]) 121 | plt.xlabel("True CH4 enhancement (ppm-eq)") 122 | plt.ylabel("Predicted CH4 enhancement (ppm-eq)") 123 | plt.title(f"Regression Parity | MAE={mae:.3f} RMSE={rmse:.3f} R2={r2:.3f}") 124 | plt.tight_layout() 125 | plt.savefig(os.path.join(OUTDIR, "regression_parity.png")); plt.close() 126 | 127 | # Feature importance (regression) 128 | imp_r = pd.Series(reg.feature_importances_, index=features).sort_values() 129 | plt.figure() 130 | plt.barh(imp_r.index, imp_r.values) 131 | plt.xlabel("Importance") 132 | plt.title("Regression Feature Importances") 133 | plt.tight_layout() 134 | plt.savefig(os.path.join(OUTDIR, "regression_feature_importance.png")); plt.close() 135 | 136 | # ----------------------------- 137 | # 4) Classification model 138 | # ----------------------------- 139 | clf = GradientBoostingClassifier(random_state=42) 140 | clf.fit(Xtr_c, ytr_c) 141 | proba = clf.predict_proba(Xte_c)[:, 1] 142 | yp_c = (proba >= 0.5).astype(int) 143 | 144 | acc = (yp_c == yte_c).mean() 145 | auc = roc_auc_score(yte_c, proba) 146 | 147 | # ROC 148 | fpr, tpr, thr = roc_curve(yte_c, proba) 149 | plt.figure() 150 | plt.plot(fpr, tpr); plt.plot([0,1],[0,1]) 151 | plt.xlabel("False Positive Rate"); plt.ylabel("True Positive Rate") 152 | plt.title(f"ROC Curve (AUC={auc:.3f})") 153 | plt.tight_layout() 154 | plt.savefig(os.path.join(OUTDIR, "classification_roc.png")); plt.close() 155 | 156 | # Confusion Matrix 157 | cm = confusion_matrix(yte_c, yp_c) 158 | plt.figure() 159 | im = plt.imshow(cm, interpolation="nearest") 160 | plt.title("Confusion Matrix") 161 | plt.xlabel("Predicted"); plt.ylabel("True") 162 | plt.xticks([0,1], ["No Emission","Emission"]) 163 | plt.yticks([0,1], ["No Emission","Emission"]) 164 | for (i,j), v in np.ndenumerate(cm): 165 | plt.text(j, i, int(v), ha="center", va="center") 166 | plt.colorbar(im, fraction=0.046, pad=0.04) 167 | plt.tight_layout() 168 | plt.savefig(os.path.join(OUTDIR, "classification_confusion.png")); plt.close() 169 | 170 | # Report 171 | report = classification_report(yte_c, yp_c, target_names=["No Emission","Emission"]) 172 | with open(os.path.join(OUTDIR, "classification_report.txt"), "w") as f: 173 | f.write(report) 174 | 175 | # Surrogate RF for feature importance (interpretability) 176 | rf_sur = RandomForestClassifier(n_estimators=400, random_state=42, n_jobs=-1) 177 | rf_sur.fit(Xtr_c, ytr_c) 178 | imp_c = pd.Series(rf_sur.feature_importances_, index=features).sort_values() 179 | plt.figure() 180 | plt.barh(imp_c.index, imp_c.values) 181 | plt.xlabel("Importance") 182 | plt.title("Classification Feature Importances (RF surrogate)") 183 | plt.tight_layout() 184 | plt.savefig(os.path.join(OUTDIR, "classification_feature_importance.png")); plt.close() 185 | 186 | # Cross-validation (AUC) 187 | cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) 188 | cv_auc = cross_val_score(clf, X, y_cls, cv=cv, scoring="roc_auc") 189 | with open(os.path.join(OUTDIR, "cv_results.txt"), "w") as f: 190 | f.write(f"ROC-AUC (5-fold): mean={cv_auc.mean():.3f}, std={cv_auc.std():.3f}\n") 191 | f.write(f"Fold scores: {np.round(cv_auc, 3)}\n") 192 | 193 | # ----------------------------- 194 | # 5) Save artifacts 195 | # ----------------------------- 196 | dump(reg, os.path.join(OUTDIR, "methane_regressor.joblib")) 197 | dump(clf, os.path.join(OUTDIR, "methane_classifier.joblib")) 198 | 199 | with open(os.path.join(OUTDIR, "README_Methane_Emissions_Demo.txt"), "w") as f: 200 | f.write( 201 | "Methane-Emissions-Detection-with-Multispectral-Sensors (Synthetic Demo)\n\n" 202 | "Files:\n" 203 | "- synthetic_methane_multispectral.csv : synthetic dataset\n" 204 | "- methane_regressor.joblib : RandomForestRegressor for CH4 enhancement (ppm-eq)\n" 205 | "- methane_classifier.joblib : GradientBoostingClassifier for emission detection\n" 206 | "- regression_parity.png, regression_feature_importance.png\n" 207 | "- classification_roc.png, classification_confusion.png, classification_feature_importance.png\n" 208 | "- classification_report.txt, cv_results.txt\n\n" 209 | "Quick usage:\n" 210 | "from joblib import load\n" 211 | "import pandas as pd\n" 212 | "df = pd.read_csv('outputs/synthetic_methane_multispectral.csv')\n" 213 | "X = df[['blue','green','red','nir','swir1','swir2','tir_bt','surface_pressure'," 214 | "'humidity','wind_speed','solar_zenith','ndvi','ndsi','nbr','albedo']].values\n" 215 | "reg = load('outputs/methane_regressor.joblib'); y_ch4 = reg.predict(X)\n" 216 | "clf = load('outputs/methane_classifier.joblib'); p_leak = clf.predict_proba(X)[:,1]\n" 217 | ) 218 | 219 | # ----------------------------- 220 | # 6) Print summary to console 221 | # ----------------------------- 222 | print("=== Regression ===") 223 | print(f"MAE={mae:.3f} RMSE={rmse:.3f} R2={r2:.3f}") 224 | print("\n=== Classification ===") 225 | print(f"Accuracy={acc:.3f} ROC-AUC={auc:.3f}") 226 | print("\nClassification report:\n", report) 227 | print(f"\n5-fold ROC-AUC: mean={cv_auc.mean():.3f} ± {cv_auc.std():.3f}") 228 | print(f"\nArtifacts saved to: {OUTDIR}/") 229 | -------------------------------------------------------------------------------- /methane_emissions_multispectral_demo.py: -------------------------------------------------------------------------------- 1 | # file: methane_emissions_multispectral_demo.py 2 | # Purpose: End-to-end synthetic demo for "Methane-Emissions-Detection-with-Multispectral-Sensors" 3 | # Author: You :) 4 | # Run: python methane_emissions_multispectral_demo.py 5 | 6 | import os 7 | import numpy as np 8 | import pandas as pd 9 | import matplotlib.pyplot as plt 10 | 11 | from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score 12 | from sklearn.ensemble import RandomForestRegressor, GradientBoostingClassifier, RandomForestClassifier 13 | from sklearn.metrics import ( 14 | mean_absolute_error, mean_squared_error, r2_score, 15 | roc_auc_score, roc_curve, confusion_matrix, classification_report 16 | ) 17 | from joblib import dump 18 | 19 | # ----------------------------- 20 | # 0) Setup 21 | # ----------------------------- 22 | np.random.seed(42) 23 | OUTDIR = "outputs" 24 | os.makedirs(OUTDIR, exist_ok=True) 25 | 26 | # ----------------------------- 27 | # 1) Generate synthetic data (>100) 28 | # ----------------------------- 29 | N = 1500 # plenty more than 100 30 | 31 | # Multispectral reflectances (0..1) 32 | blue = np.clip(np.random.normal(0.12, 0.03, N), 0, 1) 33 | green = np.clip(np.random.normal(0.18, 0.04, N), 0, 1) 34 | red = np.clip(np.random.normal(0.20, 0.05, N), 0, 1) 35 | nir = np.clip(np.random.normal(0.40, 0.08, N), 0, 1) 36 | swir1 = np.clip(np.random.normal(0.26, 0.06, N), 0, 1) # ~1.6 μm 37 | swir2 = np.clip(np.random.normal(0.22, 0.06, N), 0, 1) # ~2.2 μm (CH4-sensitive proxy) 38 | 39 | # Thermal + scene/atmos 40 | tir_bt = np.clip(np.random.normal(300, 6, N), 250, 330) # K 41 | surface_pressure = np.random.normal(1013, 8, N) # hPa 42 | humidity = np.clip(np.random.normal(0.55, 0.15, N), 0, 1) # 0..1 43 | wind_speed = np.clip(np.random.gamma(2.0, 1.5, N), 0, 20) # m/s 44 | solar_zenith = np.clip(np.random.normal(35, 12, N), 0, 80) # deg 45 | 46 | # Leak presence and intensity 47 | leak = np.random.binomial(1, 0.35, N) # 35% of scenes contain a leak 48 | base_ch4 = np.random.normal(2.0, 0.3, N) # ppm-eq background enhancement 49 | leak_strength = leak * np.random.gamma(shape=2.0, scale=1.2, size=N) # emission intensity 50 | 51 | # Spectral effects of methane (lower SWIR reflectance with more CH4) 52 | swir2_effect = -0.25 * leak_strength + np.random.normal(0, 0.02, N) 53 | swir2_obs = np.clip(swir2 + swir2_effect, 0, 1) 54 | 55 | swir1_effect = -0.08 * leak_strength + np.random.normal(0, 0.015, N) 56 | swir1_obs = np.clip(swir1 + swir1_effect, 0, 1) 57 | 58 | # Continuous target: methane column enhancement (ppm-eq) 59 | ch4_ppm_eq = ( 60 | base_ch4 61 | + 0.9 * leak_strength 62 | + 0.02 * (tir_bt - 295) # warmer scenes -> more buoyant plumes 63 | - 0.05 * (wind_speed - 4) # high wind disperses signal 64 | + 0.3 * (0.25 - swir2_obs) # optical absorption cue 65 | + np.random.normal(0, 0.15, N) 66 | ) 67 | 68 | # Binary detection label: top 40% CH4 enhancement = emission present 69 | thresh = np.percentile(ch4_ppm_eq, 60) 70 | emission_label = (ch4_ppm_eq >= thresh).astype(int) 71 | 72 | # Derived indices 73 | ndvi = (nir - red) / (nir + red + 1e-6) 74 | ndsi = (swir1_obs - swir2_obs) / (swir1_obs + swir2_obs + 1e-6) # SWIR ratio 75 | nbr = (nir - swir2_obs) / (nir + swir2_obs + 1e-6) 76 | albedo = 0.1*blue + 0.1*green + 0.2*red + 0.3*nir + 0.15*swir1_obs + 0.15*swir2_obs 77 | 78 | df = pd.DataFrame({ 79 | "blue": blue, "green": green, "red": red, "nir": nir, 80 | "swir1": swir1_obs, "swir2": swir2_obs, "tir_bt": tir_bt, 81 | "surface_pressure": surface_pressure, "humidity": humidity, 82 | "wind_speed": wind_speed, "solar_zenith": solar_zenith, 83 | "ndvi": ndvi, "ndsi": ndsi, "nbr": nbr, "albedo": albedo, 84 | "ch4_ppm_eq": ch4_ppm_eq, "emission_label": emission_label 85 | }) 86 | 87 | csv_path = os.path.join(OUTDIR, "synthetic_methane_multispectral.csv") 88 | df.to_csv(csv_path, index=False) 89 | 90 | # ----------------------------- 91 | # 2) Split 92 | # ----------------------------- 93 | features = [ 94 | "blue","green","red","nir","swir1","swir2","tir_bt", 95 | "surface_pressure","humidity","wind_speed","solar_zenith", 96 | "ndvi","ndsi","nbr","albedo" 97 | ] 98 | X = df[features].values 99 | y_reg = df["ch4_ppm_eq"].values 100 | y_cls = df["emission_label"].values 101 | 102 | Xtr_r, Xte_r, ytr_r, yte_r = train_test_split(X, y_reg, test_size=0.2, random_state=42) 103 | Xtr_c, Xte_c, ytr_c, yte_c = train_test_split(X, y_cls, test_size=0.2, random_state=42, stratify=y_cls) 104 | 105 | # ----------------------------- 106 | # 3) Regression model 107 | # ----------------------------- 108 | reg = RandomForestRegressor(n_estimators=400, random_state=42, n_jobs=-1) 109 | reg.fit(Xtr_r, ytr_r) 110 | yp_r = reg.predict(Xte_r) 111 | 112 | mae = mean_absolute_error(yte_r, yp_r) 113 | rmse = mean_squared_error(yte_r, yp_r, squared=False) 114 | r2 = r2_score(yte_r, yp_r) 115 | 116 | # Save parity plot 117 | plt.figure() 118 | plt.scatter(yte_r, yp_r, alpha=0.6) 119 | mn = min(yte_r.min(), yp_r.min()); mx = max(yte_r.max(), yp_r.max()) 120 | plt.plot([mn, mx], [mn, mx]) 121 | plt.xlabel("True CH4 enhancement (ppm-eq)") 122 | plt.ylabel("Predicted CH4 enhancement (ppm-eq)") 123 | plt.title(f"Regression Parity | MAE={mae:.3f} RMSE={rmse:.3f} R2={r2:.3f}") 124 | plt.tight_layout() 125 | plt.savefig(os.path.join(OUTDIR, "regression_parity.png")); plt.close() 126 | 127 | # Feature importance (regression) 128 | imp_r = pd.Series(reg.feature_importances_, index=features).sort_values() 129 | plt.figure() 130 | plt.barh(imp_r.index, imp_r.values) 131 | plt.xlabel("Importance") 132 | plt.title("Regression Feature Importances") 133 | plt.tight_layout() 134 | plt.savefig(os.path.join(OUTDIR, "regression_feature_importance.png")); plt.close() 135 | 136 | # ----------------------------- 137 | # 4) Classification model 138 | # ----------------------------- 139 | clf = GradientBoostingClassifier(random_state=42) 140 | clf.fit(Xtr_c, ytr_c) 141 | proba = clf.predict_proba(Xte_c)[:, 1] 142 | yp_c = (proba >= 0.5).astype(int) 143 | 144 | acc = (yp_c == yte_c).mean() 145 | auc = roc_auc_score(yte_c, proba) 146 | 147 | # ROC 148 | fpr, tpr, thr = roc_curve(yte_c, proba) 149 | plt.figure() 150 | plt.plot(fpr, tpr); plt.plot([0,1],[0,1]) 151 | plt.xlabel("False Positive Rate"); plt.ylabel("True Positive Rate") 152 | plt.title(f"ROC Curve (AUC={auc:.3f})") 153 | plt.tight_layout() 154 | plt.savefig(os.path.join(OUTDIR, "classification_roc.png")); plt.close() 155 | 156 | # Confusion Matrix 157 | cm = confusion_matrix(yte_c, yp_c) 158 | plt.figure() 159 | im = plt.imshow(cm, interpolation="nearest") 160 | plt.title("Confusion Matrix") 161 | plt.xlabel("Predicted"); plt.ylabel("True") 162 | plt.xticks([0,1], ["No Emission","Emission"]) 163 | plt.yticks([0,1], ["No Emission","Emission"]) 164 | for (i,j), v in np.ndenumerate(cm): 165 | plt.text(j, i, int(v), ha="center", va="center") 166 | plt.colorbar(im, fraction=0.046, pad=0.04) 167 | plt.tight_layout() 168 | plt.savefig(os.path.join(OUTDIR, "classification_confusion.png")); plt.close() 169 | 170 | # Report 171 | report = classification_report(yte_c, yp_c, target_names=["No Emission","Emission"]) 172 | with open(os.path.join(OUTDIR, "classification_report.txt"), "w") as f: 173 | f.write(report) 174 | 175 | # Surrogate RF for feature importance (interpretability) 176 | rf_sur = RandomForestClassifier(n_estimators=400, random_state=42, n_jobs=-1) 177 | rf_sur.fit(Xtr_c, ytr_c) 178 | imp_c = pd.Series(rf_sur.feature_importances_, index=features).sort_values() 179 | plt.figure() 180 | plt.barh(imp_c.index, imp_c.values) 181 | plt.xlabel("Importance") 182 | plt.title("Classification Feature Importances (RF surrogate)") 183 | plt.tight_layout() 184 | plt.savefig(os.path.join(OUTDIR, "classification_feature_importance.png")); plt.close() 185 | 186 | # Cross-validation (AUC) 187 | cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) 188 | cv_auc = cross_val_score(clf, X, y_cls, cv=cv, scoring="roc_auc") 189 | with open(os.path.join(OUTDIR, "cv_results.txt"), "w") as f: 190 | f.write(f"ROC-AUC (5-fold): mean={cv_auc.mean():.3f}, std={cv_auc.std():.3f}\n") 191 | f.write(f"Fold scores: {np.round(cv_auc, 3)}\n") 192 | 193 | # ----------------------------- 194 | # 5) Save artifacts 195 | # ----------------------------- 196 | dump(reg, os.path.join(OUTDIR, "methane_regressor.joblib")) 197 | dump(clf, os.path.join(OUTDIR, "methane_classifier.joblib")) 198 | 199 | with open(os.path.join(OUTDIR, "README_Methane_Emissions_Demo.txt"), "w") as f: 200 | f.write( 201 | "Methane-Emissions-Detection-with-Multispectral-Sensors (Synthetic Demo)\n\n" 202 | "Files:\n" 203 | "- synthetic_methane_multispectral.csv : synthetic dataset\n" 204 | "- methane_regressor.joblib : RandomForestRegressor for CH4 enhancement (ppm-eq)\n" 205 | "- methane_classifier.joblib : GradientBoostingClassifier for emission detection\n" 206 | "- regression_parity.png, regression_feature_importance.png\n" 207 | "- classification_roc.png, classification_confusion.png, classification_feature_importance.png\n" 208 | "- classification_report.txt, cv_results.txt\n\n" 209 | "Quick usage:\n" 210 | "from joblib import load\n" 211 | "import pandas as pd\n" 212 | "df = pd.read_csv('outputs/synthetic_methane_multispectral.csv')\n" 213 | "X = df[['blue','green','red','nir','swir1','swir2','tir_bt','surface_pressure'," 214 | "'humidity','wind_speed','solar_zenith','ndvi','ndsi','nbr','albedo']].values\n" 215 | "reg = load('outputs/methane_regressor.joblib'); y_ch4 = reg.predict(X)\n" 216 | "clf = load('outputs/methane_classifier.joblib'); p_leak = clf.predict_proba(X)[:,1]\n" 217 | ) 218 | 219 | # ----------------------------- 220 | # 6) Print summary to console 221 | # ----------------------------- 222 | print("=== Regression ===") 223 | print(f"MAE={mae:.3f} RMSE={rmse:.3f} R2={r2:.3f}") 224 | print("\n=== Classification ===") 225 | print(f"Accuracy={acc:.3f} ROC-AUC={auc:.3f}") 226 | print("\nClassification report:\n", report) 227 | print(f"\n5-fold ROC-AUC: mean={cv_auc.mean():.3f} ± {cv_auc.std():.3f}") 228 | print(f"\nArtifacts saved to: {OUTDIR}/") 229 | --------------------------------------------------------------------------------