├── air_quality_climate_data.xlsx ├── README.md └── First code /air_quality_climate_data.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Geraldine-Winston/Air-Quality-Prediction-Under-Changing-Climate-using-deep-ensemble-models./HEAD/air_quality_climate_data.xlsx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Air Quality Prediction Under Changing Climate Using Deep Ensemble Models 2 | 3 | ## Overview 4 | Air quality has become a critical global concern, especially under the escalating impacts of climate change. Rising temperatures, shifting weather patterns, and increasing urbanization have intensified air pollution levels, necessitating advanced predictive modeling approaches. This project aims to predict PM2.5 concentrations, a key indicator of air quality, using a deep ensemble learning framework. By leveraging multiple deep learning models, the ensemble approach mitigates individual model biases and enhances the robustness and accuracy of predictions. This methodology not only improves generalization across varying climatic scenarios but also provides a reliable tool for policymakers and environmental agencies to anticipate air quality trends and implement timely interventions. 5 | 6 | ## Dataset 7 | The dataset includes various environmental and climatic features: 8 | - Temperature 9 | - Humidity 10 | - Wind Speed 11 | - CO2 12 | - NO2 13 | - SO2 14 | - O3 15 | - PM2.5 (Target Variable) 16 | 17 | The data is stored in `air_quality_climate_data.xlsx`. 18 | 19 | ## Requirements 20 | - Python 3.x 21 | - TensorFlow 22 | - Pandas 23 | - NumPy 24 | - Scikit-learn 25 | - Joblib 26 | 27 | Install the required libraries: 28 | ```bash 29 | pip install tensorflow pandas numpy scikit-learn joblib 30 | ``` 31 | 32 | ## Usage 33 | 1. Load the dataset. 34 | 2. Preprocess the data (standardization). 35 | 3. Train an ensemble of deep learning models. 36 | 4. Predict and evaluate using Mean Squared Error and R-squared metrics. 37 | 5. Save the trained models and scaler for future use. 38 | 39 | ## Model Architecture 40 | Each model in the ensemble has the following architecture: 41 | - Dense layer (128 units, ReLU activation) 42 | - Dropout (30%) 43 | - Dense layer (64 units, ReLU activation) 44 | - Dropout (30%) 45 | - Dense layer (32 units, ReLU activation) 46 | - Output Dense layer (1 unit, no activation) 47 | 48 | Optimizer: Adam with learning rate 0.001 49 | Loss: Mean Squared Error 50 | 51 | ## Output 52 | - Trained model files: `air_quality_model_0.h5`, `air_quality_model_1.h5`, etc. 53 | - Scaler file: `scaler.pkl` 54 | - Evaluation metrics printed on console. 55 | 56 | ## Author 57 | **Ayebawanaemi Geraldine Winston** 58 | -------------------------------------------------------------------------------- /First code: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | from sklearn.model_selection import train_test_split 4 | from sklearn.preprocessing import StandardScaler 5 | from sklearn.metrics import mean_squared_error, r2_score 6 | import tensorflow as tf 7 | from tensorflow.keras.models import Sequential 8 | from tensorflow.keras.layers import Dense, Dropout 9 | from tensorflow.keras.optimizers import Adam 10 | 11 | # Load dataset (assuming a CSV file) 12 | df = pd.read_csv('air_quality_climate_data.csv') 13 | 14 | # Assume the target variable is 'PM2.5' and features are all other columns 15 | X = df.drop(columns=['PM2.5']) 16 | y = df['PM2.5'] 17 | 18 | # Split into train and test sets 19 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 20 | 21 | # Standardize the features 22 | scaler = StandardScaler() 23 | X_train_scaled = scaler.fit_transform(X_train) 24 | X_test_scaled = scaler.transform(X_test) 25 | 26 | # Define a function to create a base model 27 | def create_model(): 28 | model = Sequential() 29 | model.add(Dense(128, activation='relu', input_shape=(X_train_scaled.shape[1],))) 30 | model.add(Dropout(0.3)) 31 | model.add(Dense(64, activation='relu')) 32 | model.add(Dropout(0.3)) 33 | model.add(Dense(32, activation='relu')) 34 | model.add(Dense(1)) 35 | model.compile(optimizer=Adam(learning_rate=0.001), loss='mse') 36 | return model 37 | 38 | # Create an ensemble of models 39 | n_models = 5 40 | models = [] 41 | 42 | for _ in range(n_models): 43 | model = create_model() 44 | model.fit(X_train_scaled, y_train, epochs=100, batch_size=32, verbose=0) 45 | models.append(model) 46 | 47 | # Make predictions and average them 48 | def ensemble_predict(models, X): 49 | predictions = np.column_stack([model.predict(X).flatten() for model in models]) 50 | return np.mean(predictions, axis=1) 51 | 52 | # Predict on the test set 53 | y_pred = ensemble_predict(models, X_test_scaled) 54 | 55 | # Evaluate 56 | mse = mean_squared_error(y_test, y_pred) 57 | r2 = r2_score(y_test, y_pred) 58 | 59 | print(f'Mean Squared Error: {mse}') 60 | print(f'R-squared: {r2}') 61 | 62 | # Save the models 63 | for i, model in enumerate(models): 64 | model.save(f'air_quality_model_{i}.h5') 65 | 66 | # Save the scaler 67 | import joblib 68 | joblib.dump(scaler, 'scaler.pkl') 69 | --------------------------------------------------------------------------------