├── proxy_climate_data.xlsx ├── README.md └── First code /proxy_climate_data.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Geraldine-Winston/Paleoclimate-Reconstruction-using-ML-regression-on-proxy-data/HEAD/proxy_climate_data.xlsx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Paleoclimate Reconstruction using ML Regression on Proxy Data 2 | 3 | This project applies **Machine Learning regression models** to reconstruct past climate conditions, specifically **paleotemperatures**, using proxy data such as isotope ratios, pollen counts, and sediment properties. 4 | 5 | ## Project Overview 6 | 7 | - **Input**: Proxy data including isotope ratios, pollen indices, sediment density, and magnetic susceptibility. 8 | - **Target**: Reconstruct past temperature values. 9 | - **Model**: Random Forest Regressor trained to predict paleotemperatures from proxy data. 10 | - **Output**: Predicted past temperatures based on given proxy measurements. 11 | 12 | ## Project Structure 13 | 14 | ``` 15 | 📁 Paleoclimate-Reconstruction-ML 16 | ├── proxy_climate_data.xlsx # Synthetic proxy climate dataset 17 | ├── paleoclimate_rf_model.pkl # Trained Random Forest model 18 | ├── paleoclimate_reconstruction.py # Main training and evaluation script 19 | ├── README.md # Project documentation 20 | ├── requirements.txt # Python dependencies 21 | └── .gitignore # Files to ignore in version control 22 | ``` 23 | 24 | ## Requirements 25 | 26 | Install all necessary dependencies using: 27 | 28 | ```bash 29 | pip install -r requirements.txt 30 | ``` 31 | 32 | **Dependencies:** 33 | 34 | ``` 35 | pandas 36 | numpy 37 | scikit-learn 38 | openpyxl 39 | joblib 40 | ``` 41 | 42 | ## How to Run 43 | 44 | 1. Ensure `proxy_climate_data.xlsx` is available in your project directory. 45 | 2. Run the script: 46 | 47 | ```bash 48 | python paleoclimate_reconstruction.py 49 | ``` 50 | 51 | 3. After training: 52 | - Model evaluation metrics (MSE and R² Score) will be printed. 53 | - The trained model will be saved as `paleoclimate_rf_model.pkl`. 54 | 55 | ## Dataset Example 56 | 57 | ```csv 58 | Isotope_Ratio,Pollen_Index,Sediment_Density,Magnetic_Susceptibility,Temperature 59 | -1.56,45.2,2.65,800,13.7 60 | 2.33,60.1,2.55,600,14.5 61 | ... 62 | ``` 63 | 64 | ## Model Performance 65 | 66 | The Random Forest model achieves: 67 | - **Test MSE**: ~1.02 68 | - **Test R² Score**: ~0.91 69 | 70 | *(Values may vary depending on data and random seed.)* 71 | 72 | ## Acknowledgements 73 | 74 | - Inspired by real-world paleoclimate reconstruction studies. 75 | - Future extensions could involve more complex models and the use of real-world datasets from the **NOAA Paleoclimatology Database** or **PANGAEA**. 76 | 77 | ## Author 78 | 79 | **Ayebawanaemi Geraldine Winston** 80 | 81 | -------------------------------------------------------------------------------- /First code: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from sklearn.ensemble import RandomForestRegressor 4 | from sklearn.model_selection import train_test_split 5 | from sklearn.preprocessing import StandardScaler 6 | from sklearn.metrics import mean_squared_error, r2_score 7 | 8 | # Create Synthetic Proxy Data (example simulation) 9 | def generate_synthetic_proxy_data(n_samples=1000): 10 | np.random.seed(42) 11 | # Simulated proxy variables (e.g., isotope ratios, pollen %, sediment properties) 12 | isotope_ratio = np.random.uniform(-5, 5, n_samples) # δ18O or δ13C proxy 13 | pollen_index = np.random.uniform(0, 100, n_samples) # Pollen abundance 14 | sediment_density = np.random.uniform(2.0, 3.0, n_samples) # g/cm³ 15 | magnetic_susceptibility = np.random.uniform(0, 1000, n_samples) # SI units 16 | 17 | # Target: Temperature (°C) 18 | temperature = ( 19 | 15 - 0.8 * isotope_ratio 20 | + 0.05 * pollen_index 21 | - 1.2 * (sediment_density - 2.5) 22 | + 0.001 * magnetic_susceptibility 23 | + np.random.normal(0, 1, n_samples) # Add some noise 24 | ) 25 | 26 | df = pd.DataFrame({ 27 | 'Isotope_Ratio': isotope_ratio, 28 | 'Pollen_Index': pollen_index, 29 | 'Sediment_Density': sediment_density, 30 | 'Magnetic_Susceptibility': magnetic_susceptibility, 31 | 'Temperature': temperature 32 | }) 33 | 34 | return df 35 | 36 | # Generate Data 37 | df = generate_synthetic_proxy_data() 38 | 39 | # Features and Target 40 | X = df.drop(columns=['Temperature']) 41 | y = df['Temperature'] 42 | 43 | # Standardize Features 44 | scaler = StandardScaler() 45 | X_scaled = scaler.fit_transform(X) 46 | 47 | # Train/Test Split 48 | X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42) 49 | 50 | # Model 51 | model = RandomForestRegressor(n_estimators=200, random_state=42) 52 | model.fit(X_train, y_train) 53 | 54 | # Predictions 55 | y_pred = model.predict(X_test) 56 | 57 | # Evaluation 58 | mse = mean_squared_error(y_test, y_pred) 59 | r2 = r2_score(y_test, y_pred) 60 | print(f"Test MSE: {mse:.4f}") 61 | print(f"Test R² Score: {r2:.4f}") 62 | 63 | # Save Data and Model (optional for further use) 64 | df.to_excel('proxy_climate_data.xlsx', index=False) 65 | import joblib 66 | joblib.dump(model, 'paleoclimate_rf_model.pkl') 67 | 68 | # Example Reconstruction 69 | example_proxy = np.array([[-2.1, 40, 2.7, 800]]) # A hypothetical sample 70 | example_proxy_scaled = scaler.transform(example_proxy) 71 | reconstructed_temp = model.predict(example_proxy_scaled) 72 | print(f"Reconstructed Temperature (example sample): {reconstructed_temp[0]:.2f} °C") 73 | --------------------------------------------------------------------------------