├── proxy_climate_data.xlsx
├── README.md
└── First code


/proxy_climate_data.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Geraldine-Winston/Paleoclimate-Reconstruction-using-ML-regression-on-proxy-data/HEAD/proxy_climate_data.xlsx


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Paleoclimate Reconstruction using ML Regression on Proxy Data
 2 | 
 3 | This project applies **Machine Learning regression models** to reconstruct past climate conditions, specifically **paleotemperatures**, using proxy data such as isotope ratios, pollen counts, and sediment properties.
 4 | 
 5 | ## Project Overview
 6 | 
 7 | - **Input**: Proxy data including isotope ratios, pollen indices, sediment density, and magnetic susceptibility.
 8 | - **Target**: Reconstruct past temperature values.
 9 | - **Model**: Random Forest Regressor trained to predict paleotemperatures from proxy data.
10 | - **Output**: Predicted past temperatures based on given proxy measurements.
11 | 
12 | ## Project Structure
13 | 
14 | ```
15 | 📁 Paleoclimate-Reconstruction-ML
16 | ├── proxy_climate_data.xlsx        # Synthetic proxy climate dataset
17 | ├── paleoclimate_rf_model.pkl      # Trained Random Forest model
18 | ├── paleoclimate_reconstruction.py # Main training and evaluation script
19 | ├── README.md                      # Project documentation
20 | ├── requirements.txt               # Python dependencies
21 | └── .gitignore                      # Files to ignore in version control
22 | ```
23 | 
24 | ## Requirements
25 | 
26 | Install all necessary dependencies using:
27 | 
28 | ```bash
29 | pip install -r requirements.txt
30 | ```
31 | 
32 | **Dependencies:**
33 | 
34 | ```
35 | pandas
36 | numpy
37 | scikit-learn
38 | openpyxl
39 | joblib
40 | ```
41 | 
42 | ## How to Run
43 | 
44 | 1. Ensure `proxy_climate_data.xlsx` is available in your project directory.
45 | 2. Run the script:
46 | 
47 | ```bash
48 | python paleoclimate_reconstruction.py
49 | ```
50 | 
51 | 3. After training:
52 |    - Model evaluation metrics (MSE and R² Score) will be printed.
53 |    - The trained model will be saved as `paleoclimate_rf_model.pkl`.
54 | 
55 | ## Dataset Example
56 | 
57 | ```csv
58 | Isotope_Ratio,Pollen_Index,Sediment_Density,Magnetic_Susceptibility,Temperature
59 | -1.56,45.2,2.65,800,13.7
60 | 2.33,60.1,2.55,600,14.5
61 | ...
62 | ```
63 | 
64 | ## Model Performance
65 | 
66 | The Random Forest model achieves:
67 | - **Test MSE**: ~1.02
68 | - **Test R² Score**: ~0.91
69 | 
70 | *(Values may vary depending on data and random seed.)*
71 | 
72 | ## Acknowledgements
73 | 
74 | - Inspired by real-world paleoclimate reconstruction studies.
75 | - Future extensions could involve more complex models and the use of real-world datasets from the **NOAA Paleoclimatology Database** or **PANGAEA**.
76 | 
77 | ## Author
78 | 
79 | **Ayebawanaemi Geraldine Winston**
80 | 
81 | 


--------------------------------------------------------------------------------
/First code:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import numpy as np
 3 | from sklearn.ensemble import RandomForestRegressor
 4 | from sklearn.model_selection import train_test_split
 5 | from sklearn.preprocessing import StandardScaler
 6 | from sklearn.metrics import mean_squared_error, r2_score
 7 | 
 8 | # Create Synthetic Proxy Data (example simulation)
 9 | def generate_synthetic_proxy_data(n_samples=1000):
10 |     np.random.seed(42)
11 |     # Simulated proxy variables (e.g., isotope ratios, pollen %, sediment properties)
12 |     isotope_ratio = np.random.uniform(-5, 5, n_samples)   # δ18O or δ13C proxy
13 |     pollen_index = np.random.uniform(0, 100, n_samples)   # Pollen abundance
14 |     sediment_density = np.random.uniform(2.0, 3.0, n_samples) # g/cm³
15 |     magnetic_susceptibility = np.random.uniform(0, 1000, n_samples)  # SI units
16 |     
17 |     # Target: Temperature (°C)
18 |     temperature = (
19 |         15 - 0.8 * isotope_ratio 
20 |         + 0.05 * pollen_index 
21 |         - 1.2 * (sediment_density - 2.5)
22 |         + 0.001 * magnetic_susceptibility
23 |         + np.random.normal(0, 1, n_samples)  # Add some noise
24 |     )
25 |     
26 |     df = pd.DataFrame({
27 |         'Isotope_Ratio': isotope_ratio,
28 |         'Pollen_Index': pollen_index,
29 |         'Sediment_Density': sediment_density,
30 |         'Magnetic_Susceptibility': magnetic_susceptibility,
31 |         'Temperature': temperature
32 |     })
33 |     
34 |     return df
35 | 
36 | # Generate Data
37 | df = generate_synthetic_proxy_data()
38 | 
39 | # Features and Target
40 | X = df.drop(columns=['Temperature'])
41 | y = df['Temperature']
42 | 
43 | # Standardize Features
44 | scaler = StandardScaler()
45 | X_scaled = scaler.fit_transform(X)
46 | 
47 | # Train/Test Split
48 | X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
49 | 
50 | # Model
51 | model = RandomForestRegressor(n_estimators=200, random_state=42)
52 | model.fit(X_train, y_train)
53 | 
54 | # Predictions
55 | y_pred = model.predict(X_test)
56 | 
57 | # Evaluation
58 | mse = mean_squared_error(y_test, y_pred)
59 | r2 = r2_score(y_test, y_pred)
60 | print(f"Test MSE: {mse:.4f}")
61 | print(f"Test R² Score: {r2:.4f}")
62 | 
63 | # Save Data and Model (optional for further use)
64 | df.to_excel('proxy_climate_data.xlsx', index=False)
65 | import joblib
66 | joblib.dump(model, 'paleoclimate_rf_model.pkl')
67 | 
68 | # Example Reconstruction
69 | example_proxy = np.array([[-2.1, 40, 2.7, 800]])  # A hypothetical sample
70 | example_proxy_scaled = scaler.transform(example_proxy)
71 | reconstructed_temp = model.predict(example_proxy_scaled)
72 | print(f"Reconstructed Temperature (example sample): {reconstructed_temp[0]:.2f} °C")
73 | 


--------------------------------------------------------------------------------