├── biodiversity_hotspot_clusters.xlsx ├── README.md ├── file └── Biodiversity_Hotspot_Clustering.py /biodiversity_hotspot_clusters.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nelvinebi/Biodiversity-Hotspot-Prioritization-Using-Clustering-Techniques/HEAD/biodiversity_hotspot_clusters.xlsx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 📌 Overview 2 | This project applies clustering algorithms to prioritize biodiversity hotspots based on synthetic environmental and species distribution data. It aims to support conservation planning by grouping areas with similar biodiversity characteristics and identifying regions that require urgent protection. 3 | 4 | 🎯 Objectives 5 | Simulate biodiversity data for different regions. 6 | 7 | Apply clustering techniques (e.g., K-Means, Hierarchical Clustering, DBSCAN). 8 | 9 | Identify high-priority biodiversity hotspots for conservation. 10 | 11 | Visualize results for decision-making support. 12 | 13 | 📂 Project Structure 14 | plaintext 15 | Copy 16 | Edit 17 | Biodiversity-Hotspot-Prioritization-Using-Clustering-Techniques/ 18 | │ 19 | ├── data/ # Synthetic dataset (CSV/Excel format) 20 | ├── notebooks/ # Jupyter notebooks for data analysis & clustering 21 | ├── src/ # Python scripts for data preprocessing, clustering, and visualization 22 | ├── results/ # Output graphs, clustering maps, and reports 23 | ├── README.md # Project documentation 24 | └── requirements.txt # Python dependencies 25 | 🛠️ Technologies Used 26 | Python (pandas, numpy, matplotlib, seaborn, scikit-learn) 27 | 28 | Jupyter Notebook for analysis and visualization 29 | 30 | Excel/CSV for synthetic data storage 31 | 32 | 📊 Methodology 33 | Data Generation – Create synthetic biodiversity data including species richness, threat levels, and environmental variables. 34 | 35 | Preprocessing – Handle missing values, normalize data, and select relevant features. 36 | 37 | Clustering – Use algorithms such as K-Means, Hierarchical Clustering, and DBSCAN to group regions. 38 | 39 | Evaluation – Analyze clustering performance using metrics like Silhouette Score. 40 | 41 | Visualization – Plot clusters to highlight biodiversity hotspot zones. 42 | 43 | 📈 Example Output 44 | Clustered hotspot maps 45 | 46 | Priority ranking tables 47 | 48 | Silhouette score charts 49 | 50 | 🚀 Installation & Usage 51 | Clone this repository: 52 | 53 | bash 54 | Copy 55 | Edit 56 | git clone https://github.com/yourusername/Biodiversity-Hotspot-Prioritization-Using-Clustering-Techniques.git 57 | cd Biodiversity-Hotspot-Prioritization-Using-Clustering-Techniques 58 | Install dependencies: 59 | 60 | bash 61 | Copy 62 | Edit 63 | pip install -r requirements.txt 64 | Run the Jupyter notebook: 65 | 66 | bash 67 | Copy 68 | Edit 69 | jupyter notebook notebooks/Biodiversity_Hotspot_Prioritization.ipynb 70 | 📜 License 71 | This project is licensed under the MIT License – feel free to use and adapt with proper attribution. 72 | 73 | ✨ Author 74 | Ebingiye Nelvin Agbozu 75 | *nelvinebingiye@gmail.com* 76 | Environmental Science & Data Analysis Enthusiast 77 | -------------------------------------------------------------------------------- /file: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import matplotlib.pyplot as plt 4 | import seaborn as sns 5 | from sklearn.cluster import KMeans 6 | from sklearn.preprocessing import StandardScaler 7 | from sklearn.decomposition import PCA 8 | 9 | # 1. Generate Synthetic Biodiversity Data 10 | np.random.seed(42) 11 | n_points = 200 # ≥150 data points 12 | 13 | # Spatial coordinates 14 | x_coord = np.random.uniform(0, 100, n_points) 15 | y_coord = np.random.uniform(0, 100, n_points) 16 | 17 | # Ecological features 18 | species_richness = np.random.normal(loc=120, scale=30, size=n_points) # number of species 19 | endemic_species = np.random.normal(loc=30, scale=10, size=n_points) # number of endemic species 20 | threatened_species = np.random.normal(loc=15, scale=5, size=n_points) # number of threatened species 21 | habitat_quality = np.random.uniform(0.4, 1.0, n_points) # 0 (poor) to 1 (pristine) 22 | 23 | # Clip any negative values due to noise 24 | species_richness = np.clip(species_richness, 10, None) 25 | endemic_species = np.clip(endemic_species, 0, None) 26 | threatened_species = np.clip(threatened_species, 0, None) 27 | 28 | # Create DataFrame 29 | df = pd.DataFrame({ 30 | 'x_coord': x_coord, 31 | 'y_coord': y_coord, 32 | 'species_richness': species_richness, 33 | 'endemic_species': endemic_species, 34 | 'threatened_species': threatened_species, 35 | 'habitat_quality': habitat_quality 36 | }) 37 | 38 | # 2. Scale features before clustering 39 | features = ['species_richness', 'endemic_species', 'threatened_species', 'habitat_quality'] 40 | scaler = StandardScaler() 41 | scaled_features = scaler.fit_transform(df[features]) 42 | 43 | # 3. K-Means Clustering 44 | kmeans = KMeans(n_clusters=3, random_state=42) 45 | df['hotspot_cluster'] = kmeans.fit_predict(scaled_features) 46 | 47 | # Optional: Assign priority labels (e.g., 0 = Low, 1 = Medium, 2 = High) 48 | priority_labels = {0: 'Medium', 1: 'Low', 2: 'High'} 49 | df['priority'] = df['hotspot_cluster'].map(lambda x: priority_labels.get(x, 'Unknown')) 50 | 51 | # 4. PCA for 2D visualization (optional) 52 | pca = PCA(n_components=2) 53 | df[['pca1', 'pca2']] = pca.fit_transform(scaled_features) 54 | 55 | # 5. Output Summary 56 | print(df.head()) 57 | 58 | # 6. Visualization: Biodiversity Hotspot Clusters (PCA view) 59 | plt.figure(figsize=(8, 6)) 60 | sns.scatterplot(data=df, x='pca1', y='pca2', hue='priority', palette='Set1', s=60) 61 | plt.title('Biodiversity Hotspot Prioritization (PCA View)') 62 | plt.xlabel('Principal Component 1') 63 | plt.ylabel('Principal Component 2') 64 | plt.legend(title='Priority Level') 65 | plt.tight_layout() 66 | plt.show() 67 | 68 | # 7. Spatial Map 69 | plt.figure(figsize=(8, 6)) 70 | sns.scatterplot(data=df, x='x_coord', y='y_coord', hue='priority', palette='Set2', s=60) 71 | plt.title('Spatial Distribution of Biodiversity Priority Areas') 72 | plt.xlabel('X Coordinate') 73 | plt.ylabel('Y Coordinate') 74 | plt.grid(True) 75 | plt.tight_layout() 76 | plt.show() 77 | -------------------------------------------------------------------------------- /Biodiversity_Hotspot_Clustering.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import matplotlib.pyplot as plt 4 | import seaborn as sns 5 | from sklearn.cluster import KMeans 6 | from sklearn.preprocessing import StandardScaler 7 | from sklearn.decomposition import PCA 8 | 9 | # 1. Generate Synthetic Biodiversity Data 10 | np.random.seed(42) 11 | n_points = 200 # ≥150 data points 12 | 13 | # Spatial coordinates 14 | x_coord = np.random.uniform(0, 100, n_points) 15 | y_coord = np.random.uniform(0, 100, n_points) 16 | 17 | # Ecological features 18 | species_richness = np.random.normal(loc=120, scale=30, size=n_points) # number of species 19 | endemic_species = np.random.normal(loc=30, scale=10, size=n_points) # number of endemic species 20 | threatened_species = np.random.normal(loc=15, scale=5, size=n_points) # number of threatened species 21 | habitat_quality = np.random.uniform(0.4, 1.0, n_points) # 0 (poor) to 1 (pristine) 22 | 23 | # Clip any negative values due to noise 24 | species_richness = np.clip(species_richness, 10, None) 25 | endemic_species = np.clip(endemic_species, 0, None) 26 | threatened_species = np.clip(threatened_species, 0, None) 27 | 28 | # Create DataFrame 29 | df = pd.DataFrame({ 30 | 'x_coord': x_coord, 31 | 'y_coord': y_coord, 32 | 'species_richness': species_richness, 33 | 'endemic_species': endemic_species, 34 | 'threatened_species': threatened_species, 35 | 'habitat_quality': habitat_quality 36 | }) 37 | 38 | # 2. Scale features before clustering 39 | features = ['species_richness', 'endemic_species', 'threatened_species', 'habitat_quality'] 40 | scaler = StandardScaler() 41 | scaled_features = scaler.fit_transform(df[features]) 42 | 43 | # 3. K-Means Clustering 44 | kmeans = KMeans(n_clusters=3, random_state=42) 45 | df['hotspot_cluster'] = kmeans.fit_predict(scaled_features) 46 | 47 | # Optional: Assign priority labels (e.g., 0 = Low, 1 = Medium, 2 = High) 48 | priority_labels = {0: 'Medium', 1: 'Low', 2: 'High'} 49 | df['priority'] = df['hotspot_cluster'].map(lambda x: priority_labels.get(x, 'Unknown')) 50 | 51 | # 4. PCA for 2D visualization (optional) 52 | pca = PCA(n_components=2) 53 | df[['pca1', 'pca2']] = pca.fit_transform(scaled_features) 54 | 55 | # 5. Output Summary 56 | print(df.head()) 57 | 58 | # 6. Visualization: Biodiversity Hotspot Clusters (PCA view) 59 | plt.figure(figsize=(8, 6)) 60 | sns.scatterplot(data=df, x='pca1', y='pca2', hue='priority', palette='Set1', s=60) 61 | plt.title('Biodiversity Hotspot Prioritization (PCA View)') 62 | plt.xlabel('Principal Component 1') 63 | plt.ylabel('Principal Component 2') 64 | plt.legend(title='Priority Level') 65 | plt.tight_layout() 66 | plt.show() 67 | 68 | # 7. Spatial Map 69 | plt.figure(figsize=(8, 6)) 70 | sns.scatterplot(data=df, x='x_coord', y='y_coord', hue='priority', palette='Set2', s=60) 71 | plt.title('Spatial Distribution of Biodiversity Priority Areas') 72 | plt.xlabel('X Coordinate') 73 | plt.ylabel('Y Coordinate') 74 | plt.grid(True) 75 | plt.tight_layout() 76 | plt.show() 77 | --------------------------------------------------------------------------------