├── biodiversity_hotspot_clusters.xlsx
├── README.md
├── file
└── Biodiversity_Hotspot_Clustering.py


/biodiversity_hotspot_clusters.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Nelvinebi/Biodiversity-Hotspot-Prioritization-Using-Clustering-Techniques/HEAD/biodiversity_hotspot_clusters.xlsx


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 📌 Overview
 2 | This project applies clustering algorithms to prioritize biodiversity hotspots based on synthetic environmental and species distribution data. It aims to support conservation planning by grouping areas with similar biodiversity characteristics and identifying regions that require urgent protection.
 3 | 
 4 | 🎯 Objectives
 5 | Simulate biodiversity data for different regions.
 6 | 
 7 | Apply clustering techniques (e.g., K-Means, Hierarchical Clustering, DBSCAN).
 8 | 
 9 | Identify high-priority biodiversity hotspots for conservation.
10 | 
11 | Visualize results for decision-making support.
12 | 
13 | 📂 Project Structure
14 | plaintext
15 | Copy
16 | Edit
17 | Biodiversity-Hotspot-Prioritization-Using-Clustering-Techniques/
18 | │
19 | ├── data/                # Synthetic dataset (CSV/Excel format)
20 | ├── notebooks/           # Jupyter notebooks for data analysis & clustering
21 | ├── src/                 # Python scripts for data preprocessing, clustering, and visualization
22 | ├── results/             # Output graphs, clustering maps, and reports
23 | ├── README.md            # Project documentation
24 | └── requirements.txt     # Python dependencies
25 | 🛠️ Technologies Used
26 | Python (pandas, numpy, matplotlib, seaborn, scikit-learn)
27 | 
28 | Jupyter Notebook for analysis and visualization
29 | 
30 | Excel/CSV for synthetic data storage
31 | 
32 | 📊 Methodology
33 | Data Generation – Create synthetic biodiversity data including species richness, threat levels, and environmental variables.
34 | 
35 | Preprocessing – Handle missing values, normalize data, and select relevant features.
36 | 
37 | Clustering – Use algorithms such as K-Means, Hierarchical Clustering, and DBSCAN to group regions.
38 | 
39 | Evaluation – Analyze clustering performance using metrics like Silhouette Score.
40 | 
41 | Visualization – Plot clusters to highlight biodiversity hotspot zones.
42 | 
43 | 📈 Example Output
44 | Clustered hotspot maps
45 | 
46 | Priority ranking tables
47 | 
48 | Silhouette score charts
49 | 
50 | 🚀 Installation & Usage
51 | Clone this repository:
52 | 
53 | bash
54 | Copy
55 | Edit
56 | git clone https://github.com/yourusername/Biodiversity-Hotspot-Prioritization-Using-Clustering-Techniques.git
57 | cd Biodiversity-Hotspot-Prioritization-Using-Clustering-Techniques
58 | Install dependencies:
59 | 
60 | bash
61 | Copy
62 | Edit
63 | pip install -r requirements.txt
64 | Run the Jupyter notebook:
65 | 
66 | bash
67 | Copy
68 | Edit
69 | jupyter notebook notebooks/Biodiversity_Hotspot_Prioritization.ipynb
70 | 📜 License
71 | This project is licensed under the MIT License – feel free to use and adapt with proper attribution.
72 | 
73 | ✨ Author
74 | Ebingiye Nelvin Agbozu
75 | *nelvinebingiye@gmail.com*
76 | Environmental Science & Data Analysis Enthusiast
77 | 


--------------------------------------------------------------------------------
/file:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import pandas as pd
 3 | import matplotlib.pyplot as plt
 4 | import seaborn as sns
 5 | from sklearn.cluster import KMeans
 6 | from sklearn.preprocessing import StandardScaler
 7 | from sklearn.decomposition import PCA
 8 | 
 9 | # 1. Generate Synthetic Biodiversity Data
10 | np.random.seed(42)
11 | n_points = 200  # ≥150 data points
12 | 
13 | # Spatial coordinates
14 | x_coord = np.random.uniform(0, 100, n_points)
15 | y_coord = np.random.uniform(0, 100, n_points)
16 | 
17 | # Ecological features
18 | species_richness = np.random.normal(loc=120, scale=30, size=n_points)     # number of species
19 | endemic_species = np.random.normal(loc=30, scale=10, size=n_points)       # number of endemic species
20 | threatened_species = np.random.normal(loc=15, scale=5, size=n_points)     # number of threatened species
21 | habitat_quality = np.random.uniform(0.4, 1.0, n_points)                    # 0 (poor) to 1 (pristine)
22 | 
23 | # Clip any negative values due to noise
24 | species_richness = np.clip(species_richness, 10, None)
25 | endemic_species = np.clip(endemic_species, 0, None)
26 | threatened_species = np.clip(threatened_species, 0, None)
27 | 
28 | # Create DataFrame
29 | df = pd.DataFrame({
30 |     'x_coord': x_coord,
31 |     'y_coord': y_coord,
32 |     'species_richness': species_richness,
33 |     'endemic_species': endemic_species,
34 |     'threatened_species': threatened_species,
35 |     'habitat_quality': habitat_quality
36 | })
37 | 
38 | # 2. Scale features before clustering
39 | features = ['species_richness', 'endemic_species', 'threatened_species', 'habitat_quality']
40 | scaler = StandardScaler()
41 | scaled_features = scaler.fit_transform(df[features])
42 | 
43 | # 3. K-Means Clustering
44 | kmeans = KMeans(n_clusters=3, random_state=42)
45 | df['hotspot_cluster'] = kmeans.fit_predict(scaled_features)
46 | 
47 | # Optional: Assign priority labels (e.g., 0 = Low, 1 = Medium, 2 = High)
48 | priority_labels = {0: 'Medium', 1: 'Low', 2: 'High'}
49 | df['priority'] = df['hotspot_cluster'].map(lambda x: priority_labels.get(x, 'Unknown'))
50 | 
51 | # 4. PCA for 2D visualization (optional)
52 | pca = PCA(n_components=2)
53 | df[['pca1', 'pca2']] = pca.fit_transform(scaled_features)
54 | 
55 | # 5. Output Summary
56 | print(df.head())
57 | 
58 | # 6. Visualization: Biodiversity Hotspot Clusters (PCA view)
59 | plt.figure(figsize=(8, 6))
60 | sns.scatterplot(data=df, x='pca1', y='pca2', hue='priority', palette='Set1', s=60)
61 | plt.title('Biodiversity Hotspot Prioritization (PCA View)')
62 | plt.xlabel('Principal Component 1')
63 | plt.ylabel('Principal Component 2')
64 | plt.legend(title='Priority Level')
65 | plt.tight_layout()
66 | plt.show()
67 | 
68 | # 7. Spatial Map
69 | plt.figure(figsize=(8, 6))
70 | sns.scatterplot(data=df, x='x_coord', y='y_coord', hue='priority', palette='Set2', s=60)
71 | plt.title('Spatial Distribution of Biodiversity Priority Areas')
72 | plt.xlabel('X Coordinate')
73 | plt.ylabel('Y Coordinate')
74 | plt.grid(True)
75 | plt.tight_layout()
76 | plt.show()
77 | 


--------------------------------------------------------------------------------
/Biodiversity_Hotspot_Clustering.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import pandas as pd
 3 | import matplotlib.pyplot as plt
 4 | import seaborn as sns
 5 | from sklearn.cluster import KMeans
 6 | from sklearn.preprocessing import StandardScaler
 7 | from sklearn.decomposition import PCA
 8 | 
 9 | # 1. Generate Synthetic Biodiversity Data
10 | np.random.seed(42)
11 | n_points = 200  # ≥150 data points
12 | 
13 | # Spatial coordinates
14 | x_coord = np.random.uniform(0, 100, n_points)
15 | y_coord = np.random.uniform(0, 100, n_points)
16 | 
17 | # Ecological features
18 | species_richness = np.random.normal(loc=120, scale=30, size=n_points)     # number of species
19 | endemic_species = np.random.normal(loc=30, scale=10, size=n_points)       # number of endemic species
20 | threatened_species = np.random.normal(loc=15, scale=5, size=n_points)     # number of threatened species
21 | habitat_quality = np.random.uniform(0.4, 1.0, n_points)                    # 0 (poor) to 1 (pristine)
22 | 
23 | # Clip any negative values due to noise
24 | species_richness = np.clip(species_richness, 10, None)
25 | endemic_species = np.clip(endemic_species, 0, None)
26 | threatened_species = np.clip(threatened_species, 0, None)
27 | 
28 | # Create DataFrame
29 | df = pd.DataFrame({
30 |     'x_coord': x_coord,
31 |     'y_coord': y_coord,
32 |     'species_richness': species_richness,
33 |     'endemic_species': endemic_species,
34 |     'threatened_species': threatened_species,
35 |     'habitat_quality': habitat_quality
36 | })
37 | 
38 | # 2. Scale features before clustering
39 | features = ['species_richness', 'endemic_species', 'threatened_species', 'habitat_quality']
40 | scaler = StandardScaler()
41 | scaled_features = scaler.fit_transform(df[features])
42 | 
43 | # 3. K-Means Clustering
44 | kmeans = KMeans(n_clusters=3, random_state=42)
45 | df['hotspot_cluster'] = kmeans.fit_predict(scaled_features)
46 | 
47 | # Optional: Assign priority labels (e.g., 0 = Low, 1 = Medium, 2 = High)
48 | priority_labels = {0: 'Medium', 1: 'Low', 2: 'High'}
49 | df['priority'] = df['hotspot_cluster'].map(lambda x: priority_labels.get(x, 'Unknown'))
50 | 
51 | # 4. PCA for 2D visualization (optional)
52 | pca = PCA(n_components=2)
53 | df[['pca1', 'pca2']] = pca.fit_transform(scaled_features)
54 | 
55 | # 5. Output Summary
56 | print(df.head())
57 | 
58 | # 6. Visualization: Biodiversity Hotspot Clusters (PCA view)
59 | plt.figure(figsize=(8, 6))
60 | sns.scatterplot(data=df, x='pca1', y='pca2', hue='priority', palette='Set1', s=60)
61 | plt.title('Biodiversity Hotspot Prioritization (PCA View)')
62 | plt.xlabel('Principal Component 1')
63 | plt.ylabel('Principal Component 2')
64 | plt.legend(title='Priority Level')
65 | plt.tight_layout()
66 | plt.show()
67 | 
68 | # 7. Spatial Map
69 | plt.figure(figsize=(8, 6))
70 | sns.scatterplot(data=df, x='x_coord', y='y_coord', hue='priority', palette='Set2', s=60)
71 | plt.title('Spatial Distribution of Biodiversity Priority Areas')
72 | plt.xlabel('X Coordinate')
73 | plt.ylabel('Y Coordinate')
74 | plt.grid(True)
75 | plt.tight_layout()
76 | plt.show()
77 | 


--------------------------------------------------------------------------------