├── Bias_Detection_Results_Simulated.xlsx
├── README.md
├── first code
└── bias_detection_recruitment.py


/Bias_Detection_Results_Simulated.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Okes2024/Bias-Detection-in-Recruitment-Using-ML/HEAD/Bias_Detection_Results_Simulated.xlsx


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Bias Detection in Recruitment Using Machine Learning
 2 | 
 3 | This project demonstrates how to detect potential bias in recruitment processes using machine learning and fairness metrics.
 4 | 
 5 | ## Project Overview
 6 | 
 7 | We use a recruitment dataset containing features like:
 8 | - Gender
 9 | - Age
10 | - Education Level
11 | - Experience Years
12 | - Hiring Decision (Hired or Not)
13 | 
14 | The workflow includes:
15 | 1. Preprocessing the data (One-Hot Encoding for categorical features).
16 | 2. Training a Random Forest Classifier to predict hiring decisions.
17 | 3. Analyzing feature importances to understand key drivers.
18 | 4. Detecting bias using:
19 |    - Hiring Rate by Gender
20 |    - Disparate Impact (80% Rule)
21 |    - Statistical Parity Difference (SPD)
22 | 5. Exporting the results into an Excel file with multiple sheets.
23 | 
24 | ## Files Generated
25 | 
26 | - `Bias_Detection_Results_With_Predictions.xlsx` — Excel file containing:
27 |   - **Hiring Rates** by gender
28 |   - **Bias Metrics** summary (Selection Rates, Disparate Impact, SPD)
29 |   - **Feature Importances** (model feature rankings)
30 |   - **Predictions** (actual and predicted hiring decisions)
31 | 
32 | ## How to Use
33 | 
34 | 1. **Prepare Dataset**: Ensure your dataset has the following columns:
35 |     - `gender` (0 = Male, 1 = Female)
36 |     - `age` (numeric)
37 |     - `education_level` (categorical: Bachelors, Masters, PhD, etc.)
38 |     - `experience_years` (numeric)
39 |     - `hired` (0 = Not Hired, 1 = Hired)
40 | 
41 | 2. **Install the required Python libraries**:
42 | ```bash
43 | pip install pandas scikit-learn matplotlib openpyxl
44 | ```
45 | 
46 | 3. **Run the script**:
47 | - Load your recruitment dataset.
48 | - Run the Python script.
49 | - Check the generated Excel file for the results.
50 | 
51 | 4. **Check the generated Excel file**:
52 | - Open `Bias_Detection_Results_With_Predictions.xlsx`.
53 | - Review the hiring rates, bias metrics, feature importances, and prediction results.
54 | 
55 | ## Key Fairness Metrics
56 | 
57 | - **Disparate Impact (DI)**: Ratio of selection rates between protected and unprotected groups.
58 |   - DI < 0.8 or DI > 1.25 indicates potential bias (based on the 80% rule).
59 | 
60 | - **Statistical Parity Difference (SPD)**: Difference in selection rates between protected and unprotected groups.
61 |   - SPD close to 0 indicates fairness.
62 | 
63 | ## Author
64 | 
65 | © 2025 Okes Imoni
66 | 


--------------------------------------------------------------------------------
/first code:
--------------------------------------------------------------------------------
 1 | # Bias Detection in Recruitment Using Machine Learning
 2 | 
 3 | import pandas as pd
 4 | import numpy as np
 5 | from sklearn.model_selection import train_test_split
 6 | from sklearn.ensemble import RandomForestClassifier
 7 | from sklearn.metrics import classification_report
 8 | import matplotlib.pyplot as plt
 9 | import seaborn as sns
10 | 
11 | # Example: Load Dataset
12 | # Assume the dataset has these columns: 'gender', 'age', 'education_level', 'experience_years', 'hired'
13 | # Gender: 0 = Male, 1 = Female; Hired: 0 = Not hired, 1 = Hired
14 | 
15 | df = pd.read_csv('recruitment_data.csv')
16 | 
17 | # Preprocessing
18 | # Let's say 'education_level' is categorical
19 | df = pd.get_dummies(df, columns=['education_level'], drop_first=True)
20 | 
21 | # Features and Target
22 | X = df.drop('hired', axis=1)
23 | y = df['hired']
24 | 
25 | # Train/Test Split
26 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
27 | 
28 | # Train Model
29 | model = RandomForestClassifier(random_state=42)
30 | model.fit(X_train, y_train)
31 | 
32 | # Predictions
33 | y_pred = model.predict(X_test)
34 | 
35 | # Model Evaluation
36 | print(classification_report(y_test, y_pred))
37 | 
38 | # Feature Importance
39 | feature_importances = pd.Series(model.feature_importances_, index=X.columns)
40 | feature_importances.sort_values().plot(kind='barh', figsize=(10,6))
41 | plt.title('Feature Importances')
42 | plt.show()
43 | 
44 | # =========================
45 | # BIAS DETECTION SECTION
46 | # =========================
47 | 
48 | # Check Hiring Rate by Gender
49 | hiring_by_gender = df.groupby('gender')['hired'].mean()
50 | print("Hiring Rate by Gender:\n", hiring_by_gender)
51 | 
52 | # Calculate Disparate Impact
53 | # 1 = Female, 0 = Male
54 | protected = df[df['gender'] == 1]
55 | unprotected = df[df['gender'] == 0]
56 | 
57 | selection_rate_protected = protected['hired'].mean()
58 | selection_rate_unprotected = unprotected['hired'].mean()
59 | 
60 | disparate_impact = selection_rate_protected / selection_rate_unprotected
61 | print(f"Disparate Impact (Female/Male): {disparate_impact:.2f}")
62 | 
63 | # Threshold interpretation (80% rule)
64 | if disparate_impact < 0.8 or disparate_impact > 1.25:
65 |     print("⚠️ Potential Bias Detected (Fails 80% Rule)")
66 | else:
67 |     print("✅ No significant bias detected (Passes 80% Rule)")
68 | 
69 | # Visualize Hiring Rates
70 | hiring_by_gender.plot(kind='bar', color=['blue', 'pink'])
71 | plt.xticks([0, 1], ['Male', 'Female'], rotation=0)
72 | plt.ylabel('Hiring Rate')
73 | plt.title('Hiring Rate by Gender')
74 | plt.show()
75 | 
76 | # (Optional) Statistical Parity Difference
77 | spd = selection_rate_protected - selection_rate_unprotected
78 | print(f"Statistical Parity Difference: {spd:.2f}")
79 | 


--------------------------------------------------------------------------------
/bias_detection_recruitment.py:
--------------------------------------------------------------------------------
 1 | # Bias Detection in Recruitment Using Machine Learning
 2 | 
 3 | import pandas as pd
 4 | import numpy as np
 5 | from sklearn.model_selection import train_test_split
 6 | from sklearn.ensemble import RandomForestClassifier
 7 | from sklearn.metrics import classification_report
 8 | import matplotlib.pyplot as plt
 9 | import seaborn as sns
10 | 
11 | # Example: Load Dataset
12 | # Assume the dataset has these columns: 'gender', 'age', 'education_level', 'experience_years', 'hired'
13 | # Gender: 0 = Male, 1 = Female; Hired: 0 = Not hired, 1 = Hired
14 | 
15 | df = pd.read_csv('recruitment_data.csv')
16 | 
17 | # Preprocessing
18 | # Let's say 'education_level' is categorical
19 | df = pd.get_dummies(df, columns=['education_level'], drop_first=True)
20 | 
21 | # Features and Target
22 | X = df.drop('hired', axis=1)
23 | y = df['hired']
24 | 
25 | # Train/Test Split
26 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
27 | 
28 | # Train Model
29 | model = RandomForestClassifier(random_state=42)
30 | model.fit(X_train, y_train)
31 | 
32 | # Predictions
33 | y_pred = model.predict(X_test)
34 | 
35 | # Model Evaluation
36 | print(classification_report(y_test, y_pred))
37 | 
38 | # Feature Importance
39 | feature_importances = pd.Series(model.feature_importances_, index=X.columns)
40 | feature_importances.sort_values().plot(kind='barh', figsize=(10,6))
41 | plt.title('Feature Importances')
42 | plt.show()
43 | 
44 | # =========================
45 | # BIAS DETECTION SECTION
46 | # =========================
47 | 
48 | # Check Hiring Rate by Gender
49 | hiring_by_gender = df.groupby('gender')['hired'].mean()
50 | print("Hiring Rate by Gender:\n", hiring_by_gender)
51 | 
52 | # Calculate Disparate Impact
53 | # 1 = Female, 0 = Male
54 | protected = df[df['gender'] == 1]
55 | unprotected = df[df['gender'] == 0]
56 | 
57 | selection_rate_protected = protected['hired'].mean()
58 | selection_rate_unprotected = unprotected['hired'].mean()
59 | 
60 | disparate_impact = selection_rate_protected / selection_rate_unprotected
61 | print(f"Disparate Impact (Female/Male): {disparate_impact:.2f}")
62 | 
63 | # Threshold interpretation (80% rule)
64 | if disparate_impact < 0.8 or disparate_impact > 1.25:
65 |     print("⚠️ Potential Bias Detected (Fails 80% Rule)")
66 | else:
67 |     print("✅ No significant bias detected (Passes 80% Rule)")
68 | 
69 | # Visualize Hiring Rates
70 | hiring_by_gender.plot(kind='bar', color=['blue', 'pink'])
71 | plt.xticks([0, 1], ['Male', 'Female'], rotation=0)
72 | plt.ylabel('Hiring Rate')
73 | plt.title('Hiring Rate by Gender')
74 | plt.show()
75 | 
76 | # (Optional) Statistical Parity Difference
77 | spd = selection_rate_protected - selection_rate_unprotected
78 | print(f"Statistical Parity Difference: {spd:.2f}")


--------------------------------------------------------------------------------