├── Bias_Detection_Results_Simulated.xlsx ├── README.md ├── first code └── bias_detection_recruitment.py /Bias_Detection_Results_Simulated.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Okes2024/Bias-Detection-in-Recruitment-Using-ML/HEAD/Bias_Detection_Results_Simulated.xlsx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Bias Detection in Recruitment Using Machine Learning 2 | 3 | This project demonstrates how to detect potential bias in recruitment processes using machine learning and fairness metrics. 4 | 5 | ## Project Overview 6 | 7 | We use a recruitment dataset containing features like: 8 | - Gender 9 | - Age 10 | - Education Level 11 | - Experience Years 12 | - Hiring Decision (Hired or Not) 13 | 14 | The workflow includes: 15 | 1. Preprocessing the data (One-Hot Encoding for categorical features). 16 | 2. Training a Random Forest Classifier to predict hiring decisions. 17 | 3. Analyzing feature importances to understand key drivers. 18 | 4. Detecting bias using: 19 | - Hiring Rate by Gender 20 | - Disparate Impact (80% Rule) 21 | - Statistical Parity Difference (SPD) 22 | 5. Exporting the results into an Excel file with multiple sheets. 23 | 24 | ## Files Generated 25 | 26 | - `Bias_Detection_Results_With_Predictions.xlsx` — Excel file containing: 27 | - **Hiring Rates** by gender 28 | - **Bias Metrics** summary (Selection Rates, Disparate Impact, SPD) 29 | - **Feature Importances** (model feature rankings) 30 | - **Predictions** (actual and predicted hiring decisions) 31 | 32 | ## How to Use 33 | 34 | 1. **Prepare Dataset**: Ensure your dataset has the following columns: 35 | - `gender` (0 = Male, 1 = Female) 36 | - `age` (numeric) 37 | - `education_level` (categorical: Bachelors, Masters, PhD, etc.) 38 | - `experience_years` (numeric) 39 | - `hired` (0 = Not Hired, 1 = Hired) 40 | 41 | 2. **Install the required Python libraries**: 42 | ```bash 43 | pip install pandas scikit-learn matplotlib openpyxl 44 | ``` 45 | 46 | 3. **Run the script**: 47 | - Load your recruitment dataset. 48 | - Run the Python script. 49 | - Check the generated Excel file for the results. 50 | 51 | 4. **Check the generated Excel file**: 52 | - Open `Bias_Detection_Results_With_Predictions.xlsx`. 53 | - Review the hiring rates, bias metrics, feature importances, and prediction results. 54 | 55 | ## Key Fairness Metrics 56 | 57 | - **Disparate Impact (DI)**: Ratio of selection rates between protected and unprotected groups. 58 | - DI < 0.8 or DI > 1.25 indicates potential bias (based on the 80% rule). 59 | 60 | - **Statistical Parity Difference (SPD)**: Difference in selection rates between protected and unprotected groups. 61 | - SPD close to 0 indicates fairness. 62 | 63 | ## Author 64 | 65 | © 2025 Okes Imoni 66 | -------------------------------------------------------------------------------- /first code: -------------------------------------------------------------------------------- 1 | # Bias Detection in Recruitment Using Machine Learning 2 | 3 | import pandas as pd 4 | import numpy as np 5 | from sklearn.model_selection import train_test_split 6 | from sklearn.ensemble import RandomForestClassifier 7 | from sklearn.metrics import classification_report 8 | import matplotlib.pyplot as plt 9 | import seaborn as sns 10 | 11 | # Example: Load Dataset 12 | # Assume the dataset has these columns: 'gender', 'age', 'education_level', 'experience_years', 'hired' 13 | # Gender: 0 = Male, 1 = Female; Hired: 0 = Not hired, 1 = Hired 14 | 15 | df = pd.read_csv('recruitment_data.csv') 16 | 17 | # Preprocessing 18 | # Let's say 'education_level' is categorical 19 | df = pd.get_dummies(df, columns=['education_level'], drop_first=True) 20 | 21 | # Features and Target 22 | X = df.drop('hired', axis=1) 23 | y = df['hired'] 24 | 25 | # Train/Test Split 26 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) 27 | 28 | # Train Model 29 | model = RandomForestClassifier(random_state=42) 30 | model.fit(X_train, y_train) 31 | 32 | # Predictions 33 | y_pred = model.predict(X_test) 34 | 35 | # Model Evaluation 36 | print(classification_report(y_test, y_pred)) 37 | 38 | # Feature Importance 39 | feature_importances = pd.Series(model.feature_importances_, index=X.columns) 40 | feature_importances.sort_values().plot(kind='barh', figsize=(10,6)) 41 | plt.title('Feature Importances') 42 | plt.show() 43 | 44 | # ========================= 45 | # BIAS DETECTION SECTION 46 | # ========================= 47 | 48 | # Check Hiring Rate by Gender 49 | hiring_by_gender = df.groupby('gender')['hired'].mean() 50 | print("Hiring Rate by Gender:\n", hiring_by_gender) 51 | 52 | # Calculate Disparate Impact 53 | # 1 = Female, 0 = Male 54 | protected = df[df['gender'] == 1] 55 | unprotected = df[df['gender'] == 0] 56 | 57 | selection_rate_protected = protected['hired'].mean() 58 | selection_rate_unprotected = unprotected['hired'].mean() 59 | 60 | disparate_impact = selection_rate_protected / selection_rate_unprotected 61 | print(f"Disparate Impact (Female/Male): {disparate_impact:.2f}") 62 | 63 | # Threshold interpretation (80% rule) 64 | if disparate_impact < 0.8 or disparate_impact > 1.25: 65 | print("⚠️ Potential Bias Detected (Fails 80% Rule)") 66 | else: 67 | print("✅ No significant bias detected (Passes 80% Rule)") 68 | 69 | # Visualize Hiring Rates 70 | hiring_by_gender.plot(kind='bar', color=['blue', 'pink']) 71 | plt.xticks([0, 1], ['Male', 'Female'], rotation=0) 72 | plt.ylabel('Hiring Rate') 73 | plt.title('Hiring Rate by Gender') 74 | plt.show() 75 | 76 | # (Optional) Statistical Parity Difference 77 | spd = selection_rate_protected - selection_rate_unprotected 78 | print(f"Statistical Parity Difference: {spd:.2f}") 79 | -------------------------------------------------------------------------------- /bias_detection_recruitment.py: -------------------------------------------------------------------------------- 1 | # Bias Detection in Recruitment Using Machine Learning 2 | 3 | import pandas as pd 4 | import numpy as np 5 | from sklearn.model_selection import train_test_split 6 | from sklearn.ensemble import RandomForestClassifier 7 | from sklearn.metrics import classification_report 8 | import matplotlib.pyplot as plt 9 | import seaborn as sns 10 | 11 | # Example: Load Dataset 12 | # Assume the dataset has these columns: 'gender', 'age', 'education_level', 'experience_years', 'hired' 13 | # Gender: 0 = Male, 1 = Female; Hired: 0 = Not hired, 1 = Hired 14 | 15 | df = pd.read_csv('recruitment_data.csv') 16 | 17 | # Preprocessing 18 | # Let's say 'education_level' is categorical 19 | df = pd.get_dummies(df, columns=['education_level'], drop_first=True) 20 | 21 | # Features and Target 22 | X = df.drop('hired', axis=1) 23 | y = df['hired'] 24 | 25 | # Train/Test Split 26 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) 27 | 28 | # Train Model 29 | model = RandomForestClassifier(random_state=42) 30 | model.fit(X_train, y_train) 31 | 32 | # Predictions 33 | y_pred = model.predict(X_test) 34 | 35 | # Model Evaluation 36 | print(classification_report(y_test, y_pred)) 37 | 38 | # Feature Importance 39 | feature_importances = pd.Series(model.feature_importances_, index=X.columns) 40 | feature_importances.sort_values().plot(kind='barh', figsize=(10,6)) 41 | plt.title('Feature Importances') 42 | plt.show() 43 | 44 | # ========================= 45 | # BIAS DETECTION SECTION 46 | # ========================= 47 | 48 | # Check Hiring Rate by Gender 49 | hiring_by_gender = df.groupby('gender')['hired'].mean() 50 | print("Hiring Rate by Gender:\n", hiring_by_gender) 51 | 52 | # Calculate Disparate Impact 53 | # 1 = Female, 0 = Male 54 | protected = df[df['gender'] == 1] 55 | unprotected = df[df['gender'] == 0] 56 | 57 | selection_rate_protected = protected['hired'].mean() 58 | selection_rate_unprotected = unprotected['hired'].mean() 59 | 60 | disparate_impact = selection_rate_protected / selection_rate_unprotected 61 | print(f"Disparate Impact (Female/Male): {disparate_impact:.2f}") 62 | 63 | # Threshold interpretation (80% rule) 64 | if disparate_impact < 0.8 or disparate_impact > 1.25: 65 | print("⚠️ Potential Bias Detected (Fails 80% Rule)") 66 | else: 67 | print("✅ No significant bias detected (Passes 80% Rule)") 68 | 69 | # Visualize Hiring Rates 70 | hiring_by_gender.plot(kind='bar', color=['blue', 'pink']) 71 | plt.xticks([0, 1], ['Male', 'Female'], rotation=0) 72 | plt.ylabel('Hiring Rate') 73 | plt.title('Hiring Rate by Gender') 74 | plt.show() 75 | 76 | # (Optional) Statistical Parity Difference 77 | spd = selection_rate_protected - selection_rate_unprotected 78 | print(f"Statistical Parity Difference: {spd:.2f}") --------------------------------------------------------------------------------