├── README.md └── project_final.py /README.md: -------------------------------------------------------------------------------- 1 | # INT-375-PYTHON-PROJECT-CRIME-MAPPING 2 | 3 | # 🕵️‍♂️ Crime Mapping using Python 4 | 5 | A data-driven project focused on analyzing and visualizing crime data to identify patterns, hotspots, and demographic insights using Python. 6 | 7 | ## 📌 Project Overview 8 | 9 | This project performs a comprehensive **Exploratory Data Analysis (EDA)** on real-world crime data from **Los Angeles (2020–present)**. The goal is to uncover trends based on time, location, crime type, victim demographics, and resolution status using Python’s data science libraries. 10 | 11 | ## 📁 Files in the Repository 12 | 13 | - `project_final.py` – Main Python script with full EDA pipeline and visualizations. 14 | - `cleaned_python_dataset_ca.xlsx` – Cleaned version of the original crime dataset. 15 | - `PYTHON_REPORT_12318105_2.pdf` – Full academic report for the project. 16 | - `Updated_Conclusion_Crime_Project.docx` – Final updated conclusion section (optional). 17 | - `README.md` – Project overview and usage instructions. 18 | 19 | ## 🧠 Key Insights 20 | 21 | - **Evening hours** (6 PM – 12 AM) show a spike in crime. 22 | - **Summer months** have slightly higher crime rates. 23 | - Areas like **77th Street**, **Southwest**, and **Northeast** are the most crime-prone. 24 | - **Theft**, **burglary**, and **vehicle-related crimes** dominate. 25 | - **Victims aged 20–40** are most affected, with a slight majority being male. 26 | - Many cases remain **under investigation**, especially theft-related crimes. 27 | 28 | ## 🔧 Tools & Libraries Used 29 | 30 | - `pandas` – Data cleaning and manipulation 31 | - `numpy` – Numerical analysis 32 | - `matplotlib` & `seaborn` – Data visualization 33 | - `Jupyter Notebook` / `Python 3.x` 34 | 35 | ## 📊 Project Features 36 | 37 | - Clean and preprocess real-world crime data 38 | - Visualize time-based crime trends 39 | - Identify high-crime areas using area names 40 | - Analyze top crime types and weapon usage 41 | - Explore victim demographics (age, gender, ethnicity) 42 | - Assess case resolution status 43 | 44 | ## 🚀 How to Run the Project 45 | 46 | 1. Clone this repo: 47 | ```bash 48 | git clone https://github.com/your-username/Crime-Mapping-Python.git 49 | cd Crime-Mapping-Python 50 | -------------------------------------------------------------------------------- /project_final.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | import seaborn as sns 5 | 6 | # Set styles and defaults 7 | sns.set_style(style='whitegrid') 8 | plt.rcParams['figure.figsize'] = (10, 6) 9 | 10 | # Load and Clean Dataset 11 | print("\n📥 Loading and Cleaning Dataset...") 12 | df = pd.read_excel("cleaned_python_dataset_ca.xlsx") 13 | df.columns = df.columns.str.strip().str.upper().str.replace(" ", "_") 14 | 15 | # Data Cleaning 16 | df['DATE_OCC'] = pd.to_datetime(df['DATE_OCC'], errors='coerce') 17 | df['DATE_RPTD'] = pd.to_datetime(df['DATE_RPTD'], errors='coerce') 18 | df['TIME_OCC'] = df['TIME_OCC'].astype(str).str.zfill(4) 19 | df['HOUR'] = df['TIME_OCC'].str[:2].astype(int) 20 | df['YEAR'] = df['DATE_OCC'].dt.year 21 | df['MONTH'] = df['DATE_OCC'].dt.month 22 | df = df.dropna(subset=['LAT', 'LON']) 23 | 24 | # Precompute cleaned group data for later plots 25 | crime_by_month = df.groupby(['YEAR', 'MONTH']).size().reset_index(name='INCIDENTS') 26 | crime_by_month['DATE'] = pd.to_datetime(crime_by_month[['YEAR', 'MONTH']].assign(DAY=1)) 27 | heat_data = df.groupby(['YEAR', 'MONTH']).size().unstack(fill_value=0) 28 | 29 | area_crime = df['AREA_NAME'].value_counts().head(10) 30 | top_crimes = df['CRM_CD_DESC'].value_counts().head(10) 31 | weapon_usage = df['WEAPON_DESC'].value_counts().head(10) 32 | gender_count = df['VICT_SEX'].value_counts() 33 | ethnicity = df['VICT_DESCENT'].value_counts().head(10) 34 | status_counts = df['STATUS_DESC'].value_counts() 35 | 36 | top_types = df['CRM_CD_DESC'].value_counts().nlargest(5).index 37 | status_by_type = df[df['CRM_CD_DESC'].isin(top_types)].groupby(['CRM_CD_DESC', 'STATUS_DESC']).size().unstack() 38 | status_by_type = status_by_type.fillna(0) 39 | 40 | output_path = r"C:\Users\mayba\OneDrive\Desktop\Python Project\cleaned2_python_dataset_ca.xlsx" 41 | df.to_excel(output_path, index=False) 42 | 43 | 44 | # Basic EDA 45 | print("\n🔎 Dataset Overview") 46 | print(df) 47 | print("\n🔎 Head of the dataset") 48 | print(df.head()) 49 | print("\n🔎 Tail of the dataset") 50 | print(df.tail()) 51 | print("\n🔎 Summary Statistics") 52 | print(df.describe()) 53 | print("\n🔎 Info") 54 | print(df.info()) 55 | print("\n🔎 Column Names") 56 | print(df.columns) 57 | print("\n🔎 Shape of Dataset") 58 | print(df.shape) 59 | print("\n🔎 Null Values") 60 | print(df.isnull().sum()) 61 | 62 | # Correlation & Covariance 63 | correlation = df.corr(numeric_only=True) 64 | print("\n📊 Correlation Matrix") 65 | print(correlation) 66 | 67 | covariance = df.cov(numeric_only=True) 68 | print("\n📊 Covariance Matrix") 69 | print(covariance) 70 | 71 | plt.figure() 72 | sns.heatmap(correlation, annot=True, cmap="Blues", linewidths=0.5, fmt=".2f") 73 | plt.title("Correlation Heatmap") 74 | plt.tight_layout() 75 | plt.show() 76 | 77 | # 1. Crime Distribution and Trends Over Time 78 | print("\n📈 Answer 1: Crime Distribution and Trends Over Time") 79 | 80 | plt.figure() 81 | sns.lineplot(data=crime_by_month, x='DATE', y='INCIDENTS') 82 | plt.title('Crime Incidents Over Time') 83 | plt.xlabel('Date') 84 | plt.ylabel('Number of Crimes') 85 | plt.xticks(rotation=45) 86 | plt.tight_layout() 87 | plt.show() 88 | 89 | plt.figure() 90 | sns.histplot(df['HOUR'], bins=24, kde=True) 91 | plt.title('Crime Frequency by Hour of the Day') 92 | plt.xlabel('Hour') 93 | plt.ylabel('Number of Crimes') 94 | plt.xticks(range(0, 24)) 95 | plt.tight_layout() 96 | plt.show() 97 | 98 | plt.figure() 99 | sns.heatmap(heat_data, cmap="Reds") 100 | plt.title('Seasonal Crime Heatmap (Month vs Year)') 101 | plt.tight_layout() 102 | plt.show() 103 | 104 | # 2. Geographic Crime Analysis (Crime Hotspots) 105 | print("\n📍 Answer 2: Geographic Crime Analysis (Crime Hotspots)") 106 | 107 | plt.figure() 108 | sns.barplot(x=area_crime.values, y=area_crime.index) 109 | plt.title('Top 10 High-Crime Areas') 110 | plt.xlabel('Number of Crimes') 111 | plt.tight_layout() 112 | plt.show() 113 | 114 | # 3. Crime Type Analysis 115 | print("\n🔍 Answer 3: Crime Type Analysis") 116 | 117 | plt.figure() 118 | sns.barplot(x=top_crimes.values, y=top_crimes.index, palette='magma') 119 | plt.title('Top 10 Crime Types') 120 | plt.xlabel('Frequency') 121 | plt.tight_layout() 122 | plt.show() 123 | 124 | plt.figure() 125 | sns.barplot(x=weapon_usage.values, y=weapon_usage.index, palette='coolwarm') 126 | plt.title('Top 10 Weapons Used') 127 | plt.xlabel('Frequency') 128 | plt.tight_layout() 129 | plt.show() 130 | 131 | # 4. Victim Demographics Breakdown 132 | print("\n🧍 Answer 4: Victim Demographics Breakdown") 133 | 134 | # Filter out unrealistic ages (e.g., <0 or 100+) 135 | df = df[(df['VICT_AGE'] > 0) & (df['VICT_AGE'] < 100)] 136 | 137 | plt.figure() 138 | sns.histplot(df['VICT_AGE'].dropna(), bins=20, kde=True, color='purple') 139 | plt.title('Victim Age Distribution') 140 | plt.xlabel('Age') 141 | plt.ylabel('Number of Victims') 142 | plt.tight_layout() 143 | plt.show() 144 | 145 | plt.figure() 146 | sns.barplot(x=gender_count.index, y=gender_count.values, palette='pastel') 147 | plt.title('Gender Distribution of Victims') 148 | plt.xlabel('Gender') 149 | plt.ylabel('Number of Victims') 150 | plt.tight_layout() 151 | plt.show() 152 | 153 | plt.figure() 154 | sns.barplot(x=ethnicity.values, y=ethnicity.index, palette='BuGn_r') 155 | plt.title('Top 10 Affected Ethnic Groups') 156 | plt.xlabel('Number of Victims') 157 | plt.tight_layout() 158 | plt.show() 159 | 160 | # 5. Crime Resolution Status Analysis 161 | print("\n✅ Answer 5: Crime Resolution Status Analysis") 162 | 163 | plt.figure() 164 | plt.pie(status_counts, labels=status_counts.index, startangle=140, autopct='%1.1f%%') 165 | plt.title('Crime Resolution Status (Donut Chart)') 166 | plt.tight_layout() 167 | plt.show() 168 | 169 | status_by_type.plot(kind='bar', stacked=True, colormap='Set2', figsize=(10, 6)) 170 | plt.title('Crime Status Breakdown for Top 5 Crime Types') 171 | plt.xlabel('Crime Type') 172 | plt.ylabel('Number of Cases') 173 | plt.xticks(rotation=45) 174 | plt.tight_layout() 175 | plt.show() 176 | --------------------------------------------------------------------------------