├── README.md
└── project_final.py


/README.md:
--------------------------------------------------------------------------------
 1 | # INT-375-PYTHON-PROJECT-CRIME-MAPPING
 2 | 
 3 | # 🕵️‍♂️ Crime Mapping using Python
 4 | 
 5 | A data-driven project focused on analyzing and visualizing crime data to identify patterns, hotspots, and demographic insights using Python.
 6 | 
 7 | ## 📌 Project Overview
 8 | 
 9 | This project performs a comprehensive **Exploratory Data Analysis (EDA)** on real-world crime data from **Los Angeles (2020–present)**. The goal is to uncover trends based on time, location, crime type, victim demographics, and resolution status using Python’s data science libraries.
10 | 
11 | ## 📁 Files in the Repository
12 | 
13 | - `project_final.py` – Main Python script with full EDA pipeline and visualizations.
14 | - `cleaned_python_dataset_ca.xlsx` – Cleaned version of the original crime dataset.
15 | - `PYTHON_REPORT_12318105_2.pdf` – Full academic report for the project.
16 | - `Updated_Conclusion_Crime_Project.docx` – Final updated conclusion section (optional).
17 | - `README.md` – Project overview and usage instructions.
18 | 
19 | ## 🧠 Key Insights
20 | 
21 | - **Evening hours** (6 PM – 12 AM) show a spike in crime.
22 | - **Summer months** have slightly higher crime rates.
23 | - Areas like **77th Street**, **Southwest**, and **Northeast** are the most crime-prone.
24 | - **Theft**, **burglary**, and **vehicle-related crimes** dominate.
25 | - **Victims aged 20–40** are most affected, with a slight majority being male.
26 | - Many cases remain **under investigation**, especially theft-related crimes.
27 | 
28 | ## 🔧 Tools & Libraries Used
29 | 
30 | - `pandas` – Data cleaning and manipulation
31 | - `numpy` – Numerical analysis
32 | - `matplotlib` & `seaborn` – Data visualization
33 | - `Jupyter Notebook` / `Python 3.x`
34 | 
35 | ## 📊 Project Features
36 | 
37 | - Clean and preprocess real-world crime data
38 | - Visualize time-based crime trends
39 | - Identify high-crime areas using area names
40 | - Analyze top crime types and weapon usage
41 | - Explore victim demographics (age, gender, ethnicity)
42 | - Assess case resolution status
43 | 
44 | ## 🚀 How to Run the Project
45 | 
46 | 1. Clone this repo:
47 |    ```bash
48 |    git clone https://github.com/your-username/Crime-Mapping-Python.git
49 |    cd Crime-Mapping-Python
50 | 


--------------------------------------------------------------------------------
/project_final.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | import seaborn as sns
  5 | 
  6 | # Set styles and defaults
  7 | sns.set_style(style='whitegrid')
  8 | plt.rcParams['figure.figsize'] = (10, 6)
  9 | 
 10 | # Load and Clean Dataset
 11 | print("\n📥 Loading and Cleaning Dataset...")
 12 | df = pd.read_excel("cleaned_python_dataset_ca.xlsx")
 13 | df.columns = df.columns.str.strip().str.upper().str.replace(" ", "_")
 14 | 
 15 | # Data Cleaning
 16 | df['DATE_OCC'] = pd.to_datetime(df['DATE_OCC'], errors='coerce')
 17 | df['DATE_RPTD'] = pd.to_datetime(df['DATE_RPTD'], errors='coerce')
 18 | df['TIME_OCC'] = df['TIME_OCC'].astype(str).str.zfill(4)
 19 | df['HOUR'] = df['TIME_OCC'].str[:2].astype(int)
 20 | df['YEAR'] = df['DATE_OCC'].dt.year
 21 | df['MONTH'] = df['DATE_OCC'].dt.month
 22 | df = df.dropna(subset=['LAT', 'LON'])
 23 | 
 24 | # Precompute cleaned group data for later plots
 25 | crime_by_month = df.groupby(['YEAR', 'MONTH']).size().reset_index(name='INCIDENTS')
 26 | crime_by_month['DATE'] = pd.to_datetime(crime_by_month[['YEAR', 'MONTH']].assign(DAY=1))
 27 | heat_data = df.groupby(['YEAR', 'MONTH']).size().unstack(fill_value=0)
 28 | 
 29 | area_crime = df['AREA_NAME'].value_counts().head(10)
 30 | top_crimes = df['CRM_CD_DESC'].value_counts().head(10)
 31 | weapon_usage = df['WEAPON_DESC'].value_counts().head(10)
 32 | gender_count = df['VICT_SEX'].value_counts()
 33 | ethnicity = df['VICT_DESCENT'].value_counts().head(10)
 34 | status_counts = df['STATUS_DESC'].value_counts()
 35 | 
 36 | top_types = df['CRM_CD_DESC'].value_counts().nlargest(5).index
 37 | status_by_type = df[df['CRM_CD_DESC'].isin(top_types)].groupby(['CRM_CD_DESC', 'STATUS_DESC']).size().unstack()
 38 | status_by_type = status_by_type.fillna(0)
 39 | 
 40 | output_path = r"C:\Users\mayba\OneDrive\Desktop\Python Project\cleaned2_python_dataset_ca.xlsx"
 41 | df.to_excel(output_path, index=False)
 42 | 
 43 | 
 44 | # Basic EDA
 45 | print("\n🔎 Dataset Overview")
 46 | print(df)
 47 | print("\n🔎 Head of the dataset")
 48 | print(df.head())
 49 | print("\n🔎 Tail of the dataset")
 50 | print(df.tail())
 51 | print("\n🔎 Summary Statistics")
 52 | print(df.describe())
 53 | print("\n🔎 Info")
 54 | print(df.info())
 55 | print("\n🔎 Column Names")
 56 | print(df.columns)
 57 | print("\n🔎 Shape of Dataset")
 58 | print(df.shape)
 59 | print("\n🔎 Null Values")
 60 | print(df.isnull().sum())
 61 | 
 62 | # Correlation & Covariance
 63 | correlation = df.corr(numeric_only=True)
 64 | print("\n📊 Correlation Matrix")
 65 | print(correlation)
 66 | 
 67 | covariance = df.cov(numeric_only=True)
 68 | print("\n📊 Covariance Matrix")
 69 | print(covariance)
 70 | 
 71 | plt.figure()
 72 | sns.heatmap(correlation, annot=True, cmap="Blues", linewidths=0.5, fmt=".2f")
 73 | plt.title("Correlation Heatmap")
 74 | plt.tight_layout()
 75 | plt.show()
 76 | 
 77 | # 1. Crime Distribution and Trends Over Time
 78 | print("\n📈 Answer 1: Crime Distribution and Trends Over Time")
 79 | 
 80 | plt.figure()
 81 | sns.lineplot(data=crime_by_month, x='DATE', y='INCIDENTS')
 82 | plt.title('Crime Incidents Over Time')
 83 | plt.xlabel('Date')
 84 | plt.ylabel('Number of Crimes')
 85 | plt.xticks(rotation=45)
 86 | plt.tight_layout()
 87 | plt.show()
 88 | 
 89 | plt.figure()
 90 | sns.histplot(df['HOUR'], bins=24, kde=True)
 91 | plt.title('Crime Frequency by Hour of the Day')
 92 | plt.xlabel('Hour')
 93 | plt.ylabel('Number of Crimes')
 94 | plt.xticks(range(0, 24))
 95 | plt.tight_layout()
 96 | plt.show()
 97 | 
 98 | plt.figure()
 99 | sns.heatmap(heat_data, cmap="Reds")
100 | plt.title('Seasonal Crime Heatmap (Month vs Year)')
101 | plt.tight_layout()
102 | plt.show()
103 | 
104 | # 2. Geographic Crime Analysis (Crime Hotspots)
105 | print("\n📍 Answer 2: Geographic Crime Analysis (Crime Hotspots)")
106 | 
107 | plt.figure()
108 | sns.barplot(x=area_crime.values, y=area_crime.index)
109 | plt.title('Top 10 High-Crime Areas')
110 | plt.xlabel('Number of Crimes')
111 | plt.tight_layout()
112 | plt.show()
113 | 
114 | # 3. Crime Type Analysis
115 | print("\n🔍 Answer 3: Crime Type Analysis")
116 | 
117 | plt.figure()
118 | sns.barplot(x=top_crimes.values, y=top_crimes.index, palette='magma')
119 | plt.title('Top 10 Crime Types')
120 | plt.xlabel('Frequency')
121 | plt.tight_layout()
122 | plt.show()
123 | 
124 | plt.figure()
125 | sns.barplot(x=weapon_usage.values, y=weapon_usage.index, palette='coolwarm')
126 | plt.title('Top 10 Weapons Used')
127 | plt.xlabel('Frequency')
128 | plt.tight_layout()
129 | plt.show()
130 | 
131 | # 4. Victim Demographics Breakdown
132 | print("\n🧍 Answer 4: Victim Demographics Breakdown")
133 | 
134 | # Filter out unrealistic ages (e.g., <0 or 100+)
135 | df = df[(df['VICT_AGE'] > 0) & (df['VICT_AGE'] < 100)]
136 | 
137 | plt.figure()
138 | sns.histplot(df['VICT_AGE'].dropna(), bins=20, kde=True, color='purple')
139 | plt.title('Victim Age Distribution')
140 | plt.xlabel('Age')
141 | plt.ylabel('Number of Victims')
142 | plt.tight_layout()
143 | plt.show()
144 | 
145 | plt.figure()
146 | sns.barplot(x=gender_count.index, y=gender_count.values, palette='pastel')
147 | plt.title('Gender Distribution of Victims')
148 | plt.xlabel('Gender')
149 | plt.ylabel('Number of Victims')
150 | plt.tight_layout()
151 | plt.show()
152 | 
153 | plt.figure()
154 | sns.barplot(x=ethnicity.values, y=ethnicity.index, palette='BuGn_r')
155 | plt.title('Top 10 Affected Ethnic Groups')
156 | plt.xlabel('Number of Victims')
157 | plt.tight_layout()
158 | plt.show()
159 | 
160 | # 5. Crime Resolution Status Analysis
161 | print("\n✅ Answer 5: Crime Resolution Status Analysis")
162 | 
163 | plt.figure()
164 | plt.pie(status_counts, labels=status_counts.index, startangle=140, autopct='%1.1f%%')
165 | plt.title('Crime Resolution Status (Donut Chart)')
166 | plt.tight_layout()
167 | plt.show()
168 | 
169 | status_by_type.plot(kind='bar', stacked=True, colormap='Set2', figsize=(10, 6))
170 | plt.title('Crime Status Breakdown for Top 5 Crime Types')
171 | plt.xlabel('Crime Type')
172 | plt.ylabel('Number of Cases')
173 | plt.xticks(rotation=45)
174 | plt.tight_layout()
175 | plt.show()
176 | 


--------------------------------------------------------------------------------