├── README.md └── main.py /README.md: -------------------------------------------------------------------------------- 1 | # LAPD Crime Data Analysis Tool 2 | 3 | This project performs exploratory data analysis (EDA) and visualizations on crime data from Los Angeles, spanning from 2020 to the present. The tool allows users to interactively explore crime trends and patterns by time and location, and the most common types of crimes reported. 4 | 5 | ## 📂 Dataset 6 | 7 | - **Source**: [Crime Data from 2020 to Present](https://data.lacity.org/) 8 | - **Format**: CSV 9 | - **Columns Used**: 10 | - `Date Rptd`: Date the crime was reported 11 | - `TIME OCC`: Time of occurrence (HHMM format) 12 | - `Crm Cd Desc`: Description of the crime 13 | - `AREA NAME`: Name of the reporting LAPD area 14 | 15 | > Make sure the dataset is downloaded and the path is correctly set in the script: 16 | ```python 17 | DATA_PATH = "C:\\Users\\hp\\Downloads\\Crime_Data_from_2020_to_Present.csv" 18 | ⚙️ Features 19 | ✔️ Data Cleaning 20 | Drops rows with missing values in key columns. 21 | 22 | Converts Date Rptd to datetime. 23 | 24 | Formats TIME OCC into 4-digit strings for consistency. 25 | 26 | ✔️ Visualizations 27 | Top Crimes: Bar chart of the 10 most common crimes. 28 | 29 | Crime Trend Over Time: Monthly line graph of reported crimes. 30 | 31 | Crimes by Area: Bar chart of top 10 areas with highest crime counts. 32 | 33 | Hourly Crime Distribution: Line graph showing number of crimes by hour. 34 | 35 | Area vs Crime Heatmap: Heatmap showing the frequency of top crime types across top LAPD areas. 36 | 37 | ✔️ Interactive Menu 38 | Run-time console-based interface for exploring various aspects of the data without modifying the code. 39 | 40 | ▶️ Usage 41 | Install required libraries (if not already installed): 42 | 43 | bash 44 | Copy 45 | Edit 46 | pip install pandas numpy matplotlib seaborn 47 | Update the DATA_PATH in the script with your dataset path. 48 | 49 | Run the script: 50 | 51 | bash 52 | Copy 53 | Edit 54 | python lapd_crime_analysis.py 55 | Choose whether to enter interactive mode or display all visualizations automatically: 56 | 57 | vbnet 58 | Copy 59 | Edit 60 | Do you want to run interactive analysis? (y/n): 61 | 📊 Example Visuals 62 | Top 10 Most Reported Crimes 63 | 64 | Crime Trends Over Time 65 | 66 | Note: Replace image URLs with your own if needed. 67 | 68 | 📁 Folder Structure 69 | bash 70 | Copy 71 | Edit 72 | lapd_crime_analysis/ 73 | │ 74 | ├── lapd_crime_analysis.py # Main script 75 | ├── README.md # Project overview 76 | └── Crime_Data_from_2020_to_Present.csv # Dataset (not included here) 77 | 🧠 Future Improvements 78 | Add support for filtering by date range or area. 79 | 80 | Export summary statistics to CSV. 81 | 82 | Integrate with a simple GUI or web interface (e.g., Streamlit). 83 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import seaborn as sns 4 | import matplotlib.pyplot as plt 5 | 6 | # ---------------------- Load Dataset ------------------------ 7 | DATA_PATH = "C:\\Users\\hp\\Downloads\\Crime_Data_from_2020_to_Present.csv" # Replace with actual path if needed 8 | 9 | print("Loading CSV file...") 10 | df = pd.read_csv(DATA_PATH) 11 | print(f"Successfully loaded data with {df.shape[0]} rows and {df.shape[1]} columns") 12 | 13 | # ---------------------- Basic Overview ---------------------- 14 | print("\nBasic Info:") 15 | print(df.info()) 16 | 17 | print("\nFirst 5 Rows:") 18 | print(df.head()) 19 | 20 | print("\nMissing Values:") 21 | print(df.isnull().sum()[df.isnull().sum() > 0]) 22 | 23 | # ---------------------- Data Cleaning ----------------------- 24 | print("\nCleaning Data...") 25 | df_cleaned = df.dropna(subset=["Date Rptd", "TIME OCC", "Crm Cd Desc", "AREA NAME"]) 26 | df_cleaned["Date Rptd"] = pd.to_datetime(df_cleaned["Date Rptd"], errors="coerce") 27 | df_cleaned["TIME OCC"] = df_cleaned["TIME OCC"].astype(str).str.zfill(4) 28 | 29 | # ---------------------- Set Plot Style ----------------------- 30 | sns.set(style="whitegrid") 31 | plt.rcParams["figure.figsize"] = (12, 6) 32 | 33 | # ---------------------- Visualization Functions ----------------------- 34 | 35 | def plot_top_crimes(): 36 | print("\nTop 10 Reported Crimes...") 37 | top_crimes = df_cleaned["Crm Cd Desc"].value_counts().head(10) 38 | sns.barplot(x=top_crimes.values, y=top_crimes.index, color="salmon") 39 | plt.title("Top 10 Most Reported Crimes") 40 | plt.xlabel("Number of Reports") 41 | plt.ylabel("Crime Type") 42 | plt.show() 43 | 44 | def plot_crimes_over_time(): 45 | print("\nCrime Trend Over Time...") 46 | crime_trend = df_cleaned.groupby(df_cleaned["Date Rptd"].dt.to_period("M")).size() 47 | crime_trend.index = crime_trend.index.to_timestamp() 48 | plt.plot(crime_trend.index, crime_trend.values, color="steelblue", marker="o") 49 | plt.title("Monthly Crime Trend") 50 | plt.ylabel("Crime Count") 51 | plt.xlabel("Date") 52 | plt.xticks(rotation=45) 53 | plt.tight_layout() 54 | plt.show() 55 | 56 | def plot_crimes_by_area(): 57 | print("\nCrime Count by Area...") 58 | area_counts = df_cleaned["AREA NAME"].value_counts().head(10) 59 | sns.barplot(x=area_counts.values, y=area_counts.index, color="skyblue") 60 | plt.title("Top 10 Crime-Prone Areas") 61 | plt.xlabel("Number of Crimes") 62 | plt.ylabel("Area") 63 | plt.show() 64 | 65 | def plot_hourly_distribution(): 66 | print("\nCrimes by Hour of Day...") 67 | df_cleaned["Hour"] = df_cleaned["TIME OCC"].str[:2].astype(int) 68 | hourly = df_cleaned["Hour"].value_counts().sort_index() 69 | plt.plot(hourly.index, hourly.values, color="green", marker="o") 70 | plt.xticks(range(0, 24)) 71 | plt.title("Crimes by Hour of Day") 72 | plt.xlabel("Hour") 73 | plt.ylabel("Crime Count") 74 | plt.grid(True) 75 | plt.show() 76 | 77 | def plot_heatmap_area_vs_crime(): 78 | print("\nHeatmap of Area vs Crime Type...") 79 | 80 | # Get top 10 areas and top 15 crime types 81 | top_areas = df_cleaned["AREA NAME"].value_counts().head(10).index 82 | top_crimes = df_cleaned["Crm Cd Desc"].value_counts().head(15).index 83 | 84 | df_filtered = df_cleaned[df_cleaned["AREA NAME"].isin(top_areas) & df_cleaned["Crm Cd Desc"].isin(top_crimes)] 85 | 86 | pivot = df_filtered.pivot_table(index="AREA NAME", columns="Crm Cd Desc", aggfunc="size", fill_value=0) 87 | 88 | print("Pivot shape:", pivot.shape) # Debug print to confirm data is present 89 | 90 | sns.heatmap(pivot, cmap="coolwarm", linewidths=0.5, annot=True, fmt="d") 91 | plt.title("Crime Types by Area (Heatmap)") 92 | plt.xlabel("Crime Type") 93 | plt.ylabel("Area Name") 94 | plt.xticks(rotation=45, ha='right') 95 | plt.tight_layout() 96 | plt.show() 97 | 98 | # ------------------ Interactive Menu ----------------------- 99 | 100 | def run_interactive_analysis(): 101 | while True: 102 | print("\n" + "="*50) 103 | print("LAPD CRIME DATA ANALYSIS MENU") 104 | print("="*50) 105 | print("1. Basic Statistics") 106 | print("2. Top Crimes") 107 | print("3. Crime Trend Over Time") 108 | print("4. Crimes by Area") 109 | print("5. Hourly Crime Distribution") 110 | print("6. Area vs Crime Heatmap") 111 | print("7. Run All") 112 | print("0. Exit") 113 | 114 | choice = input("\nEnter your choice (0-7): ") 115 | if choice == '1': 116 | print(df_cleaned.describe(include='all')) 117 | elif choice == '2': 118 | plot_top_crimes() 119 | elif choice == '3': 120 | plot_crimes_over_time() 121 | elif choice == '4': 122 | plot_crimes_by_area() 123 | elif choice == '5': 124 | plot_hourly_distribution() 125 | elif choice == '6': 126 | plot_heatmap_area_vs_crime() 127 | elif choice == '7': 128 | plot_top_crimes() 129 | plot_crimes_over_time() 130 | plot_crimes_by_area() 131 | plot_hourly_distribution() 132 | plot_heatmap_area_vs_crime() 133 | elif choice == '0': 134 | print("\nThank you for using the LAPD Crime Data Analysis Tool!") 135 | break 136 | else: 137 | print("Invalid choice. Try again!") 138 | 139 | # ---------------------- MAIN ---------------------------- 140 | 141 | run_interactive = input("\nDo you want to run interactive analysis? (y/n): ") 142 | if run_interactive.lower() == 'y': 143 | run_interactive_analysis() 144 | else: 145 | print("You Entered Wrong Input , Now You Have To Face So Many Graphs And All") 146 | plot_top_crimes() 147 | plot_crimes_over_time() 148 | plot_crimes_by_area() 149 | plot_hourly_distribution() 150 | plot_heatmap_area_vs_crime() 151 | print("\nAll analysis complete!") 152 | --------------------------------------------------------------------------------