├── README.md └── merged.pdf /README.md: -------------------------------------------------------------------------------- 1 | 2 | # 🌍 Air Pollution Data Analysis 3 | 4 | This repository contains a data analysis and visualization notebook built using Python libraries like **Pandas**, **NumPy**, **Seaborn**, and **Matplotlib**. The dataset used contains information about pollution levels across various cities, stations, and countries. 5 | 6 | ## 📁 Dataset 7 | 8 | The dataset used in this project is a `.csv` file (example: `3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69.csv`) that includes fields such as: 9 | - `station` 10 | - `city` 11 | - `country` 12 | - `pollutant_avg` 13 | - `latitude` 14 | - `longitude` 15 | 16 | ## 🔧 Technologies Used 17 | 18 | - Python 19 | - Pandas 20 | - NumPy 21 | - Matplotlib 22 | - Seaborn 23 | 24 | ## 📊 Visualizations Included 25 | 26 | This project generates multiple types of visualizations to understand the dataset: 27 | 28 | 1. **Bar Plot** – Average pollutant by station 29 | 2. **Pie Chart** – Distribution of entries by city 30 | 3. **Histogram** – Distribution of `pollutant_avg` values 31 | 4. **Scatter Plot** – Pollutant Average vs Latitude with country-wise hue 32 | 5. **Line Plot** – Pollutant Average across Longitudes 33 | 6. **Correlation Heatmap** – Relationship among numeric columns 34 | 7. **Box Plot** – Pollutant Average distribution per country 35 | 8. **Pair Plot** – Country-wise scatter relationships 36 | 9. **Outlier-Free Box Plot** – Refined pollutant averages by country 37 | 38 | ## 🧼 Data Cleaning 39 | 40 | - Missing numeric values are filled with column means. 41 | - Categorical missing values are filled with the mode. 42 | - Outliers are handled using the IQR (Interquartile Range) method. 43 | 44 | ## 📈 Summary Statistics 45 | 46 | The notebook also prints descriptive statistics using `df.describe()` to get insights into the dataset's distribution, central tendency, and spread. 47 | 48 | ## 📂 Usage 49 | 50 | To run this project: 51 | 52 | 1. Clone the repository or download the `.ipynb`/`.py` file. 53 | 2. Make sure you have the required libraries installed: 54 | 55 | ```bash 56 | pip install pandas numpy matplotlib seaborn 57 | ``` 58 | 59 | 3. Replace the CSV path in the `read_csv()` method with your dataset path. 60 | 4. Run the script or notebook. 61 | 62 | ## 🧪 Sample Output 63 | 64 | - Visual plots for easy understanding of trends 65 | - Insights into pollution averages by location 66 | - Detection and removal of outliers 67 | - Heatmaps showing correlation between numerical columns 68 | 69 | -------------------------------------------------------------------------------- /merged.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Aagaj/pythonproject/e8e917c65f8a3e6c757049a9dfb1943d31046ce9/merged.pdf --------------------------------------------------------------------------------