└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # EDA marathon running using python 2 | This repository contains an exploratory data analysis (EDA) of a comprehensive dataset on ultra-marathon running events. The dataset spans over two centuries and provides detailed information on various ultra-marathon races, athletes, and their performances. 3 | 4 | ## Dataset 5 | The dataset used in this analysis is sourced from Kaggle: [The Big Dataset of Ultra-Marathon Running](https://www.kaggle.com/datasets/aiaiaidavid/the-big-dataset-of-ultra-marathon-running). It includes information on race events, distances, athlete demographics, and performance metrics. 6 | 7 | ## Analysis Overview 8 | The Jupyter notebook `EDA_marathon_running.ipynb` covers the following steps: 9 | 10 | 1. **Data Acquisition** 11 | - Downloading the dataset using the Kaggle API. 12 | - Extracting the data from the downloaded zip file. 13 | 14 | 2. **Data Preprocessing** 15 | - Loading the dataset into a pandas DataFrame. 16 | - Initial exploration of the dataset (shape, data types, missing values). 17 | - Filtering the dataset for specific criteria (e.g., events in 2020, races held in the USA). 18 | - Cleaning and transforming data columns for analysis. 19 | 20 | 3. **Exploratory Data Analysis (EDA)** 21 | - Visualizing the distribution of race distances and athlete genders. 22 | - Analyzing the relationship between athlete age and performance. 23 | - Comparing average speeds across different event distances and genders. 24 | 25 | ## Visualizations 26 | Several visualizations are generated to understand the data better: 27 | - Count plots for event distances and athlete genders. 28 | - Violin plots for athlete speeds across event distances. 29 | - Line plots showing the relationship between athlete age and average speed. 30 | 31 | ## Dependencies 32 | To run the notebook, you need the following Python libraries: 33 | - pandas 34 | - seaborn 35 | - matplotlib 36 | - kaggle 37 | 38 | ## Conclusion 39 | This analysis provides insights into ultra-marathon running events, highlighting trends and patterns in athlete performances based on gender, age, and event distances. Further analysis can be conducted to explore additional aspects of the dataset. 40 | 41 | 42 | --------------------------------------------------------------------------------