├── README.md ├── spotify.ipynb └── spotify.py /README.md: -------------------------------------------------------------------------------- 1 |

Spotify Data Analysis Python Project🎼🎧

2 | Coding 3 |

Contents📖

4 | 12 | 13 |

Introduction

14 |

The Spotify Data Analysis Project: In todays changing world data analysis has become crucial in fields such, as business, research and meteorology. This project showcases the role that data plays in making decisions advancing research initiatives and even predicting weather patterns. 15 | 16 | The immense potential of data analysis is evident in this project, which focuses on extracting insights from music related datasets using Python. At its core Spotify takes stage as an audio streaming giant with captivating features like seamless song sharing and synchronized lyrics display. 17 | 18 | Throughout this project I delved into the realm of data using Pythons libraries and functions. From analyzing to visualizing the data this project covers all aspects of data processing. The interactive environment provided by Jupyter notebook enhanced my experience by allowing me to engage with the data and discover patterns. 19 | 20 | Through the Spotify Data Analysis Project I not sharpened my skills but also gained a deep understanding of how data intertwines, with the world of music. This journey provided insights. Equipped me with confidence to undertake similar projects in the future. 21 | 22 |

Feel free to reach out for any questions or suggestions about this project. I'm open to discussions and eager to assist. 23 | 24 | Linkedln | Mariya Joseph
25 | 26 | Don't forget to follow and star ⭐ the repository if you find it valuable.

27 | 28 | 32 | 33 | Dataset: 34 | Spotify Dataset
35 |

36 | 37 |

Import Required Libraries

38 | 39 | ```python 40 | import numpy as np 41 | import pandas as pd 42 | import matplotlib.pyplot as plt 43 | import seaborn as sns 44 | ``` 45 | 51 | 52 | 53 |

Exploring the Dataset

54 | 55 | ```python 56 | sp_tracks = pd.read_csv('D:/spotifydata/tracks.csv') 57 | sp_feature = pd.read_csv('D:/spotifydata/SpotifyFeatures.csv') 58 | ``` 59 | 63 | 64 | ```python 65 | #viewing the tracks data 66 | sp_tracks.head() 67 | ``` 68 |

NOTE:The image provided is not the entirety of the complete image, as there are restrictions in capturing full images through screenshots. To access the comprehensive table, please refer to the Jupyter notebook folder within this repository.

69 |
Answer:
70 | Coding 71 | 75 | 76 | ```python 77 | #viewing the feature data 78 | sp_feature.head() 79 | ``` 80 |

NOTE:The image provided is not the entirety of the complete image, as there are restrictions in capturing full images through screenshots. To access the comprehensive table, please refer to the Jupyter notebook folder within this repository.

81 |
Answer:
82 | Coding 83 | 87 | 88 | 89 |

Identifying Null Values in the Dataset

90 | 91 | ```python 92 | #checking null in tracks data 93 | pd.isnull(sp_tracks).sum() 94 | ``` 95 | 96 |
Answer:
97 | Coding 98 | 102 | 103 | ```python 104 | #checking null in feature data 105 | pd.isnull(sp_feature).sum() 106 | ``` 107 | 108 |
Answer:
109 | Coding 110 | 114 | 115 |

Dataset Overview: Rows, Columns, Data Types, and Memory Usage

116 | 117 | ```python 118 | #checking info in tracks data 119 | sp_tracks.info() 120 | ``` 121 | 122 |
Answer:
123 | Coding 124 | 128 | 129 | 130 | ```python 131 | #checking info in feature data 132 | sp_feature.info() 133 | ``` 134 |
Answer:
135 | Coding 136 | 140 | 141 | ---------------------------------------------------------------------------- 142 |

Extracting Insights from the Dataset through Analysis📊

143 |
    144 |
  1. Exploring the 10 Least Popular Songs in the Spotify Dataset
  2. 145 | 146 | ```python 147 | a=sp_tracks.sort_values('popularity',ascending=True)[0:10] 148 | a[['name','popularity']] 149 | ``` 150 |
    Answer:
    151 | Coding 152 | 156 |
    157 | 158 |
  3. Descriptive Statistics
  4. 159 | 160 | ```python 161 | #descriptive statistics of tracks 162 | sp_tracks.describe().transpose() 163 | ``` 164 |
    Answer:
    165 | Coding 166 | 170 |
    171 | 172 | ```python 173 | #descriptive of feature 174 | sp_feature.describe().transpose() 175 | ``` 176 |
    Answer:
    177 | Coding 178 | 182 |
    183 | 184 |
  5. Discovering the Top 10 Popular Songs in the Spotify Dataset
  6. 185 | 186 | ```python 187 | a=sp_tracks 188 | b=a[a['popularity']>90].sort_values('popularity',ascending=False)[:10] 189 | b[['name','popularity','artists']] 190 | ``` 191 |
    Answer:
    192 | Coding 193 | 198 |
    199 | 200 |
  7. Setting Release Date as the Index Column
  8. 201 | 202 | ```python 203 | sp_tracks.set_index('release_date',inplace=True) 204 | sp_tracks.index=pd.to_datetime(sp_tracks.index) 205 | sp_tracks.head() 206 | ``` 207 |
    Answer:
    208 | Coding 209 | 214 |
    215 | 216 |
  9. Extracting Artist Name from the 18th Row of the Dataset
  10. 217 | 218 | ```python 219 | sp_tracks[['artists']].iloc[18] 220 | ``` 221 |
    Answer:
    222 | Coding 223 | 226 |
    227 | 228 |
  11. Converting Song Duration from Milliseconds to Seconds
  12. 229 | 230 | ```python 231 | sp_tracks['duration'] = sp_tracks['duration_ms'].apply (lambda x : round(x/1000)) 232 | sp_tracks.drop('duration_ms', inplace = True, axis=1) 233 | sp_tracks.duration.head() 234 | ``` 235 |
    Answer:
    236 | Coding 237 | 242 |
    243 | 244 |
  13. Visualization: Pearson Correlation Heatmap for Two Variables
  14. 245 | 246 | ```python 247 | td = sp_tracks.drop(['key','mode','explicit'], axis=1).corr(method = 'pearson') 248 | plt.figure(figsize=(9,5)) 249 | hmap = sns.heatmap(td, annot = True, fmt = '.1g', vmin=-1, vmax=1, center=0, cmap='Greens', linewidths=0.1, linecolor='black') 250 | hmap.set_title('Correlation HeatMap') 251 | hmap.set_xticklabels(hmap.get_xticklabels(), rotation=90) 252 | ``` 253 |
    Answer:
    254 | Coding 255 | 262 |
    263 | 264 |
  15. Creating a 4% Sample of the Entire Dataset
  16. 265 | 266 | ```python 267 | sample_sp=sp_tracks.sample(int(0.004*len(sp_tracks))) 268 | print(len(sample_sp)) 269 | ``` 270 |
    Answer:
    271 | Coding 272 | 276 |
    277 | 278 |
  17. Regression Plot of Loudness vs. Energy with Regression Line
  18. 279 | 280 | ```python 281 | plt.figure(figsize=(8,4)) 282 | sns.regplot(data=sample_sp, y='loudness', x='energy', color='#054907').set(title='Regression Plot - Loudness vs Energy Correlation') 283 | ``` 284 |
    Answer:
    285 | Coding 286 | 291 |
    292 | 293 |
  19. Regression Plot of Popularity vs. Acousticness with Regression Line
  20. 294 | 295 | ```python 296 | plt.figure(figsize=(8,4)) 297 | sns.regplot(data=sample_sp, y='popularity', x='acousticness', color='#008000').set(title='Regression Plot - Popularity vs Acousticness Correlation') 298 | ``` 299 |
    Answer:
    300 | Coding 301 | 306 |
    307 | 308 |
  21. Adding a New Column to the Tracks Table
  22. 309 | 310 | ```python 311 | sp_tracks['dates']=sp_tracks.index.get_level_values('release_date') 312 | sp_tracks.dates=pd.to_datetime(sp_tracks.dates) 313 | years=sp_tracks.dates.dt.year 314 | sp_tracks.head() 315 | ``` 316 |
    Answer:
    317 | Coding 318 | 324 |
    325 | 326 |
  23. Graph: Number of Songs per Year
  24. 327 | 328 | ```python 329 | sns.displot(years, discrete=True, aspect=2, height=4, kind='hist',color='g').set(title='No of songs - per year') 330 | ``` 331 |
    Answer:
    332 | Coding 333 | 337 |
    338 | 339 |
  25. Line Graph: Duration of Songs Over Each Year
  26. 340 | 341 | ```python 342 | total_dr = sp_tracks.duration 343 | fig_dims = (15,5) 344 | fig, ax = plt.subplots(figsize=fig_dims) 345 | fig = sns.barplot(x = years, y = total_dr, ax = ax, errwidth = False).set(title='Years vs Duration') 346 | plt.xticks(rotation=90) 347 | ``` 348 |
    Answer:
    349 | Coding 350 | 358 |
    359 | 360 |
  27. Horizontal Bar Plot: Song Duration Across Different Genres
  28. 361 | 362 | ```python 363 | plt.title('Duration of songs in different Genres') 364 | sns.color_palette('crest', as_cmap=True) 365 | sns.barplot(y='genre', x='duration_ms', data=sp_feature) 366 | plt.xlabel('Duration in ms') 367 | plt.ylabel('Genres') 368 | ``` 369 |
    Answer:
    370 | Coding 371 | 378 |
    379 | 380 |
  29. Bar Plot: Top Five Genres by Popularity
  30. 381 | 382 | ```python 383 | sns.set_style(style='darkgrid') 384 | plt.figure(figsize=(8,4)) 385 | Top = sp_feature.sort_values('popularity', ascending=False)[:10] 386 | sns.barplot(y = 'genre', x = 'popularity', data = Top).set(title='Genres by Popularity-Top 5') 387 | ``` 388 |
    Answer:
    389 | Coding 390 | 396 | 397 | 398 |
399 | -------------------------------------------------------------------------------- /spotify.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # In[1]: 5 | 6 | 7 | import numpy as np 8 | import pandas as pd 9 | import matplotlib.pyplot as plt 10 | import seaborn as sns 11 | 12 | 13 | # In[2]: 14 | 15 | 16 | sp_tracks = pd.read_csv('D:/spotifydata/tracks.csv') 17 | sp_feature = pd.read_csv('D:/spotifydata/SpotifyFeatures.csv') 18 | 19 | 20 | # In[3]: 21 | 22 | 23 | #viewing the tracks data 24 | sp_tracks.head() 25 | 26 | 27 | # In[4]: 28 | 29 | 30 | #viewing the feature data 31 | sp_feature.head() 32 | 33 | 34 | # In[5]: 35 | 36 | 37 | #checking null 38 | pd.isnull(sp_tracks).sum() 39 | 40 | 41 | # In[6]: 42 | 43 | 44 | pd.isnull(sp_feature).sum() 45 | 46 | 47 | # In[7]: 48 | 49 | 50 | #checking info 51 | sp_tracks.info() 52 | 53 | 54 | # In[8]: 55 | 56 | 57 | #checking info 58 | sp_feature.info() 59 | 60 | 61 | # In[9]: 62 | 63 | 64 | #finding 10 least popular songs in the spotify dataset 65 | a=sp_tracks.sort_values('popularity',ascending=True)[0:10] 66 | a[['name','popularity']] 67 | 68 | 69 | # In[10]: 70 | 71 | 72 | #descriptive statistics of tracks 73 | sp_tracks.describe().transpose() 74 | 75 | 76 | # In[11]: 77 | 78 | 79 | #descriptive of feature 80 | sp_feature.describe().transpose() 81 | 82 | 83 | # In[12]: 84 | 85 | 86 | #finding top 10 popular songs in the spotify dataset 87 | a=sp_tracks 88 | b=a[a['popularity']>90].sort_values('popularity',ascending=False)[:10] 89 | b[['name','popularity','artists']] 90 | 91 | 92 | # In[13]: 93 | 94 | 95 | #Make the Release Date Column as the Index Column. 96 | sp_tracks.set_index('release_date',inplace=True) 97 | sp_tracks.index=pd.to_datetime(sp_tracks.index) 98 | sp_tracks.head() 99 | 100 | 101 | # In[14]: 102 | 103 | 104 | #Find the Name of the Artist Present in the 18th Row of the Dataset. 105 | sp_tracks[['artists']].iloc[18] 106 | 107 | 108 | # In[15]: 109 | 110 | 111 | #Convert the Duration of the Songs From Milliseconds to Seconds. 112 | sp_tracks['duration'] = sp_tracks['duration_ms'].apply (lambda x : round(x/1000)) 113 | sp_tracks.drop('duration_ms', inplace = True, axis=1) 114 | sp_tracks.duration.head() 115 | 116 | 117 | # In[16]: 118 | 119 | 120 | #Correlation HeatMap using Pearson Correlation method between two variables 121 | td = sp_tracks.drop(['key','mode','explicit'], axis=1).corr(method = 'pearson') 122 | plt.figure(figsize=(9,5)) 123 | hmap = sns.heatmap(td, annot = True, fmt = '.1g', vmin=-1, vmax=1, center=0, cmap='Greens', linewidths=0.1, linecolor='black') 124 | hmap.set_title('Correlation HeatMap') 125 | hmap.set_xticklabels(hmap.get_xticklabels(), rotation=90) 126 | 127 | 128 | # In[17]: 129 | 130 | 131 | #Sample Only 4 Percent of the Whole Dataset. 132 | sample_sp=sp_tracks.sample(int(0.004*len(sp_tracks))) 133 | print(len(sample_sp)) 134 | 135 | 136 | # In[18]: 137 | 138 | 139 | #Create a Regression Plot Between Loudness and Energy. Let’s Plot It in the Form of a Regression Line. 140 | plt.figure(figsize=(8,4)) 141 | sns.regplot(data=sample_sp, y='loudness', x='energy', color='#054907').set(title='Regression Plot - Loudness vs Energy Correlation') 142 | 143 | 144 | # In[19]: 145 | 146 | 147 | #Create a Regression Plot Between Popularity and Acousticness in the Form of a Regression Line. 148 | plt.figure(figsize=(8,4)) 149 | sns.regplot(data=sample_sp, y='popularity', x='acousticness', color='#008000').set(title='Regression Plot - Popularity vs Acousticness Correlation') 150 | 151 | 152 | # In[20]: 153 | 154 | 155 | #creating new column in tracks table 156 | sp_tracks['dates']=sp_tracks.index.get_level_values('release_date') 157 | sp_tracks.dates=pd.to_datetime(sp_tracks.dates) 158 | years=sp_tracks.dates.dt.year 159 | sp_tracks.head() 160 | 161 | 162 | # In[21]: 163 | 164 | 165 | sns.displot(years, discrete=True, aspect=2, height=4, kind='hist',color='g').set(title='No of songs - per year') 166 | 167 | 168 | # In[22]: 169 | 170 | 171 | #Plot a Line Graph to Show the Duration of the Songs for Each Year. 172 | total_dr = sp_tracks.duration 173 | fig_dims = (15,5) 174 | fig, ax = plt.subplots(figsize=fig_dims) 175 | fig = sns.barplot(x = years, y = total_dr, ax = ax, errwidth = False).set(title='Years vs Duration') 176 | plt.xticks(rotation=90) 177 | 178 | 179 | # In[27]: 180 | 181 | 182 | #spotify feature analysis 183 | 184 | #Plot Duration of the Songs w.r.t. different Genres using a horizontal barplot. 185 | plt.title('Duration of songs in different Genres') 186 | sns.color_palette('crest', as_cmap=True) 187 | sns.barplot(y='genre', x='duration_ms', data=sp_feature) 188 | plt.xlabel('Duration in ms') 189 | plt.ylabel('Genres') 190 | 191 | 192 | # In[33]: 193 | 194 | 195 | #Find top five genres by Popularity and pot a barplot for the same. 196 | sns.set_style(style='darkgrid') 197 | plt.figure(figsize=(8,4)) 198 | Top = sp_feature.sort_values('popularity', ascending=False)[:10] 199 | sns.barplot(y = 'genre', x = 'popularity', data = Top).set(title='Genres by Popularity-Top 5') 200 | 201 | 202 | # In[ ]: 203 | 204 | 205 | 206 | 207 | --------------------------------------------------------------------------------