├── .github └── workflows │ └── superlinter.yml ├── .gitignore ├── README.md ├── dags ├── spotify_dag.py └── sql │ ├── create_spotify_genres.sql │ └── create_spotify_songs.sql ├── docker-compose.yaml ├── images ├── metabase_graphs.png ├── metabase_heatmap.png ├── metabase_recently_played.png ├── metabase_summary.png ├── metabase_top_songs.png └── spotify.drawio.svg ├── operators ├── __init__.py ├── config.yml ├── copy_to_postgres.py ├── dbt │ ├── dbt_project.yml │ ├── macros │ │ └── generate_schema_name.sql │ └── models │ │ ├── marts │ │ ├── analytical │ │ │ ├── dim_genres.sql │ │ │ ├── dim_songs.sql │ │ │ ├── fct_listening_activity.sql │ │ │ └── schema.yml │ │ └── reporting │ │ │ ├── rpt_hour_of_week.sql │ │ │ ├── rpt_most_listened.sql │ │ │ ├── rpt_recently_listened.sql │ │ │ ├── rpt_weekly_discovers.sql │ │ │ └── schema.yml │ │ └── staging │ │ └── spotify │ │ ├── schema.yml │ │ ├── stg_genres.sql │ │ └── stg_songs.sql ├── main.py ├── postgres_connect.py ├── refresh.py └── yaml_load.py ├── requirements.txt ├── setup ├── airflow_docker.md ├── dbt.md ├── metabase.md ├── postgres.md ├── slack_notifications.md └── spotify_api_access.md ├── spotify_data ├── spotify.json ├── spotify_genres.csv └── spotify_songs.csv └── tests └── __init__.py /.github/workflows/superlinter.yml: -------------------------------------------------------------------------------- 1 | name: Super-Linter 2 | 3 | on: push 4 | 5 | jobs: 6 | super-lint: 7 | name: Lint code base 8 | runs-on: ubuntu-latest 9 | steps: 10 | - name: Checkout code 11 | uses: actions/checkout@v2 12 | 13 | - name: Run Super-Linter 14 | uses: github/super-linter@v4 15 | env: 16 | DEFAULT_BRANCH: main 17 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | venv/ 3 | steps.txt 4 | secrets.py 5 | .DS_Store 6 | **/*.env 7 | dev_script.py 8 | test* 9 | .user.yml 10 | references.md 11 | spotify.drawio 12 | **/extended_streaming_history/ 13 | backfill_extended_streaming_history.py 14 | tests.ipynb 15 | 16 | # Airflow 17 | **/logs/* 18 | **/dags/spotify_data/ 19 | test_dag.py 20 | 21 | # dbt 22 | **target/ 23 | **dbt_packages/ 24 | **logs/ 25 | **.log 26 | **profiles.yml -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Spotify Data Pipeline 2 | 3 | Data pipeline that extracts a user's song listening history from the Spotify API using Python, PostgreSQL, dbt, Metabase, Airflow, and Docker 4 | 5 | ## Objective 6 | 7 | Deep dive into a user's song listening history to retrieve information about top artists, top tracks, top genres, and more. This is a personal side project for fun to recreate Spotify Wrapped but at a more frequent cadence to get quicker and more detailed insights. This pipeline calls the Spotify API every hour from hours 0-6 and 14-23 UTC (basically whenever I'm awake) to extract a user's song listening history, load the responses into a database, apply transformations and visualize the metrics in a dashboard. Since the dataset is small and this doesn't need to be running 24/7 this is all built using open source tools and hosted locally to avoid any cost. 8 | 9 | ## Tools & Technologies 10 | 11 | - Containerization - [**Docker**](https://www.docker.com), [**Docker Compose**](https://docs.docker.com/compose/) 12 | - Orchestration - [**Airflow**](https://airflow.apache.org) 13 | - Database - [**PostgreSQL**](https://www.postgresql.org/) 14 | - Transformation - [**dbt**](https://www.getdbt.com) 15 | - Data Visualization - [**Metabase**](https://www.metabase.com/) 16 | - Language - [**Python**](https://www.python.org) 17 | 18 | ## Architecture 19 | 20 | ![spotify drawio](https://user-images.githubusercontent.com/60953643/210160621-c7213f9d-2b9f-42ad-b8b1-697403bf6497.svg) 21 | 22 | #### Data Flow 23 | 1. main.py script is triggered every hour (from hours 0-6 and 14-23 UTC) via Airflow to refresh the access token, make a connection to the Postgres database to check for the latest listened time, and call the Spotify API to retrieve the most recently played songs and corresponding genres. 24 | 2. Responses are saved as CSV files in 'YYYY-MM-DD.csv' format. These are saved on the local file system and act as our replayable source since the Spotify API only allows requesting the 50 most recently played songs and not any historical data. These files will keep getting appended with the most recently played songs for the respective date. 25 | 3. Data is copied into the Postgres Database into the respective tables, spotify_songs and spotify_genres. 26 | 4. dbt run task is triggered to run transformations on top of the staging data to produce analytical and reporting tables/views. 27 | 5. dbt test will run after successful completion of dbt run to ensure all tests pass. 28 | 6. Tables/views are fed into Metabase and the metrics are visualized through a dashboard. 29 | 7. Slack subscription is set up in Metabase to send a weekly summary every Monday. 30 | 31 | Throughout this entire process if any Airflow task fails an automatic Slack alert will be sent to a custom Slack channel that was created. 32 | 33 | #### DAG 34 | Screenshot 2023-01-05 at 9 32 42 PM 35 | 36 | #### Sample Slack Alert 37 | Screenshot 2023-01-05 at 9 33 09 PM 38 | 39 | 40 | ## Dashboard 41 | Screenshot 2023-01-31 at 12 02 56 PM 42 | Screenshot 2023-01-31 at 1 20 51 PM 43 | Screenshot 2023-01-24 at 10 18 42 PM 44 | Screenshot 2023-01-31 at 12 03 24 PM 45 | Screenshot 2023-01-31 at 12 03 36 PM 46 | 47 | 48 | ## Setup 49 | 50 | 1. [Get Spotify API Access](https://github.com/calbergs/spotify-api/blob/master/setup/spotify_api_access.md) 51 | 2. [Build Docker Containers for Airflow](https://github.com/calbergs/spotify-api/blob/master/setup/airflow_docker.md) 52 | 3. [Set Up Airflow Connection to Postgres](https://github.com/calbergs/spotify-api/blob/master/setup/postgres.md) 53 | 4. [Install dbt Core](https://github.com/calbergs/spotify-api/blob/master/setup/dbt.md) 54 | 5. [Enable Airflow Slack Notifications](https://github.com/calbergs/spotify-api/blob/master/setup/slack_notifications.md) 55 | 6. [Install Metabase](https://github.com/calbergs/spotify-api/blob/master/setup/metabase.md) 56 | 57 | ## Further Improvements (Work In Progress) 58 | 59 | - Create a BranchPythonOperator to first check if the API payload is empty. If empty then proceed directly to the end task else continue to the downstream tasks. 60 | - Implement data quality checks to catch any potential errors in the dataset 61 | - Create unit tests to ensure pipeline is running as intended 62 | - Include CI/CD 63 | - Create more visualizations to uncover further insights once Spotify sends back my entire songs listening history from 10+ years back to the current date (this needed to be requested separately since the current API only allows requesting the 50 most recently played tracks) 64 | - If and whenever Spotify allows requesting historical data implement backfill capability 65 | -------------------------------------------------------------------------------- /dags/spotify_dag.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | sys.path.append("/opt/airflow/operators") 4 | from datetime import datetime, timedelta 5 | 6 | import copy_to_postgres 7 | from airflow import DAG 8 | from airflow.contrib.operators.slack_webhook_operator import SlackWebhookOperator 9 | from airflow.hooks.base_hook import BaseHook 10 | from airflow.operators.bash import BashOperator 11 | from airflow.operators.dummy_operator import DummyOperator 12 | from airflow.operators.python import PythonOperator 13 | from airflow.providers.postgres.operators.postgres import PostgresOperator 14 | from airflow_dbt.operators.dbt_operator import DbtRunOperator, DbtTestOperator 15 | 16 | 17 | def task_fail_slack_alert(context): 18 | slack_webhook_token = BaseHook.get_connection("slack").password 19 | slack_msg = """ 20 | :x: Task Failed 21 | *Task*: {task} 22 | *Dag*: {dag} 23 | *Execution Time*: {exec_date} 24 | *Log URL*: {log_url} 25 | """.format( 26 | task=context.get("task_instance").task_id, 27 | dag=context.get("task_instance").dag_id, 28 | ti=context.get("task_instance"), 29 | exec_date=context.get("execution_date"), 30 | log_url=context.get("task_instance").log_url, 31 | ) 32 | failed_alert = SlackWebhookOperator( 33 | task_id="slack_alert", 34 | http_conn_id="slack", 35 | webhook_token=slack_webhook_token, 36 | message=slack_msg, 37 | username="airflow", 38 | dag=dag, 39 | ) 40 | return failed_alert.execute(context=context) 41 | 42 | 43 | args = { 44 | "owner": "airflow", 45 | "depends_on_past": False, 46 | "start_date": datetime(2022, 12, 21), 47 | "retries": 1, 48 | "retry_delay": timedelta(minutes=1), 49 | "on_success_callback": None, 50 | "on_failure_callback": task_fail_slack_alert, 51 | } 52 | 53 | with DAG( 54 | dag_id="spotify_dag", 55 | schedule_interval="0 0-6,14-23 * * *", 56 | max_active_runs=1, 57 | catchup=False, 58 | default_args=args, 59 | ) as dag: 60 | 61 | TASK_DEFS = { 62 | "songs": {"path": "sql/create_spotify_songs.sql"}, 63 | "genres": {"path": "sql/create_spotify_genres.sql"}, 64 | } 65 | 66 | create_tables_if_not_exists = { 67 | k: PostgresOperator( 68 | task_id=f"create_if_not_exists_spotify_{k}_table", 69 | postgres_conn_id="postgres_localhost", 70 | sql=v["path"], 71 | ) 72 | for k, v in TASK_DEFS.items() 73 | } 74 | 75 | extract_spotify_data = BashOperator( 76 | task_id="extract_spotify_data", 77 | bash_command="python3 /opt/airflow/operators/main.py", 78 | ) 79 | 80 | load_tables = { 81 | k: PythonOperator( 82 | task_id=f"load_{k}", 83 | python_callable=copy_to_postgres.copy_expert_csv, 84 | op_kwargs={"file": f"spotify_{k}"}, 85 | ) 86 | for k, v in TASK_DEFS.items() 87 | } 88 | 89 | dbt_run = DbtRunOperator( 90 | task_id="dbt_run", 91 | dir="/opt/airflow/operators/dbt/", 92 | profiles_dir="/opt/airflow/operators/dbt/", 93 | ) 94 | 95 | dbt_test = DbtTestOperator( 96 | task_id="dbt_test", 97 | dir="/opt/airflow/operators/dbt/", 98 | profiles_dir="/opt/airflow/operators/dbt/", 99 | ) 100 | 101 | continue_task = DummyOperator(task_id="continue") 102 | 103 | start_task = DummyOperator(task_id="start") 104 | 105 | end_task = DummyOperator(task_id="end") 106 | 107 | ( 108 | start_task 109 | >> extract_spotify_data 110 | >> list(create_tables_if_not_exists.values()) 111 | >> continue_task 112 | >> list(load_tables.values()) 113 | >> dbt_run 114 | >> dbt_test 115 | >> end_task 116 | ) 117 | -------------------------------------------------------------------------------- /dags/sql/create_spotify_genres.sql: -------------------------------------------------------------------------------- 1 | --hacky way to do a create or replace 2 | --this is in case we are rebuilding the project from scratch whenever the spotify genres table doesn't exist yet since it's doing a full refresh each run 3 | create table if not exists spotify_genres ( 4 | artist_id text, 5 | artist_name text, 6 | artist_genre text, 7 | last_updated_datetime_utc timestamp, 8 | primary key (artist_id) 9 | ); 10 | drop table spotify_genres; 11 | create table if not exists spotify_genres ( 12 | artist_id text, 13 | artist_name text, 14 | artist_genre text, 15 | last_updated_datetime_utc timestamp, 16 | primary key (artist_id) 17 | ); -------------------------------------------------------------------------------- /dags/sql/create_spotify_songs.sql: -------------------------------------------------------------------------------- 1 | create table if not exists spotify_songs ( 2 | played_at_utc timestamp, 3 | played_date_utc date, 4 | song_name text, 5 | artist_name text, 6 | song_duration_ms integer, 7 | song_link text, 8 | album_art_link text, 9 | album_name text, 10 | album_id text, 11 | artist_id text, 12 | track_id text, 13 | last_updated_datetime_utc timestamp, 14 | primary key (played_at_utc) 15 | ); -------------------------------------------------------------------------------- /docker-compose.yaml: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one 2 | # or more contributor license agreements. See the NOTICE file 3 | # distributed with this work for additional information 4 | # regarding copyright ownership. The ASF licenses this file 5 | # to you under the Apache License, Version 2.0 (the 6 | # "License"); you may not use this file except in compliance 7 | # with the License. You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, 12 | # software distributed under the License is distributed on an 13 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 14 | # KIND, either express or implied. See the License for the 15 | # specific language governing permissions and limitations 16 | # under the License. 17 | # 18 | 19 | # Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL. 20 | # 21 | # WARNING: This configuration is for local development. Do not use it in a production deployment. 22 | # 23 | # This configuration supports basic configuration using environment variables or an .env file 24 | # The following variables are supported: 25 | # 26 | # AIRFLOW_IMAGE_NAME - Docker image name used to run Airflow. 27 | # Default: apache/airflow:2.5.0 28 | # AIRFLOW_UID - User ID in Airflow containers 29 | # Default: 50000 30 | # Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode 31 | # 32 | # _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account (if requested). 33 | # Default: airflow 34 | # _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if requested). 35 | # Default: airflow 36 | # _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers. 37 | # Default: '' 38 | # 39 | # Feel free to modify this file to suit your needs. 40 | --- 41 | version: '3' 42 | x-airflow-common: 43 | &airflow-common 44 | # In order to add custom dependencies or upgrade provider packages you can use your extended image. 45 | # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml 46 | # and uncomment the "build" line below, Then run `docker-compose build` to build the images. 47 | image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.5.0} 48 | # build: . 49 | environment: 50 | &airflow-common-env 51 | AIRFLOW__CORE__EXECUTOR: CeleryExecutor 52 | AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow 53 | # For backward compatibility, with Airflow <2.3 54 | AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow 55 | AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow 56 | AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0 57 | AIRFLOW__CORE__FERNET_KEY: '' 58 | AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' 59 | AIRFLOW__CORE__LOAD_EXAMPLES: 'false' 60 | AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth' 61 | _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- dbt-core dbt-postgres airflow-dbt} 62 | volumes: 63 | - ./dags:/opt/airflow/dags 64 | - ./logs:/opt/airflow/logs 65 | - ./plugins:/opt/airflow/plugins 66 | - ./operators:/opt/airflow/operators 67 | - ./operators/dbt:/opt/airflow/operators/dbt 68 | user: "${AIRFLOW_UID:-50000}:0" 69 | depends_on: 70 | &airflow-common-depends-on 71 | redis: 72 | condition: service_healthy 73 | postgres: 74 | condition: service_healthy 75 | 76 | services: 77 | postgres: 78 | image: postgres:13 79 | environment: 80 | POSTGRES_USER: airflow 81 | POSTGRES_PASSWORD: airflow 82 | POSTGRES_DB: airflow 83 | volumes: 84 | - postgres-db-volume:/var/lib/postgresql/data 85 | ports: 86 | - 5432:5432 87 | healthcheck: 88 | test: ["CMD", "pg_isready", "-U", "airflow"] 89 | interval: 5s 90 | retries: 5 91 | restart: always 92 | 93 | redis: 94 | image: redis:latest 95 | expose: 96 | - 6379 97 | healthcheck: 98 | test: ["CMD", "redis-cli", "ping"] 99 | interval: 5s 100 | timeout: 30s 101 | retries: 50 102 | restart: always 103 | 104 | airflow-webserver: 105 | <<: *airflow-common 106 | command: webserver 107 | ports: 108 | - 8080:8080 109 | healthcheck: 110 | test: ["CMD", "curl", "--fail", "http://localhost:8080/health"] 111 | interval: 10s 112 | timeout: 10s 113 | retries: 5 114 | restart: always 115 | depends_on: 116 | <<: *airflow-common-depends-on 117 | airflow-init: 118 | condition: service_completed_successfully 119 | 120 | airflow-scheduler: 121 | <<: *airflow-common 122 | command: scheduler 123 | healthcheck: 124 | test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"'] 125 | interval: 10s 126 | timeout: 10s 127 | retries: 5 128 | restart: always 129 | depends_on: 130 | <<: *airflow-common-depends-on 131 | airflow-init: 132 | condition: service_completed_successfully 133 | 134 | airflow-worker: 135 | <<: *airflow-common 136 | command: celery worker 137 | healthcheck: 138 | test: 139 | - "CMD-SHELL" 140 | - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"' 141 | interval: 10s 142 | timeout: 10s 143 | retries: 5 144 | environment: 145 | <<: *airflow-common-env 146 | # Required to handle warm shutdown of the celery workers properly 147 | # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation 148 | DUMB_INIT_SETSID: "0" 149 | restart: always 150 | depends_on: 151 | <<: *airflow-common-depends-on 152 | airflow-init: 153 | condition: service_completed_successfully 154 | 155 | airflow-triggerer: 156 | <<: *airflow-common 157 | command: triggerer 158 | healthcheck: 159 | test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"'] 160 | interval: 10s 161 | timeout: 10s 162 | retries: 5 163 | restart: always 164 | depends_on: 165 | <<: *airflow-common-depends-on 166 | airflow-init: 167 | condition: service_completed_successfully 168 | 169 | airflow-init: 170 | <<: *airflow-common 171 | entrypoint: /bin/bash 172 | # yamllint disable rule:line-length 173 | command: 174 | - -c 175 | - | 176 | function ver() { 177 | printf "%04d%04d%04d%04d" $${1//./ } 178 | } 179 | airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version) 180 | airflow_version_comparable=$$(ver $${airflow_version}) 181 | min_airflow_version=2.2.0 182 | min_airflow_version_comparable=$$(ver $${min_airflow_version}) 183 | if (( airflow_version_comparable < min_airflow_version_comparable )); then 184 | echo 185 | echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m" 186 | echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!" 187 | echo 188 | exit 1 189 | fi 190 | if [[ -z "${AIRFLOW_UID}" ]]; then 191 | echo 192 | echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m" 193 | echo "If you are on Linux, you SHOULD follow the instructions below to set " 194 | echo "AIRFLOW_UID environment variable, otherwise files will be owned by root." 195 | echo "For other operating systems you can get rid of the warning with manually created .env file:" 196 | echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user" 197 | echo 198 | fi 199 | one_meg=1048576 200 | mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg)) 201 | cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat) 202 | disk_available=$$(df / | tail -1 | awk '{print $$4}') 203 | warning_resources="false" 204 | if (( mem_available < 4000 )) ; then 205 | echo 206 | echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m" 207 | echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))" 208 | echo 209 | warning_resources="true" 210 | fi 211 | if (( cpus_available < 2 )); then 212 | echo 213 | echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m" 214 | echo "At least 2 CPUs recommended. You have $${cpus_available}" 215 | echo 216 | warning_resources="true" 217 | fi 218 | if (( disk_available < one_meg * 10 )); then 219 | echo 220 | echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m" 221 | echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))" 222 | echo 223 | warning_resources="true" 224 | fi 225 | if [[ $${warning_resources} == "true" ]]; then 226 | echo 227 | echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m" 228 | echo "Please follow the instructions to increase amount of resources available:" 229 | echo " https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin" 230 | echo 231 | fi 232 | mkdir -p /sources/logs /sources/dags /sources/plugins 233 | chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins} 234 | exec /entrypoint airflow version 235 | # yamllint enable rule:line-length 236 | environment: 237 | <<: *airflow-common-env 238 | _AIRFLOW_DB_UPGRADE: 'true' 239 | _AIRFLOW_WWW_USER_CREATE: 'true' 240 | _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow} 241 | _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow} 242 | _PIP_ADDITIONAL_REQUIREMENTS: '' 243 | user: "0:0" 244 | volumes: 245 | - .:/sources 246 | 247 | airflow-cli: 248 | <<: *airflow-common 249 | profiles: 250 | - debug 251 | environment: 252 | <<: *airflow-common-env 253 | CONNECTION_CHECK_MAX_COUNT: "0" 254 | # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252 255 | command: 256 | - bash 257 | - -c 258 | - airflow 259 | 260 | # You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up 261 | # or by explicitly targeted on the command line e.g. docker-compose up flower. 262 | # See: https://docs.docker.com/compose/profiles/ 263 | flower: 264 | <<: *airflow-common 265 | command: celery flower 266 | profiles: 267 | - flower 268 | ports: 269 | - 5555:5555 270 | healthcheck: 271 | test: ["CMD", "curl", "--fail", "http://localhost:5555/"] 272 | interval: 10s 273 | timeout: 10s 274 | retries: 5 275 | restart: always 276 | depends_on: 277 | <<: *airflow-common-depends-on 278 | airflow-init: 279 | condition: service_completed_successfully 280 | 281 | volumes: 282 | postgres-db-volume: 283 | -------------------------------------------------------------------------------- /images/metabase_graphs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_graphs.png -------------------------------------------------------------------------------- /images/metabase_heatmap.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_heatmap.png -------------------------------------------------------------------------------- /images/metabase_recently_played.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_recently_played.png -------------------------------------------------------------------------------- /images/metabase_summary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_summary.png -------------------------------------------------------------------------------- /images/metabase_top_songs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_top_songs.png -------------------------------------------------------------------------------- /operators/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/operators/__init__.py -------------------------------------------------------------------------------- /operators/config.yml: -------------------------------------------------------------------------------- 1 | files: 2 | songs: '/opt/airflow/dags/spotify_data/spotify_songs' 3 | genres: '/opt/airflow/dags/spotify_data/spotify_genres' 4 | genres_tmp: '/opt/airflow/dags/spotify_data/spotify_genres_tmp' -------------------------------------------------------------------------------- /operators/copy_to_postgres.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copies CSV into Postgres 3 | """ 4 | 5 | from airflow.hooks.postgres_hook import PostgresHook 6 | 7 | 8 | def copy_expert_csv(file): 9 | hook = PostgresHook("postgres_localhost") 10 | with hook.get_conn() as connection: 11 | hook.copy_expert( 12 | f""" 13 | COPY {file} FROM stdin WITH CSV HEADER DELIMITER as ',' 14 | """, 15 | f"/opt/airflow/dags/spotify_data/{file}.csv", 16 | ) 17 | connection.commit() 18 | -------------------------------------------------------------------------------- /operators/dbt/dbt_project.yml: -------------------------------------------------------------------------------- 1 | 2 | # Name your project! Project names should contain only lowercase characters 3 | # and underscores. A good package name should reflect your organization's 4 | # name or the intended use of these models 5 | name: 'dbt_spotify' 6 | version: '1.0.0' 7 | config-version: 2 8 | 9 | # This setting configures which "profile" dbt uses for this project. 10 | profile: 'postgres' 11 | 12 | # These configurations specify where dbt should look for different types of files. 13 | # The `model-paths` config, for example, states that models in this project can be 14 | # found in the "models/" directory. You probably won't need to change these! 15 | model-paths: ["models"] 16 | analysis-paths: ["analyses"] 17 | test-paths: ["tests"] 18 | seed-paths: ["seeds"] 19 | macro-paths: ["macros"] 20 | snapshot-paths: ["snapshots"] 21 | 22 | target-path: "target" # directory which will store compiled SQL files 23 | clean-targets: # directories to be removed by `dbt clean` 24 | - "target" 25 | - "dbt_packages" 26 | 27 | 28 | # Configuring models 29 | # Full documentation: https://docs.getdbt.com/docs/configuring-models 30 | 31 | # In this example config, we tell dbt to build all models in the example/ directory 32 | # as tables. These settings can be overridden in the individual model files 33 | # using the `{{ config(...) }}` macro. 34 | models: 35 | dbt_spotify: 36 | # Config indicated by + and applies to all files under models/example/ 37 | marts: 38 | +materialized: table 39 | reporting: 40 | +schema: reporting 41 | analytical: 42 | +schema: analytical 43 | staging: 44 | +materialized: table 45 | spotify: 46 | +schema: staging 47 | -------------------------------------------------------------------------------- /operators/dbt/macros/generate_schema_name.sql: -------------------------------------------------------------------------------- 1 | {% macro generate_schema_name(custom_schema_name, node) -%} 2 | 3 | {%- set default_schema = target.schema -%} 4 | {%- if custom_schema_name is none -%} 5 | 6 | {{ default_schema }} 7 | 8 | {%- else -%} 9 | 10 | {{ custom_schema_name | trim }} 11 | 12 | {%- endif -%} 13 | 14 | {%- endmacro %} -------------------------------------------------------------------------------- /operators/dbt/models/marts/analytical/dim_genres.sql: -------------------------------------------------------------------------------- 1 | {{ 2 | config( 3 | materialized='table' 4 | ) 5 | }} 6 | 7 | with genres as ( 8 | 9 | select 10 | artist_id, 11 | artist_name, 12 | artist_genre, 13 | last_updated_datetime_utc at time zone 'utc' at time zone 'America/Chicago' as last_updated_datetime_ct 14 | 15 | from {{ ref('stg_genres') }} 16 | 17 | ) 18 | 19 | select * from genres -------------------------------------------------------------------------------- /operators/dbt/models/marts/analytical/dim_songs.sql: -------------------------------------------------------------------------------- 1 | {{ 2 | config( 3 | materialized='table' 4 | ) 5 | }} 6 | 7 | with songs as ( 8 | 9 | select 10 | played_at_utc at time zone 'utc' at time zone 'America/Chicago' as played_at_ct, 11 | cast(played_at_utc at time zone 'utc' at time zone 'America/Chicago' as date) as played_date_ct, 12 | song_name, 13 | artist_name, 14 | song_duration_ms, 15 | song_link, 16 | album_art_link, 17 | album_name, 18 | album_id, 19 | artist_id, 20 | track_id, 21 | last_updated_datetime_utc at time zone 'utc' at time zone 'America/Chicago' as last_updated_datetime_ct 22 | 23 | from {{ ref('stg_songs') }} 24 | 25 | ) 26 | 27 | select * from songs -------------------------------------------------------------------------------- /operators/dbt/models/marts/analytical/fct_listening_activity.sql: -------------------------------------------------------------------------------- 1 | {{ 2 | config( 3 | materialized='table' 4 | ) 5 | }} 6 | 7 | with songs as ( 8 | 9 | select * from {{ ref('dim_songs') }} 10 | 11 | ), 12 | 13 | genres as ( 14 | 15 | select * from {{ ref('dim_genres') }} 16 | 17 | ), 18 | 19 | final as ( 20 | 21 | select 22 | songs.played_at_ct as played_at, 23 | songs.played_date_ct as played_date, 24 | to_char(songs.played_at_ct, 'Day') as played_at_day_of_week, 25 | extract(year from songs.played_at_ct) as played_at_year, 26 | extract(month from songs.played_at_ct) as played_at_month, 27 | extract(day from songs.played_at_ct) as played_at_day, 28 | DATE_PART('week', cast(date_trunc('week', songs.played_at_ct + interval '1 day') - interval '1 day' as date)) as played_at_week_number, 29 | extract(hour from songs.played_at_ct) as played_at_hour, 30 | songs.song_name, 31 | songs.artist_name, 32 | genres.artist_genre, 33 | songs.song_duration_ms, 34 | cast(songs.song_duration_ms as decimal)/60000 as song_duration_mins, 35 | songs.song_link, 36 | songs.album_art_link, 37 | songs.album_name, 38 | songs.album_id, 39 | songs.artist_id, 40 | songs.track_id, 41 | songs.last_updated_datetime_ct as last_updated_datetime 42 | 43 | from songs 44 | 45 | left join genres 46 | on songs.artist_id = genres.artist_id 47 | 48 | ) 49 | 50 | select * from final -------------------------------------------------------------------------------- /operators/dbt/models/marts/analytical/schema.yml: -------------------------------------------------------------------------------- 1 | version: 2 2 | 3 | models: 4 | - name: dim_songs 5 | description: "Songs table" 6 | columns: 7 | - name: played_at_ct 8 | description: "The primary key for this table" 9 | tests: 10 | - unique 11 | - not_null 12 | 13 | - name: dim_genres 14 | description: "Genres table" 15 | columns: 16 | - name: artist_id 17 | description: "The primary key for this table" 18 | tests: 19 | - unique 20 | - not_null 21 | 22 | - name: fct_listening_activity 23 | description: "User's listening activity" 24 | columns: 25 | - name: played_at 26 | description: "The primary key for this table" 27 | tests: 28 | - unique 29 | - not_null -------------------------------------------------------------------------------- /operators/dbt/models/marts/reporting/rpt_hour_of_week.sql: -------------------------------------------------------------------------------- 1 | {{ 2 | config( 3 | materialized='view' 4 | ) 5 | }} 6 | 7 | select 8 | played_at_day_of_week, 9 | case when played_at_day_of_week like '%Sunday%' then '0_Sun' 10 | when played_at_day_of_week like '%Monday%' then '1_Mon' 11 | when played_at_day_of_week like '%Tuesday%' then '2_Tue' 12 | when played_at_day_of_week like '%Wednesday%' then '3_Wed' 13 | when played_at_day_of_week like '%Thursday%' then '4_Thu' 14 | when played_at_day_of_week like '%Friday%' then '5_Fri' 15 | when played_at_day_of_week like '%Saturday%' then '6_Sat' 16 | end as day_num, 17 | played_at_hour, 18 | sum(song_duration_mins) as song_duration_mins 19 | 20 | from {{ ref('fct_listening_activity') }} 21 | 22 | group by 23 | played_at_day_of_week, 24 | played_at_hour -------------------------------------------------------------------------------- /operators/dbt/models/marts/reporting/rpt_most_listened.sql: -------------------------------------------------------------------------------- 1 | {{ 2 | config( 3 | materialized='view' 4 | ) 5 | }} 6 | 7 | select distinct 8 | song_name, 9 | artist_name, 10 | album_name, 11 | artist_genre, 12 | song_link, 13 | count(track_id) over (partition by artist_name, song_name) as times_song_listened, 14 | count(artist_id) over (partition by artist_id) as times_artist_listened, 15 | max(played_date) over (partition by track_id) as song_last_listened_date 16 | 17 | from {{ ref('fct_listening_activity') }} -------------------------------------------------------------------------------- /operators/dbt/models/marts/reporting/rpt_recently_listened.sql: -------------------------------------------------------------------------------- 1 | {{ 2 | config( 3 | materialized='view' 4 | ) 5 | }} 6 | 7 | with listening_activity as ( 8 | 9 | select * from {{ ref('fct_listening_activity') }} 10 | 11 | ), 12 | 13 | final as ( 14 | 15 | select 16 | played_at, 17 | song_name, 18 | artist_name, 19 | album_name, 20 | artist_genre, 21 | song_duration_mins, 22 | song_link, 23 | played_date, 24 | last_updated_datetime 25 | 26 | from listening_activity 27 | ) 28 | 29 | select * from final -------------------------------------------------------------------------------- /operators/dbt/models/marts/reporting/rpt_weekly_discovers.sql: -------------------------------------------------------------------------------- 1 | {{ 2 | config( 3 | materialized='view' 4 | ) 5 | }} 6 | 7 | with listening_activity as ( 8 | 9 | select * from {{ ref('fct_listening_activity') }} 10 | 11 | ), 12 | 13 | curr as ( 14 | 15 | select distinct 16 | artist_name, 17 | artist_id, 18 | count(artist_id) over(partition by artist_id) as times_listened, 19 | max(played_at) over (partition by artist_id) as last_listened_time 20 | 21 | from listening_activity 22 | 23 | where cast(date_trunc('week', played_date + interval '1 day') - interval '1 day' as date) = cast(date_trunc('week', current_date + interval '1 day') - interval '1 day' as date) 24 | ) 25 | 26 | ,prev as ( 27 | 28 | select distinct 29 | artist_name, 30 | artist_id 31 | 32 | from listening_activity 33 | 34 | where cast(date_trunc('week', played_date + interval '1 day') - interval '1 day' as date) < cast(date_trunc('week', current_date + interval '1 day') - interval '1 day' as date) 35 | ) 36 | 37 | select 38 | curr.artist_name, 39 | curr.artist_id, 40 | times_listened 41 | 42 | from curr 43 | 44 | left join prev 45 | on curr.artist_id = prev.artist_id 46 | 47 | where prev.artist_id is null 48 | -------------------------------------------------------------------------------- /operators/dbt/models/marts/reporting/schema.yml: -------------------------------------------------------------------------------- 1 | version: 2 2 | 3 | models: 4 | - name: rpt_most_listened 5 | description: "Most listened artists and tracks" 6 | columns: 7 | - name: (artist_name || song_name || song_link) 8 | description: "The primary key for this table" 9 | tests: 10 | - unique 11 | - not_null 12 | - name: rpt_recently_listened 13 | description: "Recently listened artists and tracks" 14 | columns: 15 | - name: played_at 16 | description: "The primary key for this table" 17 | tests: 18 | - unique 19 | - not_null 20 | - name: rpt_weekly_discovers 21 | description: "New artists discovered in the current week" 22 | columns: 23 | - name: artist_id 24 | description: "The primary key for this table" 25 | tests: 26 | - unique 27 | - not_null 28 | - name: rpt_hour_of_week 29 | description: "Hour of week heatmap on listening activity" 30 | columns: 31 | - name: (played_at_day_of_week || played_at_hour) 32 | description: "The primary key for this table" 33 | tests: 34 | - unique 35 | - not_null -------------------------------------------------------------------------------- /operators/dbt/models/staging/spotify/schema.yml: -------------------------------------------------------------------------------- 1 | version: 2 2 | 3 | sources: 4 | - name: spotify 5 | description: 'Spotify raw data landing zone' 6 | database: spotify 7 | schema: public 8 | tables: 9 | - name: spotify_songs 10 | description: 'Details about the songs the user listened to' 11 | - name: spotify_genres 12 | description: 'Details about the corresponding genres of the songs the user listened to' -------------------------------------------------------------------------------- /operators/dbt/models/staging/spotify/stg_genres.sql: -------------------------------------------------------------------------------- 1 | {{ 2 | config( 3 | materialized='table' 4 | ) 5 | }} 6 | 7 | with source_spotify_genres as ( 8 | select * from {{ source('spotify', 'spotify_genres') }} 9 | ), 10 | 11 | final as ( 12 | select * from source_spotify_genres 13 | ) 14 | 15 | select * from final -------------------------------------------------------------------------------- /operators/dbt/models/staging/spotify/stg_songs.sql: -------------------------------------------------------------------------------- 1 | {{ 2 | config( 3 | materialized='table' 4 | ) 5 | }} 6 | 7 | with source_spotify_songs as ( 8 | select * from {{ source('spotify', 'spotify_songs') }} 9 | ), 10 | 11 | final as ( 12 | select * from source_spotify_songs 13 | ) 14 | 15 | select * from final -------------------------------------------------------------------------------- /operators/main.py: -------------------------------------------------------------------------------- 1 | """ 2 | Makes requests to the Spotify API to retrieve recently played songs and the corresponding genres 3 | """ 4 | 5 | import datetime as dt 6 | import os.path 7 | from datetime import datetime 8 | from pathlib import Path 9 | from secrets import spotify_user_id 10 | 11 | import pandas as pd 12 | import requests 13 | from postgres_connect import ConnectPostgres 14 | from refresh import RefreshToken 15 | from yaml_load import yaml_loader 16 | 17 | 18 | class RetrieveSongs: 19 | def __init__(self): 20 | self.user_id = spotify_user_id # Spotify username 21 | self.spotify_token = "" # Spotify access token 22 | 23 | # Query the postgres database to get the latest played timestamp 24 | def get_latest_listened_timestamp(self): 25 | conn = ConnectPostgres().postgres_connector() 26 | cur = conn.cursor() 27 | 28 | query = "SELECT MAX(played_at_utc) FROM public.spotify_songs" 29 | 30 | cur.execute(query) 31 | 32 | max_played_at_utc = cur.fetchall()[0][0] 33 | 34 | # If the spotify_songs table is empty, grab the earliest data we can. Set at t - 90 days for now. 35 | if max_played_at_utc is None: 36 | today = dt.datetime.now() 37 | previous_date = today - dt.timedelta(days=90) 38 | previous_date_unix_timestamp = int(previous_date.timestamp()) * 1000 39 | latest_timestamp = previous_date_unix_timestamp 40 | else: 41 | latest_timestamp = int(max_played_at_utc.timestamp()) * 1000 42 | return latest_timestamp 43 | 44 | # Extract recently played songs from Spotify API 45 | def get_songs(self): 46 | headers = { 47 | "Accept": "application/json", 48 | "Content-Type": "application/json", 49 | "Authorization": "Bearer {}".format(self.spotify_token), 50 | } 51 | 52 | latest_timestamp = RetrieveSongs().get_latest_listened_timestamp() 53 | config = yaml_loader() 54 | songs = config["files"]["songs"] 55 | genres = config["files"]["genres"] 56 | genres_tmp = config["files"]["genres_tmp"] 57 | 58 | # Download all songs listened to since the last run or since the earliest listen date defined in the get_latest_listened_timestamp function 59 | song_response = requests.get( 60 | "https://api.spotify.com/v1/me/player/recently-played?limit=50&after={time}".format( 61 | time=latest_timestamp 62 | ), 63 | headers=headers, 64 | ) 65 | 66 | song_data = song_response.json() 67 | 68 | played_at_utc = [] 69 | played_date_utc = [] 70 | song_names = [] 71 | artist_names = [] 72 | song_durations_ms = [] 73 | song_links = [] 74 | album_art_links = [] 75 | album_names = [] 76 | album_ids = [] 77 | artist_ids = [] 78 | track_ids = [] 79 | 80 | # Extract only the necessary data from the json object 81 | for song in song_data["items"]: 82 | played_at_utc.append(song["played_at"]) 83 | played_date_utc.append(song["played_at"][0:10]) 84 | song_names.append(song["track"]["name"]) 85 | artist_names.append(song["track"]["album"]["artists"][0]["name"]) 86 | song_durations_ms.append(song["track"]["duration_ms"]) 87 | song_links.append(song["track"]["external_urls"]["spotify"]) 88 | album_art_links.append(song["track"]["album"]["images"][1]["url"]) 89 | album_names.append(song["track"]["album"]["name"]) 90 | album_ids.append(song["track"]["album"]["id"]) 91 | artist_ids.append(song["track"]["artists"][0]["id"]) 92 | track_ids.append(song["track"]["id"]) 93 | 94 | # Prepare a dictionary in order to turn it into a pandas dataframe 95 | song_dict = { 96 | "played_at_utc": played_at_utc, 97 | "played_date_utc": played_date_utc, 98 | "song_name": song_names, 99 | "artist_name": artist_names, 100 | "song_duration_ms": song_durations_ms, 101 | "song_link": song_links, 102 | "album_art_link": album_art_links, 103 | "album_name": album_names, 104 | "album_id": album_ids, 105 | "artist_id": artist_ids, 106 | "track_id": track_ids, 107 | } 108 | 109 | song_df = pd.DataFrame( 110 | song_dict, 111 | columns=[ 112 | "played_at_utc", 113 | "played_date_utc", 114 | "song_name", 115 | "artist_name", 116 | "song_duration_ms", 117 | "song_link", 118 | "album_art_link", 119 | "album_name", 120 | "album_id", 121 | "artist_id", 122 | "track_id", 123 | ], 124 | ) 125 | 126 | last_updated_datetime_utc = dt.datetime.utcnow() 127 | song_df["last_updated_datetime_utc"] = last_updated_datetime_utc 128 | song_df = song_df.sort_values("played_at_utc", ascending=True) 129 | 130 | # Remove latest song since last run since this will be a duplicate then write to csv 131 | song_df = song_df.iloc[1:, :] 132 | song_df.to_csv(f"{songs}.csv", index=False) 133 | 134 | for date in set(song_df["played_date_utc"]): 135 | played_dt = datetime.strptime(date, "%Y-%m-%d") 136 | date_year = played_dt.year 137 | date_month = played_dt.month 138 | output_song_dir = Path(f"{songs}/{date_year}/{date_month}") 139 | output_song_file = f"{date}.csv" 140 | path_to_songs_file = f"{output_song_dir}/{output_song_file}" 141 | songs_file_exists = os.path.exists(path_to_songs_file) 142 | print(songs_file_exists) 143 | # Check to see if file exists. If not create a new file, else append to existing file. 144 | if songs_file_exists: 145 | curr_song_df = pd.read_csv(path_to_songs_file) 146 | curr_song_df = curr_song_df.append(song_df) 147 | curr_song_df.to_csv(path_to_songs_file, index=False) 148 | else: 149 | output_song_dir.mkdir(parents=True, exist_ok=True) 150 | song_df.loc[song_df["played_date_utc"] == date].to_csv( 151 | f"{output_song_dir}/{date}.csv", index=False 152 | ) 153 | 154 | # Retrieve the corresponding genres for the artists in the artist_ids list 155 | artist_ids_genres = [] 156 | artist_names = [] 157 | artist_genres = [] 158 | 159 | artist_ids_dedup = set(artist_ids) 160 | 161 | for id in artist_ids_dedup: 162 | artist_response = requests.get( 163 | "https://api.spotify.com/v1/artists/{id}".format(id=id), headers=headers 164 | ) 165 | 166 | artist_data = artist_response.json() 167 | 168 | artist_ids_genres.append(artist_data["id"]) 169 | artist_names.append(artist_data["name"]) 170 | 171 | if len(artist_data["genres"]) == 0: 172 | artist_genres.append(None) 173 | else: 174 | artist_genres.append(artist_data["genres"][0]) 175 | 176 | artist_dict = { 177 | "artist_id": artist_ids_genres, 178 | "artist_name": artist_names, 179 | "artist_genre": artist_genres, 180 | } 181 | 182 | artist_genre_df = pd.DataFrame( 183 | artist_dict, columns=["artist_id", "artist_name", "artist_genre"] 184 | ) 185 | 186 | artist_genre_df.to_csv(f"{genres_tmp}.csv", index=False) 187 | artist_genre_df_nh = pd.read_csv(f"{genres_tmp}.csv", sep=",") 188 | try: 189 | curr_artist_genre_df = pd.read_csv(f"{genres}.csv", sep=",") 190 | curr_artist_genre_df = curr_artist_genre_df.append(artist_genre_df_nh) 191 | curr_artist_genre_df.drop_duplicates( 192 | subset="artist_id", keep="first", inplace=True 193 | ) 194 | curr_artist_genre_df[ 195 | "last_updated_datetime_utc" 196 | ] = last_updated_datetime_utc 197 | curr_artist_genre_df.to_csv(f"{genres}.csv", index=False) 198 | except: 199 | artist_genre_df_nh["last_updated_datetime_utc"] = last_updated_datetime_utc 200 | artist_genre_df_nh.to_csv(f"{genres}.csv", index=False) 201 | os.remove(f"{genres_tmp}.csv") 202 | 203 | def call_refresh(self): 204 | print("Refreshing token...") 205 | refresher = RefreshToken() 206 | self.spotify_token = refresher.refresh() 207 | print("Getting songs...") 208 | self.get_songs() 209 | 210 | 211 | if __name__ == "__main__": 212 | tracks = RetrieveSongs() 213 | tracks.call_refresh() 214 | -------------------------------------------------------------------------------- /operators/postgres_connect.py: -------------------------------------------------------------------------------- 1 | """ 2 | Connects to the Postgres database 3 | """ 4 | 5 | from secrets import dbname, host, pg_password, pg_user, port 6 | 7 | import psycopg2 8 | 9 | 10 | class ConnectPostgres: 11 | def __init__(self): 12 | self.host = host 13 | self.port = port 14 | self.dbname = dbname 15 | self.pg_user = pg_user 16 | self.pg_password = pg_password 17 | 18 | def postgres_connector(self): 19 | conn = psycopg2.connect( 20 | f"host='{self.host}' port='{self.port}' dbname='{self.dbname}' user='{self.pg_user}' password='{self.pg_password}'" 21 | ) 22 | return conn 23 | 24 | 25 | if __name__ == "__main__": 26 | conn = ConnectPostgres() 27 | conn.postgres_connector() 28 | -------------------------------------------------------------------------------- /operators/refresh.py: -------------------------------------------------------------------------------- 1 | """ 2 | Generates a new access token on each run 3 | """ 4 | 5 | from secrets import base_64, refresh_token 6 | 7 | import requests 8 | 9 | 10 | class RefreshToken: 11 | def __init__(self): 12 | self.refresh_token = refresh_token 13 | self.base_64 = base_64 14 | 15 | def refresh(self): 16 | query = "https://accounts.spotify.com/api/token" 17 | response = requests.post( 18 | query, 19 | data={"grant_type": "refresh_token", "refresh_token": refresh_token}, 20 | headers={"Authorization": "Basic " + base_64}, 21 | ) 22 | 23 | response_json = response.json() 24 | return response_json["access_token"] 25 | 26 | 27 | if __name__ == "__main__": 28 | new_token = RefreshToken() 29 | new_token.refresh() 30 | -------------------------------------------------------------------------------- /operators/yaml_load.py: -------------------------------------------------------------------------------- 1 | """ 2 | Reads in a config.yml file 3 | """ 4 | 5 | import yaml 6 | 7 | 8 | def yaml_loader(): 9 | config = yaml.safe_load(open("/opt/airflow/operators/config.yml")) 10 | return config 11 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | airflow-dbt==0.4.0 2 | pandas==1.5.2 3 | psycopg2==2.9.5 4 | pyyaml==6.0 5 | requests==2.28.1 -------------------------------------------------------------------------------- /setup/airflow_docker.md: -------------------------------------------------------------------------------- 1 | # Build Docker Containers for Airflow 2 | 3 | - Check if you have enough memory (need at least 4GB) 4 | ``` 5 | docker run --rm "debian:bullseye-slim" bash -c 'numfmt --to iec $(echo $(($(getconf _PHYS_PAGES) * $(getconf PAGE_SIZE))))' 6 | ``` 7 | - Fetch docker-compose.yaml 8 | ``` 9 | curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.0/docker-compose.yaml' 10 | ``` 11 | - Make the directories and set the user 12 | ``` 13 | mkdir -p ./dags ./logs ./plugins 14 | echo -e "AIRFLOW_UID=$(id -u)" > .env 15 | ``` 16 | - Initialize the database 17 | ``` 18 | docker compose up airflow-init 19 | ``` 20 | - Start all services 21 | ``` 22 | docker-compose up 23 | ``` 24 | - Airflow is now available on http://localhost:8080/home 25 | - Depending on where your dbt project is installed a new volume will need to be added in the docker-compose.yaml file in order for dbt to run in Airflow 26 | ``` 27 | - ./operators/dbt:/opt/airflow/operators/dbt 28 | ``` -------------------------------------------------------------------------------- /setup/dbt.md: -------------------------------------------------------------------------------- 1 | # Install dbt Core with Homebrew (or your method of choice) 2 | 3 | - Run the below commands: 4 | ``` 5 | brew update 6 | brew install git 7 | brew tap dbt-labs/dbt 8 | ``` 9 | - Identify your [**adapter**](https://docs.getdbt.com/docs/supported-data-platforms) (in this case Postgres is used) and install: 10 | ``` 11 | brew install dbt-postgres 12 | ``` 13 | - cd to the directory where you want to have dbt installed and initialize the project 14 | ``` 15 | dbt init 16 | ``` 17 | - Update the profiles.yml file found in Users//.dbt/ 18 | - Update all the appropriate configurations based on the [**dbt setup guide**](https://docs.getdbt.com/reference/warehouse-setups/postgres-setup) 19 | - Go to the dbt_project.yml file and make sure the profile configuration matches with the one in the profiles.yml file 20 | - Ensure the database setup was done correctly 21 | ``` 22 | dbt debug 23 | ``` 24 | - Test that dbt is building the models correctly. If successful you can verify the new tables/views in the database. 25 | ``` 26 | dbt run 27 | ``` 28 | - Generate the docs for the dbt project 29 | ``` 30 | dbt docs generate 31 | ``` 32 | - Serve the docs on a webserver using port 8001 33 | ``` 34 | dbt docs serve --port 8001 35 | ``` 36 | -------------------------------------------------------------------------------- /setup/metabase.md: -------------------------------------------------------------------------------- 1 | # Install Metabase 2 | 3 | - Download the Metabase [**JAR file**](https://www.metabase.com/start/oss/) (or your method of choice, JAR file was used as Metabase through Docker wasn't working on M1 Macs at the time of this writing) 4 | - Create a new directory and move the Metabase JAR file into it 5 | - Ensure the [**latest Java version**](https://www.oracle.com/java/technologies/downloads/#jdk19-mac) is downloaded 6 | - cd into the new Metabase directory and run the JAR 7 | ``` 8 | java -jar metabase.jar 9 | ``` 10 | - Metabase is now available on http://localhost:3000/setup 11 | - Set up the connection and use host: localhost 12 | -------------------------------------------------------------------------------- /setup/postgres.md: -------------------------------------------------------------------------------- 1 | # Set up Airflow connection to Postgres 2 | 3 | - Add ports to the section under services and Postgres in the docker-compose.yaml file like below: 4 | ``` 5 | ports: 6 | - 5432:5432 7 | ``` 8 | - Download DBeaver (or your tool of choice) 9 | - Create a new Postgres connection and add the username and password 10 | - Test the connection, it may ask you to download the Postgres JDBC driver if you don't have it. Download and test again. 11 | - Once connection is successful create a new database named 'spotify' 12 | - Go to the Airflow UI and click on Admin>Connections then click on the + sign 13 | - Fill in the connection with the below details and click save: 14 | - Conn Id: postgres_localhost 15 | - Conn Type: Postgres 16 | - Host: host.docker.internal 17 | - Schema: spotify 18 | - Login: 19 | - Password: 20 | - Port: 5432 -------------------------------------------------------------------------------- /setup/slack_notifications.md: -------------------------------------------------------------------------------- 1 | # Enable Slack notifications for any Airflow task failures 2 | 3 | - Create a channel in your workspace where the alerts will be sent 4 | - Go to api.slack.com/apps and click on "Create New App" then click on "From scratch" 5 | - Give your app a name and select your workspace where the alerts will be sent then click "Create App" 6 | - Enable incoming webhooks for your Slack workspace app 7 | - You can test your webhook from the command-line by running the code below (replace with your own key): 8 | ``` 9 | -curl -X POST -H 'Content-type: application/json' --data '{"text":"Hello, World!"}' https://hooks.slack.com/services/XXXXXXXXXXX/XXXXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXX 10 | ``` 11 | - Go to the Airflow UI and click on Admin>Connections then click on the + sign 12 | - Fill in the connection with the below details and click save (replace password with your credentials): 13 | - Connection Id: slack 14 | - Connection Type: HTTP 15 | - Host: https://hooks.slack.com/services/ 16 | - Password: /T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX 17 | - Implement the code into the DAG script to enable alerts -------------------------------------------------------------------------------- /setup/spotify_api_access.md: -------------------------------------------------------------------------------- 1 | # Spotify API Access 2 | 3 | - Ensure you have a Spotify account created 4 | - Register Your Application 5 | - Go to the [**Dashboard**](https://developer.spotify.com/dashboard/applications) page on the Spotify Developer site 6 | - Click on **CREATE AN APP**. Provide your app name and app description and then click create. 7 | - Click on **EDIT SETTINGS** and provide a redirect URI and then click save 8 | - Copy and save your Client ID and Client Secret 9 | - Define the query parameters in your custom link 10 | - Link: https://accounts.spotify.com/authorize?client_id=&response_type=code&redirect_uri=&scope= 11 | - = The Client ID saved from the step above 12 | - = The redirect URI you provided in the step above. This needs to be the ENCODED redirect URI. You can encode the redirect URI by going to [**urlencoder.org**](https://www.urlencoder.org/), pasting in the redirect URI, and then clicking encode. Ensure encode is selected and not decode. 13 | - = Scope(s) needed for your requests. In this case we are using user-read-recently-played. 14 | - Go to the link created in the step above to obtain your authorization code 15 | - Paste the link from the step above into a browser and hit enter 16 | - Click Agree 17 | - Copy the new URL and save the authorization code (value after 'code=' parameter) 18 | - Define your curl command 19 | - Ensure you have curl by opening up command prompt/terminal and typing curl 20 | - Curl command: 21 | ``` 22 | -curl -d client_id= -d client_secret= -d grant_type=authorization_code -d code= -d redirect_uri= https://accounts.spotify.com/api/token 23 | ``` 24 | - Run curl command to obtain access token and refresh token 25 | - Paste in the curl command from the step above into command prompt/terminal and run 26 | - Save your access token and refresh token 27 | - Access token is what we define as spotify_token in our code 28 | - Refresh token will be used to generate a new access token on each run as the access token expires after one hour 29 | - Convert Client ID and Client Secret to a base 64 encoded string 30 | - : 31 | - Using the format above convert to a base 64 encoded string by going to [**base64encode.org**](https://www.base64encode.org/), pasting in the string, and then clicking encode. Ensure encode is selected and not decode. 32 | - This will be defined as base_64 in our code and will be used when we generate a new access token on each run 33 | - Using the format above convert to a base 64 encoded string by going to [base64encode.org](https://www.base64encode.org/), pasting in the string, and then clicking encode. Ensure encode is selected and not decode.
34 | - This will be defined as base_64 in our code and will be used when we generate a new access token on each run
35 | 36 | References: 37 | https://developer.spotify.com/console/get-recently-played/?limit=&after=&before= -------------------------------------------------------------------------------- /spotify_data/spotify_genres.csv: -------------------------------------------------------------------------------- 1 | artist_id,artist_name,artist_genre 2 | 2lZFlNiQMLa2fuX3pkXcan,YOUHA,k-indie 3 | 4SpbR6yFEvexJuaBpgAU5p,LE SSERAFIM, 4 | 3YUtIXyGE3p0Y4UPc3hyrf,AFTERSCHOOL RED, 5 | 5V1qsQHdXNm4ZEZHWvFnqQ,Dreamcatcher,k-pop 6 | 0ghlgldX5Dd6720Q3qFyQB,TOMORROW X TOGETHER,k-pop 7 | 4hozqATxbpy9TwKWRT8QVO,Rocket Punch,k-pop 8 | 6IW91qUpcrhbGuZxubrG70,QUEENDOM, 9 | 7IlRNXHjoOCgEAWN5qYksg,Aya Nakamura,basshall 10 | 7rpKUJ0AnklJ8q9nIPVSpZ,Reol,anime rock 11 | 2AfmfGFbe0A0WsTYm0SDTx,(G)I-DLE,k-pop 12 | 7k73EtZwoPs516ZxE72KsO,ONE OK ROCK,j-pop 13 | 6GwM5CHqhWXzG3l5kzRSAS,Younha,k-pop 14 | 3HqSLMAZ3g3d5poNaI7GOU,IU,k-pop 15 | 3ZZzT0naD25RhY2uZvIKkJ,EVERGLOW,k-pop 16 | 2TMRvcwsmvVhvuEbKVEbZe,YURI,korean r&b 17 | 0FnDCrmcQT8qz5TEsZIYw5,Awich,j-rap 18 | 0Sadg1vgvaPqGTOjxu0N6c,Girls' Generation,k-pop 19 | 0UntV1Bw2hk3fbRrm9eMP6,B.I,k-pop 20 | 2KC9Qb60EaY0kW4eH68vr3,ITZY,k-pop girl group 21 | 2vjeuQwzSP5ErC1S41gONX,CHANMINA,j-pop 22 | 3OBkZ9NG8F0Fn4oNpg0yuU,HA SUNG WOON,k-pop 23 | 43pS0OMyfQBVqILYQJ8fpu,Dj Andrés, 24 | 2uWcrwgWmZcQc3IPBs3tfU,Apink,k-pop 25 | 1EowJ1WwkMzkCkRomFhui7,RADWIMPS,j-pop 26 | 3699Hh55qWXd0kWWMWRR2o,mimiirose, 27 | 6HvZYsbFfjnjFrWF950C9d,NewJeans,k-pop 28 | 7n2Ycct7Beij7Dj7meI4X0,TWICE,k-pop 29 | 3Nrfpe0tUJi4K4DXYWgMUX,BTS,k-pop 30 | 54gWVQFHf8IIqbjxAoOarN,AOA,k-pop 31 | 3C13AlJZ4QWHSruAWe9VPI,CRAXY, 32 | 5GwQwY63I9hrUUFlQB8FYU,H1-KEY, 33 | -------------------------------------------------------------------------------- /spotify_data/spotify_songs.csv: -------------------------------------------------------------------------------- 1 | played_at_utc,played_date_utc,song_name,artist_name,song_duration_ms,song_link,album_art_link,album_name,album_release_date,album_id,artist_id 2 | 2022-12-22T01:40:05.724Z,2022-12-22,Into You,YURI,186531,https://open.spotify.com/track/6L8wVNs6kuQ7sRjHowbrLp,https://i.scdn.co/image/ab67616d00001e02fe9914fad928a7cbf93b5175,The First Scene - The 1st Mini Album,2018-10-04,1vRQP001rGl7zI3W6ghGSR,2TMRvcwsmvVhvuEbKVEbZe 3 | 2022-12-22T01:36:58.790Z,2022-12-22,第六感,Reol,191624,https://open.spotify.com/track/22sQUmLhT8umlEhQzDrzfJ,https://i.scdn.co/image/ab67616d00001e0253172aaf6f532e9e45cbb841,第六感,2020-07-27,6CTOnVKQhpsL1NeJQ3XyXF,7rpKUJ0AnklJ8q9nIPVSpZ 4 | 2022-12-22T01:33:46.407Z,2022-12-22,LUVORATORRRRRY!,Reol,204187,https://open.spotify.com/track/1AMzUKhF0vCFHZTx8H7OS4,https://i.scdn.co/image/ab67616d00001e024b3d52f255be4bab0f200cb2,エンドレスEP,2017-10-11,1G7u7llrqq988dLueDPV7J,7rpKUJ0AnklJ8q9nIPVSpZ 5 | 2022-12-22T01:30:21.336Z,2022-12-22,煽げや尊し,Reol,217106,https://open.spotify.com/track/2LeHTvjUmiR1A6L9txGDfm,https://i.scdn.co/image/ab67616d00001e026968e389f956d5f84ce66d85,煽げや尊し,2022-07-27,54Jl0Cvkj6wS8oJnLyCsVB,7rpKUJ0AnklJ8q9nIPVSpZ 6 | 2022-12-22T01:26:44.114Z,2022-12-22,Angel,CHANMINA,191566,https://open.spotify.com/track/4iShk7X3Q8vVIIiFENs9Yz,https://i.scdn.co/image/ab67616d00001e02b6274d484c9ea468882ada0d,Harenchi,2021-10-13,1q9RWaiqhyFz8tYrl57w98,2vjeuQwzSP5ErC1S41gONX 7 | 2022-12-22T01:23:32.057Z,2022-12-22,Blueming,IU,217053,https://open.spotify.com/track/4Dr2hJ3EnVh2Aaot6fRwDO,https://i.scdn.co/image/ab67616d00001e02b658276cd9884ef6fae69033,Love poem,2019-11-18,2xEH7SRzJq7LgA0fCtTlxH,3HqSLMAZ3g3d5poNaI7GOU 8 | 2022-12-22T01:20:09.227Z,2022-12-22,GILA GILA,Awich,253333,https://open.spotify.com/track/7AsWduh1mkB8uKOox5NH0X,https://i.scdn.co/image/ab67616d00001e024e16f3fb9b9996cfa91d12f4,Queendom,2022-03-04,4jj5K8UuV6fBOHj4nOCOON,0FnDCrmcQT8qz5TEsZIYw5 9 | 2022-12-21T16:45:10.873Z,2022-12-21,Answer is near,ONE OK ROCK,219826,https://open.spotify.com/track/3N29lMZHMKTVGXUN5aqzl5,https://i.scdn.co/image/ab67616d00001e021ccfaf5b5a356e8379a1e1e7,Zankyō Reference,2011-10-05,0cb55nHUbG3tLjVVQYPdRj,7k73EtZwoPs516ZxE72KsO 10 | 2022-12-21T16:40:17.878Z,2022-12-21,Into You,YURI,186531,https://open.spotify.com/track/6L8wVNs6kuQ7sRjHowbrLp,https://i.scdn.co/image/ab67616d00001e02fe9914fad928a7cbf93b5175,The First Scene - The 1st Mini Album,2018-10-04,1vRQP001rGl7zI3W6ghGSR,2TMRvcwsmvVhvuEbKVEbZe 11 | 2022-12-21T16:37:10.970Z,2022-12-21,第六感,Reol,191624,https://open.spotify.com/track/22sQUmLhT8umlEhQzDrzfJ,https://i.scdn.co/image/ab67616d00001e0253172aaf6f532e9e45cbb841,第六感,2020-07-27,6CTOnVKQhpsL1NeJQ3XyXF,7rpKUJ0AnklJ8q9nIPVSpZ 12 | 2022-12-21T16:33:58.540Z,2022-12-21,LUVORATORRRRRY!,Reol,204187,https://open.spotify.com/track/1AMzUKhF0vCFHZTx8H7OS4,https://i.scdn.co/image/ab67616d00001e024b3d52f255be4bab0f200cb2,エンドレスEP,2017-10-11,1G7u7llrqq988dLueDPV7J,7rpKUJ0AnklJ8q9nIPVSpZ 13 | 2022-12-21T16:30:33.497Z,2022-12-21,煽げや尊し,Reol,217106,https://open.spotify.com/track/2LeHTvjUmiR1A6L9txGDfm,https://i.scdn.co/image/ab67616d00001e026968e389f956d5f84ce66d85,煽げや尊し,2022-07-27,54Jl0Cvkj6wS8oJnLyCsVB,7rpKUJ0AnklJ8q9nIPVSpZ 14 | 2022-12-21T16:26:55.919Z,2022-12-21,Angel,CHANMINA,191566,https://open.spotify.com/track/4iShk7X3Q8vVIIiFENs9Yz,https://i.scdn.co/image/ab67616d00001e02b6274d484c9ea468882ada0d,Harenchi,2021-10-13,1q9RWaiqhyFz8tYrl57w98,2vjeuQwzSP5ErC1S41gONX 15 | 2022-12-21T16:23:44.435Z,2022-12-21,Blueming,IU,217053,https://open.spotify.com/track/4Dr2hJ3EnVh2Aaot6fRwDO,https://i.scdn.co/image/ab67616d00001e02b658276cd9884ef6fae69033,Love poem,2019-11-18,2xEH7SRzJq7LgA0fCtTlxH,3HqSLMAZ3g3d5poNaI7GOU 16 | 2022-12-21T16:20:06.729Z,2022-12-21,What,Dreamcatcher,205345,https://open.spotify.com/track/4BdpM8bR0nifsaqSz3qphQ,https://i.scdn.co/image/ab67616d00001e028c0b09a8965bb16ff3f7d889,Alone In The City,2018-09-20,68esmTNXocYLPEbLD1c7si,5V1qsQHdXNm4ZEZHWvFnqQ 17 | 2022-12-21T16:16:40.980Z,2022-12-21,Dilemma,Apink,209207,https://open.spotify.com/track/3j0x2BUUtm2obQXS1lZuN3,https://i.scdn.co/image/ab67616d00001e020582560196a977f50cc8411b,HORN,2022-02-14,6GeYzOIumBxJ4iF41J3KXM,2uWcrwgWmZcQc3IPBs3tfU 18 | 2022-12-21T15:10:42.288Z,2022-12-21,Stay with me,AOA,199800,https://open.spotify.com/track/3gesvtxiKvuXj63ZRdHUEY,https://i.scdn.co/image/ab67616d00001e027b05ff669e7218d1feb70c06,Ace of Angels,2015-10-14,4kyXI4CACtU6TzTyhTa4dL,54gWVQFHf8IIqbjxAoOarN 19 | 2022-12-21T15:02:57.112Z,2022-12-21,In and out of love - Beach Club,Dj Andrés,312779,https://open.spotify.com/track/4TMlTyUoYMrJsYEI61RCUG,https://i.scdn.co/image/ab67616d00001e0263e638aa85722ce27327a937,In and out of love (Beach Club),2022-06-26,081Q2WsyY89PTlyJ4vBOwu,43pS0OMyfQBVqILYQJ8fpu 20 | 2022-12-21T14:57:43Z,2022-12-21,Nirvana - Steeve West Remix,Aya Nakamura,142857,https://open.spotify.com/track/7ckdWwRicoQ9zxflneYOjp,https://i.scdn.co/image/ab67616d00001e0211c0ee558211e527cd5c0a06,Nirvana (Steeve West Remix),2022-09-20,2TWqqpTOEBvdiL5IPlJUAj,7IlRNXHjoOCgEAWN5qYksg 21 | 2022-12-21T14:55:09.246Z,2022-12-21,Good Boy Gone Bad,TOMORROW X TOGETHER,191038,https://open.spotify.com/track/1HsSIPLTQT354yJcQGfEY3,https://i.scdn.co/image/ab67616d00001e0213ac5d67675999ba7b9c4f21,minisode 2: Thursday's Child,2022-05-09,1o8jYrnyZueTPIdhlHuTc8,0ghlgldX5Dd6720Q3qFyQB 22 | 2022-12-21T14:54:28.211Z,2022-12-21,Good Boy Gone Bad,TOMORROW X TOGETHER,191038,https://open.spotify.com/track/1HsSIPLTQT354yJcQGfEY3,https://i.scdn.co/image/ab67616d00001e0213ac5d67675999ba7b9c4f21,minisode 2: Thursday's Child,2022-05-09,1o8jYrnyZueTPIdhlHuTc8,0ghlgldX5Dd6720Q3qFyQB 23 | 2022-12-21T14:54:21.275Z,2022-12-21,Kimi to Hitsuji to Ao,RADWIMPS,162280,https://open.spotify.com/track/2qz2iNaqKDJesD2ombEIsi,https://i.scdn.co/image/ab67616d00001e02c0c53dcf5bb71d127b8eab7f,Zettaizetsumei,2011-03-09,3b3tyPWcSOYy5SFC0bCUWP,1EowJ1WwkMzkCkRomFhui7 24 | 2022-12-21T14:32:03.401Z,2022-12-21,Night into the sky,AFTERSCHOOL RED,198243,https://open.spotify.com/track/6UqWoZqgsfbzLkYbn9vL2E,https://i.scdn.co/image/ab67616d00001e0201fd5a3b6968b446df481915,THE 4TH SINGLE ALBUM-RED,2011-07-20,3HTY4W12nkuLA06l2w3b9e,3YUtIXyGE3p0Y4UPc3hyrf 25 | 2022-12-21T14:28:19.956Z,2022-12-21,SUN,CHANMINA,192600,https://open.spotify.com/track/5lmIUH7tyTogDEgEIv6n0w,https://i.scdn.co/image/ab67616d00001e02b6274d484c9ea468882ada0d,Harenchi,2021-10-13,1q9RWaiqhyFz8tYrl57w98,2vjeuQwzSP5ErC1S41gONX 26 | 2022-12-21T14:25:07.442Z,2022-12-21,ATHLETIC GIRL,H1-KEY,212613,https://open.spotify.com/track/0qu54GVbhmBFjpsgiG32PL,https://i.scdn.co/image/ab67616d00001e02ef05154fa2b15bac2e806dde,ATHLETIC GIRL,2022-01-05,3Weg79SFmoXNRUSn08QSPZ,5GwQwY63I9hrUUFlQB8FYU 27 | 2022-12-21T14:20:58.401Z,2022-12-21,Firework,CHANMINA,196163,https://open.spotify.com/track/7quF3X731mQQ67KDVNnYqX,https://i.scdn.co/image/ab67616d00001e02b6274d484c9ea468882ada0d,Harenchi,2021-10-13,1q9RWaiqhyFz8tYrl57w98,2vjeuQwzSP5ErC1S41gONX 28 | 2022-12-21T14:17:42.257Z,2022-12-21,Parade,Younha,193818,https://open.spotify.com/track/0u2RxRjpM4rMvojoNqZHH0,https://i.scdn.co/image/ab67616d00001e02d87083079a0cbd66390f4a92,RescuE,2017-12-27,32n91KG3YeLMLJ9e64EfXy,6GwM5CHqhWXzG3l5kzRSAS 29 | 2022-12-21T14:14:18.833Z,2022-12-21,Save Me,BTS,196505,https://open.spotify.com/track/7bxGcILuAjkZzaveU28ZJS,https://i.scdn.co/image/ab67616d00001e02c6dbc63cf145b4ff6bee3322,The Most Beautiful Moment in Life: Young Forever,2016-05-02,1k5bJ8l5oL5xxVBVHjil09,3Nrfpe0tUJi4K4DXYWgMUX 30 | 2022-12-21T14:11:01.741Z,2022-12-21,GILA GILA,Awich,251818,https://open.spotify.com/track/3LSALxSMhVUQoGN2zwxy1n,https://i.scdn.co/image/ab67616d00001e02c09618552aed5e659f9622ff,GILA GILA,2021-07-30,5v5FfoofCu2Ouflu1GusIN,0FnDCrmcQT8qz5TEsZIYw5 31 | 2022-12-21T14:06:18.333Z,2022-12-21,Lululu,mimiirose,234840,https://open.spotify.com/track/3OziCgBLUNZvg6VwLSjcn2,https://i.scdn.co/image/ab67616d00001e02c1aca2da8f935ac35c224515,AWESOME,2022-09-16,2Y7ZSdUNntl5exBCzcBLvz,3699Hh55qWXd0kWWMWRR2o 32 | 2022-12-21T13:57:35.450Z,2022-12-21,ANTIFRAGILE,LE SSERAFIM,184444,https://open.spotify.com/track/4fsQ0K37TOXa3hEQfjEic1,https://i.scdn.co/image/ab67616d00001e02a991995542d50a691b9ae5be,ANTIFRAGILE,2022-10-17,3u0ggfmK0vjuHMNdUbtaa9,4SpbR6yFEvexJuaBpgAU5p 33 | 2022-12-21T13:50:20.533Z,2022-12-21,Ditto,NewJeans,185506,https://open.spotify.com/track/3r8RuvgbX9s7ammBn07D3W,https://i.scdn.co/image/ab67616d00001e02edf5b257be1d6593e81bb45f,Ditto,2022-12-19,7bnqo1fdJU9nSfXQd3bSMe,6HvZYsbFfjnjFrWF950C9d 34 | 2022-12-21T13:46:22.841Z,2022-12-21,LA DI DA,EVERGLOW,210893,https://open.spotify.com/track/6mIjJONoUMvGPT9Kzrab3L,https://i.scdn.co/image/ab67616d00001e02c32633331e11b4fe108237f8,-77.82x-78.29,2020-09-21,4kMID9cggWEko9mOb1zisI,3ZZzT0naD25RhY2uZvIKkJ 35 | 2022-12-21T13:43:42.221Z,2022-12-21,Last Dance,YOUHA,192226,https://open.spotify.com/track/1bOS0JdXxmTWwlUxXX7gRG,https://i.scdn.co/image/ab67616d00001e0257a6f5928952c277c4407f98,"love you more,",2022-08-25,3g2OiEeQKfggUe6ViYeLSC,2lZFlNiQMLa2fuX3pkXcan 36 | 2022-12-21T13:42:51.069Z,2022-12-21,Ladi Dadi,AOA,202703,https://open.spotify.com/track/00Jrzl6fENQd3XEWGDtowy,https://i.scdn.co/image/ab67616d00001e02da645df1f66725e1a066c807,BINGLE BANGLE,2018-05-28,3dCzh2hx0xzXFBUmzQQeoJ,54gWVQFHf8IIqbjxAoOarN 37 | 2022-12-21T06:15:02.999Z,2022-12-21,CHIQUITA,Rocket Punch,186328,https://open.spotify.com/track/0dap7YUNPlUiEPYSA1bRg3,https://i.scdn.co/image/ab67616d00001e02067a8277ede2feece51ba2ec,YELLOW PUNCH,2022-02-28,6L2VwLPHfm5cCdTF1erFrN,4hozqATxbpy9TwKWRT8QVO 38 | 2022-12-21T06:07:32.098Z,2022-12-21,Brave,TWICE,189200,https://open.spotify.com/track/2peoFPokM6eYAIwLm9IQ8E,https://i.scdn.co/image/ab67616d00001e02c3040848e6ef0e132c5c8340,BETWEEN 1&2,2022-08-26,3NZ94nQbqimcu2i71qhc4f,7n2Ycct7Beij7Dj7meI4X0 39 | 2022-12-21T06:02:25.850Z,2022-12-21,FOREVER 1,Girls' Generation,202533,https://open.spotify.com/track/1oen3GpTcA486fTHaT7neg,https://i.scdn.co/image/ab67616d00001e02aea29200523b1ee4d5b2c035,FOREVER 1 - The 7th Album,2022-08-05,3CcgnUkTrUaPTt4Ms1MkoP,0Sadg1vgvaPqGTOjxu0N6c 40 | 2022-12-21T05:59:02.644Z,2022-12-21,WANNABE,ITZY,191242,https://open.spotify.com/track/4pspYVQGFHLPEFgQPD1J7e,https://i.scdn.co/image/ab67616d00001e02fc620c06721e90a534cc5dab,IT'z ME,2020-03-09,7ynKAohxfwPUZzvU8f1p1U,2KC9Qb60EaY0kW4eH68vr3 41 | 2022-12-21T05:55:51.501Z,2022-12-21,FOCUS,HA SUNG WOON,183146,https://open.spotify.com/track/7nj5G4aPUJD0TnNF6SqcrX,https://i.scdn.co/image/ab67616d00001e024e6531b1f309c00dcaaf2f88,Strange World,2022-08-24,2eE6EDzzdWYQH6TfwGjz87,3OBkZ9NG8F0Fn4oNpg0yuU 42 | 2022-12-21T05:52:47.169Z,2022-12-21,Undercover,CRAXY,208826,https://open.spotify.com/track/6Hqh1ybErzzlHLuDqgL3Pd,https://i.scdn.co/image/ab67616d00001e02e57b32c4af6468231a3bb6db,Who Am I,2022-08-16,5S2tHbdTC1zn7BjLsc0Bg0,3C13AlJZ4QWHSruAWe9VPI 43 | 2022-12-21T05:49:18.497Z,2022-12-21,Last Dance,YOUHA,192226,https://open.spotify.com/track/1bOS0JdXxmTWwlUxXX7gRG,https://i.scdn.co/image/ab67616d00001e0257a6f5928952c277c4407f98,"love you more,",2022-08-25,3g2OiEeQKfggUe6ViYeLSC,2lZFlNiQMLa2fuX3pkXcan 44 | 2022-12-21T05:46:05.686Z,2022-12-21,CHIQUITA,Rocket Punch,186328,https://open.spotify.com/track/0dap7YUNPlUiEPYSA1bRg3,https://i.scdn.co/image/ab67616d00001e02067a8277ede2feece51ba2ec,YELLOW PUNCH,2022-02-28,6L2VwLPHfm5cCdTF1erFrN,4hozqATxbpy9TwKWRT8QVO 45 | 2022-12-21T05:42:58.252Z,2022-12-21,Brave,TWICE,189200,https://open.spotify.com/track/2peoFPokM6eYAIwLm9IQ8E,https://i.scdn.co/image/ab67616d00001e02c3040848e6ef0e132c5c8340,BETWEEN 1&2,2022-08-26,3NZ94nQbqimcu2i71qhc4f,7n2Ycct7Beij7Dj7meI4X0 46 | 2022-12-21T05:39:28.519Z,2022-12-21,MY BAG,(G)I-DLE,160520,https://open.spotify.com/track/1t8sqIScEIP0B4bQzBuI2P,https://i.scdn.co/image/ab67616d00001e02c7b6b2976e38a802eebff046,I NEVER DIE,2022-03-14,1T2W9vDajFreUuycPDjUXk,2AfmfGFbe0A0WsTYm0SDTx 47 | 2022-12-21T05:36:49.092Z,2022-12-21,チキチキバンバン,QUEENDOM,202500,https://open.spotify.com/track/7xm0KJMfeaJQmQdDxAipiY,https://i.scdn.co/image/ab67616d00001e02a873ce4c458e732a422d342f,チキチキバンバン,2022-05-20,1BWf1vYaM0zNWE6uuFFcvF,6IW91qUpcrhbGuZxubrG70 48 | 2022-12-21T05:33:28.585Z,2022-12-21,煽げや尊し,Reol,217106,https://open.spotify.com/track/2LeHTvjUmiR1A6L9txGDfm,https://i.scdn.co/image/ab67616d00001e026968e389f956d5f84ce66d85,煽げや尊し,2022-07-27,54Jl0Cvkj6wS8oJnLyCsVB,7rpKUJ0AnklJ8q9nIPVSpZ 49 | 2022-12-21T05:29:50.880Z,2022-12-21,Dilemma,Apink,209207,https://open.spotify.com/track/3j0x2BUUtm2obQXS1lZuN3,https://i.scdn.co/image/ab67616d00001e020582560196a977f50cc8411b,HORN,2022-02-14,6GeYzOIumBxJ4iF41J3KXM,2uWcrwgWmZcQc3IPBs3tfU 50 | 2022-12-21T04:55:54.923Z,2022-12-21,BTBT,B.I,219549,https://open.spotify.com/track/4XcxgZSriCYamtIA7BgT7V,https://i.scdn.co/image/ab67616d00001e022f485bcd37d1d6733d324f7d,BTBT,2022-05-13,6z2Ij8op0iB16BnmrCy0vH,0UntV1Bw2hk3fbRrm9eMP6 51 | 2022-12-21T04:55:53.239Z,2022-12-21,BTBT,B.I,219549,https://open.spotify.com/track/4XcxgZSriCYamtIA7BgT7V,https://i.scdn.co/image/ab67616d00001e022f485bcd37d1d6733d324f7d,BTBT,2022-05-13,6z2Ij8op0iB16BnmrCy0vH,0UntV1Bw2hk3fbRrm9eMP6 52 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/tests/__init__.py --------------------------------------------------------------------------------