├── .github
    └── workflows
    │   └── superlinter.yml
├── .gitignore
├── README.md
├── dags
    ├── spotify_dag.py
    └── sql
    │   ├── create_spotify_genres.sql
    │   └── create_spotify_songs.sql
├── docker-compose.yaml
├── images
    ├── metabase_graphs.png
    ├── metabase_heatmap.png
    ├── metabase_recently_played.png
    ├── metabase_summary.png
    ├── metabase_top_songs.png
    └── spotify.drawio.svg
├── operators
    ├── __init__.py
    ├── config.yml
    ├── copy_to_postgres.py
    ├── dbt
    │   ├── dbt_project.yml
    │   ├── macros
    │   │   └── generate_schema_name.sql
    │   └── models
    │   │   ├── marts
    │   │       ├── analytical
    │   │       │   ├── dim_genres.sql
    │   │       │   ├── dim_songs.sql
    │   │       │   ├── fct_listening_activity.sql
    │   │       │   └── schema.yml
    │   │       └── reporting
    │   │       │   ├── rpt_hour_of_week.sql
    │   │       │   ├── rpt_most_listened.sql
    │   │       │   ├── rpt_recently_listened.sql
    │   │       │   ├── rpt_weekly_discovers.sql
    │   │       │   └── schema.yml
    │   │   └── staging
    │   │       └── spotify
    │   │           ├── schema.yml
    │   │           ├── stg_genres.sql
    │   │           └── stg_songs.sql
    ├── main.py
    ├── postgres_connect.py
    ├── refresh.py
    └── yaml_load.py
├── requirements.txt
├── setup
    ├── airflow_docker.md
    ├── dbt.md
    ├── metabase.md
    ├── postgres.md
    ├── slack_notifications.md
    └── spotify_api_access.md
├── spotify_data
    ├── spotify.json
    ├── spotify_genres.csv
    └── spotify_songs.csv
└── tests
    └── __init__.py


/.github/workflows/superlinter.yml:
--------------------------------------------------------------------------------
 1 | name: Super-Linter
 2 | 
 3 | on: push
 4 | 
 5 | jobs:
 6 |   super-lint:
 7 |     name: Lint code base
 8 |     runs-on: ubuntu-latest
 9 |     steps:
10 |       - name: Checkout code
11 |         uses: actions/checkout@v2
12 | 
13 |       - name: Run Super-Linter
14 |         uses: github/super-linter@v4
15 |         env:
16 |           DEFAULT_BRANCH: main
17 |           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | __pycache__/
 2 | venv/
 3 | steps.txt
 4 | secrets.py
 5 | .DS_Store
 6 | **/*.env
 7 | dev_script.py
 8 | test*
 9 | .user.yml
10 | references.md
11 | spotify.drawio
12 | **/extended_streaming_history/
13 | backfill_extended_streaming_history.py
14 | tests.ipynb
15 | 
16 | # Airflow
17 | **/logs/*
18 | **/dags/spotify_data/
19 | test_dag.py
20 | 
21 | # dbt
22 | **target/
23 | **dbt_packages/
24 | **logs/
25 | **.log
26 | **profiles.yml


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Spotify Data Pipeline
 2 | 
 3 | Data pipeline that extracts a user's song listening history from the Spotify API using Python, PostgreSQL, dbt, Metabase, Airflow, and Docker
 4 | 
 5 | ## Objective
 6 | 
 7 | Deep dive into a user's song listening history to retrieve information about top artists, top tracks, top genres, and more. This is a personal side project for fun to recreate Spotify Wrapped but at a more frequent cadence to get quicker and more detailed insights. This pipeline calls the Spotify API every hour from hours 0-6 and 14-23 UTC (basically whenever I'm awake) to extract a user's song listening history, load the responses into a database, apply transformations and visualize the metrics in a dashboard. Since the dataset is small and this doesn't need to be running 24/7 this is all built using open source tools and hosted locally to avoid any cost.
 8 | 
 9 | ## Tools & Technologies
10 | 
11 | - Containerization - [**Docker**](https://www.docker.com), [**Docker Compose**](https://docs.docker.com/compose/)
12 | - Orchestration - [**Airflow**](https://airflow.apache.org)
13 | - Database - [**PostgreSQL**](https://www.postgresql.org/)
14 | - Transformation - [**dbt**](https://www.getdbt.com)
15 | - Data Visualization - [**Metabase**](https://www.metabase.com/)
16 | - Language - [**Python**](https://www.python.org)
17 | 
18 | ## Architecture
19 | 
20 | ![spotify drawio](https://user-images.githubusercontent.com/60953643/210160621-c7213f9d-2b9f-42ad-b8b1-697403bf6497.svg)
21 | 
22 | #### Data Flow
23 | 1. main.py script is triggered every hour (from hours 0-6 and 14-23 UTC) via Airflow to refresh the access token,  make a connection to the Postgres database to check for the latest listened time, and call the Spotify API to retrieve the most recently played songs and corresponding genres.
24 | 2. Responses are saved as CSV files in 'YYYY-MM-DD.csv' format. These are saved on the local file system and act as our replayable source since the Spotify API only allows requesting the 50 most recently played songs and not any historical data. These files will keep getting appended with the most recently played songs for the respective date.
25 | 3. Data is copied into the Postgres Database into the respective tables, spotify_songs and spotify_genres.
26 | 4. dbt run task is triggered to run transformations on top of the staging data to produce analytical and reporting tables/views.
27 | 5. dbt test will run after successful completion of dbt run to ensure all tests pass.
28 | 6. Tables/views are fed into Metabase and the metrics are visualized through a dashboard.
29 | 7. Slack subscription is set up in Metabase to send a weekly summary every Monday.
30 | 
31 | Throughout this entire process if any Airflow task fails an automatic Slack alert will be sent to a custom Slack channel that was created.
32 | 
33 | #### DAG
34 | <img width="1170" alt="Screenshot 2023-01-05 at 9 32 42 PM" src="https://user-images.githubusercontent.com/60953643/210924715-f3e75b77-30d9-4bb3-81fa-fe2459355c3b.png">
35 | 
36 | #### Sample Slack Alert
37 | <img width="696" alt="Screenshot 2023-01-05 at 9 33 09 PM" src="https://user-images.githubusercontent.com/60953643/210924729-6c732f9e-e1de-4cad-9052-9a5db239007d.png">
38 | 
39 | 
40 | ## Dashboard
41 | <img width="1472" alt="Screenshot 2023-01-31 at 12 02 56 PM" src="https://user-images.githubusercontent.com/60953643/215845338-5e2f7677-8c0b-4e02-af6f-9742dbdb41e7.png">
42 | <img width="1656" alt="Screenshot 2023-01-31 at 1 20 51 PM" src="https://user-images.githubusercontent.com/60953643/215861379-2b0d8498-70ca-4fde-936c-9da3e11ad19c.png">
43 | <img width="1376" alt="Screenshot 2023-01-24 at 10 18 42 PM" src="https://user-images.githubusercontent.com/60953643/215845410-f1a9753f-39aa-4f90-b769-a11104c01962.png">
44 | <img width="1655" alt="Screenshot 2023-01-31 at 12 03 24 PM" src="https://user-images.githubusercontent.com/60953643/215845428-7831d936-bccf-46ea-9848-c527da89a5e9.png">
45 | <img width="1655" alt="Screenshot 2023-01-31 at 12 03 36 PM" src="https://user-images.githubusercontent.com/60953643/215845447-50e5af73-3a41-432f-a5a3-40932b1f153b.png">
46 | 
47 | 
48 | ## Setup
49 | 
50 | 1. [Get Spotify API Access](https://github.com/calbergs/spotify-api/blob/master/setup/spotify_api_access.md)
51 | 2. [Build Docker Containers for Airflow](https://github.com/calbergs/spotify-api/blob/master/setup/airflow_docker.md)
52 | 3. [Set Up Airflow Connection to Postgres](https://github.com/calbergs/spotify-api/blob/master/setup/postgres.md)
53 | 4. [Install dbt Core](https://github.com/calbergs/spotify-api/blob/master/setup/dbt.md)
54 | 5. [Enable Airflow Slack Notifications](https://github.com/calbergs/spotify-api/blob/master/setup/slack_notifications.md)
55 | 6. [Install Metabase](https://github.com/calbergs/spotify-api/blob/master/setup/metabase.md)
56 | 
57 | ## Further Improvements (Work In Progress)
58 | 
59 | - Create a BranchPythonOperator to first check if the API payload is empty. If empty then proceed directly to the end task else continue to the downstream tasks.
60 | - Implement data quality checks to catch any potential errors in the dataset
61 | - Create unit tests to ensure pipeline is running as intended
62 | - Include CI/CD
63 | - Create more visualizations to uncover further insights once Spotify sends back my entire songs listening history from 10+ years back to the current date (this needed to be requested separately since the current API only allows requesting the 50 most recently played tracks)
64 | - If and whenever Spotify allows requesting historical data implement backfill capability
65 | 


--------------------------------------------------------------------------------
/dags/spotify_dag.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | 
  3 | sys.path.append("/opt/airflow/operators")
  4 | from datetime import datetime, timedelta
  5 | 
  6 | import copy_to_postgres
  7 | from airflow import DAG
  8 | from airflow.contrib.operators.slack_webhook_operator import SlackWebhookOperator
  9 | from airflow.hooks.base_hook import BaseHook
 10 | from airflow.operators.bash import BashOperator
 11 | from airflow.operators.dummy_operator import DummyOperator
 12 | from airflow.operators.python import PythonOperator
 13 | from airflow.providers.postgres.operators.postgres import PostgresOperator
 14 | from airflow_dbt.operators.dbt_operator import DbtRunOperator, DbtTestOperator
 15 | 
 16 | 
 17 | def task_fail_slack_alert(context):
 18 |     slack_webhook_token = BaseHook.get_connection("slack").password
 19 |     slack_msg = """
 20 |         :x: Task Failed
 21 |         *Task*: {task}
 22 |         *Dag*: {dag}
 23 |         *Execution Time*: {exec_date}
 24 |         *Log URL*: {log_url}
 25 |         """.format(
 26 |         task=context.get("task_instance").task_id,
 27 |         dag=context.get("task_instance").dag_id,
 28 |         ti=context.get("task_instance"),
 29 |         exec_date=context.get("execution_date"),
 30 |         log_url=context.get("task_instance").log_url,
 31 |     )
 32 |     failed_alert = SlackWebhookOperator(
 33 |         task_id="slack_alert",
 34 |         http_conn_id="slack",
 35 |         webhook_token=slack_webhook_token,
 36 |         message=slack_msg,
 37 |         username="airflow",
 38 |         dag=dag,
 39 |     )
 40 |     return failed_alert.execute(context=context)
 41 | 
 42 | 
 43 | args = {
 44 |     "owner": "airflow",
 45 |     "depends_on_past": False,
 46 |     "start_date": datetime(2022, 12, 21),
 47 |     "retries": 1,
 48 |     "retry_delay": timedelta(minutes=1),
 49 |     "on_success_callback": None,
 50 |     "on_failure_callback": task_fail_slack_alert,
 51 | }
 52 | 
 53 | with DAG(
 54 |     dag_id="spotify_dag",
 55 |     schedule_interval="0 0-6,14-23 * * *",
 56 |     max_active_runs=1,
 57 |     catchup=False,
 58 |     default_args=args,
 59 | ) as dag:
 60 | 
 61 |     TASK_DEFS = {
 62 |         "songs": {"path": "sql/create_spotify_songs.sql"},
 63 |         "genres": {"path": "sql/create_spotify_genres.sql"},
 64 |     }
 65 | 
 66 |     create_tables_if_not_exists = {
 67 |         k: PostgresOperator(
 68 |             task_id=f"create_if_not_exists_spotify_{k}_table",
 69 |             postgres_conn_id="postgres_localhost",
 70 |             sql=v["path"],
 71 |         )
 72 |         for k, v in TASK_DEFS.items()
 73 |     }
 74 | 
 75 |     extract_spotify_data = BashOperator(
 76 |         task_id="extract_spotify_data",
 77 |         bash_command="python3 /opt/airflow/operators/main.py",
 78 |     )
 79 | 
 80 |     load_tables = {
 81 |         k: PythonOperator(
 82 |             task_id=f"load_{k}",
 83 |             python_callable=copy_to_postgres.copy_expert_csv,
 84 |             op_kwargs={"file": f"spotify_{k}"},
 85 |         )
 86 |         for k, v in TASK_DEFS.items()
 87 |     }
 88 | 
 89 |     dbt_run = DbtRunOperator(
 90 |         task_id="dbt_run",
 91 |         dir="/opt/airflow/operators/dbt/",
 92 |         profiles_dir="/opt/airflow/operators/dbt/",
 93 |     )
 94 | 
 95 |     dbt_test = DbtTestOperator(
 96 |         task_id="dbt_test",
 97 |         dir="/opt/airflow/operators/dbt/",
 98 |         profiles_dir="/opt/airflow/operators/dbt/",
 99 |     )
100 | 
101 |     continue_task = DummyOperator(task_id="continue")
102 | 
103 |     start_task = DummyOperator(task_id="start")
104 | 
105 |     end_task = DummyOperator(task_id="end")
106 | 
107 |     (
108 |         start_task
109 |         >> extract_spotify_data
110 |         >> list(create_tables_if_not_exists.values())
111 |         >> continue_task
112 |         >> list(load_tables.values())
113 |         >> dbt_run
114 |         >> dbt_test
115 |         >> end_task
116 |     )
117 | 


--------------------------------------------------------------------------------
/dags/sql/create_spotify_genres.sql:
--------------------------------------------------------------------------------
 1 | --hacky way to do a create or replace
 2 | --this is in case we are rebuilding the project from scratch whenever the spotify genres table doesn't exist yet since it's doing a full refresh each run
 3 | create table if not exists spotify_genres (
 4 |     artist_id text,
 5 |     artist_name text,
 6 |     artist_genre text,
 7 |     last_updated_datetime_utc timestamp,
 8 |     primary key (artist_id)
 9 | );
10 | drop table spotify_genres;
11 | create table if not exists spotify_genres (
12 |     artist_id text,
13 |     artist_name text,
14 |     artist_genre text,
15 |     last_updated_datetime_utc timestamp,
16 |     primary key (artist_id)
17 | );


--------------------------------------------------------------------------------
/dags/sql/create_spotify_songs.sql:
--------------------------------------------------------------------------------
 1 | create table if not exists spotify_songs (
 2 |     played_at_utc timestamp,
 3 |     played_date_utc date,
 4 |     song_name text,
 5 |     artist_name text,
 6 |     song_duration_ms integer,
 7 |     song_link text,
 8 |     album_art_link text,
 9 |     album_name text,
10 |     album_id text,
11 |     artist_id text,
12 |     track_id text,
13 |     last_updated_datetime_utc timestamp,
14 |     primary key (played_at_utc)
15 | );


--------------------------------------------------------------------------------
/docker-compose.yaml:
--------------------------------------------------------------------------------
  1 | # Licensed to the Apache Software Foundation (ASF) under one
  2 | # or more contributor license agreements.  See the NOTICE file
  3 | # distributed with this work for additional information
  4 | # regarding copyright ownership.  The ASF licenses this file
  5 | # to you under the Apache License, Version 2.0 (the
  6 | # "License"); you may not use this file except in compliance
  7 | # with the License.  You may obtain a copy of the License at
  8 | #
  9 | #   http://www.apache.org/licenses/LICENSE-2.0
 10 | #
 11 | # Unless required by applicable law or agreed to in writing,
 12 | # software distributed under the License is distributed on an
 13 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 14 | # KIND, either express or implied.  See the License for the
 15 | # specific language governing permissions and limitations
 16 | # under the License.
 17 | #
 18 | 
 19 | # Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
 20 | #
 21 | # WARNING: This configuration is for local development. Do not use it in a production deployment.
 22 | #
 23 | # This configuration supports basic configuration using environment variables or an .env file
 24 | # The following variables are supported:
 25 | #
 26 | # AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
 27 | #                                Default: apache/airflow:2.5.0
 28 | # AIRFLOW_UID                  - User ID in Airflow containers
 29 | #                                Default: 50000
 30 | # Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
 31 | #
 32 | # _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if requested).
 33 | #                                Default: airflow
 34 | # _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if requested).
 35 | #                                Default: airflow
 36 | # _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
 37 | #                                Default: ''
 38 | #
 39 | # Feel free to modify this file to suit your needs.
 40 | ---
 41 | version: '3'
 42 | x-airflow-common:
 43 |   &airflow-common
 44 |   # In order to add custom dependencies or upgrade provider packages you can use your extended image.
 45 |   # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
 46 |   # and uncomment the "build" line below, Then run `docker-compose build` to build the images.
 47 |   image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.5.0}
 48 |   # build: .
 49 |   environment:
 50 |     &airflow-common-env
 51 |     AIRFLOW__CORE__EXECUTOR: CeleryExecutor
 52 |     AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
 53 |     # For backward compatibility, with Airflow <2.3
 54 |     AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
 55 |     AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
 56 |     AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
 57 |     AIRFLOW__CORE__FERNET_KEY: ''
 58 |     AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
 59 |     AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
 60 |     AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
 61 |     _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- dbt-core dbt-postgres airflow-dbt}
 62 |   volumes:
 63 |     - ./dags:/opt/airflow/dags
 64 |     - ./logs:/opt/airflow/logs
 65 |     - ./plugins:/opt/airflow/plugins
 66 |     - ./operators:/opt/airflow/operators
 67 |     - ./operators/dbt:/opt/airflow/operators/dbt
 68 |   user: "${AIRFLOW_UID:-50000}:0"
 69 |   depends_on:
 70 |     &airflow-common-depends-on
 71 |     redis:
 72 |       condition: service_healthy
 73 |     postgres:
 74 |       condition: service_healthy
 75 | 
 76 | services:
 77 |   postgres:
 78 |     image: postgres:13
 79 |     environment:
 80 |       POSTGRES_USER: airflow
 81 |       POSTGRES_PASSWORD: airflow
 82 |       POSTGRES_DB: airflow
 83 |     volumes:
 84 |       - postgres-db-volume:/var/lib/postgresql/data
 85 |     ports:
 86 |       - 5432:5432
 87 |     healthcheck:
 88 |       test: ["CMD", "pg_isready", "-U", "airflow"]
 89 |       interval: 5s
 90 |       retries: 5
 91 |     restart: always
 92 | 
 93 |   redis:
 94 |     image: redis:latest
 95 |     expose:
 96 |       - 6379
 97 |     healthcheck:
 98 |       test: ["CMD", "redis-cli", "ping"]
 99 |       interval: 5s
100 |       timeout: 30s
101 |       retries: 50
102 |     restart: always
103 | 
104 |   airflow-webserver:
105 |     <<: *airflow-common
106 |     command: webserver
107 |     ports:
108 |       - 8080:8080
109 |     healthcheck:
110 |       test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
111 |       interval: 10s
112 |       timeout: 10s
113 |       retries: 5
114 |     restart: always
115 |     depends_on:
116 |       <<: *airflow-common-depends-on
117 |       airflow-init:
118 |         condition: service_completed_successfully
119 | 
120 |   airflow-scheduler:
121 |     <<: *airflow-common
122 |     command: scheduler
123 |     healthcheck:
124 |       test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
125 |       interval: 10s
126 |       timeout: 10s
127 |       retries: 5
128 |     restart: always
129 |     depends_on:
130 |       <<: *airflow-common-depends-on
131 |       airflow-init:
132 |         condition: service_completed_successfully
133 | 
134 |   airflow-worker:
135 |     <<: *airflow-common
136 |     command: celery worker
137 |     healthcheck:
138 |       test:
139 |         - "CMD-SHELL"
140 |         - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
141 |       interval: 10s
142 |       timeout: 10s
143 |       retries: 5
144 |     environment:
145 |       <<: *airflow-common-env
146 |       # Required to handle warm shutdown of the celery workers properly
147 |       # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
148 |       DUMB_INIT_SETSID: "0"
149 |     restart: always
150 |     depends_on:
151 |       <<: *airflow-common-depends-on
152 |       airflow-init:
153 |         condition: service_completed_successfully
154 | 
155 |   airflow-triggerer:
156 |     <<: *airflow-common
157 |     command: triggerer
158 |     healthcheck:
159 |       test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
160 |       interval: 10s
161 |       timeout: 10s
162 |       retries: 5
163 |     restart: always
164 |     depends_on:
165 |       <<: *airflow-common-depends-on
166 |       airflow-init:
167 |         condition: service_completed_successfully
168 | 
169 |   airflow-init:
170 |     <<: *airflow-common
171 |     entrypoint: /bin/bash
172 |     # yamllint disable rule:line-length
173 |     command:
174 |       - -c
175 |       - |
176 |         function ver() {
177 |           printf "%04d%04d%04d%04d" $${1//./ }
178 |         }
179 |         airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu airflow airflow version)
180 |         airflow_version_comparable=$$(ver $${airflow_version})
181 |         min_airflow_version=2.2.0
182 |         min_airflow_version_comparable=$$(ver $${min_airflow_version})
183 |         if (( airflow_version_comparable < min_airflow_version_comparable )); then
184 |           echo
185 |           echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
186 |           echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
187 |           echo
188 |           exit 1
189 |         fi
190 |         if [[ -z "${AIRFLOW_UID}" ]]; then
191 |           echo
192 |           echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
193 |           echo "If you are on Linux, you SHOULD follow the instructions below to set "
194 |           echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
195 |           echo "For other operating systems you can get rid of the warning with manually created .env file:"
196 |           echo "    See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
197 |           echo
198 |         fi
199 |         one_meg=1048576
200 |         mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
201 |         cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
202 |         disk_available=$$(df / | tail -1 | awk '{print $$4}')
203 |         warning_resources="false"
204 |         if (( mem_available < 4000 )) ; then
205 |           echo
206 |           echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
207 |           echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
208 |           echo
209 |           warning_resources="true"
210 |         fi
211 |         if (( cpus_available < 2 )); then
212 |           echo
213 |           echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
214 |           echo "At least 2 CPUs recommended. You have $${cpus_available}"
215 |           echo
216 |           warning_resources="true"
217 |         fi
218 |         if (( disk_available < one_meg * 10 )); then
219 |           echo
220 |           echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
221 |           echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
222 |           echo
223 |           warning_resources="true"
224 |         fi
225 |         if [[ $${warning_resources} == "true" ]]; then
226 |           echo
227 |           echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
228 |           echo "Please follow the instructions to increase amount of resources available:"
229 |           echo "   https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin"
230 |           echo
231 |         fi
232 |         mkdir -p /sources/logs /sources/dags /sources/plugins
233 |         chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
234 |         exec /entrypoint airflow version
235 |     # yamllint enable rule:line-length
236 |     environment:
237 |       <<: *airflow-common-env
238 |       _AIRFLOW_DB_UPGRADE: 'true'
239 |       _AIRFLOW_WWW_USER_CREATE: 'true'
240 |       _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
241 |       _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
242 |       _PIP_ADDITIONAL_REQUIREMENTS: ''
243 |     user: "0:0"
244 |     volumes:
245 |       - .:/sources
246 | 
247 |   airflow-cli:
248 |     <<: *airflow-common
249 |     profiles:
250 |       - debug
251 |     environment:
252 |       <<: *airflow-common-env
253 |       CONNECTION_CHECK_MAX_COUNT: "0"
254 |     # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
255 |     command:
256 |       - bash
257 |       - -c
258 |       - airflow
259 | 
260 |   # You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
261 |   # or by explicitly targeted on the command line e.g. docker-compose up flower.
262 |   # See: https://docs.docker.com/compose/profiles/
263 |   flower:
264 |     <<: *airflow-common
265 |     command: celery flower
266 |     profiles:
267 |       - flower
268 |     ports:
269 |       - 5555:5555
270 |     healthcheck:
271 |       test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
272 |       interval: 10s
273 |       timeout: 10s
274 |       retries: 5
275 |     restart: always
276 |     depends_on:
277 |       <<: *airflow-common-depends-on
278 |       airflow-init:
279 |         condition: service_completed_successfully
280 | 
281 | volumes:
282 |   postgres-db-volume:
283 | 


--------------------------------------------------------------------------------
/images/metabase_graphs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_graphs.png


--------------------------------------------------------------------------------
/images/metabase_heatmap.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_heatmap.png


--------------------------------------------------------------------------------
/images/metabase_recently_played.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_recently_played.png


--------------------------------------------------------------------------------
/images/metabase_summary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_summary.png


--------------------------------------------------------------------------------
/images/metabase_top_songs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/images/metabase_top_songs.png


--------------------------------------------------------------------------------
/operators/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/operators/__init__.py


--------------------------------------------------------------------------------
/operators/config.yml:
--------------------------------------------------------------------------------
1 | files:
2 |   songs: '/opt/airflow/dags/spotify_data/spotify_songs'
3 |   genres: '/opt/airflow/dags/spotify_data/spotify_genres'
4 |   genres_tmp: '/opt/airflow/dags/spotify_data/spotify_genres_tmp'


--------------------------------------------------------------------------------
/operators/copy_to_postgres.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Copies CSV into Postgres
 3 | """
 4 | 
 5 | from airflow.hooks.postgres_hook import PostgresHook
 6 | 
 7 | 
 8 | def copy_expert_csv(file):
 9 |     hook = PostgresHook("postgres_localhost")
10 |     with hook.get_conn() as connection:
11 |         hook.copy_expert(
12 |             f"""
13 |         COPY {file} FROM stdin WITH CSV HEADER DELIMITER as ','
14 |         """,
15 |             f"/opt/airflow/dags/spotify_data/{file}.csv",
16 |         )
17 |         connection.commit()
18 | 


--------------------------------------------------------------------------------
/operators/dbt/dbt_project.yml:
--------------------------------------------------------------------------------
 1 | 
 2 | # Name your project! Project names should contain only lowercase characters
 3 | # and underscores. A good package name should reflect your organization's
 4 | # name or the intended use of these models
 5 | name: 'dbt_spotify'
 6 | version: '1.0.0'
 7 | config-version: 2
 8 | 
 9 | # This setting configures which "profile" dbt uses for this project.
10 | profile: 'postgres'
11 | 
12 | # These configurations specify where dbt should look for different types of files.
13 | # The `model-paths` config, for example, states that models in this project can be
14 | # found in the "models/" directory. You probably won't need to change these!
15 | model-paths: ["models"]
16 | analysis-paths: ["analyses"]
17 | test-paths: ["tests"]
18 | seed-paths: ["seeds"]
19 | macro-paths: ["macros"]
20 | snapshot-paths: ["snapshots"]
21 | 
22 | target-path: "target"  # directory which will store compiled SQL files
23 | clean-targets:         # directories to be removed by `dbt clean`
24 |   - "target"
25 |   - "dbt_packages"
26 | 
27 | 
28 | # Configuring models
29 | # Full documentation: https://docs.getdbt.com/docs/configuring-models
30 | 
31 | # In this example config, we tell dbt to build all models in the example/ directory
32 | # as tables. These settings can be overridden in the individual model files
33 | # using the `{{ config(...) }}` macro.
34 | models:
35 |   dbt_spotify:
36 |     # Config indicated by + and applies to all files under models/example/
37 |     marts:
38 |       +materialized: table
39 |       reporting:
40 |         +schema: reporting
41 |       analytical:
42 |         +schema: analytical
43 |     staging:
44 |       +materialized: table
45 |       spotify:
46 |         +schema: staging
47 | 


--------------------------------------------------------------------------------
/operators/dbt/macros/generate_schema_name.sql:
--------------------------------------------------------------------------------
 1 | {% macro generate_schema_name(custom_schema_name, node) -%}
 2 | 
 3 |     {%- set default_schema = target.schema -%}
 4 |     {%- if custom_schema_name is none -%}
 5 | 
 6 |         {{ default_schema }}
 7 | 
 8 |     {%- else -%}
 9 | 
10 |         {{ custom_schema_name | trim }}
11 | 
12 |     {%- endif -%}
13 | 
14 | {%- endmacro %}


--------------------------------------------------------------------------------
/operators/dbt/models/marts/analytical/dim_genres.sql:
--------------------------------------------------------------------------------
 1 | {{
 2 |     config(
 3 |         materialized='table'
 4 |     )
 5 | }}
 6 | 
 7 | with genres as (
 8 | 
 9 |     select
10 |         artist_id,
11 |         artist_name,
12 |         artist_genre,
13 |         last_updated_datetime_utc at time zone 'utc' at time zone 'America/Chicago' as last_updated_datetime_ct
14 | 
15 |     from {{ ref('stg_genres') }}
16 | 
17 | )
18 | 
19 | select * from genres


--------------------------------------------------------------------------------
/operators/dbt/models/marts/analytical/dim_songs.sql:
--------------------------------------------------------------------------------
 1 | {{
 2 |     config(
 3 |         materialized='table'
 4 |     )
 5 | }}
 6 | 
 7 | with songs as (
 8 | 
 9 | 	select
10 | 		played_at_utc at time zone 'utc' at time zone 'America/Chicago' as played_at_ct,
11 | 		cast(played_at_utc at time zone 'utc' at time zone 'America/Chicago' as date) as played_date_ct,
12 | 		song_name,
13 | 		artist_name,
14 | 		song_duration_ms,
15 | 		song_link,
16 | 		album_art_link,
17 | 		album_name,
18 | 		album_id,
19 | 		artist_id,
20 | 		track_id,
21 | 		last_updated_datetime_utc at time zone 'utc' at time zone 'America/Chicago' as last_updated_datetime_ct
22 | 
23 | 	from {{ ref('stg_songs') }}
24 | 
25 | )
26 | 
27 | select * from songs


--------------------------------------------------------------------------------
/operators/dbt/models/marts/analytical/fct_listening_activity.sql:
--------------------------------------------------------------------------------
 1 | {{
 2 |     config(
 3 |         materialized='table'
 4 |     )
 5 | }}
 6 | 
 7 | with songs as (
 8 | 
 9 |     select * from {{ ref('dim_songs') }}
10 | 
11 | ),
12 | 
13 | genres as (
14 | 
15 |     select * from {{ ref('dim_genres') }}
16 | 
17 | ),
18 | 
19 | final as (
20 | 
21 | 	select
22 | 		songs.played_at_ct as played_at,
23 | 		songs.played_date_ct as played_date,
24 | 		to_char(songs.played_at_ct, 'Day') as played_at_day_of_week,
25 | 		extract(year from songs.played_at_ct) as played_at_year,
26 | 		extract(month from songs.played_at_ct) as played_at_month,
27 | 		extract(day from songs.played_at_ct) as played_at_day,
28 | 		DATE_PART('week', cast(date_trunc('week', songs.played_at_ct + interval '1 day') - interval '1 day' as date)) as played_at_week_number,
29 | 		extract(hour from songs.played_at_ct) as played_at_hour,
30 | 		songs.song_name,
31 | 		songs.artist_name,
32 | 		genres.artist_genre,
33 | 		songs.song_duration_ms,
34 | 		cast(songs.song_duration_ms as decimal)/60000 as song_duration_mins,
35 | 		songs.song_link,
36 | 		songs.album_art_link,
37 | 		songs.album_name,
38 | 		songs.album_id,
39 | 		songs.artist_id,
40 | 		songs.track_id,
41 | 		songs.last_updated_datetime_ct as last_updated_datetime
42 | 
43 | 	from songs
44 | 
45 | 	left join genres
46 |         on songs.artist_id = genres.artist_id
47 | 
48 | )
49 | 
50 | select * from final


--------------------------------------------------------------------------------
/operators/dbt/models/marts/analytical/schema.yml:
--------------------------------------------------------------------------------
 1 | version: 2
 2 | 
 3 | models:
 4 |   - name: dim_songs
 5 |     description: "Songs table"
 6 |     columns:
 7 |       - name: played_at_ct
 8 |         description: "The primary key for this table"
 9 |         tests:
10 |           - unique
11 |           - not_null
12 | 
13 |   - name: dim_genres
14 |     description: "Genres table"
15 |     columns:
16 |       - name: artist_id
17 |         description: "The primary key for this table"
18 |         tests:
19 |           - unique
20 |           - not_null
21 | 
22 |   - name: fct_listening_activity
23 |     description: "User's listening activity"
24 |     columns:
25 |       - name: played_at
26 |         description: "The primary key for this table"
27 |         tests:
28 |           - unique
29 |           - not_null


--------------------------------------------------------------------------------
/operators/dbt/models/marts/reporting/rpt_hour_of_week.sql:
--------------------------------------------------------------------------------
 1 | {{
 2 |     config(
 3 |         materialized='view'
 4 |     )
 5 | }}
 6 | 
 7 | select
 8 |     played_at_day_of_week,
 9 |     case when played_at_day_of_week like '%Sunday%' then '0_Sun'
10 |      when played_at_day_of_week like '%Monday%' then '1_Mon'
11 |      when played_at_day_of_week like '%Tuesday%' then '2_Tue'
12 |      when played_at_day_of_week like '%Wednesday%' then '3_Wed'
13 |      when played_at_day_of_week like '%Thursday%' then '4_Thu'
14 |      when played_at_day_of_week like '%Friday%' then '5_Fri'
15 |      when played_at_day_of_week like '%Saturday%' then '6_Sat'
16 |      end as day_num,
17 |     played_at_hour,
18 |     sum(song_duration_mins) as song_duration_mins
19 | 
20 | from {{ ref('fct_listening_activity') }}
21 | 
22 | group by
23 |     played_at_day_of_week,
24 |     played_at_hour


--------------------------------------------------------------------------------
/operators/dbt/models/marts/reporting/rpt_most_listened.sql:
--------------------------------------------------------------------------------
 1 | {{
 2 |     config(
 3 |         materialized='view'
 4 |     )
 5 | }}
 6 | 
 7 | select distinct
 8 |     song_name,
 9 |     artist_name,
10 |     album_name,
11 |     artist_genre,
12 |     song_link,
13 |     count(track_id) over (partition by artist_name, song_name) as times_song_listened,
14 |     count(artist_id) over (partition by artist_id) as times_artist_listened,
15 |     max(played_date) over (partition by track_id) as song_last_listened_date
16 | 
17 | from {{ ref('fct_listening_activity') }}


--------------------------------------------------------------------------------
/operators/dbt/models/marts/reporting/rpt_recently_listened.sql:
--------------------------------------------------------------------------------
 1 | {{
 2 |     config(
 3 |         materialized='view'
 4 |     )
 5 | }}
 6 | 
 7 | with listening_activity as (
 8 | 
 9 |     select * from {{ ref('fct_listening_activity') }}
10 | 
11 | ),
12 | 
13 | final as (
14 | 
15 |     select
16 |         played_at,
17 |         song_name,
18 |         artist_name,
19 |         album_name,
20 |         artist_genre,
21 |         song_duration_mins,
22 |         song_link,
23 |         played_date,
24 |         last_updated_datetime
25 | 
26 |     from listening_activity
27 | )
28 | 
29 | select * from final


--------------------------------------------------------------------------------
/operators/dbt/models/marts/reporting/rpt_weekly_discovers.sql:
--------------------------------------------------------------------------------
 1 | {{
 2 |     config(
 3 |         materialized='view'
 4 |     )
 5 | }}
 6 | 
 7 | with listening_activity as (
 8 | 
 9 |     select * from {{ ref('fct_listening_activity') }}
10 | 
11 | ),
12 | 
13 | curr as (
14 | 
15 | 	select distinct
16 | 		artist_name,
17 | 		artist_id,
18 | 		count(artist_id) over(partition by artist_id) as times_listened,
19 | 		max(played_at) over (partition by artist_id) as last_listened_time
20 | 
21 | 	from listening_activity
22 | 
23 | 	where cast(date_trunc('week', played_date + interval '1 day') - interval '1 day' as date) = cast(date_trunc('week', current_date + interval '1 day') - interval '1 day' as date)
24 | )
25 | 
26 | ,prev as (
27 | 
28 | 	select distinct
29 | 		artist_name,
30 | 		artist_id
31 | 
32 | 	from listening_activity
33 | 
34 | 	where cast(date_trunc('week', played_date + interval '1 day') - interval '1 day' as date) < cast(date_trunc('week', current_date + interval '1 day') - interval '1 day' as date) 
35 | )
36 | 
37 | select
38 | 	curr.artist_name,
39 | 	curr.artist_id,
40 | 	times_listened
41 | 
42 | from curr
43 | 
44 | left join prev
45 | 	on curr.artist_id = prev.artist_id
46 | 
47 | where prev.artist_id is null
48 | 


--------------------------------------------------------------------------------
/operators/dbt/models/marts/reporting/schema.yml:
--------------------------------------------------------------------------------
 1 | version: 2
 2 | 
 3 | models:
 4 |   - name: rpt_most_listened
 5 |     description: "Most listened artists and tracks"
 6 |     columns:
 7 |       - name: (artist_name || song_name || song_link)
 8 |         description: "The primary key for this table"
 9 |         tests:
10 |           - unique
11 |           - not_null
12 |   - name: rpt_recently_listened
13 |     description: "Recently listened artists and tracks"
14 |     columns:
15 |       - name: played_at
16 |         description: "The primary key for this table"
17 |         tests:
18 |           - unique
19 |           - not_null
20 |   - name: rpt_weekly_discovers
21 |     description: "New artists discovered in the current week"
22 |     columns:
23 |       - name: artist_id
24 |         description: "The primary key for this table"
25 |         tests:
26 |           - unique
27 |           - not_null
28 |   - name: rpt_hour_of_week
29 |     description: "Hour of week heatmap on listening activity"
30 |     columns:
31 |       - name: (played_at_day_of_week || played_at_hour)
32 |         description: "The primary key for this table"
33 |         tests:
34 |           - unique
35 |           - not_null


--------------------------------------------------------------------------------
/operators/dbt/models/staging/spotify/schema.yml:
--------------------------------------------------------------------------------
 1 | version: 2
 2 | 
 3 | sources:
 4 |   - name: spotify
 5 |     description: 'Spotify raw data landing zone'
 6 |     database: spotify
 7 |     schema: public
 8 |     tables:
 9 |       - name: spotify_songs
10 |         description: 'Details about the songs the user listened to'
11 |       - name: spotify_genres
12 |         description: 'Details about the corresponding genres of the songs the user listened to'


--------------------------------------------------------------------------------
/operators/dbt/models/staging/spotify/stg_genres.sql:
--------------------------------------------------------------------------------
 1 | {{
 2 |     config(
 3 |         materialized='table'
 4 |     )
 5 | }}
 6 | 
 7 | with source_spotify_genres as (
 8 |     select * from {{ source('spotify', 'spotify_genres') }}
 9 | ),
10 | 
11 | final as (
12 |     select * from source_spotify_genres
13 | )
14 | 
15 | select * from final


--------------------------------------------------------------------------------
/operators/dbt/models/staging/spotify/stg_songs.sql:
--------------------------------------------------------------------------------
 1 | {{
 2 |     config(
 3 |         materialized='table'
 4 |     )
 5 | }}
 6 | 
 7 | with source_spotify_songs as (
 8 |     select * from {{ source('spotify', 'spotify_songs') }}
 9 | ),
10 | 
11 | final as (
12 |     select * from source_spotify_songs
13 | )
14 | 
15 | select * from final


--------------------------------------------------------------------------------
/operators/main.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Makes requests to the Spotify API to retrieve recently played songs and the corresponding genres
  3 | """
  4 | 
  5 | import datetime as dt
  6 | import os.path
  7 | from datetime import datetime
  8 | from pathlib import Path
  9 | from secrets import spotify_user_id
 10 | 
 11 | import pandas as pd
 12 | import requests
 13 | from postgres_connect import ConnectPostgres
 14 | from refresh import RefreshToken
 15 | from yaml_load import yaml_loader
 16 | 
 17 | 
 18 | class RetrieveSongs:
 19 |     def __init__(self):
 20 |         self.user_id = spotify_user_id  # Spotify username
 21 |         self.spotify_token = ""  # Spotify access token
 22 | 
 23 |     # Query the postgres database to get the latest played timestamp
 24 |     def get_latest_listened_timestamp(self):
 25 |         conn = ConnectPostgres().postgres_connector()
 26 |         cur = conn.cursor()
 27 | 
 28 |         query = "SELECT MAX(played_at_utc) FROM public.spotify_songs"
 29 | 
 30 |         cur.execute(query)
 31 | 
 32 |         max_played_at_utc = cur.fetchall()[0][0]
 33 | 
 34 |         # If the spotify_songs table is empty, grab the earliest data we can. Set at t - 90 days for now.
 35 |         if max_played_at_utc is None:
 36 |             today = dt.datetime.now()
 37 |             previous_date = today - dt.timedelta(days=90)
 38 |             previous_date_unix_timestamp = int(previous_date.timestamp()) * 1000
 39 |             latest_timestamp = previous_date_unix_timestamp
 40 |         else:
 41 |             latest_timestamp = int(max_played_at_utc.timestamp()) * 1000
 42 |         return latest_timestamp
 43 | 
 44 |     # Extract recently played songs from Spotify API
 45 |     def get_songs(self):
 46 |         headers = {
 47 |             "Accept": "application/json",
 48 |             "Content-Type": "application/json",
 49 |             "Authorization": "Bearer {}".format(self.spotify_token),
 50 |         }
 51 | 
 52 |         latest_timestamp = RetrieveSongs().get_latest_listened_timestamp()
 53 |         config = yaml_loader()
 54 |         songs = config["files"]["songs"]
 55 |         genres = config["files"]["genres"]
 56 |         genres_tmp = config["files"]["genres_tmp"]
 57 | 
 58 |         # Download all songs listened to since the last run or since the earliest listen date defined in the get_latest_listened_timestamp function
 59 |         song_response = requests.get(
 60 |             "https://api.spotify.com/v1/me/player/recently-played?limit=50&after={time}".format(
 61 |                 time=latest_timestamp
 62 |             ),
 63 |             headers=headers,
 64 |         )
 65 | 
 66 |         song_data = song_response.json()
 67 | 
 68 |         played_at_utc = []
 69 |         played_date_utc = []
 70 |         song_names = []
 71 |         artist_names = []
 72 |         song_durations_ms = []
 73 |         song_links = []
 74 |         album_art_links = []
 75 |         album_names = []
 76 |         album_ids = []
 77 |         artist_ids = []
 78 |         track_ids = []
 79 | 
 80 |         # Extract only the necessary data from the json object
 81 |         for song in song_data["items"]:
 82 |             played_at_utc.append(song["played_at"])
 83 |             played_date_utc.append(song["played_at"][0:10])
 84 |             song_names.append(song["track"]["name"])
 85 |             artist_names.append(song["track"]["album"]["artists"][0]["name"])
 86 |             song_durations_ms.append(song["track"]["duration_ms"])
 87 |             song_links.append(song["track"]["external_urls"]["spotify"])
 88 |             album_art_links.append(song["track"]["album"]["images"][1]["url"])
 89 |             album_names.append(song["track"]["album"]["name"])
 90 |             album_ids.append(song["track"]["album"]["id"])
 91 |             artist_ids.append(song["track"]["artists"][0]["id"])
 92 |             track_ids.append(song["track"]["id"])
 93 | 
 94 |         # Prepare a dictionary in order to turn it into a pandas dataframe
 95 |         song_dict = {
 96 |             "played_at_utc": played_at_utc,
 97 |             "played_date_utc": played_date_utc,
 98 |             "song_name": song_names,
 99 |             "artist_name": artist_names,
100 |             "song_duration_ms": song_durations_ms,
101 |             "song_link": song_links,
102 |             "album_art_link": album_art_links,
103 |             "album_name": album_names,
104 |             "album_id": album_ids,
105 |             "artist_id": artist_ids,
106 |             "track_id": track_ids,
107 |         }
108 | 
109 |         song_df = pd.DataFrame(
110 |             song_dict,
111 |             columns=[
112 |                 "played_at_utc",
113 |                 "played_date_utc",
114 |                 "song_name",
115 |                 "artist_name",
116 |                 "song_duration_ms",
117 |                 "song_link",
118 |                 "album_art_link",
119 |                 "album_name",
120 |                 "album_id",
121 |                 "artist_id",
122 |                 "track_id",
123 |             ],
124 |         )
125 | 
126 |         last_updated_datetime_utc = dt.datetime.utcnow()
127 |         song_df["last_updated_datetime_utc"] = last_updated_datetime_utc
128 |         song_df = song_df.sort_values("played_at_utc", ascending=True)
129 | 
130 |         # Remove latest song since last run since this will be a duplicate then write to csv
131 |         song_df = song_df.iloc[1:, :]
132 |         song_df.to_csv(f"{songs}.csv", index=False)
133 | 
134 |         for date in set(song_df["played_date_utc"]):
135 |             played_dt = datetime.strptime(date, "%Y-%m-%d")
136 |             date_year = played_dt.year
137 |             date_month = played_dt.month
138 |             output_song_dir = Path(f"{songs}/{date_year}/{date_month}")
139 |             output_song_file = f"{date}.csv"
140 |             path_to_songs_file = f"{output_song_dir}/{output_song_file}"
141 |             songs_file_exists = os.path.exists(path_to_songs_file)
142 |             print(songs_file_exists)
143 |             # Check to see if file exists. If not create a new file, else append to existing file.
144 |             if songs_file_exists:
145 |                 curr_song_df = pd.read_csv(path_to_songs_file)
146 |                 curr_song_df = curr_song_df.append(song_df)
147 |                 curr_song_df.to_csv(path_to_songs_file, index=False)
148 |             else:
149 |                 output_song_dir.mkdir(parents=True, exist_ok=True)
150 |                 song_df.loc[song_df["played_date_utc"] == date].to_csv(
151 |                     f"{output_song_dir}/{date}.csv", index=False
152 |                 )
153 | 
154 |         # Retrieve the corresponding genres for the artists in the artist_ids list
155 |         artist_ids_genres = []
156 |         artist_names = []
157 |         artist_genres = []
158 | 
159 |         artist_ids_dedup = set(artist_ids)
160 | 
161 |         for id in artist_ids_dedup:
162 |             artist_response = requests.get(
163 |                 "https://api.spotify.com/v1/artists/{id}".format(id=id), headers=headers
164 |             )
165 | 
166 |             artist_data = artist_response.json()
167 | 
168 |             artist_ids_genres.append(artist_data["id"])
169 |             artist_names.append(artist_data["name"])
170 | 
171 |             if len(artist_data["genres"]) == 0:
172 |                 artist_genres.append(None)
173 |             else:
174 |                 artist_genres.append(artist_data["genres"][0])
175 | 
176 |         artist_dict = {
177 |             "artist_id": artist_ids_genres,
178 |             "artist_name": artist_names,
179 |             "artist_genre": artist_genres,
180 |         }
181 | 
182 |         artist_genre_df = pd.DataFrame(
183 |             artist_dict, columns=["artist_id", "artist_name", "artist_genre"]
184 |         )
185 | 
186 |         artist_genre_df.to_csv(f"{genres_tmp}.csv", index=False)
187 |         artist_genre_df_nh = pd.read_csv(f"{genres_tmp}.csv", sep=",")
188 |         try:
189 |             curr_artist_genre_df = pd.read_csv(f"{genres}.csv", sep=",")
190 |             curr_artist_genre_df = curr_artist_genre_df.append(artist_genre_df_nh)
191 |             curr_artist_genre_df.drop_duplicates(
192 |                 subset="artist_id", keep="first", inplace=True
193 |             )
194 |             curr_artist_genre_df[
195 |                 "last_updated_datetime_utc"
196 |             ] = last_updated_datetime_utc
197 |             curr_artist_genre_df.to_csv(f"{genres}.csv", index=False)
198 |         except:
199 |             artist_genre_df_nh["last_updated_datetime_utc"] = last_updated_datetime_utc
200 |             artist_genre_df_nh.to_csv(f"{genres}.csv", index=False)
201 |         os.remove(f"{genres_tmp}.csv")
202 | 
203 |     def call_refresh(self):
204 |         print("Refreshing token...")
205 |         refresher = RefreshToken()
206 |         self.spotify_token = refresher.refresh()
207 |         print("Getting songs...")
208 |         self.get_songs()
209 | 
210 | 
211 | if __name__ == "__main__":
212 |     tracks = RetrieveSongs()
213 |     tracks.call_refresh()
214 | 


--------------------------------------------------------------------------------
/operators/postgres_connect.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Connects to the Postgres database
 3 | """
 4 | 
 5 | from secrets import dbname, host, pg_password, pg_user, port
 6 | 
 7 | import psycopg2
 8 | 
 9 | 
10 | class ConnectPostgres:
11 |     def __init__(self):
12 |         self.host = host
13 |         self.port = port
14 |         self.dbname = dbname
15 |         self.pg_user = pg_user
16 |         self.pg_password = pg_password
17 | 
18 |     def postgres_connector(self):
19 |         conn = psycopg2.connect(
20 |             f"host='{self.host}' port='{self.port}' dbname='{self.dbname}' user='{self.pg_user}' password='{self.pg_password}'"
21 |         )
22 |         return conn
23 | 
24 | 
25 | if __name__ == "__main__":
26 |     conn = ConnectPostgres()
27 |     conn.postgres_connector()
28 | 


--------------------------------------------------------------------------------
/operators/refresh.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Generates a new access token on each run
 3 | """
 4 | 
 5 | from secrets import base_64, refresh_token
 6 | 
 7 | import requests
 8 | 
 9 | 
10 | class RefreshToken:
11 |     def __init__(self):
12 |         self.refresh_token = refresh_token
13 |         self.base_64 = base_64
14 | 
15 |     def refresh(self):
16 |         query = "https://accounts.spotify.com/api/token"
17 |         response = requests.post(
18 |             query,
19 |             data={"grant_type": "refresh_token", "refresh_token": refresh_token},
20 |             headers={"Authorization": "Basic " + base_64},
21 |         )
22 | 
23 |         response_json = response.json()
24 |         return response_json["access_token"]
25 | 
26 | 
27 | if __name__ == "__main__":
28 |     new_token = RefreshToken()
29 |     new_token.refresh()
30 | 


--------------------------------------------------------------------------------
/operators/yaml_load.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Reads in a config.yml file
 3 | """
 4 | 
 5 | import yaml
 6 | 
 7 | 
 8 | def yaml_loader():
 9 |     config = yaml.safe_load(open("/opt/airflow/operators/config.yml"))
10 |     return config
11 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | airflow-dbt==0.4.0
2 | pandas==1.5.2
3 | psycopg2==2.9.5
4 | pyyaml==6.0
5 | requests==2.28.1


--------------------------------------------------------------------------------
/setup/airflow_docker.md:
--------------------------------------------------------------------------------
 1 | # Build Docker Containers for Airflow
 2 | 
 3 | - Check if you have enough memory (need at least 4GB)
 4 |   ```
 5 |   docker run --rm "debian:bullseye-slim" bash -c 'numfmt --to iec $(echo $(($(getconf _PHYS_PAGES) * $(getconf PAGE_SIZE))))'
 6 |   ```
 7 | - Fetch docker-compose.yaml
 8 |   ```
 9 |   curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.0/docker-compose.yaml'
10 |   ```
11 | - Make the directories and set the user
12 |   ```
13 |   mkdir -p ./dags ./logs ./plugins
14 |   echo -e "AIRFLOW_UID=$(id -u)" > .env
15 |   ```
16 | - Initialize the database
17 |   ```
18 |   docker compose up airflow-init
19 |   ```
20 | - Start all services
21 |   ```
22 |   docker-compose up
23 |   ```
24 | - Airflow is now available on http://localhost:8080/home
25 | - Depending on where your dbt project is installed a new volume will need to be added in the docker-compose.yaml file in order for dbt to run in Airflow
26 |   ```
27 |   - ./operators/dbt:/opt/airflow/operators/dbt
28 |   ```


--------------------------------------------------------------------------------
/setup/dbt.md:
--------------------------------------------------------------------------------
 1 | # Install dbt Core with Homebrew (or your method of choice)
 2 | 
 3 | - Run the below commands:
 4 |   ```
 5 |   brew update
 6 |   brew install git
 7 |   brew tap dbt-labs/dbt
 8 |   ```
 9 | - Identify your [**adapter**](https://docs.getdbt.com/docs/supported-data-platforms) (in this case Postgres is used) and install:
10 |   ```
11 |   brew install dbt-postgres
12 |   ```
13 | - cd to the directory where you want to have dbt installed and initialize the project
14 |   ```
15 |   dbt init <your-folder-path>
16 |   ```
17 | - Update the profiles.yml file found in Users/<your_username>/.dbt/
18 |   - Update all the appropriate configurations based on the [**dbt setup guide**](https://docs.getdbt.com/reference/warehouse-setups/postgres-setup)
19 | - Go to the dbt_project.yml file and make sure the profile configuration matches with the one in the profiles.yml file
20 | - Ensure the database setup was done correctly
21 |   ```
22 |   dbt debug
23 |   ```
24 | - Test that dbt is building the models correctly. If successful you can verify the new tables/views in the database.
25 |   ```
26 |   dbt run
27 |   ```
28 | - Generate the docs for the dbt project
29 |   ```
30 |   dbt docs generate
31 |   ```
32 | - Serve the docs on a webserver using port 8001
33 |   ```
34 |   dbt docs serve --port 8001
35 |   ```
36 | 


--------------------------------------------------------------------------------
/setup/metabase.md:
--------------------------------------------------------------------------------
 1 | # Install Metabase
 2 | 
 3 | - Download the Metabase [**JAR file**](https://www.metabase.com/start/oss/) (or your method of choice, JAR file was used as Metabase through Docker wasn't working on M1 Macs at the time of this writing)
 4 | - Create a new directory and move the Metabase JAR file into it
 5 | - Ensure the [**latest Java version**](https://www.oracle.com/java/technologies/downloads/#jdk19-mac) is downloaded
 6 | - cd into the new Metabase directory and run the JAR
 7 |   ```
 8 |   java -jar metabase.jar
 9 |   ```
10 | - Metabase is now available on http://localhost:3000/setup
11 | - Set up the connection and use host: localhost
12 | 


--------------------------------------------------------------------------------
/setup/postgres.md:
--------------------------------------------------------------------------------
 1 | # Set up Airflow connection to Postgres
 2 | 
 3 | - Add ports to the section under services and Postgres in the docker-compose.yaml file like below:
 4 |   ```
 5 |   ports:
 6 |       - 5432:5432
 7 |   ```
 8 | - Download DBeaver (or your tool of choice)
 9 |     - Create a new Postgres connection and add the username and password
10 |     - Test the connection, it may ask you to download the Postgres JDBC driver if you don't have it. Download and test again.
11 |     - Once connection is successful create a new database named 'spotify'
12 | - Go to the Airflow UI and click on Admin>Connections then click on the + sign
13 | - Fill in the connection with the below details and click save:
14 |   - Conn Id: postgres_localhost
15 |   - Conn Type: Postgres
16 |   - Host: host.docker.internal
17 |   - Schema: spotify
18 |   - Login:
19 |   - Password:
20 |   - Port: 5432


--------------------------------------------------------------------------------
/setup/slack_notifications.md:
--------------------------------------------------------------------------------
 1 | # Enable Slack notifications for any Airflow task failures
 2 | 
 3 | - Create a channel in your workspace where the alerts will be sent
 4 | - Go to api.slack.com/apps and click on "Create New App" then click on "From scratch"
 5 | - Give your app a name and select your workspace where the alerts will be sent then click "Create App"
 6 | - Enable incoming webhooks for your Slack workspace app
 7 | - You can test your webhook from the command-line by running the code below (replace with your own key):
 8 |   ```
 9 |   -curl -X POST -H 'Content-type: application/json' --data '{"text":"Hello, World!"}' https://hooks.slack.com/services/XXXXXXXXXXX/XXXXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXX
10 |   ```
11 | - Go to the Airflow UI and click on Admin>Connections then click on the + sign
12 | - Fill in the connection with the below details and click save (replace password with your credentials):
13 |   - Connection Id: slack
14 |   - Connection Type: HTTP
15 |   - Host: https://hooks.slack.com/services/
16 |   - Password: /T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
17 | - Implement the code into the DAG script to enable alerts


--------------------------------------------------------------------------------
/setup/spotify_api_access.md:
--------------------------------------------------------------------------------
 1 | # Spotify API Access
 2 | 
 3 | - Ensure you have a Spotify account created
 4 | - Register Your Application
 5 |   - Go to the [**Dashboard**](https://developer.spotify.com/dashboard/applications) page on the Spotify Developer site
 6 |   - Click on **CREATE AN APP**. Provide your app name and app description and then click create.
 7 |   - Click on **EDIT SETTINGS** and provide a redirect URI and then click save
 8 |   - Copy and save your Client ID and Client Secret
 9 | - Define the query parameters in your custom link
10 |   - Link: https://accounts.spotify.com/authorize?client_id=<your_client_id>&response_type=code&redirect_uri=<your_redirect_uri>&scope=<your_scope>
11 |   - <your_client_id> = The Client ID saved from the step above
12 |   - <your_redirect_uri> = The redirect URI you provided in the step above. This needs to be the ENCODED redirect URI. You can encode the redirect URI by going to [**urlencoder.org**](https://www.urlencoder.org/), pasting in the redirect URI, and then clicking encode. Ensure encode is selected and not decode.
13 |   - <your_scope> = Scope(s) needed for your requests. In this case we are using user-read-recently-played.
14 | - Go to the link created in the step above to obtain your authorization code
15 |   - Paste the link from the step above into a browser and hit enter
16 |   - Click Agree
17 |   - Copy the new URL and save the authorization code (value after 'code=' parameter)
18 | - Define your curl command
19 |   - Ensure you have curl by opening up command prompt/terminal and typing curl
20 |   - Curl command:
21 |   ```
22 |     -curl -d client_id=<your_client_id> -d client_secret=<your_client_secret> -d grant_type=authorization_code -d code=<your_authorization_code> -d redirect_uri=<your_redirect_uri> https://accounts.spotify.com/api/token
23 |   ```
24 | - Run curl command to obtain access token and refresh token
25 |   - Paste in the curl command from the step above into command prompt/terminal and run
26 |   - Save your access token and refresh token
27 |   - Access token is what we define as spotify_token in our code
28 |   - Refresh token will be used to generate a new access token on each run as the access token expires after one hour
29 | - Convert Client ID and Client Secret to a base 64 encoded string
30 |   - <your_client_id>:<your_client_secret>
31 |   - Using the format above convert to a base 64 encoded string by going to [**base64encode.org**](https://www.base64encode.org/), pasting in the string, and then clicking encode. Ensure encode is selected and not decode.
32 |   - This will be defined as base_64 in our code and will be used when we generate a new access token on each run
33 |     - Using the format above convert to a base 64 encoded string by going to [base64encode.org](https://www.base64encode.org/), pasting in the string, and then clicking encode. Ensure encode is selected and not decode.<br>
34 |     - This will be defined as base_64 in our code and will be used when we generate a new access token on each run<br>
35 | 
36 | References:
37 | https://developer.spotify.com/console/get-recently-played/?limit=&after=&before=


--------------------------------------------------------------------------------
/spotify_data/spotify_genres.csv:
--------------------------------------------------------------------------------
 1 | artist_id,artist_name,artist_genre
 2 | 2lZFlNiQMLa2fuX3pkXcan,YOUHA,k-indie
 3 | 4SpbR6yFEvexJuaBpgAU5p,LE SSERAFIM,
 4 | 3YUtIXyGE3p0Y4UPc3hyrf,AFTERSCHOOL RED,
 5 | 5V1qsQHdXNm4ZEZHWvFnqQ,Dreamcatcher,k-pop
 6 | 0ghlgldX5Dd6720Q3qFyQB,TOMORROW X TOGETHER,k-pop
 7 | 4hozqATxbpy9TwKWRT8QVO,Rocket Punch,k-pop
 8 | 6IW91qUpcrhbGuZxubrG70,QUEENDOM,
 9 | 7IlRNXHjoOCgEAWN5qYksg,Aya Nakamura,basshall
10 | 7rpKUJ0AnklJ8q9nIPVSpZ,Reol,anime rock
11 | 2AfmfGFbe0A0WsTYm0SDTx,(G)I-DLE,k-pop
12 | 7k73EtZwoPs516ZxE72KsO,ONE OK ROCK,j-pop
13 | 6GwM5CHqhWXzG3l5kzRSAS,Younha,k-pop
14 | 3HqSLMAZ3g3d5poNaI7GOU,IU,k-pop
15 | 3ZZzT0naD25RhY2uZvIKkJ,EVERGLOW,k-pop
16 | 2TMRvcwsmvVhvuEbKVEbZe,YURI,korean r&b
17 | 0FnDCrmcQT8qz5TEsZIYw5,Awich,j-rap
18 | 0Sadg1vgvaPqGTOjxu0N6c,Girls' Generation,k-pop
19 | 0UntV1Bw2hk3fbRrm9eMP6,B.I,k-pop
20 | 2KC9Qb60EaY0kW4eH68vr3,ITZY,k-pop girl group
21 | 2vjeuQwzSP5ErC1S41gONX,CHANMINA,j-pop
22 | 3OBkZ9NG8F0Fn4oNpg0yuU,HA SUNG WOON,k-pop
23 | 43pS0OMyfQBVqILYQJ8fpu,Dj Andrés,
24 | 2uWcrwgWmZcQc3IPBs3tfU,Apink,k-pop
25 | 1EowJ1WwkMzkCkRomFhui7,RADWIMPS,j-pop
26 | 3699Hh55qWXd0kWWMWRR2o,mimiirose,
27 | 6HvZYsbFfjnjFrWF950C9d,NewJeans,k-pop
28 | 7n2Ycct7Beij7Dj7meI4X0,TWICE,k-pop
29 | 3Nrfpe0tUJi4K4DXYWgMUX,BTS,k-pop
30 | 54gWVQFHf8IIqbjxAoOarN,AOA,k-pop
31 | 3C13AlJZ4QWHSruAWe9VPI,CRAXY,
32 | 5GwQwY63I9hrUUFlQB8FYU,H1-KEY,
33 | 


--------------------------------------------------------------------------------
/spotify_data/spotify_songs.csv:
--------------------------------------------------------------------------------
 1 | played_at_utc,played_date_utc,song_name,artist_name,song_duration_ms,song_link,album_art_link,album_name,album_release_date,album_id,artist_id
 2 | 2022-12-22T01:40:05.724Z,2022-12-22,Into You,YURI,186531,https://open.spotify.com/track/6L8wVNs6kuQ7sRjHowbrLp,https://i.scdn.co/image/ab67616d00001e02fe9914fad928a7cbf93b5175,The First Scene - The 1st Mini Album,2018-10-04,1vRQP001rGl7zI3W6ghGSR,2TMRvcwsmvVhvuEbKVEbZe
 3 | 2022-12-22T01:36:58.790Z,2022-12-22,第六感,Reol,191624,https://open.spotify.com/track/22sQUmLhT8umlEhQzDrzfJ,https://i.scdn.co/image/ab67616d00001e0253172aaf6f532e9e45cbb841,第六感,2020-07-27,6CTOnVKQhpsL1NeJQ3XyXF,7rpKUJ0AnklJ8q9nIPVSpZ
 4 | 2022-12-22T01:33:46.407Z,2022-12-22,LUVORATORRRRRY!,Reol,204187,https://open.spotify.com/track/1AMzUKhF0vCFHZTx8H7OS4,https://i.scdn.co/image/ab67616d00001e024b3d52f255be4bab0f200cb2,エンドレスEP,2017-10-11,1G7u7llrqq988dLueDPV7J,7rpKUJ0AnklJ8q9nIPVSpZ
 5 | 2022-12-22T01:30:21.336Z,2022-12-22,煽げや尊し,Reol,217106,https://open.spotify.com/track/2LeHTvjUmiR1A6L9txGDfm,https://i.scdn.co/image/ab67616d00001e026968e389f956d5f84ce66d85,煽げや尊し,2022-07-27,54Jl0Cvkj6wS8oJnLyCsVB,7rpKUJ0AnklJ8q9nIPVSpZ
 6 | 2022-12-22T01:26:44.114Z,2022-12-22,Angel,CHANMINA,191566,https://open.spotify.com/track/4iShk7X3Q8vVIIiFENs9Yz,https://i.scdn.co/image/ab67616d00001e02b6274d484c9ea468882ada0d,Harenchi,2021-10-13,1q9RWaiqhyFz8tYrl57w98,2vjeuQwzSP5ErC1S41gONX
 7 | 2022-12-22T01:23:32.057Z,2022-12-22,Blueming,IU,217053,https://open.spotify.com/track/4Dr2hJ3EnVh2Aaot6fRwDO,https://i.scdn.co/image/ab67616d00001e02b658276cd9884ef6fae69033,Love poem,2019-11-18,2xEH7SRzJq7LgA0fCtTlxH,3HqSLMAZ3g3d5poNaI7GOU
 8 | 2022-12-22T01:20:09.227Z,2022-12-22,GILA GILA,Awich,253333,https://open.spotify.com/track/7AsWduh1mkB8uKOox5NH0X,https://i.scdn.co/image/ab67616d00001e024e16f3fb9b9996cfa91d12f4,Queendom,2022-03-04,4jj5K8UuV6fBOHj4nOCOON,0FnDCrmcQT8qz5TEsZIYw5
 9 | 2022-12-21T16:45:10.873Z,2022-12-21,Answer is near,ONE OK ROCK,219826,https://open.spotify.com/track/3N29lMZHMKTVGXUN5aqzl5,https://i.scdn.co/image/ab67616d00001e021ccfaf5b5a356e8379a1e1e7,Zankyō Reference,2011-10-05,0cb55nHUbG3tLjVVQYPdRj,7k73EtZwoPs516ZxE72KsO
10 | 2022-12-21T16:40:17.878Z,2022-12-21,Into You,YURI,186531,https://open.spotify.com/track/6L8wVNs6kuQ7sRjHowbrLp,https://i.scdn.co/image/ab67616d00001e02fe9914fad928a7cbf93b5175,The First Scene - The 1st Mini Album,2018-10-04,1vRQP001rGl7zI3W6ghGSR,2TMRvcwsmvVhvuEbKVEbZe
11 | 2022-12-21T16:37:10.970Z,2022-12-21,第六感,Reol,191624,https://open.spotify.com/track/22sQUmLhT8umlEhQzDrzfJ,https://i.scdn.co/image/ab67616d00001e0253172aaf6f532e9e45cbb841,第六感,2020-07-27,6CTOnVKQhpsL1NeJQ3XyXF,7rpKUJ0AnklJ8q9nIPVSpZ
12 | 2022-12-21T16:33:58.540Z,2022-12-21,LUVORATORRRRRY!,Reol,204187,https://open.spotify.com/track/1AMzUKhF0vCFHZTx8H7OS4,https://i.scdn.co/image/ab67616d00001e024b3d52f255be4bab0f200cb2,エンドレスEP,2017-10-11,1G7u7llrqq988dLueDPV7J,7rpKUJ0AnklJ8q9nIPVSpZ
13 | 2022-12-21T16:30:33.497Z,2022-12-21,煽げや尊し,Reol,217106,https://open.spotify.com/track/2LeHTvjUmiR1A6L9txGDfm,https://i.scdn.co/image/ab67616d00001e026968e389f956d5f84ce66d85,煽げや尊し,2022-07-27,54Jl0Cvkj6wS8oJnLyCsVB,7rpKUJ0AnklJ8q9nIPVSpZ
14 | 2022-12-21T16:26:55.919Z,2022-12-21,Angel,CHANMINA,191566,https://open.spotify.com/track/4iShk7X3Q8vVIIiFENs9Yz,https://i.scdn.co/image/ab67616d00001e02b6274d484c9ea468882ada0d,Harenchi,2021-10-13,1q9RWaiqhyFz8tYrl57w98,2vjeuQwzSP5ErC1S41gONX
15 | 2022-12-21T16:23:44.435Z,2022-12-21,Blueming,IU,217053,https://open.spotify.com/track/4Dr2hJ3EnVh2Aaot6fRwDO,https://i.scdn.co/image/ab67616d00001e02b658276cd9884ef6fae69033,Love poem,2019-11-18,2xEH7SRzJq7LgA0fCtTlxH,3HqSLMAZ3g3d5poNaI7GOU
16 | 2022-12-21T16:20:06.729Z,2022-12-21,What,Dreamcatcher,205345,https://open.spotify.com/track/4BdpM8bR0nifsaqSz3qphQ,https://i.scdn.co/image/ab67616d00001e028c0b09a8965bb16ff3f7d889,Alone In The City,2018-09-20,68esmTNXocYLPEbLD1c7si,5V1qsQHdXNm4ZEZHWvFnqQ
17 | 2022-12-21T16:16:40.980Z,2022-12-21,Dilemma,Apink,209207,https://open.spotify.com/track/3j0x2BUUtm2obQXS1lZuN3,https://i.scdn.co/image/ab67616d00001e020582560196a977f50cc8411b,HORN,2022-02-14,6GeYzOIumBxJ4iF41J3KXM,2uWcrwgWmZcQc3IPBs3tfU
18 | 2022-12-21T15:10:42.288Z,2022-12-21,Stay with me,AOA,199800,https://open.spotify.com/track/3gesvtxiKvuXj63ZRdHUEY,https://i.scdn.co/image/ab67616d00001e027b05ff669e7218d1feb70c06,Ace of Angels,2015-10-14,4kyXI4CACtU6TzTyhTa4dL,54gWVQFHf8IIqbjxAoOarN
19 | 2022-12-21T15:02:57.112Z,2022-12-21,In and out of love - Beach Club,Dj Andrés,312779,https://open.spotify.com/track/4TMlTyUoYMrJsYEI61RCUG,https://i.scdn.co/image/ab67616d00001e0263e638aa85722ce27327a937,In and out of love (Beach Club),2022-06-26,081Q2WsyY89PTlyJ4vBOwu,43pS0OMyfQBVqILYQJ8fpu
20 | 2022-12-21T14:57:43Z,2022-12-21,Nirvana - Steeve West Remix,Aya Nakamura,142857,https://open.spotify.com/track/7ckdWwRicoQ9zxflneYOjp,https://i.scdn.co/image/ab67616d00001e0211c0ee558211e527cd5c0a06,Nirvana (Steeve West Remix),2022-09-20,2TWqqpTOEBvdiL5IPlJUAj,7IlRNXHjoOCgEAWN5qYksg
21 | 2022-12-21T14:55:09.246Z,2022-12-21,Good Boy Gone Bad,TOMORROW X TOGETHER,191038,https://open.spotify.com/track/1HsSIPLTQT354yJcQGfEY3,https://i.scdn.co/image/ab67616d00001e0213ac5d67675999ba7b9c4f21,minisode 2: Thursday's Child,2022-05-09,1o8jYrnyZueTPIdhlHuTc8,0ghlgldX5Dd6720Q3qFyQB
22 | 2022-12-21T14:54:28.211Z,2022-12-21,Good Boy Gone Bad,TOMORROW X TOGETHER,191038,https://open.spotify.com/track/1HsSIPLTQT354yJcQGfEY3,https://i.scdn.co/image/ab67616d00001e0213ac5d67675999ba7b9c4f21,minisode 2: Thursday's Child,2022-05-09,1o8jYrnyZueTPIdhlHuTc8,0ghlgldX5Dd6720Q3qFyQB
23 | 2022-12-21T14:54:21.275Z,2022-12-21,Kimi to Hitsuji to Ao,RADWIMPS,162280,https://open.spotify.com/track/2qz2iNaqKDJesD2ombEIsi,https://i.scdn.co/image/ab67616d00001e02c0c53dcf5bb71d127b8eab7f,Zettaizetsumei,2011-03-09,3b3tyPWcSOYy5SFC0bCUWP,1EowJ1WwkMzkCkRomFhui7
24 | 2022-12-21T14:32:03.401Z,2022-12-21,Night into the sky,AFTERSCHOOL RED,198243,https://open.spotify.com/track/6UqWoZqgsfbzLkYbn9vL2E,https://i.scdn.co/image/ab67616d00001e0201fd5a3b6968b446df481915,THE 4TH SINGLE ALBUM-RED,2011-07-20,3HTY4W12nkuLA06l2w3b9e,3YUtIXyGE3p0Y4UPc3hyrf
25 | 2022-12-21T14:28:19.956Z,2022-12-21,SUN,CHANMINA,192600,https://open.spotify.com/track/5lmIUH7tyTogDEgEIv6n0w,https://i.scdn.co/image/ab67616d00001e02b6274d484c9ea468882ada0d,Harenchi,2021-10-13,1q9RWaiqhyFz8tYrl57w98,2vjeuQwzSP5ErC1S41gONX
26 | 2022-12-21T14:25:07.442Z,2022-12-21,ATHLETIC GIRL,H1-KEY,212613,https://open.spotify.com/track/0qu54GVbhmBFjpsgiG32PL,https://i.scdn.co/image/ab67616d00001e02ef05154fa2b15bac2e806dde,ATHLETIC GIRL,2022-01-05,3Weg79SFmoXNRUSn08QSPZ,5GwQwY63I9hrUUFlQB8FYU
27 | 2022-12-21T14:20:58.401Z,2022-12-21,Firework,CHANMINA,196163,https://open.spotify.com/track/7quF3X731mQQ67KDVNnYqX,https://i.scdn.co/image/ab67616d00001e02b6274d484c9ea468882ada0d,Harenchi,2021-10-13,1q9RWaiqhyFz8tYrl57w98,2vjeuQwzSP5ErC1S41gONX
28 | 2022-12-21T14:17:42.257Z,2022-12-21,Parade,Younha,193818,https://open.spotify.com/track/0u2RxRjpM4rMvojoNqZHH0,https://i.scdn.co/image/ab67616d00001e02d87083079a0cbd66390f4a92,RescuE,2017-12-27,32n91KG3YeLMLJ9e64EfXy,6GwM5CHqhWXzG3l5kzRSAS
29 | 2022-12-21T14:14:18.833Z,2022-12-21,Save Me,BTS,196505,https://open.spotify.com/track/7bxGcILuAjkZzaveU28ZJS,https://i.scdn.co/image/ab67616d00001e02c6dbc63cf145b4ff6bee3322,The Most Beautiful Moment in Life: Young Forever,2016-05-02,1k5bJ8l5oL5xxVBVHjil09,3Nrfpe0tUJi4K4DXYWgMUX
30 | 2022-12-21T14:11:01.741Z,2022-12-21,GILA GILA,Awich,251818,https://open.spotify.com/track/3LSALxSMhVUQoGN2zwxy1n,https://i.scdn.co/image/ab67616d00001e02c09618552aed5e659f9622ff,GILA GILA,2021-07-30,5v5FfoofCu2Ouflu1GusIN,0FnDCrmcQT8qz5TEsZIYw5
31 | 2022-12-21T14:06:18.333Z,2022-12-21,Lululu,mimiirose,234840,https://open.spotify.com/track/3OziCgBLUNZvg6VwLSjcn2,https://i.scdn.co/image/ab67616d00001e02c1aca2da8f935ac35c224515,AWESOME,2022-09-16,2Y7ZSdUNntl5exBCzcBLvz,3699Hh55qWXd0kWWMWRR2o
32 | 2022-12-21T13:57:35.450Z,2022-12-21,ANTIFRAGILE,LE SSERAFIM,184444,https://open.spotify.com/track/4fsQ0K37TOXa3hEQfjEic1,https://i.scdn.co/image/ab67616d00001e02a991995542d50a691b9ae5be,ANTIFRAGILE,2022-10-17,3u0ggfmK0vjuHMNdUbtaa9,4SpbR6yFEvexJuaBpgAU5p
33 | 2022-12-21T13:50:20.533Z,2022-12-21,Ditto,NewJeans,185506,https://open.spotify.com/track/3r8RuvgbX9s7ammBn07D3W,https://i.scdn.co/image/ab67616d00001e02edf5b257be1d6593e81bb45f,Ditto,2022-12-19,7bnqo1fdJU9nSfXQd3bSMe,6HvZYsbFfjnjFrWF950C9d
34 | 2022-12-21T13:46:22.841Z,2022-12-21,LA DI DA,EVERGLOW,210893,https://open.spotify.com/track/6mIjJONoUMvGPT9Kzrab3L,https://i.scdn.co/image/ab67616d00001e02c32633331e11b4fe108237f8,-77.82x-78.29,2020-09-21,4kMID9cggWEko9mOb1zisI,3ZZzT0naD25RhY2uZvIKkJ
35 | 2022-12-21T13:43:42.221Z,2022-12-21,Last Dance,YOUHA,192226,https://open.spotify.com/track/1bOS0JdXxmTWwlUxXX7gRG,https://i.scdn.co/image/ab67616d00001e0257a6f5928952c277c4407f98,"love you more,",2022-08-25,3g2OiEeQKfggUe6ViYeLSC,2lZFlNiQMLa2fuX3pkXcan
36 | 2022-12-21T13:42:51.069Z,2022-12-21,Ladi Dadi,AOA,202703,https://open.spotify.com/track/00Jrzl6fENQd3XEWGDtowy,https://i.scdn.co/image/ab67616d00001e02da645df1f66725e1a066c807,BINGLE BANGLE,2018-05-28,3dCzh2hx0xzXFBUmzQQeoJ,54gWVQFHf8IIqbjxAoOarN
37 | 2022-12-21T06:15:02.999Z,2022-12-21,CHIQUITA,Rocket Punch,186328,https://open.spotify.com/track/0dap7YUNPlUiEPYSA1bRg3,https://i.scdn.co/image/ab67616d00001e02067a8277ede2feece51ba2ec,YELLOW PUNCH,2022-02-28,6L2VwLPHfm5cCdTF1erFrN,4hozqATxbpy9TwKWRT8QVO
38 | 2022-12-21T06:07:32.098Z,2022-12-21,Brave,TWICE,189200,https://open.spotify.com/track/2peoFPokM6eYAIwLm9IQ8E,https://i.scdn.co/image/ab67616d00001e02c3040848e6ef0e132c5c8340,BETWEEN 1&2,2022-08-26,3NZ94nQbqimcu2i71qhc4f,7n2Ycct7Beij7Dj7meI4X0
39 | 2022-12-21T06:02:25.850Z,2022-12-21,FOREVER 1,Girls' Generation,202533,https://open.spotify.com/track/1oen3GpTcA486fTHaT7neg,https://i.scdn.co/image/ab67616d00001e02aea29200523b1ee4d5b2c035,FOREVER 1 - The 7th Album,2022-08-05,3CcgnUkTrUaPTt4Ms1MkoP,0Sadg1vgvaPqGTOjxu0N6c
40 | 2022-12-21T05:59:02.644Z,2022-12-21,WANNABE,ITZY,191242,https://open.spotify.com/track/4pspYVQGFHLPEFgQPD1J7e,https://i.scdn.co/image/ab67616d00001e02fc620c06721e90a534cc5dab,IT'z ME,2020-03-09,7ynKAohxfwPUZzvU8f1p1U,2KC9Qb60EaY0kW4eH68vr3
41 | 2022-12-21T05:55:51.501Z,2022-12-21,FOCUS,HA SUNG WOON,183146,https://open.spotify.com/track/7nj5G4aPUJD0TnNF6SqcrX,https://i.scdn.co/image/ab67616d00001e024e6531b1f309c00dcaaf2f88,Strange World,2022-08-24,2eE6EDzzdWYQH6TfwGjz87,3OBkZ9NG8F0Fn4oNpg0yuU
42 | 2022-12-21T05:52:47.169Z,2022-12-21,Undercover,CRAXY,208826,https://open.spotify.com/track/6Hqh1ybErzzlHLuDqgL3Pd,https://i.scdn.co/image/ab67616d00001e02e57b32c4af6468231a3bb6db,Who Am I,2022-08-16,5S2tHbdTC1zn7BjLsc0Bg0,3C13AlJZ4QWHSruAWe9VPI
43 | 2022-12-21T05:49:18.497Z,2022-12-21,Last Dance,YOUHA,192226,https://open.spotify.com/track/1bOS0JdXxmTWwlUxXX7gRG,https://i.scdn.co/image/ab67616d00001e0257a6f5928952c277c4407f98,"love you more,",2022-08-25,3g2OiEeQKfggUe6ViYeLSC,2lZFlNiQMLa2fuX3pkXcan
44 | 2022-12-21T05:46:05.686Z,2022-12-21,CHIQUITA,Rocket Punch,186328,https://open.spotify.com/track/0dap7YUNPlUiEPYSA1bRg3,https://i.scdn.co/image/ab67616d00001e02067a8277ede2feece51ba2ec,YELLOW PUNCH,2022-02-28,6L2VwLPHfm5cCdTF1erFrN,4hozqATxbpy9TwKWRT8QVO
45 | 2022-12-21T05:42:58.252Z,2022-12-21,Brave,TWICE,189200,https://open.spotify.com/track/2peoFPokM6eYAIwLm9IQ8E,https://i.scdn.co/image/ab67616d00001e02c3040848e6ef0e132c5c8340,BETWEEN 1&2,2022-08-26,3NZ94nQbqimcu2i71qhc4f,7n2Ycct7Beij7Dj7meI4X0
46 | 2022-12-21T05:39:28.519Z,2022-12-21,MY BAG,(G)I-DLE,160520,https://open.spotify.com/track/1t8sqIScEIP0B4bQzBuI2P,https://i.scdn.co/image/ab67616d00001e02c7b6b2976e38a802eebff046,I NEVER DIE,2022-03-14,1T2W9vDajFreUuycPDjUXk,2AfmfGFbe0A0WsTYm0SDTx
47 | 2022-12-21T05:36:49.092Z,2022-12-21,チキチキバンバン,QUEENDOM,202500,https://open.spotify.com/track/7xm0KJMfeaJQmQdDxAipiY,https://i.scdn.co/image/ab67616d00001e02a873ce4c458e732a422d342f,チキチキバンバン,2022-05-20,1BWf1vYaM0zNWE6uuFFcvF,6IW91qUpcrhbGuZxubrG70
48 | 2022-12-21T05:33:28.585Z,2022-12-21,煽げや尊し,Reol,217106,https://open.spotify.com/track/2LeHTvjUmiR1A6L9txGDfm,https://i.scdn.co/image/ab67616d00001e026968e389f956d5f84ce66d85,煽げや尊し,2022-07-27,54Jl0Cvkj6wS8oJnLyCsVB,7rpKUJ0AnklJ8q9nIPVSpZ
49 | 2022-12-21T05:29:50.880Z,2022-12-21,Dilemma,Apink,209207,https://open.spotify.com/track/3j0x2BUUtm2obQXS1lZuN3,https://i.scdn.co/image/ab67616d00001e020582560196a977f50cc8411b,HORN,2022-02-14,6GeYzOIumBxJ4iF41J3KXM,2uWcrwgWmZcQc3IPBs3tfU
50 | 2022-12-21T04:55:54.923Z,2022-12-21,BTBT,B.I,219549,https://open.spotify.com/track/4XcxgZSriCYamtIA7BgT7V,https://i.scdn.co/image/ab67616d00001e022f485bcd37d1d6733d324f7d,BTBT,2022-05-13,6z2Ij8op0iB16BnmrCy0vH,0UntV1Bw2hk3fbRrm9eMP6
51 | 2022-12-21T04:55:53.239Z,2022-12-21,BTBT,B.I,219549,https://open.spotify.com/track/4XcxgZSriCYamtIA7BgT7V,https://i.scdn.co/image/ab67616d00001e022f485bcd37d1d6733d324f7d,BTBT,2022-05-13,6z2Ij8op0iB16BnmrCy0vH,0UntV1Bw2hk3fbRrm9eMP6
52 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/calbergs/spotify-api/1fa07e7beb51226e212ab3ac9bd998bf64b16e80/tests/__init__.py


--------------------------------------------------------------------------------