├── .env ├── .github └── workflows │ └── ci.yml ├── .gitignore ├── Dockerfile ├── Makefile ├── README.md ├── assets ├── architecture.PNG ├── data_model.png ├── metabase_connection.PNG ├── metabase_dashboard_1.PNG └── metabase_dashboard_2.PNG ├── dataset └── Warehouse_and_Retail_Sales.csv ├── docker-compose.yaml ├── pipeline ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-312.pyc │ ├── connection.cpython-312.pyc │ └── etl.cpython-312.pyc ├── connection.py ├── ingestion │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-312.pyc │ │ └── to_landing.cpython-312.pyc │ └── to_landing.py ├── main_pipeline.py ├── transformation │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-312.pyc │ │ └── etl.cpython-312.pyc │ └── etl.py └── utils │ ├── __init__.py │ ├── __pycache__ │ ├── __init__.cpython-312.pyc │ ├── db.cpython-312.pyc │ └── sde_config.cpython-312.pyc │ ├── db.py │ └── sde_config.py ├── requirements.txt ├── sql └── init_db.sql └── test ├── __pycache__ ├── conftest.cpython-312-pytest-8.0.0.pyc ├── test_etl.cpython-312-pytest-8.0.0.pyc ├── test_etl_unit.cpython-312-pytest-8.0.0.pyc └── test_pipeline.cpython-312-pytest-8.0.0.pyc ├── conftest.py └── test_pipeline.py /.env: -------------------------------------------------------------------------------- 1 | POSTGRES_USER=dorian 2 | POSTGRES_PASSWORD=1412 3 | POSTGRES_DB=retail_sales 4 | POSTGRES_HOST=warehouse 5 | POSTGRES_PORT=5432 6 | TABLE_NAME=Retail_sales 7 | FILE_PATH=dataset/Warehouse_and_Retail_Sales.csv 8 | 9 | 10 | 11 | -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: ci 2 | on: 3 | push: 4 | branches: 5 | - master 6 | pull_request: 7 | branches: 8 | - master 9 | jobs: 10 | run-ci-tests: 11 | runs-on: ubuntu-latest 12 | steps: 13 | - name: checkout repo 14 | uses: actions/checkout@v2 15 | - name: Spin up containers 16 | run: make up 17 | - name: Run CI test 18 | run: make ci -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3 2 | 3 | WORKDIR /app 4 | 5 | COPY requirements.txt ./ 6 | 7 | RUN pip install --no-cache-dir -r requirements.txt 8 | 9 | COPY . . 10 | 11 | CMD ["tail", "-f", "/dev/null"] -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | up: 2 | docker compose --env-file .env up --build -d 3 | 4 | etl: 5 | docker exec etl python pipeline/main_pipeline.py 6 | 7 | 8 | pytest: 9 | docker exec etl python -m pytest -p no:warnings -v 10 | 11 | 12 | format: 13 | docker exec etl python -m black -S --line-length 79 . 14 | 15 | 16 | isort: 17 | docker exec etl isort . 18 | 19 | 20 | type: 21 | docker exec etl mypy --ignore-missing-imports . 22 | 23 | 24 | lint: 25 | docker exec etl flake8 . 26 | 27 | 28 | ci: isort format type lint pytest 29 | 30 | 31 | warehouse: 32 | winpty docker exec -ti warehouse psql postgres://dorian:1412@localhost:5432/retail_sales 33 | 34 | down: 35 | docker compose down -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | ![Architecture](assets/architecture.PNG) 3 | 4 | ## Tech Stack 5 | * Docker 6 | * Python 7 | * Metabase 8 | * Postgresql 9 | * Github actions (ci/cd) 10 | 11 | 12 | ## Project Overview 13 | In this project, we initially used Python and SQLAlchemy to load a CSV file containing a list of sales and movement data by item and month, in a schema on Postgres called "landing_area" (see pipeline/raw_data_to_landing.py). We then applied some transformation logic on that table with Pandas to build a star schema, and subsequently loaded it into the "staging_area" for visualization. 14 | 15 | ## Run the pipeline 16 | Here are the commands to set up the environment: 17 | * `make up`: Create and run all the containers. 18 | * `make ci`: Format, and run the tests 19 | * `make etl`: Run the pipeline. 20 | * `make warehouse`: Connect to the Postgres database and check the data. 21 | * Go to localhost:3000 to open Metabase. 22 | * `make down`: Stop the containers. 23 | 24 | ## Date model 25 | ![data_model.png](assets/data_model.png) 26 | 27 | ## Dashboard 28 | ![dashboard1.PNG](assets/metabase_dashboard_1.PNG) 29 | ![dashboard2.PNG](assets/metabase_dashboard_2.PNG) -------------------------------------------------------------------------------- /assets/architecture.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/assets/architecture.PNG -------------------------------------------------------------------------------- /assets/data_model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/assets/data_model.png -------------------------------------------------------------------------------- /assets/metabase_connection.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/assets/metabase_connection.PNG -------------------------------------------------------------------------------- /assets/metabase_dashboard_1.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/assets/metabase_dashboard_1.PNG -------------------------------------------------------------------------------- /assets/metabase_dashboard_2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/assets/metabase_dashboard_2.PNG -------------------------------------------------------------------------------- /docker-compose.yaml: -------------------------------------------------------------------------------- 1 | version: "3" 2 | 3 | services: 4 | etl: 5 | image: etl 6 | container_name: etl 7 | build: 8 | context: . 9 | volumes: 10 | - ./:/app 11 | environment: 12 | POSTGRES_USER: ${POSTGRES_USER} 13 | POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} 14 | POSTGRES_DB: ${POSTGRES_DB} 15 | POSTGRES_HOST: ${POSTGRES_HOST} 16 | POSTGRES_PORT: ${POSTGRES_PORT} 17 | TABLE_NAME: ${TABLE_NAME} 18 | FILE_PATH: ${FILE_PATH} 19 | 20 | warehouse: 21 | image: postgres:13 22 | container_name: warehouse 23 | environment: 24 | POSTGRES_USER: ${POSTGRES_USER} 25 | POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} 26 | POSTGRES_DB: ${POSTGRES_DB} 27 | volumes: 28 | - postgres-volume:/var/lib/postgresql/data 29 | - ./sql:/docker-entrypoint-initdb.d 30 | restart: always 31 | ports: 32 | - "5432:5432" 33 | 34 | metabase: 35 | image: metabase/metabase:latest 36 | container_name: metabase 37 | hostname: metabase 38 | volumes: 39 | - urandom:/dev/random:ro 40 | ports: 41 | - 3000:3000 42 | restart: always 43 | 44 | volumes: 45 | postgres-volume: 46 | urandom: 47 | 48 | 49 | -------------------------------------------------------------------------------- /pipeline/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/__init__.py -------------------------------------------------------------------------------- /pipeline/__pycache__/__init__.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/__pycache__/__init__.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/__pycache__/connection.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/__pycache__/connection.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/__pycache__/etl.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/__pycache__/etl.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/connection.py: -------------------------------------------------------------------------------- 1 | import logging 2 | 3 | from sqlalchemy import create_engine 4 | from utils.db import WarehouseConnection 5 | from utils.sde_config import get_warehouse_creds 6 | 7 | logging.basicConfig( 8 | level=logging.INFO, 9 | format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", 10 | ) 11 | 12 | logger = logging.getLogger(__name__) 13 | 14 | 15 | def create_conn( 16 | connection_string=WarehouseConnection( 17 | get_warehouse_creds() 18 | ).connection_string(), 19 | ): 20 | # connect to the postgres database 21 | try: 22 | engine = create_engine(connection_string) 23 | logger.info("Connected to postgres database!!") 24 | return engine 25 | except Exception as e: 26 | logger.error("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!") 27 | logger.error(f"Enable to connect to postgres : {e}") 28 | 29 | 30 | def close_conn(engine): 31 | # close the connection 32 | engine.dispose() 33 | -------------------------------------------------------------------------------- /pipeline/ingestion/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/ingestion/__init__.py -------------------------------------------------------------------------------- /pipeline/ingestion/__pycache__/__init__.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/ingestion/__pycache__/__init__.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/ingestion/__pycache__/to_landing.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/ingestion/__pycache__/to_landing.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/ingestion/to_landing.py: -------------------------------------------------------------------------------- 1 | import logging 2 | 3 | logging.basicConfig( 4 | level=logging.INFO, 5 | format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", 6 | ) 7 | 8 | logger = logging.getLogger(__name__) 9 | 10 | 11 | def load_table_to_landing(df, engine, table_name): 12 | # load the csv file to the schema 13 | try: 14 | df.to_sql( 15 | table_name, 16 | engine, 17 | if_exists='replace', 18 | index=False, 19 | schema='landing_area', 20 | ) 21 | logger.info("Table loaded to the landing area!!!") 22 | except Exception as e: 23 | logger.error("!!!!!!!!!!!!!!!!!!!!!!") 24 | logger.error(f"Enable to load the data to landing area : {e}") 25 | -------------------------------------------------------------------------------- /pipeline/main_pipeline.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import pandas as pd 4 | from connection import close_conn, create_conn 5 | from ingestion.to_landing import load_table_to_landing 6 | from transformation.etl import ( 7 | clean_data, 8 | create_schema, 9 | load_tables_staging, 10 | read_table, 11 | ) 12 | 13 | 14 | def main(): 15 | 16 | engine = create_conn() 17 | 18 | """ Landing area """ 19 | file_path = os.getenv('FILE_PATH') 20 | table_name = os.getenv('TABLE_NAME') 21 | 22 | df = pd.read_csv(file_path) 23 | load_table_to_landing(df, engine, table_name) 24 | 25 | """ Staging area """ 26 | df = read_table(engine, table_name) 27 | df_clean = clean_data(df) 28 | dict_tables = create_schema(df_clean) 29 | load_tables_staging(dict_tables, engine) 30 | 31 | close_conn(engine) 32 | 33 | 34 | if __name__ == "__main__": 35 | main() 36 | -------------------------------------------------------------------------------- /pipeline/transformation/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/transformation/__init__.py -------------------------------------------------------------------------------- /pipeline/transformation/__pycache__/__init__.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/transformation/__pycache__/__init__.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/transformation/__pycache__/etl.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/transformation/__pycache__/etl.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/transformation/etl.py: -------------------------------------------------------------------------------- 1 | import logging 2 | 3 | import pandas as pd 4 | 5 | logging.basicConfig( 6 | level=logging.INFO, 7 | format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", 8 | ) 9 | 10 | logger = logging.getLogger(__name__) 11 | 12 | 13 | def read_table(engine, table_name): 14 | """read data from the landing schema""" 15 | try: 16 | df = pd.read_sql_query( 17 | f'SELECT * FROM landing_area."{table_name}"', engine 18 | ) 19 | logger.info('Table read from the landing_area!!!!') 20 | return df 21 | except Exception as e: 22 | logger.error('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') 23 | logger.error(f'Enable to read data from landing_area: {e}') 24 | 25 | 26 | def clean_data(df): 27 | # data cleaning 28 | df['SUPPLIER'] = df['SUPPLIER'].fillna("NO SUPPLIER") 29 | df['ITEM TYPE'] = df['ITEM TYPE'].fillna("NO ITEM TYPE") 30 | df['RETAIL SALES'] = df['RETAIL SALES'].fillna(-1) 31 | 32 | return df 33 | 34 | 35 | def create_schema(df): 36 | """Build a star schema""" 37 | 38 | supplier_df = df[['SUPPLIER']] 39 | supplier_df = supplier_df.drop_duplicates() 40 | supplier_df = supplier_df.reset_index(drop=True) 41 | supplier_df = supplier_df.reset_index(names="SUPPLIER_ID") 42 | supplier_df["SUPPLIER_ID"] += 1 43 | 44 | item_df = df[['ITEM CODE', 'ITEM TYPE', 'ITEM DESCRIPTION']] 45 | item_df = item_df.rename( 46 | columns={ 47 | 'ITEM CODE': 'ITEM_CODE', 48 | 'ITEM TYPE': 'ITEM_TYPE', 49 | 'ITEM DESCRIPTION': 'ITEM_DESCRIPTION', 50 | } 51 | ) 52 | item_df = item_df.drop_duplicates() 53 | 54 | date_df = df[['YEAR', 'MONTH']] 55 | date_df = date_df.drop_duplicates() 56 | date_df = date_df.reset_index(drop=True) 57 | date_df = date_df.reset_index(names="DATE_ID") 58 | date_df["DATE_ID"] += 1 59 | 60 | fact_table = ( 61 | df.merge(supplier_df, on='SUPPLIER') 62 | .merge(item_df, left_on="ITEM CODE", right_on="ITEM_CODE") 63 | .merge(date_df, on=["YEAR", "MONTH"])[ 64 | [ 65 | 'ITEM_CODE', 66 | 'SUPPLIER_ID', 67 | 'DATE_ID', 68 | 'RETAIL SALES', 69 | 'RETAIL TRANSFERS', 70 | 'WAREHOUSE SALES', 71 | ] 72 | ] 73 | ) 74 | 75 | fact_table = fact_table.drop_duplicates() 76 | 77 | return { 78 | "Supplier": supplier_df.to_dict(orient="dict"), 79 | "Item": item_df.to_dict(orient="dict"), 80 | "Date": date_df.to_dict(orient="dict"), 81 | "Fact_table": fact_table.to_dict(orient="dict"), 82 | } 83 | 84 | 85 | def load_tables_staging(dict, engine): 86 | """load the tables to the staging schema for visualization""" 87 | try: 88 | for df_name, value_dict in dict.items(): 89 | value_df = pd.DataFrame(value_dict) 90 | logger.info( 91 | f'Importing {len(value_df)} rows from' 92 | f'landing_area to staging_area.{df_name}' 93 | ) 94 | value_df.to_sql( 95 | df_name, 96 | engine, 97 | if_exists='replace', 98 | index=False, 99 | schema='staging_area', 100 | ) 101 | 102 | logger.info('!!!!!!!!') 103 | logger.info(f'Table {df_name} loaded succesfully') 104 | 105 | except Exception as e: 106 | logger.error("!!!!!!!!!!!!!!!!!!!!!!") 107 | logger.error(f"Enable to load the data to staging area : {e}") 108 | -------------------------------------------------------------------------------- /pipeline/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/utils/__init__.py -------------------------------------------------------------------------------- /pipeline/utils/__pycache__/__init__.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/utils/__pycache__/__init__.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/utils/__pycache__/db.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/utils/__pycache__/db.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/utils/__pycache__/sde_config.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/pipeline/utils/__pycache__/sde_config.cpython-312.pyc -------------------------------------------------------------------------------- /pipeline/utils/db.py: -------------------------------------------------------------------------------- 1 | from dataclasses import dataclass 2 | 3 | 4 | @dataclass 5 | class DBConnection: 6 | database: str 7 | user: str 8 | pwd: str 9 | host: str 10 | port: int 11 | 12 | 13 | class WarehouseConnection: 14 | def __init__(self, db_conn: DBConnection): 15 | self.conn_url = ( 16 | f'postgresql://{db_conn.user}:{db_conn.pwd}@' 17 | f'{db_conn.host}:{db_conn.port}/{db_conn.database}' 18 | ) 19 | 20 | def connection_string(self): 21 | return self.conn_url 22 | -------------------------------------------------------------------------------- /pipeline/utils/sde_config.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from utils.db import DBConnection 4 | 5 | 6 | def get_warehouse_creds() -> DBConnection: 7 | return DBConnection( 8 | user=os.getenv('POSTGRES_USER', ''), 9 | pwd=os.getenv('POSTGRES_PASSWORD', ''), 10 | database=os.getenv('POSTGRES_DB', ''), 11 | host=os.getenv('POSTGRES_HOST', ''), 12 | port=int(os.getenv('POSTGRES_PORT', 5432)), 13 | ) 14 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pandas 2 | configparser 3 | sqlalchemy 4 | psycopg2 5 | pytest 6 | black 7 | flake8 8 | mypy 9 | isort 10 | 11 | 12 | -------------------------------------------------------------------------------- /sql/init_db.sql: -------------------------------------------------------------------------------- 1 | CREATE SCHEMA landing_area; 2 | CREATE SCHEMA staging_area; -------------------------------------------------------------------------------- /test/__pycache__/conftest.cpython-312-pytest-8.0.0.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/test/__pycache__/conftest.cpython-312-pytest-8.0.0.pyc -------------------------------------------------------------------------------- /test/__pycache__/test_etl.cpython-312-pytest-8.0.0.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/test/__pycache__/test_etl.cpython-312-pytest-8.0.0.pyc -------------------------------------------------------------------------------- /test/__pycache__/test_etl_unit.cpython-312-pytest-8.0.0.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/test/__pycache__/test_etl_unit.cpython-312-pytest-8.0.0.pyc -------------------------------------------------------------------------------- /test/__pycache__/test_pipeline.cpython-312-pytest-8.0.0.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dorianteffo/etl_pipeline_docker_metabase/4e686eea5500bafc85c3f323aef0b2110355a245/test/__pycache__/test_pipeline.cpython-312-pytest-8.0.0.pyc -------------------------------------------------------------------------------- /test/conftest.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import pytest 4 | 5 | 6 | @pytest.fixture() 7 | def mock_df(): 8 | data = pd.DataFrame( 9 | { 10 | 'YEAR': [2022, 2023, 2010], 11 | 'MONTH': ["Jan", "Feb", "March"], 12 | 'SUPPLIER': [np.nan, "Sup1", np.nan], 13 | 'ITEM CODE': [12, 13, 24], 14 | 'ITEM DESCRIPTION': ["first", "second", "third"], 15 | 'ITEM TYPE': ["Wine", "Liquor", np.nan], 16 | 'RETAIL SALES': [100, 130, np.nan], 17 | 'RETAIL TRANSFERS': [0, 12, 0], 18 | 'WAREHOUSE SALES': [0, 12, 0], 19 | } 20 | ) 21 | 22 | return data 23 | -------------------------------------------------------------------------------- /test/test_pipeline.py: -------------------------------------------------------------------------------- 1 | from pipeline.transformation.etl import clean_data, create_schema 2 | 3 | 4 | def test_clean_data(mock_df): 5 | clean_df = clean_data(mock_df) 6 | 7 | assert clean_df['SUPPLIER'].isna().sum() == 0 8 | assert clean_df['RETAIL SALES'].isna().sum() == 0 9 | assert clean_df['ITEM TYPE'].isna().sum() == 0 10 | assert clean_df.loc[2, 'RETAIL SALES'] == -1 11 | assert len(clean_df) == 3 12 | 13 | 14 | def test_create_schema(mock_df): 15 | clean_df = clean_data(mock_df) 16 | 17 | dict_table = create_schema(clean_df) 18 | 19 | assert len(dict_table) == 4 20 | --------------------------------------------------------------------------------