├── .gitignore ├── .streamlit └── secrets-template.toml ├── LICENSE ├── README.md ├── image.png ├── main.py ├── modules ├── code_editor.py ├── duckdb_result.py └── sidebar.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .venv 2 | .env 3 | **/__pycache__ 4 | .streamlit/secrets.toml -------------------------------------------------------------------------------- /.streamlit/secrets-template.toml: -------------------------------------------------------------------------------- 1 | STORAGE_ACCOUNT_NAME = "your-storage-account-name" 2 | DELTA_LAKE_ROOT_PATH = "" 3 | # AZURE_TENANT_ID = "your-tenant-id" 4 | # AZURE_CLIENT_ID = "your-service-principal-clientid" 5 | # AZURE_CLIENT_SECRET = "your-service-principal-secret" -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Delta Lake Explorer 2 | 3 | Delta Lake Explorer is a Streamlit application that allows users to explore Delta Lake tables on Azure Data Lake Storage using DuckDB. The application provides a code editor for writing SQL queries, a sidebar for configuring settings, and a result viewer for displaying query results. 4 | 5 | ## Screenshots 6 | 7 | ![Screenshot 1](image.png) 8 | 9 | ## Features 10 | 11 | - **Code Editor**: Write and execute SQL queries. 12 | - **Query Parsing**: Automatically parse and transform queries to use `delta_scan`. 13 | - **Query Timing**: Display the time taken to execute queries. 14 | 15 | ## Installation 16 | 17 | 1. Clone the repository: 18 | 19 | 20 | ```bash 21 | git clone https://github.com/mrjsj/delta-lake-explorer.git 22 | cd delta-lake-explorer 23 | ``` 24 | 25 | 2. Create a virtual environment: 26 | 27 | ```bash 28 | python -m venv .venv 29 | source .venv/bin/activate # On Windows, use .venv\Scripts\activate 30 | ``` 31 | 32 | 3. Install the required packages: 33 | 34 | ```bash 35 | pip install -r requirements.txt 36 | ``` 37 | 38 | ## Configuration 39 | 40 | 1. Rename the `.streamlit/secrets-template.toml` to `.streamlit/secrets.toml`: 41 | 42 | 2. Fill in the following values in `.streamlit/secrets.toml`: 43 | - `STORAGE_ACCOUNT_NAME`: The name of your Azure storage account. 44 | - `DELTA_LAKE_ROOT_PATH`: The root path up until the delta lake catalog. This includes the container name and the path to the delta lake catalog. E.g., if the full delta table path is `abfss://container/path/to/catalog/layer/table`, then the root path is `container/path/to`. If the delta lake catalog is at the root of the storage account, then the root path is an empty string. 45 | 46 | 3. Choose a way to authenticate to Azure. You can use a service principal, or a Azure CLI login. In either case, make sure you have at least **Storage Blob Data Reader** role assigned to your service principal or your personal user on the storage account. 47 | - If you choose a service principal, fill in the following values in `.streamlit/secrets.toml`: 48 | - `AZURE_TENANT_ID`: The tenant ID of your Azure AD. 49 | - `AZURE_CLIENT_ID`: The client ID of your service principal. 50 | - `AZURE_CLIENT_SECRET`: The client secret of your service principal. 51 | - If you choose Azure CLI login, run `az login` before running the application. 52 | 53 | ## Usage 54 | 55 | Run the Streamlit application: 56 | 57 | ```bash 58 | streamlit run main.py 59 | ``` 60 | 61 | Query using DuckDB syntax. Tables must be refences by `catalog.schema.table`, e.g.: 62 | ```sql 63 | SELECT * FROM catalog.schema.table; 64 | ``` 65 | 66 | For more information on DuckDB syntax, see the [DuckDB documentation](https://duckdb.org/docs/sql/introduction). 67 | 68 | ## License 69 | 70 | This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. -------------------------------------------------------------------------------- /image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mrjsj/delta-lake-explorer/e955accd87dd6823a6db9e20abb7848b569d83ee/image.png -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | from modules.code_editor import load_code_editor 3 | from modules.sidebar import load_sidebar 4 | from modules.duckdb_result import show_duckdb_result 5 | 6 | 7 | def main(): 8 | st.set_page_config(page_title="Delta Lake Explorer", page_icon="📊", layout="wide") 9 | st.title("Delta Lake Explorer") 10 | load_sidebar() 11 | load_code_editor() 12 | show_duckdb_result() 13 | 14 | if __name__ == "__main__": 15 | main() -------------------------------------------------------------------------------- /modules/code_editor.py: -------------------------------------------------------------------------------- 1 | from code_editor import code_editor 2 | import sys 3 | 4 | def load_code_editor(): 5 | 6 | cmd_button_text = "Cmd-Enter" if sys.platform == "darwin" else "Ctrl-Enter" 7 | 8 | custom_btns = [ 9 | { 10 | "name": f"{cmd_button_text} to execute", 11 | "hasText": True, 12 | "alwaysOn": True, 13 | "commands": [ 14 | "submit" 15 | ], 16 | "style": { 17 | "fontSize": "0.8rem", 18 | "position": "fixed", 19 | "bottom": "0.1rem", 20 | "right": "0.1rem", 21 | "zIndex": "1000" 22 | } 23 | } 24 | ] 25 | 26 | code_editor( 27 | key="query", 28 | code="", 29 | lang="sql", 30 | height="300px", 31 | theme="dracula", 32 | buttons=custom_btns 33 | ) -------------------------------------------------------------------------------- /modules/duckdb_result.py: -------------------------------------------------------------------------------- 1 | import re 2 | import streamlit as st 3 | import duckdb 4 | from datetime import datetime, timezone 5 | 6 | def _parse_query(query: str) -> str: 7 | 8 | query = query.strip(";") 9 | patterns = [re.compile(r'(? duckdb.DuckDBPyConnection: 27 | if "db" not in st.session_state: 28 | 29 | db = duckdb.connect(database=':memory:') 30 | 31 | if not ("AZURE_TENANT_ID" in st.secrets and "AZURE_CLIENT_ID" in st.secrets and "AZURE_CLIENT_SECRET" in st.secrets): 32 | db.sql(f""" 33 | INSTALL delta; LOAD delta; 34 | INSTALL azure; LOAD azure; 35 | CREATE SECRET azure_secret ( 36 | TYPE AZURE, 37 | PROVIDER CREDENTIAL_CHAIN, 38 | CHAIN 'cli', 39 | ACCOUNT_NAME {st.secrets["STORAGE_ACCOUNT_NAME"]} 40 | ); 41 | """) 42 | 43 | 44 | else: 45 | db.sql( 46 | f""" 47 | INSTALL delta; LOAD delta; 48 | INSTALL azure; LOAD azure; 49 | CREATE SECRET azure_secret ( 50 | TYPE AZURE, 51 | PROVIDER SERVICE_PRINCIPAL, 52 | ACCOUNT_NAME '{st.secrets["STORAGE_ACCOUNT_NAME"]}', 53 | TENANT_ID '{st.secrets["AZURE_TENANT_ID"]}', 54 | CLIENT_ID '{st.secrets["AZURE_CLIENT_ID"]}', 55 | CLIENT_SECRET '{st.secrets["AZURE_CLIENT_SECRET"]}' 56 | ); 57 | """) 58 | 59 | st.session_state["db"] = db 60 | 61 | return st.session_state["db"] 62 | 63 | def get_duckdb_result(query: str): 64 | 65 | if st.session_state["enable_query_parsing"]: 66 | query = _parse_query(query) 67 | 68 | if st.session_state["show_parsed_query"]: 69 | st.write("Parsed Query:") 70 | st.code(query, language="sql") 71 | 72 | db = _setup_database() 73 | 74 | return db.sql(query).df() 75 | 76 | 77 | def show_duckdb_result(): 78 | if "query" in st.session_state and st.session_state["query"] is not None: 79 | 80 | if st.session_state["query"]["text"] == "": 81 | return 82 | 83 | query = st.session_state["query"]["text"] 84 | with st.spinner("Loading..."): 85 | start_time = datetime.now(timezone.utc) 86 | result = get_duckdb_result(query) 87 | end_time = datetime.now(timezone.utc) 88 | 89 | st.dataframe( 90 | result, 91 | use_container_width=True, 92 | hide_index=False 93 | ) 94 | if st.session_state["show_query_time"]: 95 | st.status(f"Query completed in {end_time - start_time}", state="complete", expanded=False) -------------------------------------------------------------------------------- /modules/sidebar.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | 3 | def load_sidebar(): 4 | with st.sidebar: 5 | st.write("Settings") 6 | 7 | st.checkbox( 8 | "Enable query parsing", 9 | value=True, 10 | key="enable_query_parsing", 11 | help="""Enables parsing `"catalog"."schema"."table"` to `delta_scan('abfss://{root_path}/{catalog}/{schema}/{table}')` for all `FROM` and `JOIN` statements.""" 12 | ) 13 | st.checkbox("Show parsed query", value=st.session_state["enable_query_parsing"], key="show_parsed_query", help="Shows the parsed query when running a query.", disabled=not st.session_state["enable_query_parsing"]) 14 | 15 | st.checkbox( 16 | label="Show query time", 17 | key="show_query_time", 18 | value=True, 19 | ) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | duckdb==1.0.0 2 | streamlit==1.36.0 3 | streamlit-code-editor==0.1.21 --------------------------------------------------------------------------------