├── .gitignore
├── README.md
├── dbt
    ├── .gitignore
    ├── requirments.txt
    └── stackoverflowsurvey
    │   ├── .gitignore
    │   ├── .user.yml
    │   ├── README.md
    │   ├── analyses
    │       └── .gitkeep
    │   ├── dbt_project.yml
    │   ├── macros
    │       └── .gitkeep
    │   ├── models
    │       ├── schema.yml
    │       ├── source.yml
    │       └── survey_results.sql
    │   ├── profiles.yml
    │   ├── seeds
    │       └── .gitkeep
    │   ├── snapshots
    │       └── .gitkeep
    │   └── tests
    │       └── .gitkeep
├── duckdb
    └── README.md
├── images
    ├── architecture.png
    ├── superset_dashboard.png
    ├── superset_duckdb_connection.png
    └── superset_duckdb_connection_advanced_config.png
└── superset
    ├── docker-compose.yml
    └── docker
        ├── .env
        ├── README.md
        ├── docker-bootstrap.sh
        ├── docker-ci.sh
        ├── docker-entrypoint-initdb.d
            └── examples-init.sh
        ├── docker-frontend.sh
        ├── docker-init.sh
        ├── frontend-mem-nag.sh
        ├── pythonpath_dev
            ├── .gitignore
            ├── superset_config.py
            └── superset_config_local.example
        ├── requirements-local.txt
        └── run-server.sh


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea
2 | data


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Building a robust yet simple data analytics platform with DuckDB, dbt, Iceberg, and Superset
  2 | Modern analytics platforms require robust data storage, transformation, and management tools. DuckDB provides a simple, high-performance, columnar analytical database. DBT simplifies data transformation and modeling, and Iceberg offers scalable data lake management capabilities. Combining these tools can create a powerful and flexible analytics platform.
  3 | 
  4 | ![architecture.png](images%2Farchitecture.png)
  5 | 
  6 | # Understanding the tools
  7 | ## DuckDB
  8 | DuckDB is an in-memory, columnar analytical database that stands out for its speed, efficiency, and compatibility with SQL standard. Here is a more in-deepth look at its features:
  9 | - **High-performance Analytics**: DuckDB is optimized for analytical queries, making it an ideal choice for data warehousing and analytics workloads. It's in-memory storage and columnar data layout significantly boost query performance.
 10 | - **SQL Compatibility**: DuckDB supports SQL, making it accessible to analysts and data professionals who are ready familiar with SQL syntax. This compatibility allows you to leverage your existing SQL knowledge and tools.
 11 | - **Integration with BI Tools**: DuckDB integrates seamlessly with popular business intelligence (BI) tools like Tableau, Power BI, and Looker. This compatibility ensures that you can visualize and report on your data effectively.
 12 | 
 13 | ## DBT
 14 | dbt, which stands for Data Build Tool, is a command-line tool that revolutionizes the way data transformations and modeling are done. Here's a deeper dive into dbt's capabilities:
 15 | - **Modular Data Transformations**: dbt uses SQL and YAML files to define data transformations and models. This modular approach allows you to break down complex transformations into smaller, more manageable pieces, enhancing mantainability and version control.
 16 | - **Data Testing**: dbt facilitates data testing by allowing you to define expectations about your data. It helps ensure data quality by automatically running tests against your transformed data.
 17 | - **Version Control**: dbt projects can be version controlled with tools like Git, enabling collaboration among data professionals while keeping a history of changes.
 18 | - **Incremental Builds**: dbt supports incremental builds, meaning it only processes data that has changed since the last run. This feature saves time and resources when working with large datasets.
 19 | - **Orchestration**: While dbt focuses on data transformations and modeling, it can be integrated with orchestration tools like Apache Airflow or dbt Cloud to create automated data pipelines.
 20 | 
 21 | ## Iceberg
 22 | Iceberg is a table format designed for managing data lakes, offering several key features to ensure data quality and scalability:
 23 | - **Schema Evoluation**: One of Iceberg's standout features is its support for schema evolution. You can add, delete, or modify columns in your datasets without breaking existing queries or data integrity. This makes it suitable for rapidly evolving data lakes.
 24 | - **ACID Transformations**: Iceberg provides ACID (Atomicity, Consistency, Isolation, Durability) transactions, ensuring data consistency and reliability in multi-user and multi-write environments.
 25 | - **Time-Travel Capabilities**: Iceberg allows you to query historical versions of your data, making it possible to recover from data errors or analyze changes over time.
 26 | - **Optimized File Storage**: Iceberg optimizes file storage by using techniques like metadata management, partitioning, and file pruning. This results in efficient data storage and retrieval.
 27 | - **Connectivity**: Iceberg supports various storage connectors, including Apache Hadoop HDFS, Amazon S3, and Azure Data Lake Storage, making it versatile and compatible with different data lake platforms.
 28 | 
 29 | > NOTE: *Iceberg is not currently utilized in this showcase, but it will be added soon.*
 30 | ## Apache Superset
 31 | Apache Superset is a modern, open-source BI tool that enables data exploration, visualization, and interactive dashboards. It connects to various data sources and is designed to empower users to explore data and create dynamic reports.
 32 | - **Data Visualization**: Apache Superset allows users to create interactive visualizations, including charts, graphs, and geographic maps, to explore and understand data.
 33 | - **Dashboard Creation**: Users can build dynamic dashboards by combining multiple visualizations and applying filters for real-time data exploration.
 34 | - **Connectivity**: Apache Superset can connect to various data sources, including SQL databases, data lakes, and cloud storage, making it adaptable to diverse data ecosystems.
 35 | - **Security**: It offers robust security features, including role-based access control and integration with authentication providers, ensuring data is accessed securely.
 36 | - **Community and Extensibility**: As an open-source project, Apache Superset benefits from a vibrant community that contributes plugins, connectors, and additional features, enhancing its capabilities.
 37 | - **SQL Support**: Superset supports SQL queries, allowing users to execute custom queries and create complex calculated fields.
 38 | 
 39 | # Setting up DuckDB, dbt, Superset with Docker Compose
 40 | ## Setting up DuckDB
 41 | DuckDB will be installed as a library with dbt and Superset in the next session.
 42 | 
 43 | ## Setting up dbt
 44 | Firstly, We need to install *dbt-core* and *dbt-duckdb* libraries, then init a dbt project.
 45 | ```bash
 46 | # create a virtual environment
 47 | cd dbt
 48 | python -m venv .env
 49 | source .env/bin/activate
 50 | 
 51 | # install libraries: dbt-core and dbt-duckdb
 52 | pip install -r requirements.txt
 53 | 
 54 | # check version
 55 | dbt --version
 56 | ```
 57 | 
 58 | Then we initialize a dbt project with the name *stackoverflowsurvey* and create a *profiles.yml* with the following content:
 59 | ```yaml
 60 | stackoverflow:
 61 |   target: dev
 62 |   outputs:
 63 |     dev:
 64 |       type: duckdb
 65 |       path: '/data/duckdb/stackoverflow.duckdb' # path to local DuckDB database file
 66 | ```
 67 | 
 68 | Run the following commands to properly check configuration:
 69 | ```bash
 70 | # We must specify the directory of the 'profiles.yml' file since we are not using the default location.
 71 | dbt debug --profiles-dir .
 72 | ```
 73 | 
 74 | ### Setting up Superset
 75 | Run following commands to set up the Superset service:
 76 | ```bash
 77 | cd superset
 78 | # run docker compose command to start services of the Superset
 79 | # the libraries declared in 'requirements-local.txt' file will also be installed too (including duckdb-engine)
 80 | docker-compose up --detach
 81 | ```
 82 | 
 83 | Visit *http://localhost:8088* to access the Superset UI. Enter **admin** as username and password. Choose **DuckDB** from the supported databases drop-down. Then set up a connection to DuckDB database.
 84 | 
 85 | <div align="center">
 86 |     <table >
 87 |         <tr>
 88 |             <td><img src="images/superset_duckdb_connection.png" /></td>
 89 |             <td><img src="images/superset_duckdb_connection_advanced_config.png" /></td>
 90 |         </tr>
 91 |     </table>
 92 | </div>
 93 | 
 94 | > **NOTE**: Provide path to a duckdb database on disk in the url, e.g., *duckdb:////Users/whoever/path/to/duck.db*.
 95 | 
 96 | We combine the DuckDB database path file exposed in *superset/docker/docker-compose.yml* file
 97 | ```bash
 98 | x-superset-volumes:
 99 |   &superset-volumes
100 |   - /data/duckdb:/app/duckdb
101 | ```
102 | with the DuckDB database name defined in *dbt/stackoverflowsurvey/profiles.yml*.
103 | ```yaml
104 | path: '/data/duckdb/stackoverflow.duckdb'
105 | ```
106 | So, below, we have the final URI to establish a connection between Superset and DuckDB:
107 | ```bash
108 | duckdb:///duckdb/stackoverflow.duckdb
109 | ```
110 | 
111 | With Superset, the engine needs to be configured to open DuckDB in “read-only” mode. Otherwise, only one query can run at a time (simultaneous queries will cause locks). This also prevents refreshing the Superset dashboard while the pipeline is running.
112 | 
113 | # Loading source
114 | In this showcase, we are using the [Stack Overflow Annual Developer Survey](https://insights.stackoverflow.com/survey) data set. To simplify maters, we will focus solely on the [2023](https://cdn.stackoverflow.co/files/jo7n4k8s/production/49915bfd46d0902c3564fd9a06b509d08a20488c.zip/stack-overflow-developer-survey-2023.zip) data set, which needs to be manually downloaded and extracted into the *PROJECT_HOME/data* directory.
115 | 
116 | # Building models with dbt
117 | ## Defining data source
118 | We declare the data source in *stackoverflowsurvey/models/source.yml* file with following content:
119 | ```yaml
120 | sources:
121 |   - name: stackoverflow_survey_source
122 |     tables:
123 |       - name: surveys
124 |         meta:
125 |           external_location: "read_csv('../../data/survey_results_public.csv', AUTO_DETECT=TRUE)" # automatically parser and detect schema
126 |           formatter: oldstyle
127 | ```
128 | ## Building models
129 | For demonstration purposes only, we have created a very simple model with the following content:
130 | ```sql
131 | {{ config(materialized='table') }}
132 | 
133 | SELECT *
134 | FROM {{ source('stackoverflow_survey_source', 'surveys')}}
135 | ```
136 | # Connecting Superset
137 | Once the dbt models are built, the data visualization can begin. An admin user must be created in superset in order to log in.
138 | 
139 | ![superset_dashboard.png](images%2Fsuperset_dashboard.png)
140 | 
141 | # Conclusion
142 | In this comprehensive guide, we've demonstrated how to construct a sophisticated analytics platform that leverages the combined power of DuckDB, DBT, Iceberg, and Apache Superset. This platform empowers organizations to seamlessly ingest, transform, manage, visualize, and analyze data to extract actionable insights.
143 | Key Components:
144 | - **DuckDB**: Our high-performance, SQL-compatible, in-memory database serves as the foundation for efficient data storage and retrieval, enabling lightning-fast analytical queries.
145 | - **dbt**: DBT simplifies data transformation and modeling, allowing for the creation of modular, version-controlled data pipelines that enhance data quality and maintainability.
146 | - **Iceberg**: Iceberg manages data lakes with ease, offering schema evolution, ACID transactions, and time-travel capabilities, ensuring data integrity and scalability in large-scale analytics environments.
147 | - **Apache Superset**: Apache Superset enhances the platform by providing a modern, open-source BI tool for data exploration, visualization, and interactive dashboard creation. Its connectivity options, security features, and SQL support empower users to gain insights from data with ease.
148 | 
149 | Together, these tools create a powerful and flexible analytics platform, enabling organizations to navigate the data landscape with confidence, derive valuable insights, and make informed decisions. Whether you're dealing with structured or unstructured data, this platform equips you with the tools needed to turn raw data into actionable intelligence, driving business success and innovation.
150 | 
151 | ## Supporting Links
152 | * <a href="https://insights.stackoverflow.com/survey" target="_blank">Stack Overflow Annual Developer Survey</a>
153 | * <a href="https://duckdb.org/2022/10/12/modern-data-stack-in-a-box.html" target="_blank">Modern Data Stack in a Box with DuckDB</a>
154 | * <a href="https://github.com/jwills/dbt-duckdb" target="_blank">dbt adapter for DuckDB</a>
155 | 
156 | 
157 | 
158 | 
159 | 
160 | 
161 | 


--------------------------------------------------------------------------------
/dbt/.gitignore:
--------------------------------------------------------------------------------
1 | logs
2 | .env


--------------------------------------------------------------------------------
/dbt/requirments.txt:
--------------------------------------------------------------------------------
1 | dbt-core==1.6.0
2 | dbt-duckdb==1.6.0


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/.gitignore:
--------------------------------------------------------------------------------
1 | 
2 | target/
3 | dbt_packages/
4 | logs/
5 | stackoverflow.*
6 | 


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/.user.yml:
--------------------------------------------------------------------------------
1 | id: 1ea6d26b-1f7f-44a4-897d-43d71b80b184
2 | 


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/README.md:
--------------------------------------------------------------------------------
 1 | Welcome to your new dbt project!
 2 | 
 3 | ### Using the starter project
 4 | 
 5 | Try running the following commands:
 6 | - dbt run
 7 | - dbt test
 8 | 
 9 | 
10 | ### Resources:
11 | - Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
12 | - Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
13 | - Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
14 | - Find [dbt events](https://events.getdbt.com) near you
15 | - Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices
16 | 


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/analyses/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luatnc87/robust-data-analytics-platform-with-duckdb-dbt-iceberg/9e1ec87291ac3b7193bdc5b8193c01491e14bfdf/dbt/stackoverflowsurvey/analyses/.gitkeep


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/dbt_project.yml:
--------------------------------------------------------------------------------
 1 | 
 2 | # Name your project! Project names should contain only lowercase characters
 3 | # and underscores. A good package name should reflect your organization's
 4 | # name or the intended use of these models
 5 | name: 'stackoverflow'
 6 | version: '1.0.0'
 7 | config-version: 2
 8 | 
 9 | # This setting configures which "profile" dbt uses for this project.
10 | profile: 'stackoverflow'
11 | 
12 | # These configurations specify where dbt should look for different types of files.
13 | # The `model-paths` config, for example, states that models in this project can be
14 | # found in the "models/" directory. You probably won't need to change these!
15 | model-paths: ["models"]
16 | analysis-paths: ["analyses"]
17 | test-paths: ["tests"]
18 | seed-paths: ["seeds"]
19 | macro-paths: ["macros"]
20 | snapshot-paths: ["snapshots"]
21 | 
22 | clean-targets:         # directories to be removed by `dbt clean`
23 |   - "target"
24 |   - "dbt_packages"
25 | 
26 | 
27 | # Configuring models
28 | # Full documentation: https://docs.getdbt.com/docs/configuring-models
29 | 
30 | # In this example config, we tell dbt to build all models in the example/
31 | # directory as views. These settings can be overridden in the individual model
32 | # files using the `{{ config(...) }}` macro.
33 | models:
34 |   stackoverflow:
35 |     # Config indicated by + and applies to all files under models/example/
36 |     example:
37 |       +materialized: view
38 | 


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/macros/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luatnc87/robust-data-analytics-platform-with-duckdb-dbt-iceberg/9e1ec87291ac3b7193bdc5b8193c01491e14bfdf/dbt/stackoverflowsurvey/macros/.gitkeep


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/models/schema.yml:
--------------------------------------------------------------------------------
 1 | version: 2
 2 | 
 3 | models:
 4 |   - name: survey_results
 5 |     description: "The result of StackOverflow's survey in 2023."
 6 |     columns:
 7 |       - name: ResponseId
 8 |         description: "The response Id"
 9 |         tests:
10 |           - unique
11 |           - not_null
12 |       - name: Q120
13 |       - name: MainBranch
14 |       - name: Age
15 |       - name: Employment
16 |       - name: RemoteWork
17 |       - name: CodingActivities
18 |       - name: EdLevel
19 |       - name: LearnCode
20 |       - name: LearnCodeOnline
21 |       - name: LearnCodeCoursesCert
22 |       - name: YearsCode
23 |       - name: YearsCodePro
24 |       - name: DevType
25 |       - name: OrgSize
26 |       - name: PurchaseInfluence
27 |       - name: TechList
28 |       - name: BuyNewTool
29 |       - name: Country
30 |       - name: Currency
31 |       - name: CompTotal
32 |       - name: LanguageHaveWorkedWith
33 |       - name: LanguageWantToWorkWith
34 |       - name: DatabaseHaveWorkedWith
35 |       - name: DatabaseWantToWorkWith
36 |       - name: PlatformHaveWorkedWith
37 |       - name: PlatformWantToWorkWith
38 |       - name: WebframeHaveWorkedWith
39 |       - name: WebframeWantToWorkWith
40 |       - name: MiscTechHaveWorkedWith
41 |       - name: MiscTechWantToWorkWith
42 |       - name: ToolsTechHaveWorkedWith
43 |       - name: ToolsTechWantToWorkWith
44 |       - name: NEWCollabToolsHaveWorkedWith
45 |       - name: NEWCollabToolsWantToWorkWith
46 |       - name: OpSysPersonal use
47 |       - name: OpSysProfessional use
48 |       - name: OfficeStackAsyncHaveWorkedWith
49 |       - name: OfficeStackAsyncWantToWorkWith
50 |       - name: OfficeStackSyncHaveWorkedWith
51 |       - name: OfficeStackSyncWantToWorkWith
52 |       - name: AISearchHaveWorkedWith
53 |       - name: AISearchWantToWorkWith
54 |       - name: AIDevHaveWorkedWith
55 |       - name: AIDevWantToWorkWith
56 |       - name: NEWSOSites
57 |       - name: SOVisitFreq
58 |       - name: SOAccount
59 |       - name: SOPartFreq
60 |       - name: SOComm
61 |       - name: SOAI
62 |       - name: AISelect
63 |       - name: AISent
64 |       - name: AIAcc
65 |       - name: AIBen
66 |       - name: AIToolInterested in Using
67 |       - name: AIToolCurrently Using
68 |       - name: AIToolNot interested in Using
69 |       - name: AINextVery different
70 |       - name: AINextNeither different nor similar
71 |       - name: AINextSomewhat similar
72 |       - name: AINextVery similar
73 |       - name: AINextSomewhat different
74 |       - name: TBranch
75 |       - name: ICorPM
76 |       - name: WorkExp
77 |       - name: Knowledge_1
78 |       - name: Knowledge_2
79 |       - name: Knowledge_3
80 |       - name: Knowledge_4
81 |       - name: Knowledge_5
82 |       - name: Knowledge_6
83 |       - name: Knowledge_7
84 |       - name: Knowledge_8
85 |       - name: Frequency_1
86 |       - name: Frequency_2
87 |       - name: Frequency_3
88 |       - name: TimeSearching
89 |       - name: TimeAnswering
90 |       - name: ProfessionalTech
91 |       - name: Industry
92 |       - name: SurveyLength
93 |       - name: SurveyEase
94 |       - name: ConvertedCompYearly
95 | 


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/models/source.yml:
--------------------------------------------------------------------------------
1 | sources:
2 |   - name: stackoverflow_survey_source
3 |     tables:
4 |       - name: surveys
5 |         meta:
6 |           external_location: "read_csv('../../data/survey_results_public.csv', AUTO_DETECT=TRUE)" # automatically parser and detect schema
7 |           formatter: oldstyle


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/models/survey_results.sql:
--------------------------------------------------------------------------------
1 | {{ config(materialized='table') }}
2 | 
3 | SELECT *
4 | FROM {{ source('stackoverflow_survey_source', 'surveys')}}


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/profiles.yml:
--------------------------------------------------------------------------------
1 | stackoverflow:
2 |   target: dev
3 |   outputs:
4 |     dev:
5 |       type: duckdb
6 |       path: '/data/duckdb/stackoverflow.duckdb' # path to local DuckDB database file


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/seeds/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luatnc87/robust-data-analytics-platform-with-duckdb-dbt-iceberg/9e1ec87291ac3b7193bdc5b8193c01491e14bfdf/dbt/stackoverflowsurvey/seeds/.gitkeep


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/snapshots/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luatnc87/robust-data-analytics-platform-with-duckdb-dbt-iceberg/9e1ec87291ac3b7193bdc5b8193c01491e14bfdf/dbt/stackoverflowsurvey/snapshots/.gitkeep


--------------------------------------------------------------------------------
/dbt/stackoverflowsurvey/tests/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luatnc87/robust-data-analytics-platform-with-duckdb-dbt-iceberg/9e1ec87291ac3b7193bdc5b8193c01491e14bfdf/dbt/stackoverflowsurvey/tests/.gitkeep


--------------------------------------------------------------------------------
/duckdb/README.md:
--------------------------------------------------------------------------------
1 | > **NOTE**: DuckDB will be used as an embedded library installed with dbt project and Superset.


--------------------------------------------------------------------------------
/images/architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luatnc87/robust-data-analytics-platform-with-duckdb-dbt-iceberg/9e1ec87291ac3b7193bdc5b8193c01491e14bfdf/images/architecture.png


--------------------------------------------------------------------------------
/images/superset_dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luatnc87/robust-data-analytics-platform-with-duckdb-dbt-iceberg/9e1ec87291ac3b7193bdc5b8193c01491e14bfdf/images/superset_dashboard.png


--------------------------------------------------------------------------------
/images/superset_duckdb_connection.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luatnc87/robust-data-analytics-platform-with-duckdb-dbt-iceberg/9e1ec87291ac3b7193bdc5b8193c01491e14bfdf/images/superset_duckdb_connection.png


--------------------------------------------------------------------------------
/images/superset_duckdb_connection_advanced_config.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luatnc87/robust-data-analytics-platform-with-duckdb-dbt-iceberg/9e1ec87291ac3b7193bdc5b8193c01491e14bfdf/images/superset_duckdb_connection_advanced_config.png


--------------------------------------------------------------------------------
/superset/docker-compose.yml:
--------------------------------------------------------------------------------
  1 | #
  2 | # Licensed to the Apache Software Foundation (ASF) under one or more
  3 | # contributor license agreements.  See the NOTICE file distributed with
  4 | # this work for additional information regarding copyright ownership.
  5 | # The ASF licenses this file to You under the Apache License, Version 2.0
  6 | # (the "License"); you may not use this file except in compliance with
  7 | # the License.  You may obtain a copy of the License at
  8 | #
  9 | #    http://www.apache.org/licenses/LICENSE-2.0
 10 | #
 11 | # Unless required by applicable law or agreed to in writing, software
 12 | # distributed under the License is distributed on an "AS IS" BASIS,
 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 14 | # See the License for the specific language governing permissions and
 15 | # limitations under the License.
 16 | #
 17 | x-superset-image: &superset-image apachesuperset.docker.scarf.sh/apache/superset:latest
 18 | x-superset-depends-on: &superset-depends-on
 19 |   - db
 20 |   - redis
 21 | x-superset-volumes:
 22 |   &superset-volumes # /app/pythonpath_docker will be appended to the PYTHONPATH in the final container
 23 |   - ./docker:/app/docker
 24 |   - superset_home:/app/superset_home
 25 |   - /data/duckdb:/app/duckdb
 26 | 
 27 | version: "3.7"
 28 | services:
 29 |   redis:
 30 |     image: redis:7
 31 |     container_name: superset_cache
 32 |     restart: unless-stopped
 33 |     volumes:
 34 |       - redis:/data
 35 |     networks:
 36 |       - osmds_internal
 37 | 
 38 |   db:
 39 |     env_file: docker/.env
 40 |     image: postgres:15
 41 |     container_name: superset_db
 42 |     restart: unless-stopped
 43 |     volumes:
 44 |       - db_home:/var/lib/postgresql/data
 45 |       - ./docker/docker-entrypoint-initdb.d:/docker-entrypoint-initdb.d
 46 |     networks:
 47 |       - osmds_internal
 48 | 
 49 |   superset:
 50 |     env_file: docker/.env
 51 |     image: *superset-image
 52 |     container_name: superset_app
 53 |     command: ["/app/docker/docker-bootstrap.sh", "app-gunicorn"]
 54 |     user: "root"
 55 |     restart: unless-stopped
 56 |     ports:
 57 |       - 8088:8088
 58 |     depends_on: *superset-depends-on
 59 |     volumes: *superset-volumes
 60 |     networks:
 61 |       - osmds_internal
 62 | 
 63 |   superset-init:
 64 |     image: *superset-image
 65 |     container_name: superset_init
 66 |     command: ["/app/docker/docker-init.sh"]
 67 |     env_file: docker/.env
 68 |     depends_on: *superset-depends-on
 69 |     user: "root"
 70 |     volumes: *superset-volumes
 71 |     healthcheck:
 72 |       disable: true
 73 |     networks:
 74 |       - osmds_internal
 75 | 
 76 |   superset-worker:
 77 |     image: *superset-image
 78 |     container_name: superset_worker
 79 |     command: ["/app/docker/docker-bootstrap.sh", "worker"]
 80 |     env_file: docker/.env
 81 |     restart: unless-stopped
 82 |     depends_on: *superset-depends-on
 83 |     user: "root"
 84 |     volumes: *superset-volumes
 85 |     healthcheck:
 86 |       test:
 87 |         [
 88 |           "CMD-SHELL",
 89 |           "celery -A superset.tasks.celery_app:app inspect ping -d celery@$$HOSTNAME",
 90 |         ]
 91 |     networks:
 92 |       - osmds_internal
 93 | 
 94 |   superset-worker-beat:
 95 |     image: *superset-image
 96 |     container_name: superset_worker_beat
 97 |     command: ["/app/docker/docker-bootstrap.sh", "beat"]
 98 |     env_file: docker/.env
 99 |     restart: unless-stopped
100 |     depends_on: *superset-depends-on
101 |     user: "root"
102 |     volumes: *superset-volumes
103 |     healthcheck:
104 |       disable: true
105 |     networks:
106 |       - osmds_internal
107 | 
108 | volumes:
109 |   superset_home:
110 |     external: false
111 |   db_home:
112 |     external: false
113 |   redis:
114 |     external: false
115 | 
116 | networks:
117 |   osmds_internal:
118 |     external: true


--------------------------------------------------------------------------------
/superset/docker/.env:
--------------------------------------------------------------------------------
 1 | #
 2 | # Licensed to the Apache Software Foundation (ASF) under one or more
 3 | # contributor license agreements.  See the NOTICE file distributed with
 4 | # this work for additional information regarding copyright ownership.
 5 | # The ASF licenses this file to You under the Apache License, Version 2.0
 6 | # (the "License"); you may not use this file except in compliance with
 7 | # the License.  You may obtain a copy of the License at
 8 | #
 9 | #    http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 | #
17 | COMPOSE_PROJECT_NAME=superset
18 | 
19 | # database configurations (do not modify)
20 | DATABASE_DB=superset
21 | DATABASE_HOST=db
22 | DATABASE_PASSWORD=superset
23 | DATABASE_USER=superset
24 | DATABASE_PORT=5432
25 | DATABASE_DIALECT=postgresql
26 | 
27 | EXAMPLES_DB=examples
28 | EXAMPLES_HOST=db
29 | EXAMPLES_USER=examples
30 | EXAMPLES_PASSWORD=examples
31 | EXAMPLES_PORT=5432
32 | 
33 | # database engine specific environment variables
34 | # change the below if you prefer another database engine
35 | POSTGRES_DB=superset
36 | POSTGRES_USER=superset
37 | POSTGRES_PASSWORD=superset
38 | #MYSQL_DATABASE=superset
39 | #MYSQL_USER=superset
40 | #MYSQL_PASSWORD=superset
41 | #MYSQL_RANDOM_ROOT_PASSWORD=yes
42 | 
43 | # Add the mapped in /app/pythonpath_docker which allows devs to override stuff
44 | PYTHONPATH=/app/pythonpath:/app/docker/pythonpath_dev
45 | REDIS_HOST=redis
46 | REDIS_PORT=6379
47 | 
48 | SUPERSET_ENV=production
49 | SUPERSET_LOAD_EXAMPLES=yes
50 | SUPERSET_SECRET_KEY=TEST_NON_DEV_SECRET
51 | CYPRESS_CONFIG=false
52 | SUPERSET_PORT=8088
53 | MAPBOX_API_KEY=''
54 | 


--------------------------------------------------------------------------------
/superset/docker/README.md:
--------------------------------------------------------------------------------
 1 | <!--
 2 | Licensed to the Apache Software Foundation (ASF) under one
 3 | or more contributor license agreements.  See the NOTICE file
 4 | distributed with this work for additional information
 5 | regarding copyright ownership.  The ASF licenses this file
 6 | to you under the Apache License, Version 2.0 (the
 7 | "License"); you may not use this file except in compliance
 8 | with the License.  You may obtain a copy of the License at
 9 | 
10 |   http://www.apache.org/licenses/LICENSE-2.0
11 | 
12 | Unless required by applicable law or agreed to in writing,
13 | software distributed under the License is distributed on an
14 | "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 | KIND, either express or implied.  See the License for the
16 | specific language governing permissions and limitations
17 | under the License.
18 | -->
19 | 
20 | # Getting Started with Superset using Docker
21 | 
22 | Docker is an easy way to get started with Superset.
23 | 
24 | ## Prerequisites
25 | 
26 | 1. [Docker](https://www.docker.com/get-started)
27 | 2. [Docker Compose](https://docs.docker.com/compose/install/)
28 | 
29 | ## Configuration
30 | 
31 | The `/app/pythonpath` folder is mounted from [`./docker/pythonpath_dev`](./pythonpath_dev)
32 | which contains a base configuration [`./docker/pythonpath_dev/superset_config.py`](./pythonpath_dev/superset_config.py)
33 | intended for use with local development.
34 | 
35 | ### Local overrides
36 | 
37 | In order to override configuration settings locally, simply make a copy of [`./docker/pythonpath_dev/superset_config_local.example`](./pythonpath_dev/superset_config_local.example)
38 | into `./docker/pythonpath_dev/superset_config_docker.py` (git ignored) and fill in your overrides.
39 | 
40 | ### Local packages
41 | 
42 | If you want to add Python packages in order to test things like databases locally, you can simply add a local requirements.txt (`./docker/requirements-local.txt`)
43 | and rebuild your Docker stack.
44 | 
45 | Steps:
46 | 
47 | 1. Create `./docker/requirements-local.txt`
48 | 2. Add your new packages
49 | 3. Rebuild docker-compose
50 |     1. `docker-compose down -v`
51 |     2. `docker-compose up`
52 | 
53 | ## Initializing Database
54 | 
55 | The database will initialize itself upon startup via the init container ([`superset-init`](./docker-init.sh)). This may take a minute.
56 | 
57 | ## Normal Operation
58 | 
59 | To run the container, simply run: `docker-compose up`
60 | 
61 | After waiting several minutes for Superset initialization to finish, you can open a browser and view [`http://localhost:8088`](http://localhost:8088)
62 | to start your journey.
63 | 
64 | ## Developing
65 | 
66 | While running, the container server will reload on modification of the Superset Python and JavaScript source code.
67 | Don't forget to reload the page to take the new frontend into account though.
68 | 
69 | ## Production
70 | 
71 | It is possible to run Superset in non-development mode by using [`docker-compose-non-dev.yml`](../docker-compose-non-dev.yml). This file excludes the volumes needed for development and uses [`./docker/.env-non-dev`](./.env-non-dev) which sets the variable `SUPERSET_ENV` to `production`.
72 | 
73 | ## Resource Constraints
74 | 
75 | If you are attempting to build on macOS and it exits with 137 you need to increase your Docker resources. See instructions [here](https://docs.docker.com/docker-for-mac/#advanced) (search for memory)
76 | 


--------------------------------------------------------------------------------
/superset/docker/docker-bootstrap.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | #
 3 | # Licensed to the Apache Software Foundation (ASF) under one or more
 4 | # contributor license agreements.  See the NOTICE file distributed with
 5 | # this work for additional information regarding copyright ownership.
 6 | # The ASF licenses this file to You under the Apache License, Version 2.0
 7 | # (the "License"); you may not use this file except in compliance with
 8 | # the License.  You may obtain a copy of the License at
 9 | #
10 | #    http://www.apache.org/licenses/LICENSE-2.0
11 | #
12 | # Unless required by applicable law or agreed to in writing, software
13 | # distributed under the License is distributed on an "AS IS" BASIS,
14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | # See the License for the specific language governing permissions and
16 | # limitations under the License.
17 | #
18 | 
19 | set -eo pipefail
20 | 
21 | REQUIREMENTS_LOCAL="/app/docker/requirements-local.txt"
22 | # If Cypress run – overwrite the password for admin and export env variables
23 | if [ "$CYPRESS_CONFIG" == "true" ]; then
24 |     export SUPERSET_CONFIG=tests.integration_tests.superset_test_config
25 |     export SUPERSET_TESTENV=true
26 |     export SUPERSET__SQLALCHEMY_DATABASE_URI=postgresql+psycopg2://superset:superset@db:5432/superset
27 | fi
28 | #
29 | # Make sure we have dev requirements installed
30 | #
31 | if [ -f "${REQUIREMENTS_LOCAL}" ]; then
32 |   echo "Installing local overrides at ${REQUIREMENTS_LOCAL}"
33 |   pip install --no-cache-dir -r "${REQUIREMENTS_LOCAL}"
34 | else
35 |   echo "Skipping local overrides"
36 | fi
37 | 
38 | case "${1}" in
39 |   worker)
40 |     echo "Starting Celery worker..."
41 |     celery --app=superset.tasks.celery_app:app worker -O fair -l INFO
42 |     ;;
43 |   beat)
44 |     echo "Starting Celery beat..."
45 |     rm -f /tmp/celerybeat.pid
46 |     celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid -l INFO -s "${SUPERSET_HOME}"/celerybeat-schedule
47 |     ;;
48 |   app)
49 |     echo "Starting web app (using development server)..."
50 |     flask run -p 8088 --with-threads --reload --debugger --host=0.0.0.0
51 |     ;;
52 |   app-gunicorn)
53 |     echo "Starting web app..."
54 |     /usr/bin/run-server.sh
55 |     ;;
56 |   *)
57 |     echo "Unknown Operation!!!"
58 |     ;;
59 | esac
60 | 


--------------------------------------------------------------------------------
/superset/docker/docker-ci.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | #
 3 | # Licensed to the Apache Software Foundation (ASF) under one or more
 4 | # contributor license agreements.  See the NOTICE file distributed with
 5 | # this work for additional information regarding copyright ownership.
 6 | # The ASF licenses this file to You under the Apache License, Version 2.0
 7 | # (the "License"); you may not use this file except in compliance with
 8 | # the License.  You may obtain a copy of the License at
 9 | #
10 | #    http://www.apache.org/licenses/LICENSE-2.0
11 | #
12 | # Unless required by applicable law or agreed to in writing, software
13 | # distributed under the License is distributed on an "AS IS" BASIS,
14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | # See the License for the specific language governing permissions and
16 | # limitations under the License.
17 | #
18 | /app/docker/docker-init.sh
19 | 
20 | # TODO: copy config overrides from ENV vars
21 | 
22 | # TODO: run celery in detached state
23 | export SERVER_THREADS_AMOUNT=8
24 | # start up the web server
25 | 
26 | /usr/bin/run-server.sh
27 | 


--------------------------------------------------------------------------------
/superset/docker/docker-entrypoint-initdb.d/examples-init.sh:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------
 2 | # Creates the examples database and repective user. This database location
 3 | # and access credentials are defined on the environment variables
 4 | # ------------------------------------------------------------------------
 5 | set -e
 6 | 
 7 | psql -v ON_ERROR_STOP=1 --username "${POSTGRES_USER}" <<-EOSQL
 8 |   CREATE USER ${EXAMPLES_USER} WITH PASSWORD '${EXAMPLES_PASSWORD}';
 9 |   CREATE DATABASE ${EXAMPLES_DB};
10 |   GRANT ALL PRIVILEGES ON DATABASE ${EXAMPLES_DB} TO ${EXAMPLES_USER};
11 | EOSQL
12 | 
13 | psql -v ON_ERROR_STOP=1 --username "${POSTGRES_USER}" -d "${EXAMPLES_DB}" <<-EOSQL
14 |    GRANT ALL ON SCHEMA public TO ${EXAMPLES_USER};
15 | EOSQL
16 | 


--------------------------------------------------------------------------------
/superset/docker/docker-frontend.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | #
 3 | # Licensed to the Apache Software Foundation (ASF) under one or more
 4 | # contributor license agreements.  See the NOTICE file distributed with
 5 | # this work for additional information regarding copyright ownership.
 6 | # The ASF licenses this file to You under the Apache License, Version 2.0
 7 | # (the "License"); you may not use this file except in compliance with
 8 | # the License.  You may obtain a copy of the License at
 9 | #
10 | #    http://www.apache.org/licenses/LICENSE-2.0
11 | #
12 | # Unless required by applicable law or agreed to in writing, software
13 | # distributed under the License is distributed on an "AS IS" BASIS,
14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | # See the License for the specific language governing permissions and
16 | # limitations under the License.
17 | #
18 | set -e
19 | 
20 | # Packages needed for puppeteer:
21 | apt update
22 | apt install -y chromium
23 | 
24 | cd /app/superset-frontend
25 | npm install -f --no-optional --global webpack webpack-cli
26 | npm install -f --no-optional
27 | 
28 | echo "Running frontend"
29 | npm run dev
30 | 


--------------------------------------------------------------------------------
/superset/docker/docker-init.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | #
 3 | # Licensed to the Apache Software Foundation (ASF) under one or more
 4 | # contributor license agreements.  See the NOTICE file distributed with
 5 | # this work for additional information regarding copyright ownership.
 6 | # The ASF licenses this file to You under the Apache License, Version 2.0
 7 | # (the "License"); you may not use this file except in compliance with
 8 | # the License.  You may obtain a copy of the License at
 9 | #
10 | #    http://www.apache.org/licenses/LICENSE-2.0
11 | #
12 | # Unless required by applicable law or agreed to in writing, software
13 | # distributed under the License is distributed on an "AS IS" BASIS,
14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | # See the License for the specific language governing permissions and
16 | # limitations under the License.
17 | #
18 | set -e
19 | 
20 | #
21 | # Always install local overrides first
22 | #
23 | /app/docker/docker-bootstrap.sh
24 | 
25 | STEP_CNT=4
26 | 
27 | echo_step() {
28 | cat <<EOF
29 | 
30 | ######################################################################
31 | 
32 | 
33 | Init Step ${1}/${STEP_CNT} [${2}] -- ${3}
34 | 
35 | 
36 | ######################################################################
37 | 
38 | EOF
39 | }
40 | ADMIN_PASSWORD="admin"
41 | # If Cypress run – overwrite the password for admin and export env variables
42 | if [ "$CYPRESS_CONFIG" == "true" ]; then
43 |     ADMIN_PASSWORD="general"
44 |     export SUPERSET_CONFIG=tests.integration_tests.superset_test_config
45 |     export SUPERSET_TESTENV=true
46 |     export SUPERSET__SQLALCHEMY_DATABASE_URI=postgresql+psycopg2://superset:superset@db:5432/superset
47 | fi
48 | # Initialize the database
49 | echo_step "1" "Starting" "Applying DB migrations"
50 | superset db upgrade
51 | echo_step "1" "Complete" "Applying DB migrations"
52 | 
53 | # Create an admin user
54 | echo_step "2" "Starting" "Setting up admin user ( admin / $ADMIN_PASSWORD )"
55 | superset fab create-admin \
56 |               --username admin \
57 |               --firstname Superset \
58 |               --lastname Admin \
59 |               --email admin@superset.com \
60 |               --password $ADMIN_PASSWORD
61 | echo_step "2" "Complete" "Setting up admin user"
62 | # Create default roles and permissions
63 | echo_step "3" "Starting" "Setting up roles and perms"
64 | superset init
65 | echo_step "3" "Complete" "Setting up roles and perms"
66 | 
67 | if [ "$SUPERSET_LOAD_EXAMPLES" = "yes" ]; then
68 |     # Load some data to play with
69 |     echo_step "4" "Starting" "Loading examples"
70 |     # If Cypress run which consumes superset_test_config – load required data for tests
71 |     if [ "$CYPRESS_CONFIG" == "true" ]; then
72 |         superset load_test_users
73 |         superset load_examples --load-test-data
74 |     else
75 |         superset load_examples --force
76 |     fi
77 |     echo_step "4" "Complete" "Loading examples"
78 | fi
79 | 


--------------------------------------------------------------------------------
/superset/docker/frontend-mem-nag.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | #
 3 | # Licensed to the Apache Software Foundation (ASF) under one or more
 4 | # contributor license agreements.  See the NOTICE file distributed with
 5 | # this work for additional information regarding copyright ownership.
 6 | # The ASF licenses this file to You under the Apache License, Version 2.0
 7 | # (the "License"); you may not use this file except in compliance with
 8 | # the License.  You may obtain a copy of the License at
 9 | #
10 | #    http://www.apache.org/licenses/LICENSE-2.0
11 | #
12 | # Unless required by applicable law or agreed to in writing, software
13 | # distributed under the License is distributed on an "AS IS" BASIS,
14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | # See the License for the specific language governing permissions and
16 | # limitations under the License.
17 | #
18 | 
19 | set -e
20 | 
21 | # We need at least 3GB of free mem...
22 | MIN_MEM_FREE_GB=3
23 | MIN_MEM_FREE_KB=$(($MIN_MEM_FREE_GB*1000000))
24 | 
25 | echo_mem_warn() {
26 |   MEM_FREE_KB=$(awk '/MemFree/ { printf "%s \n", $2 }' /proc/meminfo)
27 |   MEM_FREE_GB=$(awk '/MemFree/ { printf "%s \n", $2/1024/1024 }' /proc/meminfo)
28 | 
29 |   if [[ "${MEM_FREE_KB}" -lt "${MIN_MEM_FREE_KB}" ]]; then
30 |     cat <<EOF
31 |     ===============================================
32 |     ========  Memory Insufficient Warning =========
33 |     ===============================================
34 | 
35 |     It looks like you only have ${MEM_FREE_GB}GB of
36 |     memory free. Please increase your Docker
37 |     resources to at least ${MIN_MEM_FREE_GB}GB
38 | 
39 |     ===============================================
40 |     ========  Memory Insufficient Warning =========
41 |     ===============================================
42 | EOF
43 |   else
44 |     echo "Memory check Ok [${MEM_FREE_GB}GB free]"
45 |   fi
46 | }
47 | 
48 | # Always nag if they're low on mem...
49 | echo_mem_warn
50 | 


--------------------------------------------------------------------------------
/superset/docker/pythonpath_dev/.gitignore:
--------------------------------------------------------------------------------
 1 | #
 2 | # Licensed to the Apache Software Foundation (ASF) under one or more
 3 | # contributor license agreements.  See the NOTICE file distributed with
 4 | # this work for additional information regarding copyright ownership.
 5 | # The ASF licenses this file to You under the Apache License, Version 2.0
 6 | # (the "License"); you may not use this file except in compliance with
 7 | # the License.  You may obtain a copy of the License at
 8 | #
 9 | #    http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 | #
17 | 
18 | # Ignore everything
19 | *
20 | # DON'T ignore the .gitignore
21 | !.gitignore
22 | !superset_config.py
23 | !superset_config_local.example
24 | 


--------------------------------------------------------------------------------
/superset/docker/pythonpath_dev/superset_config.py:
--------------------------------------------------------------------------------
  1 | # Licensed to the Apache Software Foundation (ASF) under one
  2 | # or more contributor license agreements.  See the NOTICE file
  3 | # distributed with this work for additional information
  4 | # regarding copyright ownership.  The ASF licenses this file
  5 | # to you under the Apache License, Version 2.0 (the
  6 | # "License"); you may not use this file except in compliance
  7 | # with the License.  You may obtain a copy of the License at
  8 | #
  9 | #   http://www.apache.org/licenses/LICENSE-2.0
 10 | #
 11 | # Unless required by applicable law or agreed to in writing,
 12 | # software distributed under the License is distributed on an
 13 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 14 | # KIND, either express or implied.  See the License for the
 15 | # specific language governing permissions and limitations
 16 | # under the License.
 17 | #
 18 | # This file is included in the final Docker image and SHOULD be overridden when
 19 | # deploying the image to prod. Settings configured here are intended for use in local
 20 | # development environments. Also note that superset_config_docker.py is imported
 21 | # as a final step as a means to override "defaults" configured here
 22 | #
 23 | import logging
 24 | import os
 25 | 
 26 | from celery.schedules import crontab
 27 | from flask_caching.backends.filesystemcache import FileSystemCache
 28 | 
 29 | logger = logging.getLogger()
 30 | 
 31 | DATABASE_DIALECT = os.getenv("DATABASE_DIALECT")
 32 | DATABASE_USER = os.getenv("DATABASE_USER")
 33 | DATABASE_PASSWORD = os.getenv("DATABASE_PASSWORD")
 34 | DATABASE_HOST = os.getenv("DATABASE_HOST")
 35 | DATABASE_PORT = os.getenv("DATABASE_PORT")
 36 | DATABASE_DB = os.getenv("DATABASE_DB")
 37 | 
 38 | EXAMPLES_USER = os.getenv("EXAMPLES_USER")
 39 | EXAMPLES_PASSWORD = os.getenv("EXAMPLES_PASSWORD")
 40 | EXAMPLES_HOST = os.getenv("EXAMPLES_HOST")
 41 | EXAMPLES_PORT = os.getenv("EXAMPLES_PORT")
 42 | EXAMPLES_DB = os.getenv("EXAMPLES_DB")
 43 | 
 44 | # The SQLAlchemy connection string.
 45 | SQLALCHEMY_DATABASE_URI = (
 46 |     f"{DATABASE_DIALECT}://"
 47 |     f"{DATABASE_USER}:{DATABASE_PASSWORD}@"
 48 |     f"{DATABASE_HOST}:{DATABASE_PORT}/{DATABASE_DB}"
 49 | )
 50 | 
 51 | SQLALCHEMY_EXAMPLES_URI = (
 52 |     f"{DATABASE_DIALECT}://"
 53 |     f"{EXAMPLES_USER}:{EXAMPLES_PASSWORD}@"
 54 |     f"{EXAMPLES_HOST}:{EXAMPLES_PORT}/{EXAMPLES_DB}"
 55 | )
 56 | 
 57 | REDIS_HOST = os.getenv("REDIS_HOST", "redis")
 58 | REDIS_PORT = os.getenv("REDIS_PORT", "6379")
 59 | REDIS_CELERY_DB = os.getenv("REDIS_CELERY_DB", "0")
 60 | REDIS_RESULTS_DB = os.getenv("REDIS_RESULTS_DB", "1")
 61 | 
 62 | RESULTS_BACKEND = FileSystemCache("/app/superset_home/sqllab")
 63 | 
 64 | CACHE_CONFIG = {
 65 |     "CACHE_TYPE": "RedisCache",
 66 |     "CACHE_DEFAULT_TIMEOUT": 300,
 67 |     "CACHE_KEY_PREFIX": "superset_",
 68 |     "CACHE_REDIS_HOST": REDIS_HOST,
 69 |     "CACHE_REDIS_PORT": REDIS_PORT,
 70 |     "CACHE_REDIS_DB": REDIS_RESULTS_DB,
 71 | }
 72 | DATA_CACHE_CONFIG = CACHE_CONFIG
 73 | 
 74 | 
 75 | class CeleryConfig:
 76 |     broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
 77 |     imports = ("superset.sql_lab",)
 78 |     result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
 79 |     worker_prefetch_multiplier = 1
 80 |     task_acks_late = False
 81 |     beat_schedule = {
 82 |         "reports.scheduler": {
 83 |             "task": "reports.scheduler",
 84 |             "schedule": crontab(minute="*", hour="*"),
 85 |         },
 86 |         "reports.prune_log": {
 87 |             "task": "reports.prune_log",
 88 |             "schedule": crontab(minute=10, hour=0),
 89 |         },
 90 |     }
 91 | 
 92 | 
 93 | CELERY_CONFIG = CeleryConfig
 94 | 
 95 | FEATURE_FLAGS = {"ALERT_REPORTS": True}
 96 | ALERT_REPORTS_NOTIFICATION_DRY_RUN = True
 97 | WEBDRIVER_BASEURL = "http://superset:8088/"
 98 | # The base URL for the email report hyperlinks.
 99 | WEBDRIVER_BASEURL_USER_FRIENDLY = WEBDRIVER_BASEURL
100 | 
101 | SQLLAB_CTAS_NO_LIMIT = True
102 | 
103 | #
104 | # Optionally import superset_config_docker.py (which will have been included on
105 | # the PYTHONPATH) in order to allow for local settings to be overridden
106 | #
107 | try:
108 |     import superset_config_docker
109 |     from superset_config_docker import *  # noqa
110 | 
111 |     logger.info(
112 |         f"Loaded your Docker configuration at " f"[{superset_config_docker.__file__}]"
113 |     )
114 | except ImportError:
115 |     logger.info("Using default Docker config...")
116 | 


--------------------------------------------------------------------------------
/superset/docker/pythonpath_dev/superset_config_local.example:
--------------------------------------------------------------------------------
 1 | #
 2 | # Licensed to the Apache Software Foundation (ASF) under one or more
 3 | # contributor license agreements.  See the NOTICE file distributed with
 4 | # this work for additional information regarding copyright ownership.
 5 | # The ASF licenses this file to You under the Apache License, Version 2.0
 6 | # (the "License"); you may not use this file except in compliance with
 7 | # the License.  You may obtain a copy of the License at
 8 | #
 9 | #    http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 | #
17 | 
18 | #
19 | # This is an example "local" configuration file. In order to set/override config
20 | # options that ONLY apply to your local environment, simply copy/rename this file
21 | # to docker/pythonpath/superset_config_docker.py
22 | # It ends up being imported by docker/superset_config.py which is loaded by
23 | # superset/config.py
24 | #
25 | 
26 | SQLALCHEMY_DATABASE_URI = "postgresql+psycopg2://pguser:pgpwd@some.host/superset"
27 | SQLALCHEMY_ECHO = True
28 | 


--------------------------------------------------------------------------------
/superset/docker/requirements-local.txt:
--------------------------------------------------------------------------------
1 | duckdb-engine==0.9.2


--------------------------------------------------------------------------------
/superset/docker/run-server.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | #
 3 | # Licensed to the Apache Software Foundation (ASF) under one
 4 | # or more contributor license agreements.  See the NOTICE file
 5 | # distributed with this work for additional information
 6 | # regarding copyright ownership.  The ASF licenses this file
 7 | # to you under the Apache License, Version 2.0 (the
 8 | # "License"); you may not use this file except in compliance
 9 | # with the License.  You may obtain a copy of the License at
10 | #
11 | # http://www.apache.org/licenses/LICENSE-2.0
12 | #
13 | # Unless required by applicable law or agreed to in writing,
14 | # software distributed under the License is distributed on an
15 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16 | # KIND, either express or implied.  See the License for the
17 | # specific language governing permissions and limitations
18 | # under the License.
19 | #
20 | HYPHEN_SYMBOL='-'
21 | 
22 | gunicorn \
23 |     --bind "${SUPERSET_BIND_ADDRESS:-0.0.0.0}:${SUPERSET_PORT:-8088}" \
24 |     --access-logfile "${ACCESS_LOG_FILE:-$HYPHEN_SYMBOL}" \
25 |     --error-logfile "${ERROR_LOG_FILE:-$HYPHEN_SYMBOL}" \
26 |     --workers ${SERVER_WORKER_AMOUNT:-1} \
27 |     --worker-class ${SERVER_WORKER_CLASS:-gthread} \
28 |     --threads ${SERVER_THREADS_AMOUNT:-20} \
29 |     --timeout ${GUNICORN_TIMEOUT:-60} \
30 |     --keep-alive ${GUNICORN_KEEPALIVE:-2} \
31 |     --max-requests ${WORKER_MAX_REQUESTS:-0} \
32 |     --max-requests-jitter ${WORKER_MAX_REQUESTS_JITTER:-0} \
33 |     --limit-request-line ${SERVER_LIMIT_REQUEST_LINE:-0} \
34 |     --limit-request-field_size ${SERVER_LIMIT_REQUEST_FIELD_SIZE:-0} \
35 |     "${FLASK_APP}"
36 | 


--------------------------------------------------------------------------------