14 |
15 | # Altinity Project Antalya Examples
16 |
17 | Project Antalya is a new branch of ClickHouse® code designed to
18 | integrate real-time analytic query with data lakes. This project
19 | provides documentation as well as working code examples to help you use
20 | and contribute to Antalya.
21 |
22 | *Important Note!* Altinity maintains and supports Project Antalya. Altinity
23 | is not affiliated or associated with ClickHouse Inc any way. ClickHouse® is
24 | a registered trademark of ClickHouse, Inc.
25 |
26 | See the
27 | [Community Support Section](#community-support) if you want to ask
28 | questions or log ideas and issues.
29 |
30 | ## Project Antalya Goals
31 |
32 | Analytic data size has increased to a point where traditional designs
33 | based on shared-nothing architectures and block storage are prohibitively
34 | expensive to operate. Project Antalya introduces a fully open source
35 | architecture for cost-efficient, real-time systems using cheap object
36 | storage and scalable compute, specifically:
37 |
38 | * Enable real-time analytics to work off a single copy of
39 | data that is shared with AI and data science applications.
40 | * Provide a single SQL endpoint for native ClickHouse® and data lake data.
41 | * Use open table formats to enable easy access from any application type.
42 | * Separate compute and storage; moreover, allow users to scale compute
43 | for ingest, merge, transformation, and query independently.
44 |
45 | Antalya will implement these goals through the following concrete features:
46 |
47 | 1. Optimize query performance of ClickHouse® on Parquet files stored
48 | S3-compatible object storage.
49 | 2. Enable ClickHouse® clusters to add pools of stateless servers aka swarm
50 | clusters that handle query and insert operations on shared object storage
51 | files with linear scaling.
52 | 3. Adapt ClickHouse® to use Iceberg tables as shared storage.
53 | 4. Enable ClickHouse® clusters to extend existing tables onto unlimited
54 | Iceberg storage with transparent query across both native MergeTree and
55 | Parquet data.
56 | 5. Simplify backup and DR by leveraging Iceberg features like snapshots.
57 | 6. Maintain full compability with upstream ClickHouse® features and
58 | bug fixes.
59 |
60 | ## Roadmap
61 |
62 | [Project Antalya Roadmap 2025 - Real-Time Data Lakes](https://github.com/Altinity/ClickHouse/issues/804)
63 |
64 | ## Licensing
65 |
66 | Project Antalya code is licensed under Apache 2.0 license. There are no feature
67 | hold-backs.
68 |
69 | ## Quick Start
70 |
71 | See the [Docker Quick Start](./docker/README.md) to try out Antalya in
72 | a few minutes using Docker Compose on a laptop.
73 |
74 | ## Scalable Swarm Example
75 |
76 | For a fully functional swarm cluster implemention, look at the
77 | [kubernetes](kubernetes/README.md) example. It demonstrates use of swarm
78 | clusters on a large blockchain dataset stored in Parquet.
79 |
80 | ## Project Antalya Binaries
81 |
82 | ### Packages
83 |
84 | Project Antalya ClickHouse® server and keeper packages are available on the
85 | [builds.altinity.cloud](https://builds.altinity.cloud/) page. Scan to the last
86 | section to find them.
87 |
88 | ### Containers
89 |
90 | Project Antalya ClickHouse® server and ClickHouse® keeper containers
91 | are available on Docker Hub.
92 |
93 | Check for the latest build on
94 | [Docker Hub](https://hub.docker.com/r/altinity/clickhouse-server/tags).
95 |
96 | ## Documentation
97 |
98 | Look in the docs directory for current documentation. More is on the way.
99 |
100 | * [Project Antalya Concepts Guide](docs/concepts.md)
101 | * [Command and Configuration Reference](docs/reference.md)
102 |
103 | See also the [Project Antalya Launch Video](https://altinity.com/events/scale-clickhouse-queries-infinitely-with-10x-cheaper-storage-introducing-project-antalya)
104 | for an introduction to Project Antalya and a demo of performance.
105 |
106 | The [Altinity Blog](https://altinity.com/blog/) has regular articles
107 | on Project Antalya features and performance.
108 |
109 | ## Code
110 |
111 | To access Project Antalya code run the following commands.
112 |
113 | ```
114 | git clone git@github.com:Altinity/ClickHouse.git Altinity-ClickHouse
115 | cd Altinity-ClickHouse
116 | git branch
117 | ```
118 |
119 | You will be in the antalya branch by default.
120 |
121 | ## Building
122 |
123 | Build instructions are located [here](https://github.com/Altinity/ClickHouse/blob/antalya/docs/en/development/developer-instruction.md)
124 | in the Altinity ClickHouse code tree. Project Antalya code does not
125 | introduce new libaries or build procedures.
126 |
127 | ## Contributing
128 |
129 | We welcome contributions. We're setting up procedures for community
130 | contribution. For now, please contact us in Slack to find out how to
131 | join the project.
132 |
133 | ## Community Support
134 |
135 | * Join the [AltinityDB Slack Workspace](https://altinity.com/slack) to ask questions.
136 | * [Log an issue on this documentation](https://github.com/Altinity/antalya-examples/issues).
137 | * [Log an issue on Antalya code](https://github.com/Altinity/ClickHouse/issues).
138 |
139 | ## Commercial Support
140 |
141 | Altinity is the primary maintainer of Project Antalya. It is the
142 | basis of our data lake-enabled Altinity.Cloud and is also used in
143 | self-managed installations. Altinity offers a range of services related
144 | to ClickHouse® and data lakes.
145 |
146 | - [Official website](https://altinity.com/) - Get a high level overview of Altinity and our offerings.
147 | - [Altinity.Cloud](https://altinity.com/cloud-database/) - Run Antalya in your cloud or ours.
148 | - [Altinity Support](https://altinity.com/support/) - Get Enterprise-class support for ClickHouse®.
149 | - [Slack](https://altinity.com/slack) - Talk directly with ClickHouse® users and Altinity devs.
150 | - [Contact us](https://hubs.la/Q020sH3Z0) - Contact Altinity with your questions or issues.
151 | - [Free consultation](https://hubs.la/Q020sHkv0) - Get a free consultation with a ClickHouse® expert today.
152 |
--------------------------------------------------------------------------------
/kubernetes/manifests/nvme/README.md:
--------------------------------------------------------------------------------
1 | # Antalya Swarms using NVMe Backed Workers (Experimental)
2 |
3 | This directory shows how to set up an Antalya swarm cluster using
4 | workers with local NVMe. It is still work in progress.
5 |
6 | The current examples show prototype configuration using the following
7 | options:
8 | * hostPath volumes
9 | * local storage volumes
10 |
11 | Examples apply to apply to AWS EKS only.
12 |
13 | ## NVMe SSD provisioning.
14 |
15 | This is a prerequisite for either swarm type. It creates
16 | a daemonset that automatically formats NVMe drives
17 | on new EKS workers. The source code is currently located
18 | [here](https://github.com/hodgesrm/eks-nvme-ssd-provisioner). It will
19 | be transferred to the Altinity org shortly.
20 |
21 | ```
22 | kubectl apply -f eks-nvme-ssd-provisioner.yaml
23 | ```
24 |
25 | The daemonset has tolerations necessary to operate on swarm nodes.
26 | Confirm that it is working by checking that a daemon appears on each new
27 | worker node. You should also see log messages when the daemon formats
28 | the file system on a new worker, as shown by the following example.
29 |
30 | ```
31 | $ kubectl logs eks-nvme-ssd-provisioner-zfmv7 -n kube-system
32 | mke2fs 1.47.0 (5-Feb-2023)
33 | Discarding device blocks: done
34 | Creating filesystem with 228759765 4k blocks and 57196544 inodes
35 | Filesystem UUID: fa6a9dd7-322b-456b-bc86-176c5dee2470
36 | Superblock backups stored on blocks:
37 | 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
38 | 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
39 | 102400000, 214990848
40 |
41 | Allocating group tables: done
42 | Writing inode tables: done
43 | Creating journal (262144 blocks): done
44 | Writing superblocks and filesystem accounting information: done
45 |
46 | Device /dev/nvme1n1 has been mounted to /pv-disks/fa6a9dd7-322b-456b-bc86-176c5dee2470
47 | NVMe SSD provisioning is done and I will go to sleep now
48 | ```
49 |
50 | The log messages are helpful if you need to debug problems with mount locations.
51 |
52 | ## Swarm using NVMe SSD with hostPath volumes (Experimental)
53 |
54 | The swarm-hostpath.yaml is configured to use
55 | [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath)
56 | volumes.
57 |
58 | ```
59 | kubectl apply -f swarm-hostpath.yaml
60 | ```
61 |
62 | The local path on the worker is /nvme/disk/clickhouse. /nvme/disk is a
63 | soft link on the worker that points to the NVMe SSD file system mount
64 | point. Swarms must use a mount point below this.
65 |
66 | ### Issues
67 |
68 | * During autoscaling the swarm node may get scheduled onto the worker
69 | before the eks-nvme-ssd-provisioner can format the disk and create
70 | the mount point. In this case the pod will not correctly mount the
71 | hostPath volume and will use the worker root storage instead.
72 |
73 | ## Swarm using NVMe with local persistent volumes (Experimental)
74 |
75 | This is based on the [Local Persistence Volume Static Provisioner](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner) project.
76 |
77 | It attempts to create a PVs from the worker's local NVMe file system
78 | provisioned by the eks-nvme-ssd-provisioner.
79 |
80 | Local storage provisioning currently does not mesh
81 | well with cluster autoscaling. The issue is summarized in
82 | https://github.com/kubernetes/autoscaler/issues/1658#issuecomment-1036205889,
83 | which also provides a draft workaround to enable the Kubernetes cluster
84 | autoscaler to complete pod deployments when the pod depends on storage
85 | that is allocated locally on the VM once it scales up.
86 |
87 | ### Prerequisites
88 |
89 | * The daemonset pod spec must match the
90 | nodeSelector and have matching tolerations so that it can operate on
91 | worker nodes.
92 |
93 | * Storage class must use a fake provisioner name or scale-up will
94 | not start. See
95 |
96 | * The EKS nodegroup must have following EKS tag set to let the autoscaler
97 | infer the settings on worker nodes:
98 | `k8s.io/cluster-autoscaler/node-template/label/aws.amazon.com/eks-local-ssd=true`
99 |
100 | * All mount paths in the sig-storage-local-static-provisioner must
101 | match the eks-nvme-ssd-paths. The correct mount path is: `/pv-disks`.
102 | (The default path in generated yaml files is /dev/disk/kubernetes. This
103 | does not work.)
104 |
105 | The included [local-storage-eks-nvme-ssd.yaml file](./local-storage-eks-nvme-ssd.yaml)
106 | is adjusted to meet the above requirements other than the EKS tag on the swarm node
107 | group, which must be set manually for now.
108 |
109 | Install [Altinity Kubernetes Operator for ClickHouse](https://github.com/Altinity/clickhouse-operator).
110 | (Must be 0.24.4 or above to support CHK resource to manage ClickHouse Keeper.)
111 |
112 | ### Installation
113 |
114 | Install as follows:
115 | ```
116 | kubectl apply -f local-storage-eks-nvme-ssd.yaml
117 | ```
118 |
119 | Confirm that it is working by checking that a daemon appears on
120 | each new worker node. You should also see log messages when the daemon
121 | formats the file system, as shown by the following example.
122 |
123 | ```
124 | $ kubectl logs local-static-provisioner-k9rqt|more
125 | I0330 05:10:19.357347 1 main.go:69] Loaded configuration: {StorageClass
126 | Config:map[nvme-ssd:{HostDir:/pv-disks MountDir:/pv-disks BlockCleanerCommand
127 | :[/scripts/quick_reset.sh] VolumeMode:Filesystem FsType: NamePattern:*}] Node
128 | LabelsForPV:[] UseAlphaAPI:false UseJobForCleaning:false MinResyncPeriod:{Dur
129 | ation:5m0s} UseNodeNameOnly:false LabelsForPV:map[] SetPVOwnerRef:false}
130 | I0330 05:10:19.357403 1 main.go:70] Ready to run...
131 | I0330 05:10:19.357464 1 common.go:444] Creating client using in-cluster
132 | config
133 | I0330 05:10:19.370946 1 main.go:95] Starting config watcher
134 | I0330 05:10:19.370964 1 main.go:98] Starting controller
135 | I0330 05:10:19.370971 1 main.go:102] Starting metrics server at :8080
136 | I0330 05:10:19.371060 1 controller.go:91] Initializing volume cache
137 | I0330 05:10:19.471437 1 controller.go:163] Controller started
138 | I0330 05:10:19.471697 1 discovery.go:423] Found new volume at host path
139 | "/pv-disks/fa6a9dd7-322b-456b-bc86-176c5dee2470" with capacity 921138413568,
140 | creating Local PV "local-pv-8312a141", required volumeMode "Filesystem"
141 | I0330 05:10:19.481257 1 cache.go:55] Added pv "local-pv-8312a141" to ca
142 | che
143 | I0330 05:10:19.481326 1 discovery.go:457] Created PV "local-pv-8312a141
144 | ```
145 |
146 | ### Start swarm
147 |
148 | The swarm-local-storage.yaml is configured to use local storage PVs.
149 |
150 | ```
151 | kubectl apply -f swarm-local-storage.yaml
152 | ```
153 |
154 | For this to work you must currently scale up the node group manually to the number
155 | of requested nodes.
156 |
157 | ### Issues
158 |
159 | * Autoscaling does not work, because the cluster autoscaler will hang
160 | waiting for acknowledgement of PVs on new workers.
161 |
162 | * The local storage provisioner does not properly clean up PVs left
163 | behind when workers are deleted. This is probably a consequence of
164 | using a mock
165 | provisioner.
166 |
167 | * Behavior may also be flakey even when workers are
168 | preprovisioned. Scaling up one-by-one seems OK.
169 |
--------------------------------------------------------------------------------
/kubernetes/README.md:
--------------------------------------------------------------------------------
1 | # Antalya Kubernetes Example
2 |
3 | This directory contains samples for querying a Parquet-based data lake
4 | lake using AWS EKS, AWS S3, and Project Antalya.
5 |
6 | ## Quickstart
7 |
8 | ### Prerequisites
9 |
10 | Install:
11 | * [aws-cli](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
12 | * [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl)
13 | * [terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli)
14 |
15 | ### Start Kubernetes
16 |
17 | Cd to the terraform directory and follow the installation directions in the
18 | README.md file to set up a Kubernetes cluster on EKS. Here's the short form.
19 |
20 | ```
21 | cd terraform
22 | terraform init
23 | terraform apply
24 | aws eks update-kubeconfig --name my-eks-cluster # Default cluster name
25 | ```
26 |
27 | Create a namespace named antalya and make it the default. (You don't
28 | have to do this but the examples assume it.)
29 |
30 | ```
31 | kubectl create ns antalya
32 | kubectl config set-context --current --namespace=antalya
33 | ```
34 |
35 | ### Install the Altinity Operator for Kubernetes
36 |
37 | Install the latest production version of the [Altinity Kubernetes Operator
38 | for ClickHouse](https://github.com/Altinity/clickhouse-operator).
39 |
40 | ```
41 | kubectl apply -f https://raw.githubusercontent.com/Altinity/clickhouse-operator/master/deploy/operator/clickhouse-operator-install-bundle.yaml
42 | ```
43 |
44 | ### Install an Iceberg REST catalog
45 |
46 | This step installs an Iceberg REST catalog using the
47 | Altinity [Ice Toolset](https://github.com/Altinity/ice).
48 |
49 | Follow instructions in the [ice directory README.md](ice/README.md).
50 |
51 | ### Install ClickHouse server with Antalya swarm cluster
52 |
53 | This step installs a ClickHouse "vector" server that applications can connect
54 | to, an Antalya swarm cluster, and a Keeper ensemble to allow the swarm servers
55 | to register themselves dynamically.
56 |
57 | #### Using plain manifest files
58 |
59 | Cd to the manifests directory and install the manifests in the default
60 | namespace.
61 |
62 | ```
63 | cd manifest
64 | kubectl apply -f gp3-encrypted-fast-storage-class.yaml
65 | kubectl apply -f keeper.yaml
66 | kubectl apply -f swarm.yaml
67 | kubectl apply -f vector.yaml
68 | ```
69 |
70 | #### Using helm
71 |
72 | The helm script is in the helm directory. It's under development.
73 |
74 | ## Running
75 |
76 | ### Querying Parquet files on AWS S3 and Apache Iceberg
77 |
78 | AWS kindly provides
79 | [AWS Public Block Data](https://registry.opendata.aws/aws-public-blockchain/),
80 | which we will use as example data for Parquet on S3.
81 |
82 | Start by logging into the vector server.
83 | ```
84 | kubectl exec -it chi-vector-example-0-0-0 -- clickhouse-client
85 | ```
86 |
87 | Try running a query using only the vector server.
88 | ```
89 | SELECT date, sum(output_count)
90 | FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/**.parquet', NOSIGN)
91 | WHERE date >= '2025-01-01' GROUP BY date ORDER BY date ASC
92 | SETTINGS use_hive_partitioning = 1
93 | ```
94 |
95 | This query sets the baseline for execution without assistance from the swarm.
96 | Depending on the date range you use it is likely to be slow. You can cancel
97 | using ^C.
98 |
99 | Next, let's try a query using the swarm. The object_storage_cluster
100 | setting points to the swarm cluster name.
101 |
102 | ```
103 | SELECT date, sum(output_count)
104 | FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/**.parquet', NOSIGN)
105 | WHERE date >= '2025-02-01' GROUP BY date ORDER BY date ASC
106 | SETTINGS use_hive_partitioning = 1, object_storage_cluster = 'swarm';
107 | ```
108 |
109 | The next query shows results when caches are turned on.
110 | ```
111 | SELECT date, sum(output_count)
112 | FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/**.parquet', NOSIGN)
113 | WHERE date >= '2025-02-01' GROUP BY date ORDER BY date ASC
114 | SETTINGS use_hive_partitioning = 1, object_storage_cluster = 'swarm',
115 | input_format_parquet_use_metadata_cache = 1, enable_filesystem_cache = 1;
116 | ```
117 |
118 | Successive queries will complete faster as caches load.
119 |
120 | ### Improving performance by scaling up the swarm
121 |
122 | You can at any time increase the size of the swarm server by directly
123 | editing the swarm CHI resource, changing the number of shards to 8,
124 | and submitting the changes. (Example using manifest files.)
125 |
126 | ```
127 | kubectl edit chi swarm
128 | ...
129 | podTemplate: replica
130 | volumeClaimTemplate: storage
131 | shardsCount: 4 <-- Change to 8 and save.
132 | templates:
133 | ...
134 | ```
135 |
136 | Run the query again after scale-up completes. You should see the response
137 | time drop by roughly 50%. Try running it again. You should see a further drop
138 | as swarm caches pick up additional files. You can scale up further to see
139 | additional drops. This setup has been tested to 16 nodes.
140 |
141 | To scale down the swarm, just edit the shardsCount again and set it to
142 | a smaller number.
143 |
144 | Important note: You may see failed queries as the swarm scales down. This
145 | is [a known issue](https://github.com/Altinity/ClickHouse/issues/759)
146 | and will be corrected soon.
147 |
148 | ### Querying Parquet files in Iceberg
149 |
150 | You can load the public data set into Iceberg, which makes the queries
151 | much easier to construct. Here are examples of the same queries when
152 | the public data are available in Iceberg once you do the ice REST
153 | catalog installation.
154 |
155 | ```
156 | SET allow_experimental_database_iceberg=true;
157 |
158 | -- Use this for Antalya 25.3 or above.
159 | CREATE DATABASE ice
160 | ENGINE = DataLakeCatalog('http://ice-rest-catalog:5000')
161 | SETTINGS catalog_type = 'rest',
162 | auth_header = 'Authorization: Bearer foo',
163 | warehouse = 's3://rhodges-ice-rest-catalog-demo}';
164 |
165 | -- Use this for Antalya 25.2 or below.
166 | CREATE DATABASE ice
167 | ENGINE = Iceberg('https://rest-catalog.dev.altinity.cloud')
168 | SETTINGS catalog_type = 'rest',
169 | auth_header = 'Authorization: Bearer jj...2j',
170 | warehouse = 's3://aws...iceberg';
171 | ```
172 |
173 | Show the tables available in the database.
174 |
175 | ```
176 | SHOW TABLES FROM ice
177 |
178 | ┌─name─────────────┐
179 | 1. │ btc.transactions │
180 | 2. │ nyc.taxis │
181 | └──────────────────┘
182 | ```
183 |
184 | Try counting rows. This goes faster if you enable caching of Iceberg metadata.
185 |
186 | ```
187 | SELECT count()
188 | FROM ice.`btc.transactions`
189 | SETTINGS use_hive_partitioning = 1, object_storage_cluster = 'swarm',
190 | input_format_parquet_use_metadata_cache = 1, enable_filesystem_cache = 1,
191 | use_iceberg_metadata_files_cache=1;
192 | ```
193 |
194 | Now try the same query that we ran earlier directly against the public S3
195 | dataset files. Caches are not enabled.
196 |
197 | ```
198 | SELECT date, sum(output_count)
199 | FROM ice.`btc.transactions`
200 | WHERE date >= '2025-02-01' GROUP BY date ORDER BY date ASC
201 | SETTINGS use_hive_partitioning = 1, object_storage_cluster = 'swarm';
202 | ```
203 |
204 | Try the same query with all caches enabled. It should be faster.
205 |
206 | ```
207 | SELECT date, sum(output_count)
208 | FROM ice.`btc.transactions`
209 | WHERE date = '2025-02-01' GROUP BY date ORDER BY date ASC
210 | SETTINGS use_hive_partitioning = 1, object_storage_cluster = 'swarm',
211 | input_format_parquet_use_metadata_cache = 1, enable_filesystem_cache = 1,
212 | use_iceberg_metadata_files_cache = 1;
213 | ```
214 |
--------------------------------------------------------------------------------
/docker/README.md:
--------------------------------------------------------------------------------
1 | # Antalya Docker Example
2 |
3 | This directory contains samples for construction of an Iceberg-based data
4 | lake using Docker Compose and Altinity Antalya.
5 |
6 | The docker compose structure and the Python scripts took early inspiration from
7 | [ClickHouse integration tests for Iceberg](https://github.com/ClickHouse/ClickHouse/tree/master/tests/integration/test_database_iceberg) but have deviated substantially since then.
8 |
9 | ## Quickstart
10 |
11 | Examples are for Ubuntu. Adjust commands for other distros.
12 |
13 | ### Install prerequisite software
14 |
15 | Install [Docker Desktop](https://docs.docker.com/engine/install/) and
16 | [Docker Compose](https://docs.docker.com/compose/install/).
17 |
18 | Install the Altinity ice catalog client. (Requires a JDK.)
19 |
20 | ```
21 | sudo apt install openjdk-21-jdk
22 | curl -sSL https://github.com/altinity/ice/releases/download/v0.8.1/ice-0.8.1 \
23 | -o ice && chmod a+x ice && sudo mv ice /usr/local/bin/
24 | ```
25 |
26 | ### Bring up the data lake
27 |
28 | ```
29 | docker compose up -d
30 | ```
31 |
32 | ### Load data
33 |
34 | Create a table by loading using the ice catalog client. This creates the
35 | table automatically from the schema in the parquet file.
36 |
37 | ```
38 | ice insert nyc.taxis -p \
39 | https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet
40 | ```
41 |
42 | ### Compute aggregates on Parquet data
43 |
44 | Connect to the Antalya server container and start clickhouse-client.
45 | ```
46 | docker exec -it vector clickhouse-client
47 | ```
48 |
49 | Set up database pointing to Ice[berg] REST catalog.
50 | ```
51 | SET allow_experimental_database_iceberg = 1;
52 |
53 | DROP DATABASE IF EXISTS ice;
54 |
55 | CREATE DATABASE ice ENGINE = DataLakeCatalog('http://ice-rest-catalog:5000')
56 | SETTINGS catalog_type = 'rest',
57 | auth_header = 'Authorization: Bearer foo',
58 | storage_endpoint = 'http://minio:9000',
59 | warehouse = 's3://warehouse';
60 | ```
61 |
62 | Query data on vector only, followed by vector plus swarm servers.
63 | ```
64 | -- Run query only on initiator.
65 | SELECT
66 | toDate(tpep_pickup_datetime) AS date,
67 | avg(passenger_count) AS passengers,
68 | avg(fare_amount) AS fare
69 | FROM ice.`nyc.taxis` GROUP BY date ORDER BY date
70 |
71 | -- Delegate to swarm servers.
72 | SELECT
73 | toDate(tpep_pickup_datetime) AS date,
74 | avg(passenger_count) AS passengers,
75 | avg(fare_amount) AS fare
76 | FROM ice.`nyc.taxis` GROUP BY date ORDER BY date
77 | SETTINGS object_storage_cluster='swarm'
78 | ```
79 |
80 | ### Bring down cluster and delete data
81 |
82 | ```
83 | docker compose down
84 | sudo rm -rf data
85 | ```
86 |
87 | ## Load data with Python and view with ClickHouse
88 |
89 | ### Enable Python
90 |
91 | Install Python virtual environment module for your python version. Example shown
92 | below for Python 3.12.
93 |
94 | ```
95 | sudo apt install python3.12-venv
96 | ```
97 |
98 | Create and invoke the venv, then install required modules.
99 | ```
100 | python3.12 -m venv venv
101 | . ./venv/bin/activate
102 | pip install --upgrade pip
103 | pip install -r requirements.txt
104 | ```
105 |
106 | ### Load and read data with pyiceberg library
107 | ```
108 | python iceberg_setup.py
109 | python iceberg_read.py
110 | ```
111 |
112 | ### Demonstrate Antalya queries against data from Python
113 |
114 | Connect to the Antalya server container and start clickhouse-client.
115 | ```
116 | docker exec -it vector clickhouse-client
117 | ```
118 |
119 | Query data on vector only, followed by vector plus swarm servers.
120 | ```
121 | -- Run query only on initiator.
122 | SELECT * FROM ice.`iceberg.bids`
123 |
124 | -- Delegate to swarm servers.
125 | SELECT symbol, avg(bid)
126 | FROM ice.`iceberg.bids` GROUP BY symbol
127 | SETTINGS object_storage_cluster = 'swarm'
128 | ```
129 |
130 | ## Using Spark with ClickHouse and Ice
131 |
132 | Connect to the spark-iceberg container command line.
133 | ```
134 | docker exec -it spark-iceberg /bin/bash
135 | ```
136 |
137 | Start the Spark scala shell.
138 | ```
139 | spark-shell
140 | ```
141 |
142 | Read data and prove you can change it as well by running the commands below.
143 | ```
144 | spark.sql("SHOW NAMESPACES").show()
145 | spark.sql("SHOW TABLES FROM iceberg").show()
146 | spark.sql("SHOW CREATE TABLE iceberg.bids").show(truncate=false)
147 | spark.sql("SELECT * FROM iceberg.bids").show()
148 | spark.sql("DELETE FROM iceberg.bids WHERE bid < 198.23").show()
149 | spark.sql("SELECT * FROM iceberg.bids").show()
150 | ```
151 |
152 | Try reading the table from ClickHouse. The deleted rows should be gone.
153 |
154 | ## Additional help and troubleshooting
155 |
156 | ### Logs
157 |
158 | Logs are in the data directory along with service data.
159 |
160 | ### Cleaning up
161 |
162 | This deletes *all* containers and volumes for a fresh start. Do not use it
163 | if you have other Docker applications running.
164 | ```
165 | ./clean-all.sh -f
166 | ```
167 |
168 | ### Find out where your query ran
169 |
170 | If you are curious to find out where your query was actually processed,
171 | you can find out easily. Take the query_id that clickhouse-client prints
172 | and run a query like the following. You'll see all query log records.
173 |
174 | ```
175 | SELECT hostName() AS host, type, initial_query_id, is_initial_query, query
176 | FROM clusterAllReplicas('all', system.query_log)
177 | WHERE (type = 'QueryFinish')
178 | AND (initial_query_id = '8051eef1-e68b-491a-b63d-fac0c8d6ef27')\G
179 | ```
180 |
181 | ### Setting up Iceberg databases
182 |
183 | These commands when the vector server comes up for the first time.
184 |
185 | ```
186 | SET allow_experimental_database_iceberg = 1;
187 |
188 | DROP DATABASE IF EXISTS ice;
189 |
190 | CREATE DATABASE ice
191 | ENGINE = DataLakeCatalog('http://ice-rest-catalog:5000')
192 | SETTINGS catalog_type = 'rest',
193 | auth_header = 'Authorization: Bearer foo',
194 | storage_endpoint = 'http://minio:9000',
195 | warehouse = 's3://warehouse';
196 | ```
197 |
198 | ### Query Iceberg and local data together
199 |
200 | Create a local table and populate it with data from Iceberg, altering
201 | data to make it different.
202 |
203 | ```
204 | CREATE DATABASE IF NOT EXISTS local
205 | ;
206 | CREATE TABLE local.bids AS datalake.`iceberg.bids`
207 | ENGINE = MergeTree
208 | PARTITION BY toDate(datetime)
209 | ORDER BY (symbol, datetime)
210 | SETTINGS allow_nullable_key = 1
211 | ;
212 | -- Pull some data into the local table, making it look different.
213 | INSERT INTO local.bids
214 | SELECT datetime + toIntervalDay(4), symbol, bid, ask
215 | FROM datalake.`iceberg.bids`
216 | ;
217 | SELECT *
218 | FROM local.bids
219 | UNION ALL
220 | SELECT *
221 | FROM datalake.`iceberg.bids`
222 | ;
223 | -- Create a merge table.
224 | CREATE TABLE all_bids AS local.bids
225 | ENGINE = Merge(REGEXP('local|datalake'), '.*bids')
226 | ;
227 | SELECT * FROM all_bids
228 | ;
229 | ```
230 |
231 | ### Fetching values from Iceberg catalog using curl
232 |
233 | The Iceberg REST API is simple to query using curl. The documentation is
234 | effectively [the full REST spec in the Iceberg GitHub Repo](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml). Meanwhile here
235 | are a few examples that you can try on this project.
236 |
237 | Find namespaces.
238 | ```
239 | curl -H "Authorization: bearer foo" http://localhost:5000/v1/namespaces | jq -s
240 | ```
241 |
242 | Find tables in namespace.
243 | ```
244 | curl -H "Authorization: bearer foo" http://localhost:5000/v1/namespaces/iceberg/tables | jq -s
245 | ```
246 |
247 | Find table spec in Iceberg.
248 | ```
249 | curl -H "Authorization: bearer foo" http://localhost:5000/v1/namespaces/iceberg/tables/bids | jq -s
250 | ```
251 |
--------------------------------------------------------------------------------
/docker/tests/helpers.py:
--------------------------------------------------------------------------------
1 | # Copyright 2024
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | import base64
16 | import os
17 | import subprocess
18 | import sys
19 | import time
20 |
21 | import requests
22 |
23 |
24 | class DockerHelper:
25 | """Helper class containing functions to manage Docker Compose operations
26 | including log capture as well as other actions require to test the
27 | docker compose setup"""
28 |
29 | def __init__(self, docker_dir):
30 | """Initialize DockerHelper with the Docker Compose directory"""
31 | self.docker_dir = docker_dir
32 | self.started_services = False
33 | self._create_logs_directory()
34 |
35 | def _create_logs_directory(self):
36 | """Create the test_logs directory if it doesn't exist"""
37 | logs_dir = os.path.join(self.docker_dir, "test_logs")
38 | os.makedirs(logs_dir, exist_ok=True)
39 |
40 | def setup_services(self):
41 | """Start Docker Compose services or verify existing setup based on environment"""
42 | os.chdir(self.docker_dir)
43 |
44 | if os.getenv('SKIP_DOCKER_SETUP') == 'true':
45 | print("Using existing Docker Compose setup...")
46 | self._verify_services_running()
47 | else:
48 | print("Starting Docker Compose services...")
49 | self._start_docker_services()
50 | self.started_services = True
51 |
52 | def cleanup_services(self):
53 | """Capture logs and stop Docker Compose services if we started them"""
54 | # Always capture logs for debugging
55 | self.capture_container_logs()
56 |
57 | # Only stop services if we started them
58 | if self.started_services:
59 | print("Stopping Docker Compose services...")
60 | self._stop_docker_services()
61 |
62 | def _start_docker_services(self):
63 | """Start Docker Compose services"""
64 | try:
65 | subprocess.run(
66 | ["docker", "compose", "up", "-d"],
67 | capture_output=True,
68 | text=True,
69 | check=True,
70 | )
71 | print("Docker Compose started successfully")
72 |
73 | # Wait a bit for services to be ready
74 | time.sleep(5)
75 | except subprocess.CalledProcessError as e:
76 | print(f"Failed to start Docker Compose: {e}")
77 | print(f"stdout: {e.stdout}")
78 | print(f"stderr: {e.stderr}")
79 | sys.exit(1)
80 |
81 | def _stop_docker_services(self):
82 | """Stop Docker Compose services"""
83 | try:
84 | subprocess.run(
85 | ["docker", "compose", "down"],
86 | capture_output=True,
87 | text=True,
88 | check=True,
89 | )
90 | print("Docker Compose stopped successfully")
91 | except subprocess.CalledProcessError as e:
92 | print(f"Failed to stop Docker Compose: {e}")
93 | print(f"stdout: {e.stdout}")
94 | print(f"stderr: {e.stderr}")
95 |
96 | def _verify_services_running(self):
97 | """Verify that required Docker Compose services are running"""
98 | try:
99 | result = subprocess.run(
100 | ["docker", "compose", "ps", "--services", "--filter", "status=running"],
101 | capture_output=True,
102 | text=True,
103 | check=True,
104 | )
105 | running_services = result.stdout.strip().split('\n') if result.stdout.strip() else []
106 | if not running_services:
107 | print("Warning: No running Docker Compose services found")
108 | else:
109 | print(f"Found running services: {', '.join(running_services)}")
110 | except subprocess.CalledProcessError as e:
111 | print(f"Failed to verify services: {e}")
112 | print(f"stdout: {e.stdout}")
113 | print(f"stderr: {e.stderr}")
114 |
115 | def capture_container_logs(self):
116 | """Capture container logs and save them to test_logs directory"""
117 | print("Capturing container logs...")
118 | logs_dir = os.path.join(self.docker_dir, "test_logs")
119 | timestamp = time.strftime("%Y%m%d_%H%M%S")
120 |
121 | try:
122 | # Get list of services
123 | result = subprocess.run(
124 | ["docker", "compose", "config", "--services"],
125 | capture_output=True,
126 | text=True,
127 | check=True,
128 | )
129 | services = result.stdout.strip().split('\n') if result.stdout.strip() else []
130 |
131 | # Capture logs for each service
132 | for service in services:
133 | if service: # Skip empty lines
134 | try:
135 | log_result = subprocess.run(
136 | ["docker", "compose", "logs", "--no-color", service],
137 | capture_output=True,
138 | text=True,
139 | check=True,
140 | )
141 | log_file = os.path.join(logs_dir, f"{service}_{timestamp}.log")
142 | with open(log_file, 'w') as f:
143 | f.write(f"=== Logs for service: {service} ===\n")
144 | f.write(f"=== Captured at: {time.strftime('%Y-%m-%d %H:%M:%S')} ===\n\n")
145 | f.write(log_result.stdout)
146 | print(f"Saved logs for {service} to {log_file}")
147 | except subprocess.CalledProcessError as e:
148 | print(f"Failed to capture logs for service {service}: {e}")
149 |
150 | # Also capture combined logs
151 | try:
152 | combined_result = subprocess.run(
153 | ["docker", "compose", "logs", "--no-color"],
154 | capture_output=True,
155 | text=True,
156 | check=True,
157 | )
158 | combined_log_file = os.path.join(logs_dir, f"combined_{timestamp}.log")
159 | with open(combined_log_file, 'w') as f:
160 | f.write(f"=== Combined Docker Compose Logs ===\n")
161 | f.write(f"=== Captured at: {time.strftime('%Y-%m-%d %H:%M:%S')} ===\n\n")
162 | f.write(combined_result.stdout)
163 | print(f"Saved combined logs to {combined_log_file}")
164 | except subprocess.CalledProcessError as e:
165 | print(f"Failed to capture combined logs: {e}")
166 |
167 | except subprocess.CalledProcessError as e:
168 | print(f"Failed to get service list: {e}")
169 | print(f"stdout: {e.stdout}")
170 | print(f"stderr: {e.stderr}")
171 |
172 |
173 | def generate_basic_auth_header(username, password):
174 | """Generate a properly encoded basic authentication header"""
175 | credentials = f"{username}:{password}"
176 | encoded_credentials = base64.b64encode(credentials.encode("utf-8")).decode(
177 | "utf-8"
178 | )
179 | return f"Basic {encoded_credentials}"
180 |
181 |
182 | def http_get_helper(test_case, url, timeout=10, expected_status_code=200, auth_header=None):
183 | """Helper function to perform HTTP GET request with error handling"""
184 | try:
185 | headers = {}
186 | if auth_header:
187 | headers["Authorization"] = auth_header
188 |
189 | response = requests.get(url, timeout=timeout, headers=headers)
190 | # Check if status code matches expected.
191 | if response.status_code == expected_status_code:
192 | print(f"HTTP GET to {url} successful: {response.status_code}")
193 | return response
194 | else:
195 | test_case.fail(
196 | f"Expected status {expected_status_code}, got {response.status_code}"
197 | )
198 | except requests.exceptions.RequestException as e:
199 | print(f"HTTP request failed for URL: {url}")
200 | print(f"Exception: {e}")
201 | test_case.fail(f"HTTP request failed: {e}")
202 |
203 |
204 | def run_python_script_helper(test_case, script_name):
205 | """Helper function to run a Python script from the parent directory"""
206 | try:
207 | # Get the parent directory (docker) where the Python scripts should be
208 | test_dir = os.path.dirname(os.path.abspath(__file__))
209 | docker_dir = os.path.dirname(test_dir)
210 | script_path = os.path.join(docker_dir, script_name)
211 |
212 | # Run the script and check for success
213 | result = subprocess.run(
214 | ["python", script_path],
215 | capture_output=True,
216 | text=True,
217 | check=True,
218 | cwd=docker_dir
219 | )
220 | print(f"{script_name} executed successfully")
221 | print(f"stdout: {result.stdout}")
222 | return result
223 | except subprocess.CalledProcessError as e:
224 | test_case.fail(f"{script_name} failed with exit code {e.returncode}\n"
225 | f"stdout: {e.stdout}\nstderr: {e.stderr}")
226 | except FileNotFoundError:
227 | test_case.fail(f"{script_name} not found in parent directory")
228 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/docs/reference.md:
--------------------------------------------------------------------------------
1 | # Command and Configuration Reference
2 |
3 | ## SQL Syntax Guide
4 |
5 | This section shows how to use the Iceberg database engine, table engine, and
6 | table function.
7 |
8 | ### Iceberg Database Engine
9 |
10 | The Iceberg database engine connects ClickHouse to an Iceberg REST catalog. The
11 | tables listed in the REST catalog show up as database. The Iceberg REST catalog
12 | must already exist. Here is an example of the syntax. Note that you must enable
13 | Iceberg database support with the allow_experimental_database_iceberg. This can
14 | also be placed in a user profile to enable it by default.
15 |
16 | ```
17 | SET allow_experimental_database_iceberg=true;
18 |
19 | CREATE DATABASE datalake
20 | ENGINE = Iceberg('http://rest:8181/v1', 'minio', 'minio123')
21 | SETTINGS catalog_type = 'rest',
22 | storage_endpoint = 'http://minio:9000/warehouse',
23 | warehouse = 'iceberg';
24 | ```
25 |
26 | The Iceberg database engine takes three arguments:
27 |
28 | * url - Path to Iceberg READ catalog endpoint
29 | * user - Object storage user
30 | * password - Object storage password
31 |
32 | The following settings are supported.
33 |
34 | * auth_header - Authorization header of format 'Authorization: '
35 | * auth_scope - Authorization scope for client credentials or token exchange
36 | * oauth_server_uri - OAuth server URI
37 | * vended_credentials - Use vended credentials (storage credentials) from catalog
38 | * warehouse - Warehouse name inside the catalog
39 | * storage_endpoint - Object storage endpoint
40 |
41 | ### Iceberg Table Engine
42 |
43 | Will be documented later.
44 |
45 | ### Iceberg Table Function
46 |
47 | The [Iceberg table function](https://clickhouse.com/docs/en/sql-reference/table-functions/iceberg)
48 | selects from an Iceberg table. It uses the path of the table in object
49 | storage to locate table metadata. Here is an example of the syntax.
50 |
51 | ```
52 | SELECT count()
53 | FROM iceberg('http://minio:9000/warehouse/data')
54 | ```
55 |
56 | You can dispatch queries to the swarm as follows:
57 |
58 | ```
59 | SELECT count()
60 | FROM iceberg('http://minio:9000/warehouse/data')
61 | SETTINGS object_storage_cluster = 'swarm'
62 | ```
63 |
64 | The iceberg() function is an alias for icebergS3(). See the upstream docs for more information.
65 |
66 | It's important to note that the iceberg() table function expects to see data
67 | and metadata directores after the URL provided as an argument. In other words,
68 | the Iceberg table must be arranged in object storage as follows:
69 |
70 | * http://minio:9000/warehouse/data/metadata - Contains Iceberg metadata files for the table
71 | * http://minio:9000/warehouse/data/data - Contains Iceberg data files for the table
72 |
73 | If the files are not laid out as shown above the iceberg() table function
74 | may not be able to read data.
75 |
76 | ## Swarm Clusters
77 |
78 | Swarm clusters are clusters of stateless ClickHouse servers that may be used for parallel
79 | query on S3 files as well as Iceberg tables (which are just collections of S3 files).
80 |
81 | ### Using Swarm Clusters to speed up query
82 |
83 | Swarm clusters can accelerate queries that use any of the following functions.
84 |
85 | * s3() function
86 | * s3Cluster() function -- Specify as function argument
87 | * iceberg() function
88 | * icebergS3Cluster() function -- Specify as function argument
89 | * Iceberg table engine, including tables made available via using the Iceberg database engine
90 |
91 | To delegate subqueries to a swarm cluster, add the object_storage_cluster
92 | setting as shown below with the swarm cluster name. You can also set
93 | the value in a user profile, which will ensure that the setting applies by default
94 | to all queries for that user.
95 |
96 | Here's an example of a query on Parquet files using Hive partitioning.
97 |
98 | ```
99 | SELECT hostName() AS host, count()
100 | FROM s3('http://minio:9000/warehouse/data/data/**/**.parquet')
101 | GROUP BY host
102 | SETTINGS use_hive_partitioning=1, object_storage_cluster='swarm'
103 | ```
104 |
105 | Here is an example of querying the same data via Iceberg using the swarm
106 | cluster.
107 |
108 | ```
109 | SELECT count()
110 | FROM datalake.`iceberg.bids`
111 | SETTINGS object_storage_cluster = 'swarm'
112 | ```
113 |
114 | Here's an example of using the swarm cluster with the icebergS3Cluster()
115 | function.
116 |
117 | ```
118 | SELECT hostName() AS host, count()
119 | FROM icebergS3Cluster('swarm', 'http://minio:9000/warehouse/data')
120 | GROUP BY host
121 | ```
122 |
123 | ### Relevant settings for swarm clusters
124 |
125 | The following list shows the main query settings that affect swarm
126 | cluster processing.
127 |
128 | | Setting Name | Description | Value |
129 | |--------------|-------------|-------|
130 | | `enable_filesystem_cache` | Use filesystem cache for S3 blocks | 0 or 1 |
131 | | `input_format_parquet_use_metadata_cache` | Cache Parquet file metadata | 0 or 1 |
132 | | `input_format_parquet_metadata_cache_max_size` | Parquet metadata cache size (defaults to 500MiB) | Integer |
133 | | `object_storage_cluster` | Swarm cluster name | String |
134 | | `object_storage_max_nodes` | Number of swarm nodes to use (defaults to all nodes) | Integer |
135 | | `use_hive_partitioning` | Files follow Hive partitioning | 0 or 1 |
136 | | `use_iceberg_metadata_files_cache` | Cache parsed Iceberg metadata files in memory | 0 or 1 |
137 | | `use_iceberg_partition_pruning` | Prune files based on Iceberg data | 0 or 1 |
138 |
139 | ### Configuring swarm cluster autodiscovery
140 |
141 | Cluster-autodiscovery uses [Zoo]Keeper as a registry for swarm cluster
142 | members. Swarm cluster servers register themselves on a specific path
143 | at start-up time to join the cluster. Other servers can read the path
144 | find members of the swarm cluster.
145 |
146 | To use auto-discovery, you must enable Keeper by adding a ``
147 | tag similar to the following example. This must be done for all servers
148 | including swarm servers as well as ClickHouse servers that invoke them.
149 |
150 | ```
151 |
152 |
153 |
154 | keeper
155 | 9181
156 |
157 |
158 |
159 | ```
160 |
161 | You must also enable automatic cluster discovery.
162 | ```
163 | 1
164 | ```
165 |
166 | #### Using a single Keeper ensemble
167 |
168 | When using a single Keeper for all servers, add the following remote server
169 | definition to each swarm server configuration. This provides a path on which
170 | the server will register.
171 |
172 | ```
173 |
174 |
175 |
176 |
177 | /clickhouse/discovery/swarm
178 | secret_key
179 |
180 |
181 |
182 | ```
183 |
184 | Add the following remote server definition to each server that _reads_ the
185 | swarm server list using remote discovery. Note the `` tag, which
186 | must be set to prevent non-swarm servers from joining th cluster.
187 |
188 | ```
189 |
190 |
191 |
192 |
193 | /clickhouse/discovery/swarm
194 | secret_key
195 |
196 | true
197 |
198 |
199 |
200 | ```
201 |
202 | #### Using multiple keeper ensembles
203 |
204 | It's common to use separate keeper ensembles to manage intra-cluster
205 | replication and swarm cluster discovery. In this case you can enable
206 | an auxiliary keeper that handles only auto-discovery. Here is the
207 | configuration for such a Keeper ensemble. ClickHouse will
208 | use this Keeper ensemble for auto-discovery.
209 |
210 | ```
211 |
212 |
213 |
214 |
215 |
216 | keeper
217 | 9181
218 |
219 |
220 |
221 |
222 | ```
223 |
224 | This is in addition to the settings described in previous sections,
225 | which remain the same.
226 |
227 | ## Configuring Caches
228 |
229 | Caches make a major difference in the performance of ClickHouse queries. This
230 | section describes how to configure them in a swarm cluster.
231 |
232 | ### Iceberg Metadata Cache
233 |
234 | The Iceberg metadata cache keeps parsed table definitions in memory. It is
235 | enabled using the `use_iceberg_metadata_files_cache setting`, as shown in the
236 | following example:
237 |
238 | ```
239 | SELECT count()
240 | FROM ice.`aws-public-blockchain.btc`
241 | SETTINGS object_storage_cluster = 'swarm',
242 | use_iceberg_metadata_files_cache = 0;
243 | ```
244 |
245 | Reading and parsing Iceberg metadata files (including metadata.json,
246 | manifest list, and manifest files) is slow. Enabling this setting can
247 | speed up query planning significantly.
248 |
249 | ### Parquet Metadata Cache
250 |
251 | The Parquet metadata cache keeps metadata from individual Parquets in memory,
252 | including column metadata, min/max statistics, and Bloom filter indexes.
253 | Swarm nodes use the metadata to avoid fetching unnecessary blocks from
254 | object storage. If no blocks are needed the swarm node skips the file entirely.
255 |
256 | The following example shows how to enable Parquet metadata caching.
257 | ```
258 | SELECT count()
259 | FROM ice.`aws-public-blockchain.btc`
260 | SETTINGS object_storage_cluster = 'swarm',
261 | input_format_parquet_use_metadata_cache = 1;
262 | ```
263 |
264 | The server setting `input_format_parquet_metadata_cache_max_size` controls the
265 | size of the cache. It currently defaults to 500MiB.
266 |
267 | ### S3 Filesystem Cache
268 |
269 | This cache stores blocks read from object storage on local disk. It offers
270 | a considerable speed advantage, especially when blocks are in storage. The
271 | S3 filesystem cache requires special configuration each swarm host.
272 |
273 | #### Define the cache
274 |
275 | Add a definition like the following to /etc/clickhouse/filesystem_cache.xml
276 | to set up a filesystem cache.
277 |
278 | ```
279 | spec:
280 | configuration:
281 | files:
282 | config.d/filesystem_cache.xml: |
283 |
284 |
285 |
286 | /var/lib/clickhouse/s3_parquet_cache
287 | 50Gi
288 |
289 |
290 |
291 | ```
292 |
293 | #### Enable cache use in queries
294 |
295 | The following settings control use of the filesystem cache.
296 |
297 | * enable_filesystem_cache - Enable filesystem cache (1=enabled)
298 | * enable_filesystem_cache_log - Enable logging of cache operations (1=enabled)
299 | * filesystem_cache_name - Name of the cache to use (must be specified)
300 |
301 | You can enable the settings on a query as follows:
302 |
303 | ```
304 | SELECT date, sum(output_count)
305 | FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/**.parquet', NOSIGN)
306 | WHERE date >= '2025-01-01' GROUP BY date ORDER BY date ASC
307 | SETTINGS use_hive_partitioning = 1, object_storage_cluster = 'swarm',
308 | enable_filesystem_cache = 1, filesystem_cache_name = 's3_parquet_cache'
309 | ```
310 |
311 | You can also set cache values in user profiles as shown by the following
312 | settings in Altinity operator format:
313 |
314 | ```
315 | spec:
316 | configuration:
317 | profiles:
318 | use_cache/enable_filesystem_cache: 1
319 | use_cache/enable_filesystem_cache_log: 1
320 | use_cache/filesystem_cache_name: "s3_parquet_cache"
321 | ```
322 |
323 | #### Clear cache
324 |
325 | Issue the following command on any swarm server. (It does not work from
326 | other clusters.)
327 |
328 | ```
329 | SYSTEM DROP FILESYSTEM CACHE ON CLUSTER 'swarm'
330 | ```
331 |
332 | #### Find out how the cache is doing.
333 |
334 | Get statistics on file system caches across the swarm.
335 |
336 | ```
337 | SELECT hostName() host, cache_name, count() AS segments, sum(size) AS size,
338 | min(cache_hits) AS min_hits, avg(cache_hits) AS avg_hits,
339 | max(cache_hits) AS max_hits
340 | FROM clusterAllReplicas('swarm', system.filesystem_cache)
341 | GROUP BY host, cache_name
342 | ORDER BY host, cache_name ASC
343 | FORMAT Vertical
344 | ```
345 |
346 | Find out how many S3 calls an individual ClickHouse is making. When caching
347 | is working properly you should see the values remain the same between
348 | successive queries.
349 |
350 | ```
351 | SELECT name, value
352 | FROM clusterAllReplicas('swarm', system.events)
353 | WHERE event ILIKE '%s3%'
354 | ORDER BY 1
355 | ```
356 |
357 | To see S3 stats across all servers, use the following.
358 | ```
359 | SELECT hostName() host, name, sum(value) AS value
360 | FROM clusterAllReplicas('all', system.events)
361 | WHERE event ILIKE '%s3%'
362 | GROUP BY 1, 2 ORDER BY 1, 2
363 | ```
364 |
365 | To see S3 stats for a single query spread across multiple hosts, issue the following request.
366 | ```
367 | SELECT hostName() host, k, v
368 | FROM clusterAllReplicas('all', system.query_log)
369 | ARRAY JOIN ProfileEvents.keys AS k, ProfileEvents.values AS v
370 | WHERE initial_query_id = '5737ecca-c066-42f8-9cd1-a910a3d1e0b4' AND type = 2
371 | AND k ilike '%S3%'
372 | ORDER BY host, k
373 | ```
374 |
375 | ### S3 List Objects Cache
376 |
377 | Listing files in object storage using the S3 ListObjectsV2 call is
378 | expensive. The S3 List Objects Cache avoids repeated calls and can
379 | cut down significant time during query planning. You can enable it
380 | using the `use_object_storage_list_objects_cache` setting as shown below.
381 |
382 | ```
383 | SELECT date, count()
384 | FROM s3('s3://aws-public-blockchain/v1.0/btc/transactions/*/*.parquet', NOSIGN)
385 | WHERE (date >= '2025-01-01') AND (date <= '2025-01-31')
386 | GROUP BY date
387 | ORDER BY date ASC
388 | SETTINGS use_hive_partitioning = 1, use_object_storage_list_objects_cache = 1
389 | ```
390 |
391 | The setting can speed up performance enormously but has a number of limitations:
392 |
393 | * It does not speed up Iceberg queries, since Iceberg metadata provides lists
394 | of files.
395 | * It is best for datasets that are largely read-only. It may cause queries to miss
396 | newer files, if they arrive while the cache is active.
397 |
--------------------------------------------------------------------------------