--------------------------------------------------------------------------------
/content/en/engines/mergetree-table-engine-family/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "MergeTree table engine family"
3 | linkTitle: "MergeTree table engine family"
4 | description: >
5 | MergeTree table engine family
6 | ---
7 | Internals:
8 |
9 | [https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup41/merge_tree.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup41/merge_tree.pdf)
10 |
11 | [https://youtu.be/1UIl7FpNo2M?t=2467](https://youtu.be/1UIl7FpNo2M?t=2467)
12 |
--------------------------------------------------------------------------------
/static/assets/93978647.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/content/en/engines/mergetree-table-engine-family/index-and-column-files.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "index & column files"
3 | linkTitle: "index & column files"
4 | description: >
5 | index & column files
6 | ---
7 | 
8 |
9 | 
10 |
11 | [https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup27/adaptive_index_granularity.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup27/adaptive_index_granularity.pdf)
12 |
--------------------------------------------------------------------------------
/content/en/upgrade/clickhouse-feature-report.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "ClickHouse® Function/Engines/Settings Report"
3 | linkTitle: "ClickHouse® Function/Engines/Settings Report"
4 | description: >
5 | Report on ClickHouse® functions, table functions, table engines, system and MergeTree settings, with availability information.
6 | ---
7 |
8 | Follow this link for a complete report on ClickHouse® features with their availability: https://github.com/anselmodadams/ChMisc/blob/main/report/report.md. It is frequently updated (at least once a month).
9 |
--------------------------------------------------------------------------------
/static/assets/90964099.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/static/assets/90964099 (1).svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright 2021 Altinity Inc
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
--------------------------------------------------------------------------------
/content/en/altinity-kb-functions/arrayfold.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "arrayFold"
3 | linkTitle: "arrayFold"
4 | ---
5 |
6 | ## EWMA example
7 |
8 | ```sql
9 | WITH
10 | [40, 45, 43, 31, 20] AS data,
11 | 0.3 AS alpha
12 | SELECT arrayFold((acc, x) -> arrayPushBack(acc, (alpha * x) + ((1 - alpha) * (acc[-1]))), arrayPopFront(data), [CAST(data[1], 'Float64')]) as ewma
13 |
14 | ┌─ewma─────────────────────────────────────────────────────────────┐
15 | │ [40,41.5,41.949999999999996,38.66499999999999,33.06549999999999] │
16 | └──────────────────────────────────────────────────────────────────┘
17 | ```
18 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/machine-learning-in-clickhouse.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Machine learning in ClickHouse"
3 | linkTitle: "Machine learning in ClickHouse"
4 | description: >
5 | Machine learning in ClickHouse
6 | ---
7 |
8 | Resources
9 |
10 | * [Machine Learning in ClickHouse](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup31/ml.pdf) - Presentation from 2019 (Meetup 31)
11 | * [ML discussion: CatBoost / MindsDB / Fast.ai](../../altinity-kb-integrations/catboost-mindsdb-fast.ai) - Brief article from 2021
12 | * [Machine Learning Forecase (Russian)](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/forecast.pdf) - Presentation from 2019 (Meetup 38)
13 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-dictionaries/example-of-postgresql-dictionary.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Example of PostgreSQL dictionary"
3 | linkTitle: "Example of PostgreSQL dictionary"
4 | description: >
5 | Example of PostgreSQL dictionary
6 | ---
7 |
8 | ```sql
9 | CREATE DICTIONARY postgres_dict
10 | (
11 | id UInt32,
12 | value String
13 | )
14 | PRIMARY KEY id
15 | SOURCE(
16 | POSTGRESQL(
17 | port 5432
18 | host 'postgres1'
19 | user 'postgres'
20 | password 'mysecretpassword'
21 | db 'clickhouse'
22 | table 'test_schema.test_table'
23 | )
24 | )
25 | LIFETIME(MIN 300 MAX 600)
26 | LAYOUT(HASHED());
27 | ```
28 |
29 | and later do
30 |
31 | ```sql
32 | SELECT dictGetString(postgres_dict, 'value', toUInt64(1))
33 | ```
34 |
--------------------------------------------------------------------------------
/content/en/using-this-knowledgebase/mermaid_example.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Mermaid Example"
3 | linkTitle: "Mermaid Example"
4 | description: >
5 | A short example of using the Mermaid library to add charts.
6 | weight: 12
7 | ---
8 | This Knowledge Base now supports [Mermaid](https://mermaid-js.github.io/mermaid/#/), a handy way to create charts from text. The following example shows a very simple chart, and the code to use.
9 |
10 | To add a Mermaid chart, encase the Mermaid code between {{* mermaid */>}}, as follows:
11 |
12 |
13 |
14 | ```text
15 | {{*mermaid*/>}}
16 | graph TD;
17 | A-->B;
18 | A-->C;
19 | B-->D;
20 | C-->D;
21 | {{*/mermaid*/>}}
22 | ```
23 |
24 | And it renders as so:
25 |
26 | {{}}
27 | graph TD;
28 | A-->B;
29 | A-->C;
30 | B-->D;
31 | C-->D;
32 | {{}}
33 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-selects-from-engine-kafka.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "SELECTs from engine=Kafka"
3 | linkTitle: "SELECTs from engine=Kafka"
4 | description: >
5 | SELECTs from engine=Kafka
6 | ---
7 | ## Question
8 |
9 | What will happen, if we would run SELECT query from working Kafka table with MV attached? Would data showed in SELECT query appear later in MV destination table?
10 |
11 | ## Answer
12 |
13 | 1. Most likely SELECT query would show nothing.
14 | 2. If you lucky enough and something would show up, those rows **wouldn't appear** in MV destination table.
15 |
16 | So it's not recommended to run SELECT queries on working Kafka tables.
17 |
18 | In case of debug it's possible to use another Kafka table with different `consumer_group`, so it wouldn't affect your main pipeline.
19 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-replication-queue.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Replication queue"
3 | linkTitle: "Replication queue"
4 | description: >
5 | Replication queue
6 | ---
7 | ```sql
8 | SELECT
9 | database,
10 | table,
11 | type,
12 | max(last_exception),
13 | max(postpone_reason),
14 | min(create_time),
15 | max(last_attempt_time),
16 | max(last_postpone_time),
17 | max(num_postponed) AS max_postponed,
18 | max(num_tries) AS max_tries,
19 | min(num_tries) AS min_tries,
20 | countIf(last_exception != '') AS count_err,
21 | countIf(num_postponed > 0) AS count_postponed,
22 | countIf(is_currently_executing) AS count_executing,
23 | count() AS count_all
24 | FROM system.replication_queue
25 | GROUP BY
26 | database,
27 | table,
28 | type
29 | ORDER BY count_all DESC
30 | ```
31 |
--------------------------------------------------------------------------------
/static/assets/93913111.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/layouts/_default/content.html:
--------------------------------------------------------------------------------
1 |
2 |
{{ .Title }}
3 |
{{ if ne (.Title|markdownify) (.Params.description|markdownify) }}{{ with .Params.description }}{{ . | markdownify }}{{ end }}{{ end }}
4 | {{ if .Date }} {{ .Date.Format "January 2, 2006" }} {{ end }}
20 |
--------------------------------------------------------------------------------
/content/en/engines/mergetree-table-engine-family/altinity-kb-nulls-in-order-by.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Nulls in order by"
3 | linkTitle: "Nulls in order by"
4 | description: >
5 | Nulls in order by
6 | ---
7 |
8 | 1) It is NOT RECOMMENDED for a general use
9 | 2) Use on your own risk
10 | 3) Use latest ClickHouse® version if you need that.
11 |
12 | ```sql
13 | CREATE TABLE x
14 | (
15 | `a` Nullable(UInt32),
16 | `b` Nullable(UInt32),
17 | `cnt` UInt32
18 | )
19 | ENGINE = SummingMergeTree
20 | ORDER BY (a, b)
21 | SETTINGS allow_nullable_key = 1;
22 | INSERT INTO x VALUES (Null,2,1), (Null,Null,1), (3, Null, 1), (4,4,1);
23 | INSERT INTO x VALUES (Null,2,1), (Null,Null,1), (3, Null, 1), (4,4,1);
24 | SELECT * FROM x;
25 | ┌────a─┬────b─┬─cnt─┐
26 | │ 3 │ null │ 2 │
27 | │ 4 │ 4 │ 2 │
28 | │ null │ 2 │ 2 │
29 | │ null │ null │ 2 │
30 | └──────┴──────┴─────┘
31 | ```
32 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/http_handlers.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "http handler example"
3 | linkTitle: "http_handlers"
4 | weight: 100
5 | description: >-
6 | http handler example
7 | ---
8 |
9 | ## http handler example (how to disable /play)
10 |
11 | ```xml
12 | # cat /etc/clickhouse-server/config.d/play_disable.xml
13 |
14 |
15 |
16 |
17 | /play
18 | GET
19 |
20 | static
21 | 403
22 | text/plain; charset=UTF-8
23 |
24 |
25 |
26 |
27 |
28 |
29 | ```
30 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-dictionaries/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Dictionaries"
3 | linkTitle: "Dictionaries"
4 | keywords:
5 | - clickhouse dictionaries
6 | - clickhouse arrays
7 | - postgresql dictionary
8 | description: >
9 | All you need to know about creating and using ClickHouse® dictionaries.
10 | weight: 11
11 | ---
12 |
13 | For more information on ClickHouse® Dictionaries, see
14 |
15 | the presentation [https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup34/clickhouse_integration.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup34/clickhouse_integration.pdf), slides 82-95, video https://youtu.be/728Yywcd5ys?t=10642
16 |
17 | We have also couple of articles about dictionaries in our blog:
18 | https://altinity.com/blog/dictionaries-explained
19 | https://altinity.com/blog/2020/5/19/clickhouse-dictionaries-reloaded
20 |
21 | And some videos:
22 | https://www.youtube.com/watch?v=FsVrFbcyb84
23 |
--------------------------------------------------------------------------------
/layouts/partials/toc.html:
--------------------------------------------------------------------------------
1 | {{ if not .Params.notoc }}
2 | {{ with .TableOfContents }}
3 | {{ if ge (len .) 200 }}
4 | {{ . }}
5 | {{ end }}
6 | {{ end }}
7 | {{ end }}
8 | {{ partial "social-links.html" . }}
9 |
10 |
11 | Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.
12 |
13 |
14 | Project Antalya
15 | Build Real‑Time Data Lakes with ClickHouse® and Apache Iceberg
16 | Learn more
17 |
18 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-google-s3-gcs.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Google S3 (GCS)"
3 | linkTitle: "Google S3 (GCS)"
4 | ---
5 |
6 | GCS with the table function - seems to work correctly for simple scenarios.
7 |
8 | Essentially you can follow the steps from the [Migrating from Amazon S3 to Cloud Storage](https://cloud.google.com/storage/docs/aws-simple-migration).
9 |
10 | 1. Set up a GCS bucket.
11 | 2. This bucket must be set as part of the default project for the account. This configuration can be found in settings -> interoperability.
12 | 3. Generate a HMAC key for the account, can be done in settings -> interoperability, in the section for user account access keys.
13 | 4. In ClickHouse®, replace the S3 bucket endpoint with the GCS bucket endpoint This must be done with the path-style GCS endpoint: `https://storage.googleapis.com/BUCKET_NAME/OBJECT_NAME`.
14 | 5. Replace the aws access key id and aws secret access key with the corresponding parts of the HMAC key.
15 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/values-mapping.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Values mapping"
3 | linkTitle: "Values mapping"
4 | description: >
5 | Values mapping
6 | ---
7 | ```sql
8 | SELECT count()
9 | FROM numbers_mt(1000000000)
10 | WHERE NOT ignore(transform(number % 3, [0, 1, 2, 3], ['aa', 'ab', 'ad', 'af'], 'a0'))
11 |
12 | 1 rows in set. Elapsed: 4.668 sec. Processed 1.00 billion rows, 8.00 GB (214.21 million rows/s., 1.71 GB/s.)
13 |
14 | SELECT count()
15 | FROM numbers_mt(1000000000)
16 | WHERE NOT ignore(multiIf((number % 3) = 0, 'aa', (number % 3) = 1, 'ab', (number % 3) = 2, 'ad', (number % 3) = 3, 'af', 'a0'))
17 |
18 | 1 rows in set. Elapsed: 7.333 sec. Processed 1.00 billion rows, 8.00 GB (136.37 million rows/s., 1.09 GB/s.)
19 |
20 | SELECT count()
21 | FROM numbers_mt(1000000000)
22 | WHERE NOT ignore(CAST(number % 3 AS Enum('aa' = 0, 'ab' = 1, 'ad' = 2, 'af' = 3)'))
23 |
24 | 1 rows in set. Elapsed: 1.152 sec. Processed 1.00 billion rows, 8.00 GB (867.79 million rows/s., 6.94 GB/s.)
25 | ```
26 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-useful-queries/remove_empty_partitions_from_rq.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Removing tasks in the replication queue related to empty partitions"
3 | linkTitle: "Removing tasks in the replication queue related to empty partitions"
4 | weight: 100
5 | description: >-
6 | Removing tasks in the replication queue related to empty partitions
7 | ---
8 |
9 | ## Removing tasks in the replication queue related to empty partitions
10 |
11 | ```
12 | SELECT 'ALTER TABLE ' || database || '.' || table || ' DROP PARTITION ID \''|| partition_id || '\';' FROM
13 | (SELECT DISTINCT database, table, extract(new_part_name, '^[^_]+') as partition_id FROM clusterAllReplicas('{cluster}', system.replication_queue) ) as rq
14 | LEFT JOIN
15 | (SELECT database, table, partition_id, sum(rows) as rows_count, count() as part_count
16 | FROM clusterAllReplicas('{cluster}', system.parts)
17 | WHERE active GROUP BY database, table, partition_id
18 | ) as p
19 | USING (database, table, partition_id)
20 | WHERE p.rows_count = 0 AND p.part_count = 0
21 | FORMAT TSVRaw;
22 | ```
23 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/cluster-production-configuration-guide/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Production Cluster Configuration Guide"
3 | linkTitle: "Production Cluster Configuration Guide"
4 | description: >
5 | Production Cluster Configuration Guide
6 | ---
7 |
8 |
9 | Moving from a single ClickHouse® server to a clustered format provides several benefits:
10 |
11 | * Replication guarantees data integrity.
12 | * Provides redundancy.
13 | * Failover by being able to restart half of the nodes without encountering downtime.
14 |
15 | Moving from an unsharded ClickHouse environment to a sharded cluster requires redesign of schema and queries. Starting with a sharded cluster from the beginning makes it easier in the future to scale the cluster up.
16 |
17 | Setting up a ClickHouse cluster for a production environment requires the following stages:
18 |
19 | * Hardware Requirements
20 | * Network Configuration
21 | * Create Host Names
22 | * Monitoring Considerations
23 | * Configuration Steps
24 | * Setting Up Backups
25 | * Staging Plans
26 | * Upgrading The Cluster
27 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-useful-queries/debug-hang.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Debug hanging thing"
3 | linkTitle: "Debug hanging thing"
4 | weight: 100
5 | description: >-
6 | Debug hanging / freezing things
7 | ---
8 |
9 | ## Debug hanging / freezing things
10 |
11 | If ClickHouse® is busy with something and you don't know what's happening, you can easily check the stacktraces of all the thread which are working
12 |
13 | ```sql
14 | SELECT
15 | arrayStringConcat(arrayMap(x -> concat('0x', lower(hex(x)), '\t', demangle(addressToSymbol(x))), trace), '\n') as trace_functions,
16 | count()
17 | FROM system.stack_trace
18 | GROUP BY trace_functions
19 | ORDER BY count()
20 | DESC
21 | SETTINGS allow_introspection_functions=1
22 | FORMAT Vertical;
23 | ```
24 |
25 | If you can't start any queries, but you have access to the node, you can sent a signal
26 |
27 | ```
28 | # older versions
29 | for i in $(ls -1 /proc/$(pidof clickhouse-server)/task/); do kill -TSTP $i; done
30 | # even older versions
31 | for i in $(ls -1 /proc/$(pidof clickhouse-server)/task/); do kill -SIGPROF $i; done
32 | ```
33 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/clean-up-orphaned-objects-on-s3.md.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Clean up orphaned objects on s3"
3 | linkTitle: "Clean up orphaned objects on s3"
4 | weight: 100
5 | description: >-
6 | Clean up orphaned objects left in an S3-backed ClickHouse tiered‐storage
7 | ---
8 |
9 | ### Problems
10 |
11 | - TRUNCATE and DROP TABLE remove **metadata only**.
12 | - Long-running queries, merges or other replicas may still reference parts, so ClickHouse delays removal.
13 | - There are bugs in Clickhouse that leave orphaned files, especially after failures.
14 |
15 | ### Solutions
16 |
17 | - use our utility for garbage collection - https://github.com/Altinity/s3gc
18 | - or create a separate path in the bucket for every table and every replica and remove the whole path in AWS console
19 | - you can also use [clickhouse-disk](https://clickhouse.com/docs/operations/utilities/clickhouse-disks) utility to delete s3 data:
20 |
21 | ```
22 | clickhouse-disks --disk s3 --query "remove /cluster/database/table/replica1"
23 | ```
24 |
25 |
26 |
27 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-parallel-consuming.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Kafka parallel consuming"
3 | linkTitle: "Kafka parallel consuming"
4 | description: >
5 | Kafka parallel consuming
6 | ---
7 | For very large topics when you need more parallelism (especially on the insert side) you may use several tables with the same pipeline (pre ClickHouse® 20.9) or enable `kafka_thread_per_consumer` (after 20.9).
8 |
9 | ```ini
10 | kafka_num_consumers = N,
11 | kafka_thread_per_consumer=1
12 | ```
13 |
14 | Notes:
15 |
16 | * the inserts will happen in parallel (without that setting inserts happen linearly)
17 | * enough partitions are needed.
18 | * `kafka_num_consumers` is limited by number of physical cores (half of vCPUs). `kafka_disable_num_consumers_limit` can be used to override the limit.
19 | * `background_message_broker_schedule_pool_size` is 16 by default, you may need to increase if using more than 16 consumers
20 |
21 | Before increasing `kafka_num_consumers` with keeping `kafka_thread_per_consumer=0` may improve consumption & parsing speed, but flushing & committing still happens by a single thread there (so inserts are linear).
22 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-schema-design/lowcardinality.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "LowCardinality"
3 | linkTitle: "LowCardinality"
4 | description: >
5 | LowCardinality
6 | ---
7 | ## Settings
8 |
9 | #### allow_suspicious_low_cardinality_types
10 |
11 | In CREATE TABLE statement allows specifying LowCardinality modifier for types of small fixed size (8 or less). Enabling this may increase merge times and memory consumption.
12 |
13 | #### low_cardinality_max_dictionary_size
14 |
15 | default - 8192
16 |
17 | Maximum size (in rows) of shared global dictionary for LowCardinality type.
18 |
19 | #### low_cardinality_use_single_dictionary_for_part
20 |
21 | LowCardinality type serialization setting. If is true, than will use additional keys when global dictionary overflows. Otherwise, will create several shared dictionaries.
22 |
23 | #### low_cardinality_allow_in_native_format
24 |
25 | Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.
26 |
27 | #### output_format_arrow_low_cardinality_as_dictionary
28 |
29 | Enable output LowCardinality type as Dictionary Arrow type
30 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/altinity-kb-optimize-vs-optimize-final.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "OPTIMIZE vs OPTIMIZE FINAL"
3 | linkTitle: "OPTIMIZE vs OPTIMIZE FINAL"
4 | description: >
5 | OPTIMIZE vs OPTIMIZE FINAL
6 | ---
7 | `OPTIMIZE TABLE xyz` -- this initiates an unscheduled merge.
8 |
9 | ## Example
10 |
11 | You have 40 parts in 3 partitions. This unscheduled merge selects some partition (i.e. February) and selects 3 small parts to merge, then merge them into a single part. You get 38 parts in the result.
12 |
13 | `OPTIMIZE TABLE xyz FINAL` -- initiates a cycle of unscheduled merges.
14 |
15 | ClickHouse® merges parts in this table until will remains 1 part in each partition (if a system has enough free disk space). As a result, you get 3 parts, 1 part per partition. In this case, ClickHouse rewrites parts even if they are already merged into a single part. It creates a huge CPU / Disk load if the table (XYZ) is huge. ClickHouse reads / uncompress / merge / compress / writes all data in the table.
16 |
17 | If this table has size 1TB it could take around 3 hours to complete.
18 |
19 | So we don't recommend running `OPTIMIZE TABLE xyz FINAL` against tables with more than 10million rows.
20 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-schema-design/how-to-store-ips.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "IPs/masks"
3 | linkTitle: "IPs/masks"
4 | description: >
5 | IPs/masks
6 | ---
7 | ### How do I Store IPv4 and IPv6 Address In One Field?
8 |
9 | There is a clean and simple solution for that. Any IPv4 has its unique IPv6 mapping:
10 |
11 | * IPv4 IP address: 191.239.213.197
12 | * IPv4-mapped IPv6 address: ::ffff:191.239.213.197
13 |
14 | #### Find IPs matching CIDR/network mask (IPv4)
15 |
16 | ```sql
17 | WITH IPv4CIDRToRange( toIPv4('10.0.0.1'), 8 ) as range
18 | SELECT
19 | *
20 | FROM values('ip IPv4',
21 | toIPv4('10.2.3.4'),
22 | toIPv4('192.0.2.1'),
23 | toIPv4('8.8.8.8'))
24 | WHERE
25 | ip BETWEEN range.1 AND range.2;
26 | ```
27 |
28 | #### Find IPs matching CIDR/network mask (IPv6)
29 |
30 | ```sql
31 | WITH IPv6CIDRToRange
32 | (
33 | toIPv6('2001:0db8:0000:85a3:0000:0000:ac1f:8001'),
34 | 32
35 | ) as range
36 | SELECT
37 | *
38 | FROM values('ip IPv6',
39 | toIPv6('2001:db8::8a2e:370:7334'),
40 | toIPv6('::ffff:192.0.2.1'),
41 | toIPv6('::'))
42 | WHERE
43 | ip BETWEEN range.1 AND range.2;
44 | ```
45 |
--------------------------------------------------------------------------------
/layouts/partials/social-links.html:
--------------------------------------------------------------------------------
1 | {{ $twitterurl := printf "https://twitter.com/intent/tweet?text=%s&url=%s" (htmlEscape .Title ) (htmlEscape .Permalink) }}
2 | {{ $facebookurl := printf "https://www.facebook.com/sharer/sharer.php?u=%s" (htmlEscape .Permalink) }}
3 | {{ $linkedinurl := printf "https://www.linkedin.com/shareArticle?mini=true&url=%s&title=%s" (htmlEscape .Permalink) (htmlEscape .Title ) }}
4 |
5 |
6 |
14 |
15 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/altinity-kb-how-to-check-the-list-of-watches.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "How to check the list of watches"
3 | linkTitle: "How to check the list of watches"
4 | description: >
5 | How to check the list of watches
6 | ---
7 | Zookeeper use watches to notify a client on znode changes. This article explains how to check watches set by ZooKeeper servers and how it is used.
8 |
9 | **Solution:**
10 |
11 | Zookeeper uses the `'wchc'` command to list all watches set on the Zookeeper server.
12 |
13 | `# echo wchc | nc zookeeper 2181`
14 |
15 | Reference
16 |
17 | [https://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html](https://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html)
18 |
19 | The `wchp` and `wchc` commands are not enabled by default because of their known DOS vulnerability. For more information, see [ZOOKEEPER-2693](https://issues.apache.org/jira/browse/ZOOKEEPER-2693)and [Zookeeper 3.5.2 - Denial of Service](https://vulners.com/exploitdb/EDB-ID:41277).
20 |
21 | By default those commands are disabled, they can be enabled via Java system property:
22 |
23 | `-Dzookeeper.4lw.commands.whitelist=*`
24 |
25 | on in zookeeper config: `4lw.commands.whitelist=*`\
26 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/window-functions.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Window functions"
3 | linkTitle: "Window functions"
4 | description: >
5 | Window functions
6 | ---
7 |
8 | #### Resources:
9 |
10 | * [Tutorial: ClickHouse® Window Functions](https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art)
11 | * [Video: Fun with ClickHouse Window Functions](https://www.youtube.com/watch?v=sm_vUdMQz4s)
12 | * [Blog: Battle of the Views: ClickHouse Window View vs. Live View](https://altinity.com/blog/battle-of-the-views-clickhouse-window-view-vs-live-view)
13 |
14 | #### How Do I Simulate Window Functions Using Arrays on older versions of ClickHouse?
15 |
16 | 1. Group with groupArray.
17 | 2. Calculate the needed metrics.
18 | 3. Ungroup back using arrayJoin.
19 |
20 | ### NTILE
21 |
22 | ```sql
23 | SELECT intDiv((num - 1) - (cnt % 3), 3) AS ntile
24 | FROM
25 | (
26 | SELECT
27 | row_number() OVER (ORDER BY number ASC) AS num,
28 | count() OVER () AS cnt
29 | FROM numbers(11)
30 | )
31 |
32 | ┌─ntile─┐
33 | │ 0 │
34 | │ 0 │
35 | │ 0 │
36 | │ 0 │
37 | │ 0 │
38 | │ 1 │
39 | │ 1 │
40 | │ 1 │
41 | │ 2 │
42 | │ 2 │
43 | │ 2 │
44 | └───────┘
45 | ```
46 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-useful-queries/table-meta-in-zookeeper.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Check table metadata in zookeeper"
3 | linkTitle: "Check table metadata in zookeeper"
4 | weight: 100
5 | description: >-
6 | Check table metadata in zookeeper.
7 | ---
8 |
9 | ## Compare table metadata of different replicas in zookeeper
10 |
11 | > Metadata on replica is not up to date with common metadata in Zookeeper
12 |
13 | ```sql
14 | SELECT *, if( neighbor(name, -1) == name and name != 'is_active', neighbor(value, -1) == value , 1) as looks_good
15 | FROM (
16 | SELECT
17 | name,
18 | path,
19 | ctime,
20 | mtime,
21 | value
22 | FROM system.zookeeper
23 | WHERE (path IN (
24 | SELECT arrayJoin(groupUniqArray(if(path LIKE '%/replicas', concat(path, '/', name), path)))
25 | FROM system.zookeeper
26 | WHERE path IN (
27 | SELECT arrayJoin([zookeeper_path, concat(zookeeper_path, '/replicas')])
28 | FROM system.replicas
29 | WHERE table = 'test_repl'
30 | )
31 | )) AND (name IN ('metadata', 'columns', 'is_active'))
32 | ORDER BY
33 | name = 'is_active',
34 | name ASC,
35 | path ASC
36 | )
37 | ```
38 |
39 | vs.
40 |
41 | ```sql
42 | SELECT metadata_modification_time, create_table_query FROM system.tables WHERE name = 'test_repl'
43 | ```
44 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/bi-tools.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "BI Tools"
3 | linkTitle: "BI Tools"
4 | description: >
5 | Business Intelligence Tools
6 | ---
7 | * Superset: [https://superset.apache.org/docs/databases/clickhouse](https://superset.apache.org/docs/databases/clickhouse)
8 | * Metabase: [https://github.com/enqueue/metabase-clickhouse-driver](https://github.com/enqueue/metabase-clickhouse-driver)
9 | * Querybook: [https://www.querybook.org/docs/setup_guide/connect_to_query_engines/\#all-query-engines](https://www.querybook.org/docs/setup_guide/connect_to_query_engines/#all-query-engines)
10 | * Tableau: [Altinity Tableau Connector for ClickHouse®](https://github.com/Altinity/tableau-connector-for-clickhouse) support both JDBC & ODBC drivers
11 | * Looker: [https://docs.looker.com/setup-and-management/database-config/clickhouse](https://docs.looker.com/setup-and-management/database-config/clickhouse)
12 | * Apache Zeppelin
13 | * SeekTable
14 | * ReDash
15 | * Mondrian: [https://altinity.com/blog/accessing-clickhouse-from-excel-using-mondrian-rolap-engine](https://altinity.com/blog/accessing-clickhouse-from-excel-using-mondrian-rolap-engine)
16 | * Grafana: [Integrating Grafana with ClickHouse](https://docs.altinity.com/integrations/clickhouse-and-grafana/)
17 | * Cumul.io
18 | * Tablum: https://tablum.io
19 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/cluster-production-configuration-guide/hardening-clickhouse-security.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Backups"
3 | linkTitle: "Backups"
4 | description: >
5 | Backups
6 | ---
7 |
8 |
9 | ClickHouse® is currently at the design stage of creating some universal backup solution. Some custom backup strategies are:
10 |
11 | 1. Each shard is backed up separately.
12 | 2. FREEZE the table/partition. For more information, see [Alter Freeze Partition](https://clickhouse.tech/docs/en/sql-reference/statements/alter/partition/#alter_freeze-partition).
13 | 1. This creates hard links in shadow subdirectory.
14 | 3. rsync that directory to a backup location, then remove that subfolder from shadow.
15 | 1. Cloud users are recommended to use [Rclone](https://rclone.org/).
16 | 4. Always add the full contents of the metadata subfolder that contains the current DB schema and ClickHouse configs to your backup.
17 | 5. For a second replica, it’s enough to copy metadata and configuration.
18 | 6. Data in ClickHouse is already compressed with lz4, backup can be compressed bit better, but avoid using cpu-heavy compression algorithms like gzip, use something like zstd instead.
19 |
20 | The tool automating that process: [Altinity Backup for ClickHouse](https://github.com/Altinity/clickhouse-backup).
21 |
--------------------------------------------------------------------------------
/content/en/upgrade/vulnerabilities.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Vulnerabilities"
3 | linkTitle: "Vulnerabilities"
4 | weight: 100
5 | description: >-
6 | Vulnerabilities
7 | ---
8 |
9 | ## 2022-03-15: 7 vulnerabilities in ClickHouse® were published.
10 |
11 | See the details https://jfrog.com/blog/7-rce-and-dos-vulnerabilities-found-in-clickhouse-dbms/
12 |
13 | Those vulnerabilities were fixed by 2 PRs:
14 |
15 | * https://github.com/ClickHouse/ClickHouse/pull/27136
16 | * https://github.com/ClickHouse/ClickHouse/pull/27743
17 |
18 | All releases starting from v21.10.2.15 have that problem fixed.
19 |
20 | Also, the fix was backported to 21.3 and 21.8 branches - versions v21.8.11.4-lts and v21.3.19.1-lts
21 | accordingly have the problem fixed (and all newer releases in those branches).
22 |
23 | The latest Altinity stable releases also contain the bugfix.
24 |
25 | * [21.8.13](https://docs.altinity.com/releasenotes/altinity-stable-release-notes/21.8/21813/)
26 | * [21.3.20](https://docs.altinity.com/releasenotes/altinity-stable-release-notes/21.3/21320/)
27 |
28 | If you use some older version we recommend upgrading.
29 |
30 | Before the upgrade - please ensure that ports 9000 and 8123 are not exposed to the internet, so external
31 | clients who can try to exploit those vulnerabilities can not access your clickhouse node.
32 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-dictionaries/security-named-collections.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Security named collections"
3 | linkTitle: "Security named collections"
4 | description: >
5 | Security named collections
6 | ---
7 |
8 |
9 | ## Dictionary with ClickHouse® table as a source with named collections
10 |
11 | ### Data for connecting to external sources can be stored in named collections
12 |
13 | ```xml
14 |
15 |
16 |
17 | localhost
18 | 9000
19 | default
20 | ch_dict
21 | mypass
22 |
23 |
24 |
25 | ```
26 |
27 | ### Dictionary
28 |
29 | ```sql
30 | DROP DICTIONARY IF EXISTS named_coll_dict;
31 | CREATE DICTIONARY named_coll_dict
32 | (
33 | key UInt64,
34 | val String
35 | )
36 | PRIMARY KEY key
37 | SOURCE(CLICKHOUSE(NAME local_host TABLE my_table DB default))
38 | LIFETIME(MIN 1 MAX 2)
39 | LAYOUT(HASHED());
40 |
41 | INSERT INTO my_table(key, val) VALUES(1, 'first row');
42 |
43 | SELECT dictGet('named_coll_dict', 'b', 1);
44 | ┌─dictGet('named_coll_dict', 'b', 1)─┐
45 | │ first row │
46 | └────────────────────────────────────┘
47 | ```
48 |
--------------------------------------------------------------------------------
/go.sum:
--------------------------------------------------------------------------------
1 | github.com/FortAwesome/Font-Awesome v0.0.0-20210804190922-7d3d774145ac/go.mod h1:IUgezN/MFpCDIlFezw3L8j83oeiIuYoj28Miwr/KUYo=
2 | github.com/FortAwesome/Font-Awesome v0.0.0-20230327165841-0698449d50f2/go.mod h1:IUgezN/MFpCDIlFezw3L8j83oeiIuYoj28Miwr/KUYo=
3 | github.com/FortAwesome/Font-Awesome v0.0.0-20240402185447-c0f460dca7f7/go.mod h1:IUgezN/MFpCDIlFezw3L8j83oeiIuYoj28Miwr/KUYo=
4 | github.com/google/docsy v0.2.0 h1:DN6wfyyp2rXsjdV1K3wioxOBTRvG6Gg48wLPDso2lc4=
5 | github.com/google/docsy v0.2.0/go.mod h1:shlabwAQakGX6qpXU6Iv/b/SilpHRd7d+xqtZQd3v+8=
6 | github.com/google/docsy v0.10.0 h1:6tMDacPwAyRWNCfvsn/9qGOZDQ8b0aRzjRZvnZPY5dg=
7 | github.com/google/docsy v0.10.0/go.mod h1:c0nIAqmRTOuJ01F85U/wJPQtc3Zj9N58Kea9bOT2AJc=
8 | github.com/google/docsy/dependencies v0.2.0/go.mod h1:2zZxHF+2qvkyXhLZtsbnqMotxMukJXLaf8fAZER48oo=
9 | github.com/google/docsy/dependencies v0.7.2 h1:+t5ufoADQAj4XneFphz4A+UU0ICAxmNaRHVWtMYXPSI=
10 | github.com/google/docsy/dependencies v0.7.2/go.mod h1:gihhs5gmgeO+wuoay4FwOzob+jYJVyQbNaQOh788lD4=
11 | github.com/twbs/bootstrap v4.6.1+incompatible/go.mod h1:fZTSrkpSf0/HkL0IIJzvVspTt1r9zuf7XlZau8kpcY0=
12 | github.com/twbs/bootstrap v5.2.3+incompatible/go.mod h1:fZTSrkpSf0/HkL0IIJzvVspTt1r9zuf7XlZau8kpcY0=
13 | github.com/twbs/bootstrap v5.3.3+incompatible/go.mod h1:fZTSrkpSf0/HkL0IIJzvVspTt1r9zuf7XlZau8kpcY0=
--------------------------------------------------------------------------------
/layouts/docs/list.html:
--------------------------------------------------------------------------------
1 | {{ define "main" }}
2 |
3 |
{{ .Title }}
4 | {{ if ne (.Params.description|markdownify) (.Title|markdownify) }}{{ with .Params.description }}
32 | {{ end }}
33 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-useful-queries/altinity-kb-number-of-active-parts-in-a-partition.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Number of active parts in a partition"
3 | linkTitle: "Number of active parts in a partition"
4 | description: >
5 | Number of active parts in a partition
6 | ---
7 | ## Q: Why do I have several active parts in a partition? Why ClickHouse® does not merge them immediately?
8 |
9 | ### A: CH does not merge parts by time
10 |
11 | Merge scheduler selects parts by own algorithm based on the current node workload / number of parts / size of parts.
12 |
13 | CH merge scheduler balances between a big number of parts and a wasting resources on merges.
14 |
15 | Merges are CPU/DISK IO expensive. If CH will merge every new part then all resources will be spend on merges and will no resources remain on queries (selects ).
16 |
17 | CH will not merge parts with a combined size greater than 150 GB [max_bytes_to_merge_at_max_space_in_pool](https://clickhouse.com/docs/en/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool).
18 |
19 | ```
20 | SELECT
21 | database,
22 | table,
23 | partition,
24 | sum(rows) AS rows,
25 | count() AS part_count
26 | FROM system.parts
27 | WHERE (active = 1) AND (table LIKE '%') AND (database LIKE '%')
28 | GROUP BY
29 | database,
30 | table,
31 | partition
32 | ORDER BY part_count DESC
33 | limit 20
34 | ```
35 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-dictionaries/mysql8-source-for-dictionaries.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "MySQL8 source for dictionaries"
3 | linkTitle: "MySQL8 source for dictionaries"
4 | description: >
5 | MySQL8 source for dictionaries
6 | ---
7 | #### Authorization
8 |
9 | MySQL8 used default authorization plugin `caching_sha2_password`. Unfortunately, `libmysql` which currently used (21.4-) in ClickHouse® is not.
10 |
11 | You can fix it during create custom user with `mysql_native_password` authentication plugin.
12 |
13 | ```sql
14 | CREATE USER IF NOT EXISTS 'clickhouse'@'%'
15 | IDENTIFIED WITH mysql_native_password BY 'clickhouse_user_password';
16 |
17 | CREATE DATABASE IF NOT EXISTS test;
18 |
19 | GRANT ALL PRIVILEGES ON test.* TO 'clickhouse'@'%';
20 | ```
21 |
22 | #### Table schema changes
23 |
24 | As an example, in ClickHouse, run `SHOW TABLE STATUS LIKE 'table_name'` and try to figure out was table schema changed or not from MySQL response field `Update_time`.
25 |
26 | By default, to properly data loading from MySQL8 source to dictionaries, please turn off the `information_schema` cache.
27 |
28 | You can change default behavior with create `/etc/mysql/conf.d/information_schema_cache.cnf`with following content:
29 |
30 | ```ini
31 | [mysqld]
32 | information_schema_stats_expiry=0
33 | ```
34 |
35 | Or setup it via SQL query:
36 |
37 | ```sql
38 | SET GLOBAL information_schema_stats_expiry=0;
39 | ```
40 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/ssl-connection-unexpectedly-closed.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "SSL connection unexpectedly closed"
3 | linkTitle: "SSL connection unexpectedly closed"
4 | description: >
5 | SSL connection unexpectedly closed
6 | ---
7 | ClickHouse doesn't probe CA path which is default on CentOS and Amazon Linux.
8 |
9 | ## ClickHouse client
10 |
11 | ```markup
12 | cat /etc/clickhouse-client/conf.d/openssl-ca.xml
13 |
14 |
15 |
16 | /etc/ssl/certs
17 |
18 |
19 |
20 | ```
21 |
22 | ## ClickHouse server
23 |
24 | ```markup
25 | cat /etc/clickhouse-server/conf.d/openssl-ca.xml
26 |
27 |
28 |
29 | /etc/ssl/certs
30 |
31 |
32 | /etc/ssl/certs
33 |
34 |
35 |
36 | ```
37 |
38 | [https://github.com/ClickHouse/ClickHouse/issues/17803](https://github.com/ClickHouse/ClickHouse/issues/17803)
39 |
40 | [https://github.com/ClickHouse/ClickHouse/issues/18869](https://github.com/ClickHouse/ClickHouse/issues/18869)
41 |
--------------------------------------------------------------------------------
/content/en/engines/mergetree-table-engine-family/skip-index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Skip index"
3 | linkTitle: "Skip index"
4 | description: >
5 | Skip index
6 | ---
7 | {{% alert title="Warning" color="warning" %}}
8 | When you are creating
9 | [skip indexes](https://altinity.com/blog/clickhouse-black-magic-skipping-indices)
10 | in non-regular (Replicated)MergeTree tables over non ORDER BY columns. ClickHouse® applies index condition on the first step of query execution, so it's possible to get outdated rows.
11 | {{% /alert %}}
12 |
13 | ```sql
14 | --(1) create test table
15 | drop table if exists test;
16 | create table test
17 | (
18 | version UInt32
19 | ,id UInt32
20 | ,state UInt8
21 | ,INDEX state_idx (state) type set(0) GRANULARITY 1
22 | ) ENGINE ReplacingMergeTree(version)
23 | ORDER BY (id);
24 |
25 | --(2) insert sample data
26 | INSERT INTO test (version, id, state) VALUES (1,1,1);
27 | INSERT INTO test (version, id, state) VALUES (2,1,0);
28 | INSERT INTO test (version, id, state) VALUES (3,1,1);
29 |
30 | --(3) check the result:
31 | -- expected 3, 1, 1
32 | select version, id, state from test final;
33 | ┌─version─┬─id─┬─state─┐
34 | │ 3 │ 1 │ 1 │
35 | └─────────┴────┴───────┘
36 |
37 | -- expected empty result
38 | select version, id, state from test final where state=0;
39 | ┌─version─┬─id─┬─state─┐
40 | │ 2 │ 1 │ 0 │
41 | └─────────┴────┴───────┘
42 | ```
43 |
--------------------------------------------------------------------------------
/static/assets/93978653.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-dictionaries/partial-updates.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Partial updates"
3 | linkTitle: "Partial updates"
4 | description: >
5 | Partial updates
6 | ---
7 | ClickHouse® is able to fetch from a source only updated rows. You need to define `update_field` section.
8 |
9 | As an example, We have a table in an external source MySQL, PG, HTTP, ... defined with the following code sample:
10 |
11 | ```sql
12 | CREATE TABLE cities
13 | (
14 | `polygon` Array(Tuple(Float64, Float64)),
15 | `city` String,
16 | `updated_at` DateTime DEFAULT now()
17 | )
18 | ENGINE = MergeTree ORDER BY city
19 | ```
20 |
21 | When you add new row and `update` some rows in this table you should update `updated_at` with the new timestamp.
22 |
23 | ```sql
24 | -- fetch updated rows every 30 seconds
25 |
26 | CREATE DICTIONARY cities_dict (
27 | polygon Array(Tuple(Float64, Float64)),
28 | city String
29 | )
30 | PRIMARY KEY polygon
31 | SOURCE(CLICKHOUSE( TABLE cities DB 'default'
32 | update_field 'updated_at'))
33 | LAYOUT(POLYGON())
34 | LIFETIME(MIN 30 MAX 30)
35 | ```
36 |
37 | A dictionary with **update_field** `updated_at` will fetch only updated rows. A dictionary saves the current time (now) time of the last successful update and queries the source `where updated_at >= previous_update - 1` (shift = 1 sec.).
38 |
39 | In case of HTTP source ClickHouse will send get requests with **update_field** as an URL parameter `&updated_at=2020-01-01%2000:01:01`
40 |
--------------------------------------------------------------------------------
/content/en/engines/mergetree-table-engine-family/collapsing-vs-replacing.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "CollapsingMergeTree vs ReplacingMergeTree"
3 | linkTitle: "CollapsingMergeTree vs ReplacingMergeTree"
4 | weight: 100
5 | ---
6 |
7 | ## CollapsingMergeTree vs ReplacingMergeTree
8 |
9 | | ReplacingMergeTree | CollapsingMergeTree |
10 | |:----------------------------------------------------------------------------------------------------|:-|
11 | | + very easy to use (always replace) | - more complex (accounting-alike, put 'rollback' records to fix something) |
12 | | + you don't need to store the previous state of the row | - you need to the store (somewhere) the previous state of the row, OR extract it from the table itself (point queries is not nice for ClickHouse®) |
13 | | - no deletes | + support deletes |
14 | | - w/o FINAL - you can can always see duplicates, you need always to 'pay' FINAL performance penalty | + properly crafted query can give correct results without final (i.e. `sum(amount * sign)` will be correct, no matter of you have duplicated or not) |
15 | | - only `uniq()`-alike things can be calculated in materialized views | + you can do basic counts & sums in materialized views |
16 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-functions/encrypt.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Encrypt"
3 | linkTitle: "Encrypt"
4 | ---
5 |
6 | ## WHERE over encrypted column
7 |
8 | ```sql
9 | CREATE TABLE encrypt
10 | (
11 | `key` UInt32,
12 | `value` FixedString(4)
13 | )
14 | ENGINE = MergeTree
15 | ORDER BY key;
16 |
17 | INSERT INTO encrypt SELECT
18 | number,
19 | encrypt('aes-256-ctr', reinterpretAsString(number + 0.3), 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 'xxxxxxxxxxxxxxxx')
20 | FROM numbers(100000000);
21 |
22 | SET max_threads = 1;
23 |
24 | SELECT count()
25 | FROM encrypt
26 | WHERE value IN encrypt('aes-256-ctr', reinterpretAsString(toFloat32(1.3)), 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 'xxxxxxxxxxxxxxxx')
27 |
28 | ┌─count()─┐
29 | │ 1 │
30 | └─────────┘
31 |
32 | 1 rows in set. Elapsed: 0.666 sec. Processed 100.00 million rows, 400.01 MB (150.23 million rows/s., 600.93 MB/s.)
33 |
34 |
35 | SELECT count()
36 | FROM encrypt
37 | WHERE reinterpretAsFloat32(encrypt('aes-256-ctr', value, 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 'xxxxxxxxxxxxxxxx')) IN toFloat32(1.3)
38 |
39 | ┌─count()─┐
40 | │ 1 │
41 | └─────────┘
42 |
43 | 1 rows in set. Elapsed: 8.395 sec. Processed 100.00 million rows, 400.01 MB (11.91 million rows/s., 47.65 MB/s.)
44 | ```
45 |
46 | {{% alert title="Info" color="info" %}}
47 | Because encryption and decryption can be expensive due re-initialization of keys and iv, usually it make sense to use those functions over literal values instead of table column.
48 | {{% /alert %}}
49 |
--------------------------------------------------------------------------------
/content/en/engines/mergetree-table-engine-family/replacingmergetree/altinity-kb-replacingmergetree-does-not-collapse-duplicates.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "ReplacingMergeTree does not collapse duplicates"
3 | linkTitle: "ReplacingMergeTree does not collapse duplicates"
4 | description: >
5 | ReplacingMergeTree does not collapse duplicates
6 | ---
7 | **Hi there, I have a question about replacing merge trees. I have set up a
8 | [Materialized View](https://www.youtube.com/watch?v=THDk625DGsQ)
9 | with ReplacingMergeTree table, but even if I call optimize on it, the parts don't get merged. I filled that table yesterday, nothing happened since then. What should I do?**
10 |
11 | Merges are eventual and may never happen. It depends on the number of inserts that happened after, the number of parts in the partition, size of parts.
12 | If the total size of input parts are greater than the maximum part size then they will never be merged.
13 |
14 | [https://clickhouse.com/docs/en/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool](https://clickhouse.com/docs/en/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool)
15 |
16 | [https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree)
17 | _ReplacingMergeTree is suitable for clearing out duplicate data in the background in order to save space, but it doesn’t guarantee the absence of duplicates._
18 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/joins/join-table-engine.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "JOIN table engine"
3 | linkTitle: "JOIN table engine"
4 | description: >
5 | JOIN table engine
6 | draft: true
7 | ---
8 | The main purpose of JOIN table engine is to avoid building the right table for joining on each query execution. So it's usually used when you have a high amount of fast queries which share the same right table for joining.
9 |
10 | ### Updates
11 |
12 | It's possible to update rows with setting `join_any_take_last_row` enabled.
13 |
14 | ```sql
15 | CREATE TABLE id_val_join
16 | (
17 | `id` UInt32,
18 | `val` UInt8
19 | )
20 | ENGINE = Join(ANY, LEFT, id)
21 | SETTINGS join_any_take_last_row = 1
22 |
23 | Ok.
24 |
25 | INSERT INTO id_val_join VALUES (1,21)(1,22)(3,23);
26 |
27 | Ok.
28 |
29 | SELECT *
30 | FROM
31 | (
32 | SELECT toUInt32(number) AS id
33 | FROM numbers(4)
34 | ) AS n
35 | ANY LEFT JOIN id_val_join USING (id)
36 |
37 | ┌─id─┬─val─┐
38 | │ 0 │ 0 │
39 | │ 1 │ 22 │
40 | │ 2 │ 0 │
41 | │ 3 │ 23 │
42 | └────┴─────┘
43 |
44 | INSERT INTO id_val_join VALUES (1,40)(2,24);
45 |
46 | Ok.
47 |
48 | SELECT *
49 | FROM
50 | (
51 | SELECT toUInt32(number) AS id
52 | FROM numbers(4)
53 | ) AS n
54 | ANY LEFT JOIN id_val_join USING (id)
55 |
56 | ┌─id─┬─val─┐
57 | │ 0 │ 0 │
58 | │ 1 │ 40 │
59 | │ 2 │ 24 │
60 | │ 3 │ 23 │
61 | └────┴─────┘
62 | ```
63 |
64 | [Join table engine documentation](https://clickhouse.com/docs/en/engines/table-engines/special/join/)
65 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/roaring-bitmaps-for-calculating-retention.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Roaring bitmaps for calculating retention"
3 | linkTitle: "Roaring bitmaps for calculating retention"
4 | ---
5 | ```sql
6 | CREATE TABLE test_roaring_bitmap
7 | ENGINE = MergeTree
8 | ORDER BY h AS
9 | SELECT
10 | intDiv(number, 5) AS h,
11 | groupArray(toUInt16(number - (2 * intDiv(number, 5)))) AS vals,
12 | groupBitmapState(toUInt16(number - (2 * intDiv(number, 5)))) AS vals_bitmap
13 | FROM numbers(40)
14 | GROUP BY h
15 |
16 | SELECT
17 | h,
18 | vals,
19 | hex(vals_bitmap)
20 | FROM test_roaring_bitmap
21 |
22 | ┌─h─┬─vals─────────────┬─hex(vals_bitmap)─────────┐
23 | │ 0 │ [0,1,2,3,4] │ 000500000100020003000400 │
24 | │ 1 │ [3,4,5,6,7] │ 000503000400050006000700 │
25 | │ 2 │ [6,7,8,9,10] │ 000506000700080009000A00 │
26 | │ 3 │ [9,10,11,12,13] │ 000509000A000B000C000D00 │
27 | │ 4 │ [12,13,14,15,16] │ 00050C000D000E000F001000 │
28 | │ 5 │ [15,16,17,18,19] │ 00050F001000110012001300 │
29 | │ 6 │ [18,19,20,21,22] │ 000512001300140015001600 │
30 | │ 7 │ [21,22,23,24,25] │ 000515001600170018001900 │
31 | └───┴──────────────────┴──────────────────────────┘
32 |
33 | SELECT
34 | groupBitmapAnd(vals_bitmap) AS uniq,
35 | bitmapToArray(groupBitmapAndState(vals_bitmap)) AS vals
36 | FROM test_roaring_bitmap
37 | WHERE h IN (0, 1)
38 |
39 | ┌─uniq─┬─vals──┐
40 | │ 2 │ [3,4] │
41 | └──────┴───────┘
42 | ```
43 |
44 | See also [A primer on roaring bitmaps](https://vikramoberoi.com/a-primer-on-roaring-bitmaps-what-they-are-and-how-they-work/)
45 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Welcome
2 |
3 | Welcome to the Altinity Knowledgebase Repository! This Knowledgebase was established for Altinity Engineers and ClickHouse community members to work together to find common solutions.
4 |
5 | Submissions and merges to this repository are distributed at https://kb.altinity.com .
6 |
7 | This knowledgebase is licensed under Apache 2.0. Contributors who submit to the Altinity Knowledgebase agree to the Altinity Contribution License Agreement.
8 |
9 | ## How This Site is Rendered
10 |
11 | This site is rendered using [Hugo](https://gohugo.io/) and the [Docsy theme](https://www.docsy.dev/).
12 |
13 | To test out the site on a local system:
14 |
15 | 1. Download the entire repo.
16 | 1. Install `hugo`.
17 | 1. From the command line, run `npm install` to allocate the proper packages locally.
18 | 1. From the command line, run `git submodule update --init --recursive` to populate the Docsy theme.
19 | 1. Edit the contents of the `./content/en` directory. To add images/pdfs/etc , those go into `./static`.
20 | 1. To view the web page locally to verify how it looks, use `hugo server` and the web page will be displayed from `./docs` as a local server on `http://localhost:1313`.
21 |
22 | ## How This Site Is Served
23 |
24 | Merges into the `main` branch are run through a Github workflow, and the results are rendered into the branch `altinity-knowledgebase`. The GitHub pages are served from that branch. Members of the Altinity Knowledge Base team can directly contribute to the Knowledge Base. Other users will submit pull requests and agree to the CLA before their pull request will be accepted.
25 |
--------------------------------------------------------------------------------
/layouts/partials/section-index.html:
--------------------------------------------------------------------------------
1 |
2 | {{ $pages := (where .Site.Pages "Section" .Section).ByWeight }}
3 | {{ $pages = (where $pages "Type" "!=" "search") }}
4 | {{ $parent := .Page }}
5 | {{ if $parent.Params.no_list }}
6 | {{/* If no_list is true we don't show a list of subpages */}}
7 | {{ else if $parent.Params.simple_list }}
8 | {{/* If simple_list is true we show a bulleted list of subpages */}}
9 |
10 | {{ range $pages }}
11 | {{ if eq .Parent $parent }}
12 |
16 | {{ else }}
17 | {{/* Otherwise we show a nice formatted list of subpages with page descriptions */}}
18 | {{ with .Content }}
19 |
20 | {{ end }}
21 | {{ range $pages }}
22 | {{ if eq .Parent $parent }}
23 |
{{ if ne (.Description | markdownify) (.Title | markdownify) }}{{ .Description | markdownify }}{{ end }}
28 | {{ if .Date }}
29 | ({{- .Date.Format "January 2, 2006" -}})
30 | {{ end }}
31 |
32 |
33 |
34 | {{ end }}
35 | {{ end }}
36 | {{ end }}
37 |
38 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/data-types-on-disk-and-in-ram.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Data types on disk and in RAM"
3 | linkTitle: "Data types on disk and in RAM"
4 | description: >
5 | Data types on disk and in RAM
6 | ---
7 |
41 |
42 | See also the presentation [Data processing into ClickHouse®](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup41/data_processing.pdf), especially slides 17-22.
43 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/explain-query.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "EXPLAIN query"
3 | linkTitle: "EXPLAIN query"
4 | description: >
5 | EXPLAIN query
6 | ---
7 |
8 | ### EXPLAIN types
9 |
10 | ```sql
11 | EXPLAIN AST
12 | SYNTAX
13 | PLAN indexes = 0,
14 | header = 0,
15 | description = 1,
16 | actions = 0,
17 | optimize = 1
18 | json = 0
19 | PIPELINE header = 0,
20 | graph = 0,
21 | compact = 1
22 | ESTIMATE
23 | SELECT ...
24 | ```
25 |
26 | * `AST` - abstract syntax tree
27 | * `SYNTAX` - query text after AST-level optimizations
28 | * `PLAN` - query execution plan
29 | * `PIPELINE` - query execution pipeline
30 | * `ESTIMATE` - See [Estimates for select query](https://github.com/ClickHouse/ClickHouse/pull/26131), available since ClickHouse® 21.9
31 | * `indexes=1` supported starting from 21.6 (https://github.com/ClickHouse/ClickHouse/pull/22352 )
32 | * `json=1` supported starting from 21.6 (https://github.com/ClickHouse/ClickHouse/pull/23082)
33 |
34 |
35 | References
36 | * https://clickhouse.com/docs/en/sql-reference/statements/explain/
37 | * Nikolai Kochetov from Yandeх. EXPLAIN query in ClickHouse. [slides](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup47/explain.pdf), [video](https://youtu.be/ckChUkC3Pns?t=1387)
38 | * [https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup39/query-profiling.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup39/query-profiling.pdf)
39 | * https://github.com/ClickHouse/ClickHouse/issues/28847
40 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-schema-design/altinity-kb-jsonasstring-and-mat.-view-as-json-parser.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "JSONAsString and Mat. View as JSON parser"
3 | linkTitle: "JSONAsString and Mat. View as JSON parser"
4 | description: >
5 | JSONAsString and Mat. View as JSON parser
6 | ---
7 | Tables with engine Null don’t store data but can be used as a source for materialized views.
8 |
9 | JSONAsString a special input format which allows to ingest JSONs into a String column. If the input has several JSON objects (comma separated) they will be interpreted as separate rows. JSON can be multiline.
10 |
11 | ```sql
12 | create table entrypoint(J String) Engine=Null;
13 | create table datastore(a String, i Int64, f Float64) Engine=MergeTree order by a;
14 |
15 | create materialized view jsonConverter to datastore
16 | as select (JSONExtract(J, 'Tuple(String,Tuple(Int64,Float64))') as x),
17 | x.1 as a,
18 | x.2.1 as i,
19 | x.2.2 as f
20 | from entrypoint;
21 |
22 | $ echo '{"s": "val1", "b2": {"i": 42, "f": 0.1}}' | \
23 | clickhouse-client -q "insert into entrypoint format JSONAsString"
24 |
25 | $ echo '{"s": "val1","b2": {"i": 33, "f": 0.2}},{"s": "val1","b2": {"i": 34, "f": 0.2}}' | \
26 | clickhouse-client -q "insert into entrypoint format JSONAsString"
27 |
28 | SELECT * FROM datastore;
29 | ┌─a────┬──i─┬───f─┐
30 | │ val1 │ 42 │ 0.1 │
31 | └──────┴────┴─────┘
32 | ┌─a────┬──i─┬───f─┐
33 | │ val1 │ 33 │ 0.2 │
34 | │ val1 │ 34 │ 0.2 │
35 | └──────┴────┴─────┘
36 | ```
37 |
38 | See also: [JSONExtract to parse many attributes at a time](/altinity-kb-queries-and-syntax/jsonextract-to-parse-many-attributes-at-a-time/)
39 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-dictionaries/dictionary-on-top-tables.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Dictionary on the top of several tables using VIEW"
3 | linkTitle: "Dictionary on the top of several tables using VIEW"
4 | description: >
5 | Dictionary on the top of several tables using VIEW
6 | ---
7 | ```sql
8 |
9 | DROP TABLE IF EXISTS dictionary_source_en;
10 | DROP TABLE IF EXISTS dictionary_source_ru;
11 | DROP TABLE IF EXISTS dictionary_source_view;
12 | DROP DICTIONARY IF EXISTS flat_dictionary;
13 |
14 | CREATE TABLE dictionary_source_en
15 | (
16 | id UInt64,
17 | value String
18 | ) ENGINE = TinyLog;
19 |
20 | INSERT INTO dictionary_source_en VALUES (1, 'One'), (2,'Two'), (3, 'Three');
21 |
22 | CREATE TABLE dictionary_source_ru
23 | (
24 | id UInt64,
25 | value String
26 | ) ENGINE = TinyLog;
27 |
28 | INSERT INTO dictionary_source_ru VALUES (1, 'Один'), (2,'Два'), (3, 'Три');
29 |
30 | CREATE VIEW dictionary_source_view AS SELECT id, dictionary_source_en.value as value_en, dictionary_source_ru.value as value_ru FROM dictionary_source_en LEFT JOIN dictionary_source_ru USING (id);
31 |
32 | select * from dictionary_source_view;
33 |
34 | CREATE DICTIONARY flat_dictionary
35 | (
36 | id UInt64,
37 | value_en String,
38 | value_ru String
39 | )
40 | PRIMARY KEY id
41 | SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' PASSWORD '' TABLE 'dictionary_source_view'))
42 | LIFETIME(MIN 1 MAX 1000)
43 | LAYOUT(FLAT());
44 |
45 | SELECT
46 | dictGet(concat(currentDatabase(), '.flat_dictionary'), 'value_en', number + 1),
47 | dictGet(concat(currentDatabase(), '.flat_dictionary'), 'value_ru', number + 1)
48 | FROM numbers(3);
49 | ```
50 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/who-ate-my-cpu.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Who ate my CPU"
3 | linkTitle: "Who ate my CPU"
4 | weight: 100
5 | description: >-
6 | Queries to find which subsytem of ClickHouse® is using the most of CPU.
7 | ---
8 |
9 | ## Merges
10 |
11 | ```sql
12 | SELECT
13 | table,
14 | round((elapsed * (1 / progress)) - elapsed, 2) AS estimate,
15 | elapsed,
16 | progress,
17 | is_mutation,
18 | formatReadableSize(total_size_bytes_compressed) AS size,
19 | formatReadableSize(memory_usage) AS mem
20 | FROM system.merges
21 | ORDER BY elapsed DESC
22 | ```
23 |
24 | ## Mutations
25 |
26 | ```sql
27 | SELECT
28 | database,
29 | table,
30 | substr(command, 1, 30) AS command,
31 | sum(parts_to_do) AS parts_to_do,
32 | anyIf(latest_fail_reason, latest_fail_reason != '')
33 | FROM system.mutations
34 | WHERE NOT is_done
35 | GROUP BY
36 | database,
37 | table,
38 | command
39 | ```
40 |
41 | ## Current Processes
42 |
43 | ```sql
44 | select elapsed, query from system.processes where is_initial_query and elapsed > 2
45 | ```
46 |
47 | ## Processes retrospectively
48 |
49 | ```sql
50 | SELECT
51 | normalizedQueryHash(query) hash,
52 | current_database,
53 | sum(ProfileEvents['UserTimeMicroseconds'] as userCPUq)/1000 AS userCPUms,
54 | count(),
55 | sum(query_duration_ms) query_duration_ms,
56 | userCPUms/query_duration_ms cpu_per_sec,
57 | argMax(query, userCPUq) heaviest_query
58 | FROM system.query_log
59 | WHERE (type = 2) AND (event_date >= today())
60 | GROUP BY
61 | current_database,
62 | hash
63 | ORDER BY userCPUms DESC
64 | LIMIT 10
65 | FORMAT Vertical;
66 | ```
67 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/altinity-kb-zookeeper-cluster-migration.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "ZooKeeper cluster migration"
3 | linkTitle: "ZooKeeper cluster migration"
4 | description: >
5 | ZooKeeper cluster migration
6 | ---
7 | Here is a plan for ZK 3.4.9 (no dynamic reconfiguration):
8 |
9 | 1. Add the 3 new ZK nodes to the old cluster. No changes needed for the 3 old ZK nodes at this time.
10 | 1. Configure one of the new ZK nodes as a cluster of 4 nodes (3 old + 1 new), start it.
11 | 2. Configure the other two new ZK nodes as a cluster of 6 nodes (3 old + 3 new), start them.
12 | 2. Make sure the 3 new ZK nodes connected to the old ZK cluster as followers (run `echo stat | nc localhost 2181` on the 3 new ZK nodes)
13 | 3. Confirm that the leader has 5 synced followers (run `echo mntr | nc localhost 2181` on the leader, look for `zk_synced_followers`)
14 | 4. Stop data ingestion in CH (this is to minimize errors when CH loses ZK).
15 | 5. Change the zookeeper section in the configs on the CH nodes (remove the 3 old ZK servers, add the 3 new ZK servers)
16 | 6. Make sure that there are no connections from CH to the 3 old ZK nodes (run `echo stat | nc localhost 2181` on the 3 old nodes, check their `Clients` section). Restart all CH nodes if necessary (In some cases CH can reconnect to different ZK servers without a restart).
17 | 7. Remove the 3 old ZK nodes from `zoo.cfg` on the 3 new ZK nodes.
18 | 8. Restart the 3 new ZK nodes. They should form a cluster of 3 nodes.
19 | 9. When CH reconnects to ZK, start data loading.
20 | 10. Turn off the 3 old ZK nodes.
21 |
22 | This plan works, but it is not the only way to do this, it can be changed if needed.
23 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-schema-design/altinity-kb-dictionaries-vs-lowcardinality.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Dictionaries vs LowCardinality"
3 | linkTitle: "Dictionaries vs LowCardinality"
4 | description: >
5 | Dictionaries vs LowCardinality
6 | ---
7 | Q. I think I'm still trying to understand how de-normalized is okay - with my relational mindset, I want to move repeated string fields into their own table, but I'm not sure to what extent this is necessary
8 |
9 | I will look at LowCardinality in more detail - I think it may work well here
10 |
11 | A. If it's a simple repetition, which you don't need to manipulate/change in future - LowCardinality works great, and you usually don't need to increase the system complexity by introducing dicts.
12 |
13 | For example: name of team 'Manchester United' will rather not be changed, and even if it will you can keep the historical records with historical name. So normalization here (with some dicts) is very optional, and de-normalized approach with LowCardinality is good & simpler alternative.
14 |
15 | From the other hand: if data can be changed in future, and that change should impact the reports, then normalization may be a big advantage.
16 |
17 | For example if you need to change the used currency rare every day- it would be quite stupid to update all historical records to apply the newest exchange rate. And putting it to dict will allow to do calculations with latest exchange rate at select time.
18 |
19 | For dictionary it's possible to mark some of the attributes as injective. An attribute is called injective if different attribute values correspond to different keys. It would allow ClickHouse® to replace dictGet call in GROUP BY with cheap dict key.
20 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-rabbitmq/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "RabbitMQ"
3 | linkTitle: "RabbitMQ"
4 | description: >
5 | RabbitMQ engine in ClickHouse® 24.3+
6 | ---
7 |
8 | ### Settings
9 |
10 | Basic RabbitMQ settings and use cases: https://clickhouse.com/docs/en/engines/table-engines/integrations/rabbitmq
11 |
12 | ### Latest improvements/fixes
13 |
14 | ##### (v23.10+)
15 |
16 | - **Allow to save unparsed records and errors in RabbitMQ**:
17 | NATS and FileLog engines. Add virtual columns `_error` and `_raw_message` (for NATS and RabbitMQ), `_raw_record` (for FileLog) that are filled when ClickHouse fails to parse new record.
18 | The behaviour is controlled under storage settings `nats_handle_error_mode` for NATS, `rabbitmq_handle_error_mode` for RabbitMQ, `handle_error_mode` for FileLog similar to `kafka_handle_error_mode`.
19 | If it's set to `default`, en exception will be thrown when ClickHouse fails to parse a record, if it's set to `stream`, error and raw record will be saved into virtual columns.
20 | Closes [#36035](https://github.com/ClickHouse/ClickHouse/issues/36035) and [#55477](https://github.com/ClickHouse/ClickHouse/pull/55477)
21 |
22 |
23 | ##### (v24+)
24 |
25 | - [#45350 RabbitMq Storage Engine should NACK messages if exception is thrown during processing](https://github.com/ClickHouse/ClickHouse/issues/45350)
26 | - [#59775 rabbitmq: fix having neither acked nor nacked messages](https://github.com/ClickHouse/ClickHouse/pull/59775)
27 | - [#60312 Make rabbitmq nack broken messages](https://github.com/ClickHouse/ClickHouse/pull/60312)
28 | - [#61320 Fix logical error in RabbitMQ storage with MATERIALIZED columns](https://github.com/ClickHouse/ClickHouse/pull/61320)
29 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/load-balancers.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Load balancers"
3 | linkTitle: "Load balancers"
4 | description: >
5 | Load balancers
6 | ---
7 | In general - one of the simplest option to do load balancing is to implement it on the client side.
8 |
9 | I.e. list several endpoints for ClickHouse® connections and add some logic to pick one of the nodes.
10 |
11 | Many client libraries support that.
12 |
13 | ## ClickHouse native protocol (port 9000)
14 |
15 | Currently there are no protocol-aware proxies for ClickHouse protocol, so the proxy / load balancer can work only on TCP level.
16 |
17 | One of the best option for TCP load balancer is haproxy, also nginx can work in that mode.
18 |
19 | Haproxy will pick one upstream when connection is established, and after that it will keep it connected to the same server until the client or server will disconnect (or some timeout will happen).
20 |
21 | It can’t send different queries coming via a single connection to different servers, as he knows nothing about ClickHouse protocol and doesn't know when one query ends and another start, it just sees the binary stream.
22 |
23 | So for native protocol, there are only 3 possibilities:
24 |
25 | 1) close connection after each query client-side
26 | 2) close connection after each query server-side (currently there is only one setting for that - idle_connection_timeout=0, which is not exact what you need, but similar).
27 | 3) use a ClickHouse server with Distributed table as a proxy.
28 |
29 | ## HTTP protocol (port 8123)
30 |
31 | There are many more options and you can use haproxy / nginx / chproxy, etc.
32 | chproxy give some extra ClickHouse-specific features, you can find a list of them at [https://chproxy.org](https://chproxy.org)
33 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/source-pars-size-is-greater-than-maximum.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "source parts size is greater than the current maximum"
3 | linkTitle: "source parts sizeis greater than the current maximum"
4 | weight: 100
5 | description: >-
6 | source parts size (...) is greater than the current maximum (...)
7 | ---
8 |
9 | ## Symptom
10 |
11 | I see messages like: `source parts size (...) is greater than the current maximum (...)` in the logs and/or inside `system.replication_queue`
12 |
13 |
14 | ## Cause
15 |
16 | Usually that means that there are already few big merges running.
17 | You can see the running merges using the query:
18 |
19 | ```
20 | SELECT * FROM system.merges
21 | ```
22 |
23 | That logic is needed to prevent picking a log of huge merges simultaneously
24 | (otherwise they will take all available slots and ClickHouse® will not be
25 | able to do smaller merges, which usually are important for keeping the
26 | number of parts stable).
27 |
28 |
29 | ## Action
30 |
31 | It is normal to see those messages on some stale replicas. And it should be resolved
32 | automatically after some time. So just wait & monitor system.merges &
33 | system.replication_queue tables, it should be resolved by it's own.
34 |
35 | If it happens often or don't resolves by it's own during some longer period of time,
36 | it could be caused by:
37 | 1) increased insert pressure
38 | 2) disk issues / high load (it works slow, not enough space etc.)
39 | 3) high CPU load (not enough CPU power to catch up with merges)
40 | 4) issue with table schemas leading to high merges pressure (high / increased number of tables / partitions / etc.)
41 |
42 | Start from checking dmesg / system journals / ClickHouse monitoring to find the anomalies.
43 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-functions/altinity-kb-sequencematch.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "sequenceMatch"
3 | linkTitle: "sequenceMatch"
4 | description: >
5 | sequenceMatch
6 | ---
7 | ## Question
8 |
9 | I expect the sequence here to only match once as a is only directly after a once - but it matches with gaps. Why is that?
10 |
11 | ```sql
12 | SELECT sequenceCount('(?1)(?2)')(sequence, page ILIKE '%a%', page ILIKE '%a%') AS sequences
13 | FROM values('page String, sequence UInt16', ('a', 1), ('a', 2), ('b', 3), ('b', 4), ('a', 5), ('b', 6), ('a', 7))
14 |
15 | 2 # ??
16 | ```
17 |
18 | ## Answer
19 |
20 | `sequenceMatch` just ignores the events which don't match the condition. Check that:
21 |
22 | ```sql
23 | SELECT sequenceMatch('(?1)(?2)')(sequence,page='a',page='b') AS sequences FROM values( 'page String, sequence UInt16' , ('a', 1), ('c',2), ('b', 3));
24 | 1 # ??
25 |
26 | SELECT sequenceMatch('(?1).(?2)')(sequence,page='a',page='b') AS sequences FROM values( 'page String, sequence UInt16' , ('a', 1), ('c',2), ('b', 3));
27 | 0 # ???
28 |
29 | SELECT sequenceMatch('(?1)(?2)')(sequence,page='a',page='b', page NOT IN ('a','b')) AS sequences from values( 'page String, sequence UInt16' , ('a', 1), ('c',2), ('b', 3));
30 | 0 # !
31 |
32 | SELECT sequenceMatch('(?1).(?2)')(sequence,page='a',page='b', page NOT IN ('a','b')) AS sequences from values( 'page String, sequence UInt16' , ('a', 1), ('c',2), ('b', 3));
33 | 1 #
34 | ```
35 |
36 | So for your example - just introduce one more 'nothing matched' condition:
37 |
38 | ```sql
39 | SELECT sequenceCount('(?1)(?2)')(sequence, page ILIKE '%a%', page ILIKE '%a%', NOT (page ILIKE '%a%')) AS sequences
40 | FROM values('page String, sequence UInt16', ('a', 1), ('a', 2), ('b', 3), ('b', 4), ('a', 5), ('b', 6), ('a', 7))
41 | ```
42 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-shutting-down-a-node.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Shutting down a node"
3 | linkTitle: "Shutting down a node"
4 | description: >
5 | Shutting down a node
6 | ---
7 | It’s possible to shutdown server on fly, but that would lead to failure of some queries.
8 |
9 | More safer way:
10 |
11 | * Remove server (which is going to be disabled) from remote_server section of config.xml on all servers.
12 | * avoid removing the last replica of the shard (that can lead to incorrect data placement if you use non-random distribution)
13 | * Remove server from load balancer, so new queries wouldn’t hit it.
14 | * Detach Kafka / Rabbit / Buffer tables (if used), and Materialized* databases.
15 | * Wait until all already running queries would finish execution on it.
16 | It’s possible to check it via query:
17 |
18 | ```sql
19 | SHOW PROCESSLIST;
20 | ```
21 | * Ensure there is no pending data in distributed tables
22 |
23 | ```sql
24 | SELECT * FROM system.distribution_queue;
25 | SYSTEM FLUSH DISTRIBUTED ;
26 | ```
27 |
28 | * Run sync replica query in related shard replicas (others than the one you remove) via query:
29 |
30 | ```sql
31 | SYSTEM SYNC REPLICA db.table;
32 | ```
33 |
34 |
35 | * Shutdown server.
36 |
37 | `SYSTEM SHUTDOWN` query by default doesn’t wait until query completion and tries to kill all queries immediately after receiving signal, if you want to change this behavior, you need to enable setting `shutdown_wait_unfinished_queries`.
38 |
39 | [https://github.com/ClickHouse/ClickHouse/blob/d705f8ead4bdc837b8305131844f558ec002becc/programs/server/Server.cpp#L1682](https://github.com/ClickHouse/ClickHouse/blob/d705f8ead4bdc837b8305131844f558ec002becc/programs/server/Server.cpp#L1682)
40 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-exactly-once-semantics.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Exactly once semantics"
3 | linkTitle: "Exactly once semantics"
4 | description: >
5 | Exactly once semantics
6 | ---
7 | EOS consumer (isolation.level=read_committed) is enabled by default since librdkafka 1.2.0, so for ClickHouse® - since 20.2
8 |
9 | See:
10 |
11 | * [edenhill/librdkafka@6b2a155](https://github.com/edenhill/librdkafka/commit/6b2a1552ac2a4ea09d915015183f268dd2df96e6)
12 | * [9de5dff](https://github.com/ClickHouse/ClickHouse/commit/9de5dffb5c97eb93545ae25eaf87ec195a590148)
13 |
14 | BUT: while EOS semantics will guarantee you that no duplicates will happen on the Kafka side (i.e. even if you produce the same messages few times it will be consumed once), but ClickHouse as a Kafka client can currently guarantee only at-least-once. And in some corner cases (connection lost etc) you can get duplicates.
15 |
16 | We need to have something like transactions on ClickHouse side to be able to avoid that. Adding something like simple transactions is in plans for Y2022.
17 |
18 |
19 | ## block-aggregator by eBay
20 |
21 | Block Aggregator is a data loader that subscribes to Kafka topics, aggregates the Kafka messages into blocks that follow the ClickHouse’s table schemas, and then inserts the blocks into ClickHouse. Block Aggregator provides exactly-once delivery guarantee to load data from Kafka to ClickHouse. Block Aggregator utilizes Kafka’s metadata to keep track of blocks that are intended to send to ClickHouse, and later uses this metadata information to deterministically re-produce ClickHouse blocks for re-tries in case of failures. The identical blocks are guaranteed to be deduplicated by ClickHouse.
22 |
23 | [eBay/block-aggregator](https://github.com/eBay/block-aggregator)
24 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-rabbitmq/error-handling.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "RabbitMQ Error handling"
3 | linkTitle: "RabbitMQ Error handling"
4 | description: >
5 | Error handling for RabbitMQ table engine
6 | ---
7 |
8 | Same approach as in Kafka but virtual columns are different. Check https://clickhouse.com/docs/en/engines/table-engines/integrations/rabbitmq#virtual-columns
9 |
10 | ```sql
11 | CREATE TABLE IF NOT EXISTS rabbitmq.broker_errors_queue
12 | (
13 | exchange_name String,
14 | channel_id String,
15 | delivery_tag UInt64,
16 | redelivered UInt8,
17 | message_id String,
18 | timestamp UInt64
19 | )
20 | engine = RabbitMQ
21 | SETTINGS
22 | rabbitmq_host_port = 'localhost:5672',
23 | rabbitmq_exchange_name = 'exchange-test', -- required parameter even though this is done via the rabbitmq config
24 | rabbitmq_queue_consume = true,
25 | rabbitmq_queue_base = 'test-errors',
26 | rabbitmq_format = 'JSONEachRow',
27 | rabbitmq_username = 'guest',
28 | rabbitmq_password = 'guest',
29 | rabbitmq_handle_error_mode = 'stream';
30 |
31 | CREATE MATERIALIZED VIEW IF NOT EXISTS rabbitmq.broker_errors_mv
32 | (
33 | exchange_name String,
34 | channel_id String,
35 | delivery_tag UInt64,
36 | redelivered UInt8,
37 | message_id String,
38 | timestamp UInt64
39 | raw_message String,
40 | error String
41 | )
42 | ENGINE = MergeTree
43 | ORDER BY (error)
44 | SETTINGS index_granularity = 8192 AS
45 | SELECT
46 | _exchange_name AS exchange_name,
47 | _channel_id AS channel_id,
48 | _delivery_tag AS delivery_tag,
49 | _redelivered AS redelivered,
50 | _message_id AS message_id,
51 | _timestamp AS timestamp,
52 | _raw_message AS raw_message,
53 | _error AS error
54 | FROM rabbitmq.broker_errors_queue
55 | WHERE length(_error) > 0
56 | ```
57 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/ansi-sql-mode.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "ANSI SQL mode"
3 | linkTitle: "ANSI SQL mode"
4 | description: >
5 | ANSI SQL mode
6 | ---
7 | To make ClickHouse® more compatible with ANSI SQL standards (at the expense of some performance), you can adjust several settings. These configurations will bring ClickHouse closer to ANSI SQL behavior but may introduce a slowdown in query performance:
8 |
9 | ```sql
10 | join_use_nulls=1
11 | ```
12 | Introduced in: early versions
13 | Ensures that JOIN operations return NULL for non-matching rows, aligning with standard SQL behavior.
14 |
15 |
16 | ```sql
17 | cast_keep_nullable=1
18 | ```
19 | Introduced in: v20.5
20 | Preserves the NULL flag when casting between data types, which is typical in ANSI SQL.
21 |
22 |
23 | ```sql
24 | union_default_mode='DISTINCT'
25 | ```
26 | Introduced in: v21.1
27 | Makes the UNION operation default to UNION DISTINCT, which removes duplicate rows, following ANSI SQL behavior.
28 |
29 |
30 | ```sql
31 | allow_experimental_window_functions=1
32 | ```
33 | Introduced in: v21.3
34 | Enables support for window functions, which are a standard feature in ANSI SQL.
35 |
36 |
37 | ```sql
38 | prefer_column_name_to_alias=1
39 | ```
40 | Introduced in: v21.4
41 | This setting resolves ambiguities by preferring column names over aliases, following ANSI SQL conventions.
42 |
43 |
44 | ```sql
45 | group_by_use_nulls=1
46 | ```
47 | Introduced in: v22.7
48 | Allows NULL values to appear in the GROUP BY clause, consistent with ANSI SQL behavior.
49 |
50 | By enabling these settings, ClickHouse becomes more ANSI SQL-compliant, although this may come with a trade-off in terms of performance. Each of these options can be enabled as needed, based on the specific SQL compatibility requirements of your application.
51 |
52 |
53 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/logging.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Logging"
3 | linkTitle: "Logging"
4 | weight: 100
5 | description: >-
6 | Logging configuration and issues
7 | ---
8 | Q. I get errors:
9 |
10 | ```bash
11 | File not found: /var/log/clickhouse-server/clickhouse-server.log.0.
12 | File not found: /var/log/clickhouse-server/clickhouse-server.log.8.gz.
13 |
14 | ...
15 |
16 | File not found: /var/log/clickhouse-server/clickhouse-server.err.log.0, Stack trace (when copying this message, always include the lines below):
17 | 0. Poco::FileImpl::handleLastErrorImpl(std::__1::basic_string, std::__1::allocator > const&) @ 0x11c2b345 in /usr/bin/clickhouse
18 | 1. Poco::PurgeOneFileStrategy::purge(std::__1::basic_string, std::__1::allocator > const&) @ 0x11c84618 in /usr/bin/clickhouse
19 | 2. Poco::FileChannel::log(Poco::Message const&) @ 0x11c314cc in /usr/bin/clickhouse
20 | 3. DB::OwnFormattingChannel::logExtended(DB::ExtendedLogMessage const&) @ 0x8681402 in /usr/bin/clickhouse
21 | 4. DB::OwnSplitChannel::logSplit(Poco::Message const&) @ 0x8682fa8 in /usr/bin/clickhouse
22 | 5. DB::OwnSplitChannel::log(Poco::Message const&) @ 0x8682e41 in /usr/bin/clickhouse
23 | ```
24 |
25 | A. Check if you have proper permission to a log files folder, and enough disk space \(& inode numbers\) on the block device used for logging.
26 |
27 | ```bash
28 | ls -la /var/log/clickhouse-server/
29 | df -Th
30 | df -Thi
31 | ```
32 |
33 | Q. How to configure logging in ClickHouse®?
34 |
35 | A. See [https://github.com/ClickHouse/ClickHouse/blob/ceaf6d57b7f00e1925b85754298cf958a278289a/programs/server/config.xml\#L9-L62](https://github.com/ClickHouse/ClickHouse/blob/ceaf6d57b7f00e1925b85754298cf958a278289a/programs/server/config.xml#L9-L62)
36 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/altinity-kb-possible-deadlock-avoided.-client-should-retry.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Possible deadlock avoided. Client should retry"
3 | linkTitle: "Possible deadlock avoided. Client should retry"
4 | description: >
5 | Possible deadlock avoided. Client should retry
6 | ---
7 | In ClickHouse® version 19.14 a serious issue was found: a race condition that can lead to server deadlock. The reason for that was quite fundamental, and a temporary workaround for that was added ("possible deadlock avoided").
8 |
9 | Those locks are one of the fundamental things that the core team was actively working on in 2020.
10 |
11 | In 20.3 some of the locks leading to that situation were removed as a part of huge refactoring.
12 |
13 | In 20.4 more locks were removed, the check was made configurable (see `lock_acquire_timeout` ) so you can say how long to wait before returning that exception
14 |
15 | In 20.5 heuristics of that check ("possible deadlock avoided") was improved.
16 |
17 | In 20.6 all table-level locks which were possible to remove were removed, so alters are totally lock-free.
18 |
19 | 20.10 enables `database=Atomic` by default which allows running even DROP commands without locks.
20 |
21 | Typically issue was happening when doing some concurrent select on `system.parts` / `system.columns` / `system.table` with simultaneous table manipulations (doing some kind of ALTERS / TRUNCATES / DROP)I
22 |
23 | If that exception happens often in your use-case:
24 | - use recent clickhouse versions
25 | - ensure you use Atomic engine for the database (not Ordinary) (can be checked in system.databases)
26 |
27 | Sometime you can try to workaround issue by finding the queries which uses that table concurently (especially to system.tables / system.parts and other system tables) and try killing them (or avoiding them).
28 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/cluster-production-configuration-guide/cluster-configuration-faq.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Cluster Configuration FAQ"
3 | linkTitle: "Cluster Configuration FAQ"
4 | description: >
5 | Cluster Configuration FAQ
6 | ---
7 | ## ClickHouse® does not start, some other unexpected behavior happening
8 |
9 | Check ClickHouse logs, they are your friends:
10 |
11 | tail -n 1000 /var/log/clickhouse-server/clickhouse-server.err.log \| less
12 | tail -n 10000 /var/log/clickhouse-server/clickhouse-server.log \| less
13 |
14 | ## How Do I Restrict Memory Usage?
15 |
16 | See [our knowledge base article]({{}}) and [official documentation](https://clickhouse.tech/docs/en/operations/settings/query-complexity/#settings_max_memory_usage) for more information.
17 |
18 | ## ClickHouse died during big query execution
19 |
20 | Misconfigured ClickHouse can try to allocate more RAM than is available on the system.
21 |
22 | In that case an OS component called oomkiller can kill the ClickHouse process.
23 |
24 | That event leaves traces inside system logs (can be checked by running dmesg command).
25 |
26 | ## How Do I make huge ‘Group By’ queries use less RAM?
27 |
28 | Enable on disk GROUP BY (it is slower, so is disabled by default)
29 |
30 | Set [max_bytes_before_external_group_by](https://clickhouse.tech/docs/en/operations/settings/query-complexity/#settings-max_bytes_before_external_group_by) to a value about 70-80% of your max_memory_usage value.
31 |
32 | ## Data returned in chunks by clickhouse-client
33 |
34 | See [altinity-kb-clickhouse-client]({{}})
35 |
36 | ## I Can’t Connect From Other Hosts. What do I do?
37 |
38 | Check the settings in config.xml. Verify that the connection can connect on both IPV4 and IPV6.
39 |
--------------------------------------------------------------------------------
/assets/icons/logo.svg:
--------------------------------------------------------------------------------
1 |
47 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/joins/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "JOINs"
3 | linkTitle: "JOINs"
4 | description: >
5 | JOINs
6 | aliases:
7 | - /altinity-kb-queries-and-syntax/joins/join-table-engine/
8 | ---
9 | Resources:
10 |
11 | * [Overview of JOINs (Russian)](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/join.pdf) - Presentation from Meetup 38 in 2019
12 | * [Notes on JOIN options](https://excalidraw.com/#json=xX_heZcCu0whsDmC2Mdvo,ppbUVFpPz-flJu5ZDnwIPw)
13 |
14 | ## Join Table Engine
15 |
16 | The main purpose of JOIN table engine is to avoid building the right table for joining on each query execution. So it's usually used when you have a high amount of fast queries which share the same right table for joining.
17 |
18 | ### Updates
19 |
20 | It's possible to update rows with setting `join_any_take_last_row` enabled.
21 |
22 | ```sql
23 | CREATE TABLE id_val_join
24 | (
25 | `id` UInt32,
26 | `val` UInt8
27 | )
28 | ENGINE = Join(ANY, LEFT, id)
29 | SETTINGS join_any_take_last_row = 1
30 |
31 | Ok.
32 |
33 | INSERT INTO id_val_join VALUES (1,21)(1,22)(3,23);
34 |
35 | Ok.
36 |
37 | SELECT *
38 | FROM
39 | (
40 | SELECT toUInt32(number) AS id
41 | FROM numbers(4)
42 | ) AS n
43 | ANY LEFT JOIN id_val_join USING (id)
44 |
45 | ┌─id─┬─val─┐
46 | │ 0 │ 0 │
47 | │ 1 │ 22 │
48 | │ 2 │ 0 │
49 | │ 3 │ 23 │
50 | └────┴─────┘
51 |
52 | INSERT INTO id_val_join VALUES (1,40)(2,24);
53 |
54 | Ok.
55 |
56 | SELECT *
57 | FROM
58 | (
59 | SELECT toUInt32(number) AS id
60 | FROM numbers(4)
61 | ) AS n
62 | ANY LEFT JOIN id_val_join USING (id)
63 |
64 | ┌─id─┬─val─┐
65 | │ 0 │ 0 │
66 | │ 1 │ 40 │
67 | │ 2 │ 24 │
68 | │ 3 │ 23 │
69 | └────┴─────┘
70 | ```
71 |
72 | [Join table engine documentation](https://clickhouse.com/docs/en/engines/table-engines/special/join/)
73 |
--------------------------------------------------------------------------------
/.github/workflows/gh-pages.yml:
--------------------------------------------------------------------------------
1 | name: Render Knowledgebase
2 |
3 | on:
4 | push:
5 | branches:
6 | - main # Set a branch to deploy
7 | pull_request:
8 |
9 | jobs:
10 | deploy:
11 | runs-on: ubuntu-22.04
12 | steps:
13 | - name: Git checkout
14 | uses: actions/checkout@v4
15 | with:
16 | submodules: true # Fetch Hugo themes (true OR recursive)
17 | fetch-depth: 0 # Fetch all history for .GitInfo and .Lastmod
18 | ref: main
19 |
20 | - name: Setup Hugo
21 | uses: peaceiris/actions-hugo@v3
22 | with:
23 | hugo-version: '0.128.2'
24 | extended: true
25 |
26 | - name: Cache Hugo modules
27 | uses: actions/cache@v4
28 | with:
29 | path: /tmp/hugo_cache
30 | key: ${{ runner.os }}-hugomod-${{ hashFiles('**/go.sum') }}
31 | restore-keys: |
32 | ${{ runner.os }}-hugomod-
33 |
34 | - name: Setup Node
35 | uses: actions/setup-node@v4
36 | with:
37 | node-version: '20'
38 |
39 | - name: Cache dependencies
40 | uses: actions/cache@v4
41 | with:
42 | path: ~/.npm
43 | key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
44 | restore-keys: |
45 | ${{ runner.os }}-node-
46 |
47 | - name: Build
48 | run: |
49 | npm ci
50 | hugo --minify
51 | # run: hugo --gc
52 |
53 | - name: Deploy
54 | uses: peaceiris/actions-gh-pages@v4
55 | if: github.ref == 'refs/heads/main'
56 | with:
57 | github_token: ${{ secrets.GITHUB_TOKEN }}
58 | publish_dir: ./docs
59 | publish_branch: altinity-knowledgebase #forces this workflow to update the altinity-knowledgebase branch
60 | force_orphan: true
61 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-main-parsing-loop.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Kafka main parsing loop"
3 | linkTitle: "Kafka main parsing loop"
4 | description: >
5 | Kafka main parsing loop
6 | ---
7 | One of the threads from scheduled_pool (pre ClickHouse® 20.9) / `background_message_broker_schedule_pool` (after 20.9) do that in infinite loop:
8 |
9 | 1. Batch poll (time limit: `kafka_poll_timeout_ms` 500ms, messages limit: `kafka_poll_max_batch_size` 65536)
10 | 2. Parse messages.
11 | 3. If we don't have enough data (rows limit: `kafka_max_block_size` 1048576) or time limit reached (`kafka_flush_interval_ms` 7500ms) - continue polling (goto p.1)
12 | 4. Write a collected block of data to MV
13 | 5. Do commit (commit after write = at-least-once).
14 |
15 | On any error, during that process, Kafka client is restarted (leading to rebalancing - leave the group and get back in few seconds).
16 |
17 | 
18 |
19 | ## Important settings
20 |
21 | These usually should not be adjusted:
22 |
23 | * `kafka_poll_max_batch_size` = max_block_size (65536)
24 | * `kafka_poll_timeout_ms` = stream_poll_timeout_ms (500ms)
25 |
26 | You may want to adjust those depending on your scenario:
27 |
28 | * `kafka_flush_interval_ms` = stream_poll_timeout_ms (7500ms)
29 | * `kafka_max_block_size` = max_insert_block_size / kafka_num_consumers (for the single consumer: 1048576)
30 |
31 | ## See also
32 |
33 | [https://github.com/ClickHouse/ClickHouse/pull/11388](https://github.com/ClickHouse/ClickHouse/pull/11388)
34 |
35 | ## Disable at-least-once delivery
36 |
37 | `kafka_commit_every_batch` = 1 will change the loop logic mentioned above. Consumed batch committed to the Kafka and the block of rows send to Materialized Views only after that. It could be resembled as at-most-once delivery mode as prevent duplicate creation but allow loss of data in case of failures.
38 |
39 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-schema-design/codecs/codecs-speed.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Codecs speed"
3 | linkTitle: "Codecs speed"
4 | description: >
5 | Codecs speed
6 | ---
7 | ```sql
8 | create table test_codec_speed engine=MergeTree
9 | ORDER BY tuple()
10 | as select cast(now() + rand()%2000 + number, 'DateTime') as x from numbers(1000000000);
11 |
12 | option 1: CODEC(LZ4) (same as default)
13 | option 2: CODEC(DoubleDelta) (`alter table test_codec_speed modify column x DateTime CODEC(DoubleDelta)`);
14 | option 3: CODEC(T64, LZ4) (`alter table test_codec_speed modify column x DateTime CODEC(T64, LZ4)`)
15 | option 4: CODEC(Delta, LZ4) (`alter table test_codec_speed modify column x DateTime CODEC(Delta, LZ4)`)
16 | option 5: CODEC(ZSTD(1)) (`alter table test_codec_speed modify column x DateTime CODEC(ZSTD(1))`)
17 | option 6: CODEC(T64, ZSTD(1)) (`alter table test_codec_speed modify column x DateTime CODEC(T64, ZSTD(1))`)
18 | option 7: CODEC(Delta, ZSTD(1)) (`alter table test_codec_speed modify column x DateTime CODEC(Delta, ZSTD(1))`)
19 | option 8: CODEC(T64, LZ4HC(1)) (`alter table test_codec_speed modify column x DateTime CODEC(T64, LZ4HC(1))`)
20 | option 9: CODEC(Gorilla) (`alter table test_codec_speed modify column x DateTime CODEC(Gorilla)`)
21 |
22 | Result may be not 100% reliable (checked on my laptop, need to be repeated in lab environment)
23 |
24 |
25 | OPTIMIZE TABLE test_codec_speed FINAL (second run - i.e. read + write the same data)
26 | 1) 17 sec.
27 | 2) 30 sec.
28 | 3) 16 sec
29 | 4) 17 sec
30 | 5) 29 sec
31 | 6) 24 sec
32 | 7) 31 sec
33 | 8) 35 sec
34 | 9) 19 sec
35 |
36 | compressed size
37 | 1) 3181376881
38 | 2) 2333793699
39 | 3) 1862660307
40 | 4) 3408502757
41 | 5) 2393078266
42 | 6) 1765556173
43 | 7) 2176080497
44 | 8) 1810471247
45 | 9) 2109640716
46 |
47 | select max(x) from test_codec_speed
48 | 1) 0.597
49 | 2) 2.756 :(
50 | 3) 1.168
51 | 4) 0.752
52 | 5) 1.362
53 | 6) 1.364
54 | 7) 1.752
55 | 8) 1.270
56 | 9) 1.607
57 | ```
58 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/timeouts-during-optimize-final.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Timeouts during OPTIMIZE FINAL"
3 | linkTitle: "Timeouts during OPTIMIZE FINAL"
4 | weight: 100
5 | description: >-
6 | `Timeout exceeded ...` or `executing longer than distributed_ddl_task_timeout` during `OPTIMIZE FINAL`.
7 | ---
8 |
9 | ## `Timeout exceeded ...` or `executing longer than distributed_ddl_task_timeout` during `OPTIMIZE FINAL`
10 |
11 | Timeout may occur
12 | 1) due to the fact that the client reach timeout interval.
13 | - in case of TCP / native clients - you can change send_timeout / receive_timeout + tcp_keep_alive_timeout + driver timeout settings
14 | - in case of HTTP clients - you can change http_send_timeout / http_receive_timeout + tcp_keep_alive_timeout + driver timeout settings
15 |
16 | 2) (in the case of ON CLUSTER queries) due to the fact that the timeout for query execution by shards ends
17 | - see setting `distributed_ddl_task_timeout`
18 |
19 | In the first case you additionally may get the misleading messages: `Cancelling query. ... Query was cancelled.`
20 |
21 | In both cases, this does NOT stop the execution of the OPTIMIZE command. It continues to work even after
22 | the client is disconnected. You can see the progress of that in `system.processes` / `show processlist` / `system.merges` / `system.query_log`.
23 |
24 | The same applies to queries like:
25 |
26 | - `INSERT ... SELECT`
27 | - `CREATE TABLE ... AS SELECT`
28 | - `CREATE MATERIALIZED VIEW ... POPULATE ...`
29 |
30 | It is possible to run a query with some special `query_id` and then poll the status from the processlist (in the case of a cluster, it can be a bit more complicated).
31 |
32 | See also
33 | - https://github.com/ClickHouse/ClickHouse/issues/6093
34 | - https://github.com/ClickHouse/ClickHouse/issues/7794
35 | - https://github.com/ClickHouse/ClickHouse/issues/28896
36 | - https://github.com/ClickHouse/ClickHouse/issues/19319
37 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/asynchronous_metrics_descr.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Description of asynchronous_metrics"
3 | linkTitle: "Description of asynchronous_metrics"
4 | weight: 100
5 | description: >-
6 | Description of asynchronous_metrics
7 | ---
8 |
9 | ```
10 | CompiledExpressionCacheCount -- number or compiled cached expression (if CompiledExpressionCache is enabled)
11 |
12 | jemalloc -- parameters of jemalloc allocator, they are not very useful, and not interesting
13 |
14 | MarkCacheBytes / MarkCacheFiles -- there are cache for .mrk files (default size is 5GB), you can see is it use all 5GB or not
15 |
16 | MemoryCode -- how much memory allocated for ClickHouse® executable
17 |
18 | MemoryDataAndStack -- virtual memory allocated for data and stack
19 |
20 | MemoryResident -- real memory used by ClickHouse ( the same as top RES/RSS)
21 |
22 | MemoryShared -- shared memory used by ClickHouse
23 |
24 | MemoryVirtual -- virtual memory used by ClickHouse ( the same as top VIRT)
25 |
26 | NumberOfDatabases
27 |
28 | NumberOfTables
29 |
30 | ReplicasMaxAbsoluteDelay -- important parameter - replica max absolute delay in seconds
31 |
32 | ReplicasMaxRelativeDelay -- replica max relative delay (from other replicas) in seconds
33 |
34 | ReplicasMaxInsertsInQueue -- max number of parts to fetch for a single Replicated table
35 |
36 | ReplicasSumInsertsInQueue -- sum of parts to fetch for all Replicated tables
37 |
38 | ReplicasMaxMergesInQueue -- max number of merges in queue for a single Replicated table
39 |
40 | ReplicasSumMergesInQueue -- total number of merges in queue for all Replicated tables
41 |
42 | ReplicasMaxQueueSize -- max number of tasks for a single Replicated table
43 |
44 | ReplicasSumQueueSize -- total number of tasks in replication queue
45 |
46 | UncompressedCacheBytes/UncompressedCacheCells -- allocated memory for uncompressed cache (disabled by default)
47 |
48 | Uptime -- uptime seconds
49 | ```
50 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/literal-decimal-or-float.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Imprecise parsing of literal Decimal or Float64"
3 | linkTitle: "Imprecise literal Decimal or Float64 values"
4 | weight: 100
5 | description: >-
6 | Imprecise parsing of literal Decimal or Float64
7 | ---
8 |
9 | ## Decimal
10 |
11 | ```sql
12 | SELECT
13 | 9.2::Decimal64(2) AS postgresql_cast,
14 | toDecimal64(9.2, 2) AS to_function,
15 | CAST(9.2, 'Decimal64(2)') AS cast_float_literal,
16 | CAST('9.2', 'Decimal64(2)') AS cast_string_literal
17 |
18 | ┌─postgresql_cast─┬─to_function─┬─cast_float_literal─┬─cast_string_literal─┐
19 | │ 9.2 │ 9.19 │ 9.19 │ 9.2 │
20 | └─────────────────┴─────────────┴────────────────────┴─────────────────────┘
21 | ```
22 |
23 |
24 | > When we try to type cast 64.32 to Decimal128(2) the resulted value is 64.31.
25 |
26 | When it sees a number with a decimal separator it interprets as `Float64` literal (where `64.32` have no accurate representation, and actually you get something like `64.319999999999999999`) and later that Float is casted to Decimal by removing the extra precision.
27 |
28 | Workaround is very simple - wrap the number in quotes (and it will be considered as a string literal by query parser, and will be transformed to Decimal directly), or use postgres-alike casting syntax:
29 |
30 | ```sql
31 | select cast(64.32,'Decimal128(2)') a, cast('64.32','Decimal128(2)') b, 64.32::Decimal128(2) c;
32 |
33 | ┌─────a─┬─────b─┬─────c─┐
34 | │ 64.31 │ 64.32 │ 64.32 │
35 | └───────┴───────┴───────┘
36 | ```
37 |
38 | ## Float64
39 |
40 | ```sql
41 | SELECT
42 | toFloat64(15008753.) AS to_func,
43 | toFloat64('1.5008753E7') AS to_func_scientific,
44 | CAST('1.5008753E7', 'Float64') AS cast_scientific
45 |
46 | ┌──to_func─┬─to_func_scientific─┬────cast_scientific─┐
47 | │ 15008753 │ 15008753.000000002 │ 15008753.000000002 │
48 | └──────────┴────────────────────┴────────────────────┘
49 | ```
50 |
51 |
--------------------------------------------------------------------------------
/static/images/hetzner-logo.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/multiple-date-column-in-partition-key.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Multiple aligned date columns in PARTITION BY expression"
3 | linkTitle: "Multiple aligned date columns in PARTITION BY expression"
4 | weight: 100
5 | description: >-
6 | How to put multiple correlated date-like columns in partition key without generating a lot of partitions in case not exact match between them.
7 | ---
8 |
9 | Alternative to doing that by [minmax skip index](https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/minmax/#multiple-datedatetime-columns-can-be-used-in-where-conditions).
10 |
11 | ```sql
12 | CREATE TABLE part_key_multiple_dates
13 | (
14 | `key` UInt32,
15 | `date` Date,
16 | `time` DateTime,
17 | `created_at` DateTime,
18 | `inserted_at` DateTime
19 | )
20 | ENGINE = MergeTree
21 | PARTITION BY (toYYYYMM(date), ignore(created_at, inserted_at))
22 | ORDER BY (key, time);
23 |
24 |
25 | INSERT INTO part_key_multiple_dates SELECT
26 | number,
27 | toDate(x),
28 | now() + intDiv(number, 10) AS x,
29 | x - (rand() % 100),
30 | x + (rand() % 100)
31 | FROM numbers(100000000);
32 |
33 | SELECT count()
34 | FROM part_key_multiple_dates
35 | WHERE date > (now() + toIntervalDay(105));
36 |
37 | ┌─count()─┐
38 | │ 8434210 │
39 | └─────────┘
40 |
41 | 1 rows in set. Elapsed: 0.022 sec. Processed 11.03 million rows, 22.05 MB (501.94 million rows/s., 1.00 GB/s.)
42 |
43 | SELECT count()
44 | FROM part_key_multiple_dates
45 | WHERE inserted_at > (now() + toIntervalDay(105));
46 |
47 | ┌─count()─┐
48 | │ 9279818 │
49 | └─────────┘
50 |
51 | 1 rows in set. Elapsed: 0.046 sec. Processed 11.03 million rows, 44.10 MB (237.64 million rows/s., 950.57 MB/s.)
52 |
53 | SELECT count()
54 | FROM part_key_multiple_dates
55 | WHERE created_at > (now() + toIntervalDay(105));
56 |
57 | ┌─count()─┐
58 | │ 9279139 │
59 | └─────────┘
60 |
61 | 1 rows in set. Elapsed: 0.043 sec. Processed 11.03 million rows, 44.10 MB (258.22 million rows/s., 1.03 GB/s.)
62 | ```
63 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-useful-queries/connection-issues-distributed-parts.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Notes on Various Errors with respect to replication and distributed connections"
3 | linkTtitle: "Notes on Various Errors with respect to replication and distributed connections"
4 | description: >
5 | Notes on errors related to replication and distributed connections
6 | keywords:
7 | - replication
8 | - distributed connections
9 | ---
10 |
11 | ## `ClickHouseDistributedConnectionExceptions`
12 |
13 | This alert usually indicates that one of the nodes isn’t responding or that there’s an interconnectivity issue. Debug steps:
14 |
15 | ## 1. Check Cluster Connectivity
16 | Verify connectivity inside the cluster by running:
17 | ```
18 | SELECT count() FROM clusterAllReplicas('{cluster}', cluster('{cluster}', system.one))
19 | ```
20 |
21 | ## 2. Check for Errors
22 | Run the following queries to see if any nodes report errors:
23 |
24 | ```
25 | SELECT hostName(), * FROM clusterAllReplicas('{cluster}', system.clusters) WHERE errors_count > 0;
26 | SELECT hostName(), * FROM clusterAllReplicas('{cluster}', system.errors) WHERE last_error_time > now() - 3600 ORDER BY value;
27 | ```
28 |
29 | Depending on the results, ensure that the affected node is up and responding to queries. Also, verify that connectivity (DNS, routes, delays) is functioning correctly.
30 |
31 | ### `ClickHouseReplicatedPartChecksFailed` & `ClickHouseReplicatedPartFailedFetches`
32 |
33 | Unless you’re seeing huge numbers, these alerts can generally be ignored. They’re often a sign of temporary replication issues that ClickHouse resolves on its own. However, if the issue persists or increases rapidly, follow the steps to debug replication issues:
34 |
35 | * Check the replication status using tables such as system.replicas and system.replication_queue.
36 | * Examine server logs, system.errors, and system load for any clues.
37 | * Try to restart the replica (`SYSTEM RESTART REPLICA db_name.table_name` command) and, if necessary, contact Altinity support.
38 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/ttl/ttl-recompress-example.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "TTL Recompress example"
3 | linkTitle: "TTL Recompress example"
4 | description: >
5 | TTL Recompress example
6 | ---
7 |
8 | *See also [the Altinity Knowledge Base article on testing different compression codecs](../../../altinity-kb-schema-design/codecs/altinity-kb-how-to-test-different-compression-codecs).*
9 |
10 | ## Example how to create a table and define recompression rules
11 |
12 | ```sql
13 | CREATE TABLE hits
14 | (
15 | `banner_id` UInt64,
16 | `event_time` DateTime CODEC(Delta, Default),
17 | `c_name` String,
18 | `c_cost` Float64
19 | )
20 | ENGINE = MergeTree
21 | PARTITION BY toYYYYMM(event_time)
22 | ORDER BY (banner_id, event_time)
23 | TTL event_time + toIntervalMonth(1) RECOMPRESS CODEC(ZSTD(1)),
24 | event_time + toIntervalMonth(6) RECOMPRESS CODEC(ZSTD(6);
25 | ```
26 |
27 | Default compression is LZ4. See [the ClickHouse® documentation](https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#server-settings-compression) for more information.
28 |
29 | These TTL rules recompress data after 1 and 6 months.
30 |
31 | CODEC(Delta, Default) -- **Default** means to use default compression (LZ4 -> ZSTD1 -> ZSTD6) in this case.
32 |
33 | ## Example how to define recompression rules for an existing table
34 |
35 | ```sql
36 | CREATE TABLE hits
37 | (
38 | `banner_id` UInt64,
39 | `event_time` DateTime CODEC(Delta, LZ4),
40 | `c_name` String,
41 | `c_cost` Float64
42 | )
43 | ENGINE = MergeTree
44 | PARTITION BY toYYYYMM(event_time)
45 | ORDER BY (banner_id, event_time);
46 |
47 | ALTER TABLE hits
48 | modify column event_time DateTime CODEC(Delta, Default),
49 | modify TTL event_time + toIntervalMonth(1) RECOMPRESS CODEC(ZSTD(1)),
50 | event_time + toIntervalMonth(6) RECOMPRESS CODEC(ZSTD(6));
51 | ```
52 |
53 | All columns have implicit default compression from server config, except `event_time`, that's why need to change to compression to `Default` for this column otherwise it won't be recompressed.
54 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/how_to_recreate_table.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "How to recreate a table in case of total corruption of the replication queue"
3 | linkTitle: "How to recreate a table"
4 | weight: 100
5 | description: >-
6 | How to recreate a table in case of total corruption of the replication queue.
7 | ---
8 |
9 | ## How to fix a replication using hard-reset way
10 |
11 | 1. Find the best replica (replica with the most fresh/consistent) data.
12 | 2. Backup the table `alter table mydatabase.mybadtable freeze;`
13 | 3. Stop all applications!!! Stop ingestion. Stop queries - table will be empty for some time.
14 | 4. Check that detached folder is empty or clean it.
15 | ```sql
16 | SELECT concat('alter table ', database, '.', table, ' drop detached part \'', name, '\' settings allow_drop_detached=1;')
17 | FROM system.detached_parts
18 | WHERE (database = 'mydatabase') AND (table = 'mybadtable')
19 | FORMAT TSVRaw;
20 | ```
21 | 5. Make sure that detached folder is empty `select count() from system.detached_parts where database='mydatabase' and table ='mybadtable';`
22 | 6. Detach all parts (table will became empty)
23 | ```sql
24 | SELECT concat('alter table ', database, '.', table, ' detach partition id \'', partition_id, '\';') AS detach
25 | FROM system.parts
26 | WHERE (active = 1) AND (database = 'mydatabase') AND (table = 'mybadtable')
27 | GROUP BY detach
28 | ORDER BY detach ASC
29 | FORMAT TSVRaw;
30 | ```
31 | 7. Make sure that table is empty `select count() from mydatabase.mybadtable;`
32 | 8. Attach all parts back
33 | ```sql
34 | SELECT concat('alter table ', database, '.', table, ' attach part \'', a.name, '\';')
35 | FROM system.detached_parts AS a
36 | WHERE (database = 'mydatabase') AND (table = 'mybadtable')
37 | FORMAT TSVRaw;
38 | ```
39 | 9. Make sure that data is consistent at all replicas
40 | ```sql
41 | SELECT
42 | formatReadableSize(sum(bytes)) AS size,
43 | sum(rows),
44 | count() AS part_count,
45 | uniqExact(partition) AS partition_count
46 | FROM system.parts
47 | WHERE (active = 1) AND (database = 'mydatabase') AND (table = 'mybadtable');
48 | ```
49 |
--------------------------------------------------------------------------------
/content/en/upgrade/removing-empty-parts.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Removing empty parts"
3 | linkTitle: "Removing empty parts"
4 | description: >
5 | Removing empty parts
6 | ---
7 | Removing of empty parts is a new feature introduced in ClickHouse® 20.12.
8 | Earlier versions leave empty parts (with 0 rows) if TTL removes all rows from a part ([https://github.com/ClickHouse/ClickHouse/issues/5491](https://github.com/ClickHouse/ClickHouse/issues/5491)).
9 | If you set up TTL for your data it is likely that there are quite many empty parts in your system.
10 |
11 | The new version notices empty parts and tries to remove all of them immediately.
12 | This is a one-time operation which runs right after an upgrade.
13 | After that TTL will remove empty parts on its own.
14 |
15 | There is a problem when different replicas of the same table start to remove empty parts at the same time. Because of the bug they can block each other ([https://github.com/ClickHouse/ClickHouse/issues/23292](https://github.com/ClickHouse/ClickHouse/issues/23292)).
16 |
17 | What we can do to avoid this problem during an upgrade:
18 |
19 | 1) Drop empty partitions before upgrading to decrease the number of empty parts in the system.
20 |
21 | ```sql
22 | SELECT concat('alter table ',database, '.', table, ' drop partition id ''', partition_id, ''';')
23 | FROM system.parts WHERE active
24 | GROUP BY database, table, partition_id
25 | HAVING count() = countIf(rows=0)
26 | ```
27 |
28 | 2) Upgrade/restart one replica (in a shard) at a time.
29 | If only one replica is cleaning empty parts there will be no deadlock because of replicas waiting for one another.
30 | Restart one replica, wait for replication queue to process, then restart the next one.
31 |
32 | Removing of empty parts can be disabled by adding `remove_empty_parts=0` to the default profile.
33 |
34 | ```markup
35 | $ cat /etc/clickhouse-server/users.d/remove_empty_parts.xml
36 |
37 |
38 |
39 | 0
40 |
41 |
42 |
43 | ```
44 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-schema-design/preaggregations.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Pre-Aggregation approaches"
3 | linkTitle: "Pre-Aggregation approaches"
4 | weight: 100
5 | description: >-
6 | ETL vs Materialized Views vs Projections in ClickHouse®
7 | ---
8 |
9 | ## Pre-Aggregation approaches: ETL vs Materialized Views vs Projections
10 |
11 |
12 | | | ETL | MV | Projections |
13 | |:-|:-----------------------------------------------------------------|:-|:-|
14 | | Realtime | no | yes | yes |
15 | | How complex queries can be used to build the preaggregaton | any | complex | very simple |
16 | | Impacts the insert speed | no | yes | yes |
17 | | Are inconsistancies possible | Depends on ETL. If it process the errors properly - no. | yes (no transactions / atomicity) | no |
18 | | Lifetime of aggregation | any | any | Same as the raw data |
19 | | Requirements | need external tools/scripting | is a part of database schema | is a part of table schema |
20 | | How complex to use in queries | Depends on aggregation, usually simple, quering a separate table | Depends on aggregation, sometimes quite complex, quering a separate table | Very simple, quering the main table |
21 | | Can work correctly with ReplacingMergeTree as a source | Yes | No | No |
22 | | Can work correctly with CollapsingMergeTree as a source | Yes | For simple aggregations | For simple aggregations |
23 | | Can be chained | Yes (Usually with DAGs / special scripts) | Yes (but may be not straightforward, and often is a bad idea) | No |
24 | | Resources needed to calculate the increment | May be significant | Usually tiny | Usually tiny |
25 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/change-me.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Replication: Can not resolve host of another ClickHouse® server"
3 | linkTitle: "Replication: Can not resolve host of another ClickHouse® server"
4 | weight: 100
5 | description: >-
6 | ---
7 |
8 | ### Symptom
9 |
10 | When configuring Replication the ClickHouse® cluster nodes are experiencing communication issues, and an error message appears in the log that states that the ClickHouse host cannot be resolved.
11 |
12 | ```
13 | DNSResolver: Cannot resolve host (xxxxx), error 0: DNS error.
14 | auto DB::StorageReplicatedMergeTree::processQueueEntry(ReplicatedMergeTreeQueue::SelectedEntryPtr)::(anonymous class)::operator()(DB::StorageReplicatedMergeTree::LogEntryPtr &) const: Code: 198. DB::Exception: Not found address of host: xxxx. (DNS_ERROR),
15 | ```
16 |
17 | ### Cause:
18 |
19 | The error message indicates that the host name of the one of the nodes of the cluster cannot be resolved by other cluster members, causing communication issues between the nodes.
20 |
21 | Each node in the replication setup pushes its Fully Qualified Domain Name (FQDN) to Zookeeper, and if other nodes cannot access it using its FQDN, this can cause issues.
22 |
23 | ### Action:
24 |
25 | There are two possible solutions to this problem:
26 |
27 | 1. Change the FQDN to allow other nodes to access it. This solution can also help to keep the environment more organized. To do this, use the following command to edit the hostname file:
28 |
29 | ```sh
30 | sudo vim /etc/hostname
31 | ```
32 |
33 | Or use the following command to change the hostname:
34 |
35 | ```sh
36 | sudo hostnamectl set-hostname ...
37 | ```
38 |
39 | 2. Use the configuration parameter `` to specify the IP address or hostname that the nodes can use to communicate with each other. This solution can have some issues, such as the one described in this link: https://github.com/ClickHouse/ClickHouse/issues/2154.
40 | To configure this parameter, refer to the documentation for more information: https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/#interserver-http-host.
41 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-memory-configuration-settings.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "memory configuration settings"
3 | linkTitle: "memory configuration settings"
4 | description: >
5 | memory configuration settings
6 | ---
7 | ## max_memory_usage. Single query memory usage
8 |
9 | max_memory_usage - the maximum amount of memory allowed for **a single query** to take. By default, it's 10Gb. The default value is good, don't adjust it in advance.
10 |
11 | There are scenarios when you need to relax the limit for particular queries (if you hit 'Memory limit (for query) exceeded'), or use a lower limit if you need to discipline the users or increase the number of simultaneous queries.
12 |
13 | ## Server memory usage
14 |
15 | Server memory usage = constant memory footprint (used by different caches, dictionaries, etc) + sum of memory temporary used by running queries (a theoretical limit is a number of simultaneous queries multiplied by max_memory_usage).
16 |
17 | Since 20.4 you can set up a global limit using the `max_server_memory_usage` setting. If **something** will hit that limit you will see 'Memory limit (total) exceeded' in **random places**.
18 |
19 | By default it 90% of the physical RAM of the server.
20 | [https://clickhouse.tech/docs/en/operations/server-configuration-parameters/settings/\#max_server_memory_usage](https://clickhouse.tech/docs/en/operations/server-configuration-parameters/settings/#max_server_memory_usage)
21 | [https://github.com/ClickHouse/ClickHouse/blob/e5b96bd93b53d2c1130a249769be1049141ef386/programs/server/config.xml\#L239-L250](https://github.com/ClickHouse/ClickHouse/blob/e5b96bd93b53d2c1130a249769be1049141ef386/programs/server/config.xml#L239-L250)
22 |
23 | You can decrease that in some scenarios (like you need to leave more free RAM for page cache or to some other software).
24 |
25 | ### How to check what is using my RAM?
26 |
27 | [altinity-kb-who-ate-my-memory.md]({{}})
28 |
29 | ### Mark cache
30 |
31 | [https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup39/mark-cache.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup39/mark-cache.pdf)
32 |
--------------------------------------------------------------------------------
/content/en/engines/altinity-kb-atomic-database-engine/how-to-convert-ordinary-to-atomic.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "How to Convert Ordinary to Atomic"
3 | linkTitle: "How to Convert Ordinary to Atomic"
4 | weight: 100
5 | ---
6 |
7 | ## New, official way
8 |
9 | * Implemented automatic conversion of database engine from `Ordinary` to `Atomic` (ClickHouse® Server 22.8+). Create empty `convert_ordinary_to_atomic` file in `flags` directory and all `Ordinary` databases will be converted automatically on next server start.
10 | * The conversion is not automatic between upgrades, you need to set the flag as explained below:
11 | ```
12 | Warnings:
13 | * Server has databases (for example `test`) with Ordinary engine, which was deprecated. To convert this database to the new Atomic engine, create a flag /var/lib/clickhouse/flags/convert_ordinary_to_atomic and make sure that ClickHouse has write permission for it.
14 | Example: sudo touch '/var/lib/clickhouse/flags/convert_ordinary_to_atomic' && sudo chmod 666 '/var/lib/clickhouse/flags/convert_ordinary_to_atomic'
15 | ```
16 | * Resolves [#39546](https://github.com/ClickHouse/ClickHouse/issues/39546). [#39933](https://github.com/ClickHouse/ClickHouse/pull/39933) ([Alexander Tokmakov](https://github.com/tavplubix))
17 |
18 | * There can be some problems if the `default` database is Ordinary and fails for some reason. You can add:
19 |
20 | ```
21 |
22 | 1
23 |
24 | ```
25 | [More detailed info here](https://github.com/ClickHouse/ClickHouse/blob/f01a285f6091265cfae72bb7fbf3186269804891/src/Interpreters/loadMetadata.cpp#L150)
26 |
27 | Don't forget to remove detached parts from all Ordinary databases, or you can get the error:
28 | ```
29 | │ 2025.01.28 11:34:57.510330 [ 7 ] {} Application: Code: 219. DB::Exception: Cannot drop: filesystem error: in remove: Directory not empty ["/var/lib/clickhouse/data/db/"]. Probably data │
30 | │ base contain some detached tables or metadata leftovers from Ordinary engine. If you want to remove all data anyway, try to attach database back and drop it again with enabled force_remove_data_recursively_ │
31 | ```
32 |
33 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-interfaces/altinity-kb-clickhouse-client.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "clickhouse-client"
3 | linkTitle: "clickhouse-client"
4 | keywords:
5 | - clickhouse client
6 | description: >
7 | ClickHouse® client
8 | ---
9 | Q. How can I input multi-line SQL code? can you guys give me an example?
10 |
11 | A. Just run clickhouse-client with `-m` switch, and it starts executing only after you finish the line with a semicolon.
12 |
13 | Q. How can i use pager with clickhouse-client
14 |
15 | A. Here is an example: `clickhouse-client --pager 'less -RS'`
16 |
17 | Q. Data is returned in chunks / several tables.
18 |
19 | A. Data get streamed from the server in blocks, every block is formatted individually when the default `PrettyCompact` format is used. You can use `PrettyCompactMonoBlock` format instead, using one of the options:
20 |
21 | * start clickhouse-client with an extra flag: `clickhouse-client --format=PrettyCompactMonoBlock`
22 | * add `FORMAT PrettyCompactMonoBlock` at the end of your query.
23 | * change clickhouse-client default format in the config. See [https://github.com/ClickHouse/ClickHouse/blob/976dbe8077f9076387528e2f40b6174f6d8a8b90/programs/client/clickhouse-client.xml\#L42](https://github.com/ClickHouse/ClickHouse/blob/976dbe8077f9076387528e2f40b6174f6d8a8b90/programs/client/clickhouse-client.xml#L42)
24 |
25 | Q. Сustomize client config
26 |
27 | A. you can change it globally (for all users of the workstation)
28 |
29 | ```markup
30 | nano /etc/clickhouse-client/conf.d/user.xml
31 |
32 |
33 | default1
34 | default1
35 |
36 | true
37 | true
38 |
39 | See also https://github.com/ClickHouse/ClickHouse/blob/976dbe8077f9076387528e2f40b6174f6d8a8b90/programs/client/clickhouse-client.xml#L42
40 | ```
41 |
42 | or for particular users - by adjusting one of.
43 |
44 | ```markup
45 | ./clickhouse-client.xml
46 | ~/.clickhouse-client/config.xml
47 | ```
48 |
49 | Also, it’s possible to have several client config files and pass one of them to clickhouse-client command explicitly
50 |
51 | References:
52 |
53 | * [https://clickhouse.com/docs/en/interfaces/cli](https://clickhouse.com/docs/en/interfaces/cli)
54 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-rewind-fast-forward-replay.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Rewind / fast-forward / replay"
3 | linkTitle: "Rewind / fast-forward / replay"
4 | description: >
5 | Rewind / fast-forward / replay
6 | ---
7 | * Step 1: Detach Kafka tables in ClickHouse®
8 | ```
9 | DETACH TABLE db.kafka_table_name ON CLUSTER '{cluster}';
10 | ```
11 | * Step 2: `kafka-consumer-groups.sh --bootstrap-server kafka:9092 --topic topic:0,1,2 --group id1 --reset-offsets --to-latest --execute`
12 | * More samples: [https://gist.github.com/filimonov/1646259d18b911d7a1e8745d6411c0cc](https://gist.github.com/filimonov/1646259d18b911d7a1e8745d6411c0cc)
13 | * Step 3: Attach Kafka tables back
14 | ```
15 | ATTACH TABLE db.kafka_table_name ON CLUSTER '{cluster}';
16 | ```
17 |
18 | See also these configuration settings:
19 |
20 | ```markup
21 |
22 | smallest
23 |
24 | ```
25 | ### About Offset Consuming
26 |
27 | When a consumer joins the consumer group, the broker will check if it has a committed offset. If that is the case, then it will start from the latest offset. Both ClickHouse and librdKafka documentation state that the default value for `auto_offset_reset` is largest (or `latest` in new Kafka versions) but it is not, if the consumer is new:
28 |
29 | https://github.com/ClickHouse/ClickHouse/blob/f171ad93bcb903e636c9f38812b6aaf0ab045b04/src/Storages/Kafka/StorageKafka.cpp#L506
30 |
31 | `conf.set("auto.offset.reset", "earliest"); // If no offset stored for this group, read all messages from the start`
32 |
33 | If there is no offset stored or it is out of range, for that particular consumer group, the consumer will start consuming from the beginning (`earliest`), and if there is some offset stored then it should use the `latest`.
34 | The log retention policy influences which offset values correspond to the `earliest` and `latest` configurations. Consider a scenario where a topic has a retention policy set to 1 hour. Initially, you produce 5 messages, and then, after an hour, you publish 5 more messages. In this case, the latest offset will remain unchanged from the previous example. However, due to Kafka removing the earlier messages, the earliest available offset will not be 0; instead, it will be 5.
35 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/zookeeper-monitoring.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "ZooKeeper Monitoring"
3 | linkTitle: "ZooKeeper Monitoring"
4 | description: >
5 | ZooKeeper Monitoring
6 | ---
7 |
8 | ## ZooKeeper
9 |
10 | ### scrape metrics
11 |
12 | * embedded exporter since version 3.6.0
13 | * [https://zookeeper.apache.org/doc/r3.6.2/zookeeperMonitor.html](https://zookeeper.apache.org/doc/r3.6.2/zookeeperMonitor.html)
14 | * standalone exporter
15 | * [https://github.com/dabealu/zookeeper-exporter](https://github.com/dabealu/zookeeper-exporter)
16 |
17 | ### Install dashboards
18 |
19 | * embedded exporter [https://grafana.com/grafana/dashboards/10465](https://grafana.com/grafana/dashboards/10465)
20 | * dabealu exporter [https://grafana.com/grafana/dashboards/11442](https://grafana.com/grafana/dashboards/11442)
21 |
22 | See also [https://grafana.com/grafana/dashboards?search=ZooKeeper&dataSource=prometheus](https://grafana.com/grafana/dashboards?search=ZooKeeper&dataSource=prometheus)
23 |
24 | ### setup alert rules
25 |
26 | * embedded exporter [link](https://github.com/Altinity/clickhouse-operator/blob/master/deploy/prometheus/prometheus-alert-rules-zookeeper.yaml)
27 |
28 | ### See also
29 |
30 | * [https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/\#zookeeper-metrics](https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/#zookeeper-metrics)
31 | * [https://dzone.com/articles/monitoring-apache-zookeeper-servers](https://dzone.com/articles/monitoring-apache-zookeeper-servers) - note exhibitor is no longer maintained
32 | * [https://github.com/samber/awesome-prometheus-alerts/blob/c3ba0cf1997c7e952369a090aeb10343cdca4878/\_data/rules.yml\#L1146-L1170](https://github.com/samber/awesome-prometheus-alerts/blob/c3ba0cf1997c7e952369a090aeb10343cdca4878/_data/rules.yml#L1146-L1170) \(or [https://awesome-prometheus-alerts.grep.to/rules.html\#zookeeper](https://awesome-prometheus-alerts.grep.to/rules.html#zookeeper) \)
33 | * [https://alex.dzyoba.com/blog/prometheus-alerts/](https://alex.dzyoba.com/blog/prometheus-alerts/)
34 | * [https://docs.datadoghq.com/integrations/zk/?tab=host](https://docs.datadoghq.com/integrations/zk/?tab=host)
35 | * [https://statuslist.app/uptime-monitoring/zookeeper/](https://statuslist.app/uptime-monitoring/zookeeper/)
36 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/s3disk.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "S3Disk"
3 | linkTitle: "S3Disk"
4 | weight: 100
5 | description: >-
6 |
7 | ---
8 |
9 | ## Settings
10 |
11 | ```xml
12 |
13 |
14 |
15 |
16 | s3
17 | http://s3.us-east-1.amazonaws.com/BUCKET_NAME/test_s3_disk/
18 | ACCESS_KEY_ID
19 | SECRET_ACCESS_KEY
20 | true
21 | true
22 |
23 |
24 |
25 |
26 | ```
27 |
28 | * skip_access_check — if true, it's possible to use read only credentials with regular MergeTree table. But you would need to disable merges (`prefer_not_to_merge` setting) on s3 volume as well.
29 |
30 | * send_metadata — if true, ClickHouse® will populate s3 object with initial part & file path, which allow you to recover metadata from s3 and make debug easier.
31 |
32 |
33 | ## Restore metadata from S3
34 |
35 | ### Default
36 |
37 | Limitations:
38 | 1. ClickHouse need RW access to this bucket
39 |
40 | In order to restore metadata, you would need to create restore file in `metadata_path/_s3_disk_name_` directory:
41 |
42 | ```bash
43 | touch /var/lib/clickhouse/disks/_s3_disk_name_/restore
44 | ```
45 |
46 | In that case ClickHouse would restore to the same bucket and path and update only metadata files in s3 bucket.
47 |
48 | ### Custom
49 |
50 | Limitations:
51 | 1. ClickHouse needs RO access to the old bucket and RW to the new.
52 | 2. ClickHouse will copy objects in case of restoring to a different bucket or path.
53 |
54 | If you would like to change bucket or path, you need to populate restore file with settings in key=value format:
55 |
56 | ```bash
57 | cat /var/lib/clickhouse/disks/_s3_disk_name_/restore
58 |
59 | source_bucket=s3disk
60 | source_path=vol1/
61 | ```
62 |
63 | ## Links
64 |
65 | * https://altinity.com/blog/integrating-clickhouse-with-minio
66 | * https://altinity.com/blog/clickhouse-object-storage-performance-minio-vs-aws-s3
67 | * https://altinity.com/blog/tips-for-high-performance-clickhouse-clusters-with-s3-object-storage
68 |
--------------------------------------------------------------------------------
/layouts/_default/page-meta-links.html:
--------------------------------------------------------------------------------
1 | {{ if .Path }}
2 | {{ $pathFormatted := replace .Path "\\" "/" }}
3 | {{ $gh_repo := ($.Param "github_repo") }}
4 | {{ $gh_subdir := ($.Param "github_subdir") }}
5 | {{ $gh_project_repo := ($.Param "github_project_repo") }}
6 | {{ $gh_branch := (default "master" ($.Param "github_branch")) }}
7 |
41 | {{ end }}
42 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/useful-setting-to-turn-on.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Useful settings to turn on/Defaults that should be reconsidered"
3 | linkTitle: "Useful settings to turn on"
4 | weight: 100
5 | description: >-
6 | Useful settings to turn on.
7 | ---
8 |
9 | ## Useful settings to turn on/Defaults that should be reconsidered
10 |
11 | Some setting that are not enabled by default.
12 |
13 | * [ttl_only_drop_parts](https://clickhouse.com/docs/operations/settings/merge-tree-settings#ttl_only_drop_parts)
14 |
15 | Enables or disables complete dropping of data parts where all rows are expired in MergeTree tables.
16 |
17 | When ttl_only_drop_parts is disabled (by default), the ClickHouse® server only deletes expired rows according to their TTL.
18 |
19 | When ttl_only_drop_parts is enabled, the ClickHouse server drops a whole part when all rows in it are expired.
20 |
21 | Dropping whole parts instead of partial cleaning TTL-d rows allows having shorter merge_with_ttl_timeout times and lower impact on system performance.
22 |
23 | * [join_use_nulls](https://clickhouse.com/docs/en/operations/settings/settings/#join_use_nulls)
24 |
25 | Might be you not expect that join will be filled with default values for missing columns (instead of classic NULLs) during JOIN.
26 |
27 | Sets the type of JOIN behaviour. When merging tables, empty cells may appear. ClickHouse fills them differently based on this setting.
28 |
29 | Possible values:
30 |
31 | 0 — The empty cells are filled with the default value of the corresponding field type.
32 | 1 — JOIN behaves the same way as in standard SQL. The type of the corresponding field is converted to Nullable, and empty cells are filled with NULL.
33 |
34 | * [aggregate_functions_null_for_empty](https://clickhouse.com/docs/en/operations/settings/settings/#aggregate_functions_null_for_empty)
35 |
36 | Default behaviour is not compatible with ANSI SQL (ClickHouse avoids Nullable types by performance reasons)
37 |
38 | ```sql
39 | select sum(x), avg(x) from (select 1 x where 0);
40 | ┌─sum(x)─┬─avg(x)─┐
41 | │ 0 │ nan │
42 | └────────┴────────┘
43 |
44 | set aggregate_functions_null_for_empty=1;
45 |
46 | select sum(x), avg(x) from (select 1 x where 0);
47 | ┌─sumOrNull(x)─┬─avgOrNull(x)─┐
48 | │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │
49 | └──────────────┴──────────────┘
50 | ```
51 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/skip-indexes/skip-indexes-examples.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Skip indexes examples"
3 | linkTitle: "Skip indexes examples"
4 | ---
5 | ## bloom\_filter
6 |
7 | ```sql
8 | create table bftest (k Int64, x Int64) Engine=MergeTree order by k;
9 |
10 | insert into bftest select number, rand64()%565656 from numbers(10000000);
11 | insert into bftest select number, rand64()%565656 from numbers(100000000);
12 |
13 | select count() from bftest where x = 42;
14 | ┌─count()─┐
15 | │ 201 │
16 | └─────────┘
17 | 1 rows in set. Elapsed: 0.243 sec. Processed 110.00 million rows
18 |
19 |
20 | alter table bftest add index ix1(x) TYPE bloom_filter GRANULARITY 1;
21 |
22 | alter table bftest materialize index ix1;
23 |
24 |
25 | select count() from bftest where x = 42;
26 | ┌─count()─┐
27 | │ 201 │
28 | └─────────┘
29 | 1 rows in set. Elapsed: 0.056 sec. Processed 3.68 million rows
30 | ```
31 |
32 | ## minmax
33 |
34 | ```sql
35 | create table bftest (k Int64, x Int64) Engine=MergeTree order by k;
36 |
37 | -- data is in x column is correlated with the primary key
38 | insert into bftest select number, number * 2 from numbers(100000000);
39 |
40 | alter table bftest add index ix1(x) TYPE minmax GRANULARITY 1;
41 | alter table bftest materialize index ix1;
42 |
43 | select count() from bftest where x = 42;
44 | 1 rows in set. Elapsed: 0.004 sec. Processed 8.19 thousand rows
45 | ```
46 |
47 | ## projection
48 |
49 | ```sql
50 | create table bftest (k Int64, x Int64, S String) Engine=MergeTree order by k;
51 | insert into bftest select number, rand64()%565656, '' from numbers(10000000);
52 | insert into bftest select number, rand64()%565656, '' from numbers(100000000);
53 | alter table bftest add projection p1 (select k,x order by x);
54 | alter table bftest materialize projection p1 settings mutations_sync=1;
55 | set allow_experimental_projection_optimization=1 ;
56 |
57 | -- projection
58 | select count() from bftest where x = 42;
59 | 1 rows in set. Elapsed: 0.002 sec. Processed 24.58 thousand rows
60 |
61 | -- no projection
62 | select * from bftest where x = 42 format Null;
63 | 0 rows in set. Elapsed: 0.432 sec. Processed 110.00 million rows
64 |
65 | -- projection
66 | select * from bftest where k in (select k from bftest where x = 42) format Null;
67 | 0 rows in set. Elapsed: 0.316 sec. Processed 1.50 million rows
68 | ```
69 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-kafka/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Kafka"
3 | linkTitle: "Kafka"
4 | description: >
5 | Kafka
6 | ---
7 | ```bash
8 | git log -- contrib/librdkafka | git name-rev --stdin
9 | ```
10 |
11 | | **ClickHouse® version** | **librdkafka version** |
12 | | :--- | :--- |
13 | | 25.3+ ([\#63697](https://github.com/ClickHouse/ClickHouse/issues/63697)) | [2.8.0](https://github.com/confluentinc/librdkafka/blob/v2.8.0/CHANGELOG.md) + few [fixes](https://gist.github.com/filimonov/ad252aa601d4d99fb57d4d76f14aa2bf) |
14 | | 21.10+ ([\#27883](https://github.com/ClickHouse/ClickHouse/pull/27883)) | [1.6.1](https://github.com/edenhill/librdkafka/blob/v1.6.1/CHANGELOG.md) + snappy fixes + boring ssl + illumos_build fixes + edenhill#3279 fix|
15 | | 21.6+ ([\#23874](https://github.com/ClickHouse/ClickHouse/pull/23874)) | [1.6.1](https://github.com/edenhill/librdkafka/blob/v1.6.1/CHANGELOG.md) + snappy fixes + boring ssl + illumos_build fixes|
16 | | 21.1+ ([\#18671](https://github.com/ClickHouse/ClickHouse/pull/18671)) | [1.6.0-RC3](https://github.com/edenhill/librdkafka/blob/v1.6.0-RC3/CHANGELOG.md) + snappy fixes + boring ssl |
17 | | 20.13+ ([\#18053](https://github.com/ClickHouse/ClickHouse/pull/18053)) | [1.5.0](https://github.com/edenhill/librdkafka/blob/v1.5.0/CHANGELOG.md) + msan fixes + snappy fixes + boring ssl |
18 | | 20.7+ ([\#12991](https://github.com/ClickHouse/ClickHouse/pull/12991)) | [1.5.0](https://github.com/edenhill/librdkafka/blob/v1.5.0/CHANGELOG.md) + msan fixes |
19 | | 20.5+ ([\#11256](https://github.com/ClickHouse/ClickHouse/pull/11256)) | [1.4.2](https://github.com/edenhill/librdkafka/blob/v1.4.2/CHANGELOG.md) |
20 | | 20.2+ ([\#9000](https://github.com/ClickHouse/ClickHouse/pull/9000)) | [1.3.0](https://github.com/edenhill/librdkafka/releases?after=v1.4.0-PRE1) |
21 | | 19.11+ ([\#5872](https://github.com/ClickHouse/ClickHouse/pull/5872)) | [1.1.0](https://github.com/edenhill/librdkafka/releases?after=v1.1.0-selfstatic-test12) |
22 | | 19.5+ ([\#4799](https://github.com/ClickHouse/ClickHouse/pull/4799)) | [1.0.0](https://github.com/edenhill/librdkafka/releases?after=v1.0.1-RC1) |
23 | | 19.1+ ([\#4025](https://github.com/ClickHouse/ClickHouse/pull/4025)) | 1.0.0-RC5 |
24 | | v1.1.54382+ ([\#2276](https://github.com/ClickHouse/ClickHouse/pull/2276)) | [0.11.4](https://github.com/edenhill/librdkafka/releases?after=v0.11.4-adminapi-post1) |
25 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/altinity-kb-zookeeper-backup.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "ZooKeeper backup"
3 | linkTitle: "ZooKeeper backup"
4 | description: >
5 | ZooKeeper backup
6 | ---
7 |
8 | Question: Do I need to backup Zookeeper Database, because it’s pretty important for ClickHouse®?
9 |
10 | TLDR answer: **NO, just backup ClickHouse data itself, and do SYSTEM RESTORE REPLICA during recovery to recreate zookeeper data**
11 |
12 | Details:
13 |
14 | Zookeeper does not store any data, it stores the STATE of the distributed system ("that replica have those parts", "still need 2 merges to do", "alter is being applied" etc). That state always changes, and you can not capture / backup / and recover that state in a safe manner. So even backup from few seconds ago is representing some 'old state from the past' which is INCONSISTENT with actual state of the data.
15 |
16 | In other words - if ClickHouse is working - then the state of distributed system always changes, and it's almost impossible to collect the current state of zookeeper (while you collecting it it will change many times). The only exception is 'stop-the-world' scenario - i.e. shutdown all ClickHouse nodes, with all other zookeeper clients, then shutdown all the zookeeper, and only then take the backups, in that scenario and backups of zookeeper & ClickHouse will be consistent. In that case restoring the backup is as simple (and is equal to) as starting all the nodes which was stopped before. But usually that scenario is very non-practical because it requires huge downtime.
17 |
18 | So what to do instead? It's enough if you will backup ClickHouse data itself, and to recover the state of zookeeper you can just run the command `SYSTEM RESTORE REPLICA` command **AFTER** restoring the ClickHouse data itself. That will recreate the state of the replica in the zookeeper as it exists on the filesystem after backup recovery.
19 |
20 | Normally Zookeeper ensemble consists of 3 nodes, which is enough to survive hardware failures.
21 |
22 | On older version (which don't have `SYSTEM RESTORE REPLICA` command - it can be done manually, using instruction https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/#converting-from-mergetree-to-replicatedmergetree), on scale you can try [https://github.com/Altinity/clickhouse-zookeeper-recovery](https://github.com/Altinity/clickhouse-zookeeper-recovery)
23 |
--------------------------------------------------------------------------------
/.github/workflows/cla.yml:
--------------------------------------------------------------------------------
1 | name: "CLA Assistant"
2 | on:
3 | issue_comment:
4 | types: [created]
5 | pull_request_target:
6 | types: [opened,closed,synchronize]
7 |
8 | jobs:
9 | CLAssistant:
10 | runs-on: ubuntu-latest
11 | steps:
12 | - name: "CLA Assistant"
13 | if: (github.event.comment.body == 'recheck' || github.event.comment.body == 'I have read the CLA Document and I hereby sign the CLA') || github.event_name == 'pull_request_target'
14 | # Beta Release
15 | uses: cla-assistant/github-action@v2.4.0
16 | env:
17 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
18 | # the below token should have repo scope and must be manually added by you in the repository's secret
19 | PERSONAL_ACCESS_TOKEN : ${{ secrets.PERSONAL_ACCESS_TOKEN }}
20 | with:
21 | path-to-signatures: 'signatures/version1/cla.json'
22 | path-to-document: 'https://altinity.com/legal/content-licensing-agreement-cla/' # e.g. a CLA or a DCO document
23 | # branch should not be protected
24 | branch: 'main'
25 | allowlist: johnhummelAltinity,bot*
26 |
27 | #below are the optional inputs - If the optional inputs are not given, then default values will be taken
28 | #remote-organization-name: enter the remote organization name where the signatures should be stored (Default is storing the signatures in the same repository)
29 | #remote-repository-name: enter the remote repository name where the signatures should be stored (Default is storing the signatures in the same repository)
30 | #create-file-commit-message: 'For example: Creating file for storing CLA Signatures'
31 | #signed-commit-message: 'For example: $contributorName has signed the CLA in #$pullRequestNo'
32 | #custom-notsigned-prcomment: 'pull request comment with Introductory message to ask new contributors to sign'
33 | #custom-pr-sign-comment: 'The signature to be committed in order to sign the CLA'
34 | #custom-allsigned-prcomment: 'pull request comment when all contributors has signed, defaults to **CLA Assistant Lite bot** All Contributors have signed the CLA.'
35 | #lock-pullrequest-aftermerge: false - if you don't want this bot to automatically lock the pull request after merging (default - true)
36 | #use-dco-flag: true - If you are using DCO instead of CLA
37 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/altinity-kb-kill-query.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "KILL QUERY"
3 | linkTitle: "KILL QUERY"
4 | description: >
5 | KILL QUERY
6 | ---
7 | Unfortunately not all queries can be killed.
8 | `KILL QUERY` only sets a flag that must be checked by the query.
9 | A query pipeline is checking this flag before a switching to next block. If the pipeline has stuck somewhere in the middle it cannot be killed.
10 | If a query does not stop, the only way to get rid of it is to restart ClickHouse®.
11 |
12 | See also:
13 |
14 | * [https://github.com/ClickHouse/ClickHouse/issues/3964](https://github.com/ClickHouse/ClickHouse/issues/3964)
15 | * [https://github.com/ClickHouse/ClickHouse/issues/1576](https://github.com/ClickHouse/ClickHouse/issues/1576)
16 |
17 | ## How to replace a running query
18 |
19 | > Q. We are trying to abort running queries when they are being replaced with a new one. We are setting the same query id for this. In some cases this error happens:
20 | >
21 | > Query with id = e213cc8c-3077-4a6c-bc78-e8463adad35d is already running and can't be stopped
22 | >
23 | > The query is still being killed but the new one is not being executed. Do you know anything about this and if there is a fix or workaround for it?
24 |
25 | I guess you use replace_running_query + replace_running_query_max_wait_ms.
26 |
27 | Unfortunately it's not always possible to kill the query at random moment of time.
28 |
29 | Kill don't send any signals, it just set a flag. Which gets (synchronously) checked at certain moments of query execution, mostly after finishing processing one block and starting another.
30 |
31 | On certain stages (executing scalar sub-query) the query can not be killed at all. This is a known issue and requires an architectural change to fix it.
32 |
33 | > I see. Is there a workaround?
34 | >
35 | > This is our use case:
36 | >
37 | > A user requests an analytics report which has a query that takes several settings, the user makes changes to the report (e.g. to filters, metrics, dimensions...). Since the user changed what he is looking for the query results from the initial query are never used and we would like to cancel it when starting the new query (edited)
38 |
39 | You can just use 2 commands:
40 |
41 | ```sql
42 | KILL QUERY WHERE query_id = ' ... ' ASYNC
43 |
44 | SELECT ... new query ....
45 | ```
46 |
47 | in that case you don't need to care when the original query will be stopped.
48 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/slow_select_count.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Why is simple `SELECT count()` Slow in ClickHouse®?"
3 | linkTitle: "Slow `SELECT count()`"
4 | weight: 100
5 | description: >-
6 | ---
7 |
8 | ClickHouse is a columnar database that provides excellent performance for analytical queries. However, in some cases, a simple count query can be slow. In this article, we'll explore the reasons why this can happen and how to optimize the query.
9 |
10 | ### Three Strategies for Counting Rows in ClickHouse
11 |
12 | There are three ways to count rows in a table in ClickHouse:
13 |
14 | 1. `optimize_trivial_count_query`: This strategy extracts the number of rows from the table metadata. It's the fastest and most efficient way to count rows, but it only works for simple count queries.
15 |
16 | 2. `allow_experimental_projection_optimization`: This strategy uses a virtual projection called _minmax_count_projection to count rows. It's faster than scanning the table but slower than the trivial count query.
17 |
18 | 3. Scanning the smallest column in the table and reading rows from that. This is the slowest strategy and is only used when the other two strategies can't be used.
19 |
20 | ### Why Does ClickHouse Sometimes Choose the Slowest Counting Strategy?
21 |
22 | In some cases, ClickHouse may choose the slowest counting strategy even when there are faster options available. Here are some possible reasons why this can happen:
23 |
24 | 1. Row policies are used on the table: If row policies are used, ClickHouse needs to filter rows to give the proper count. You can check if row policies are used by selecting from system.row_policies.
25 |
26 | 2. Experimental light-weight delete feature was used on the table: If the experimental light-weight delete feature was used, ClickHouse may use the slowest counting strategy. You can check this by looking into parts_columns for the column named _row_exists. To do this, run the following query:
27 |
28 | ```sql
29 | SELECT DISTINCT database, table FROM system.parts_columns WHERE column = '_row_exists';
30 | ```
31 |
32 | You can also refer to this issue on GitHub for more information: https://github.com/ClickHouse/ClickHouse/issues/47930.
33 |
34 | 3. `SELECT FINAL` or `final=1` setting is used.
35 |
36 | 4. `max_parallel_replicas > 1` is used.
37 |
38 | 5. Sampling is used.
39 |
40 | 6. Some other features like `allow_experimental_query_deduplication` or `empty_result_for_aggregation_by_empty_set` is used.
41 |
--------------------------------------------------------------------------------
/content/en/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Altinity® Knowledge Base for ClickHouse®"
3 | linkTitle: "Altinity® Knowledge Base for ClickHouse®"
4 | description: "Up-to-date ClickHouse® knowledge base for every ClickHouse user."
5 | keywords:
6 | - ClickHouse Knowledge Base
7 | - Altinity Knowledge Base
8 | - ClickHouse users
9 | no_list: true
10 | cascade:
11 | - type: "docs"
12 | _target:
13 | path: "/**"
14 | ---
15 | ## Welcome to the Altinity® Knowledge Base (KB) for ClickHouse®
16 |
17 | This knowledge base is supported by [Altinity](http://altinity.com/) engineers to provide quick answers to common questions and issues involving ClickHouse.
18 |
19 | The [Altinity Knowledge Base is licensed under Apache 2.0](https://github.com/Altinity/altinityknowledgebase/blob/main/LICENSE), and available to all ClickHouse users. The information and code samples are available freely and distributed under the Apache 2.0 license.
20 |
21 | For more detailed information about Altinity services support, see the following:
22 |
23 | * [Altinity](https://altinity.com/): Providers of Altinity.Cloud, providing SOC-2 certified support for ClickHouse.
24 | * [Altinity.com Documentation](https://docs.altinity.com): Detailed guides on working with:
25 | * [Altinity.Cloud](https://docs.altinity.com/altinitycloud/)
26 | * [Altinity.Cloud Anywhere](https://docs.altinity.com/altinitycloudanywhere/)
27 | * [The Altinity Cloud Manager](https://docs.altinity.com/altinitycloud/quickstartguide/clusterviewexplore/)
28 | * [The Altinity Kubernetes Operator for ClickHouse](https://docs.altinity.com/releasenotes/altinity-kubernetes-operator-release-notes/)
29 | * [The Altinity Sink Connector for ClickHouse](https://docs.altinity.com/releasenotes/altinity-sink-connector-release-notes/) and
30 | * [Altinity Backup for ClickHouse](https://docs.altinity.com/releasenotes/altinity-backup-release-notes/)
31 | * [Altinity Blog](https://altinity.com/blog/): Blog posts about ClickHouse the database and Altinity services.
32 |
33 | The following sites are also useful references regarding ClickHouse:
34 |
35 | * [ClickHouse.com documentation](https://clickhouse.com/docs/en/): Official documentation from ClickHouse Inc.
36 | * [ClickHouse at Stackoverflow](https://stackoverflow.com/questions/tagged/clickhouse): Community driven responses to questions regarding ClickHouse
37 | * [Google groups (Usenet) yes we remember it](https://groups.google.com/g/clickhouse): The grandparent of all modern discussion boards.
38 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/cluster-production-configuration-guide/hardware-requirements.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Hardware Requirements"
3 | linkTitle: "Hardware Requirements"
4 | description: >
5 | Hardware Requirements
6 | ---
7 | ### ClickHouse®
8 |
9 | ClickHouse will use all available hardware to maximize performance. So the more hardware - the better. As of this publication, the hardware requirements are:
10 |
11 | * Minimum Hardware: 4-core CPU with support of SSE4.2, 16 Gb RAM, 1Tb HDD.
12 | * Recommended for development and staging environments.
13 | * SSE4.2 is required, and going below 4 Gb of RAM is not recommended.
14 | * Recommended Hardware: >=16-cores, >=64Gb RAM, HDD-raid or SSD.
15 | * For processing up to hundreds of millions / billions of rows.
16 |
17 | For clouds: disk throughput is the more important factor compared to IOPS. Be aware of burst / baseline disk speed difference.
18 |
19 | See also: [https://benchmark.clickhouse.com/hardware/](https://benchmark.clickhouse.com/hardware/)
20 |
21 | ### **Zookeeper**
22 |
23 | Zookeeper requires separate servers from those used for ClickHouse. Zookeeper has poor performance when installed on the same node as ClickHouse.
24 |
25 | Hardware Requirements for Zookeeper:
26 |
27 | * Fast disk speed (ideally NVMe, 128Gb should be enough).
28 | * Any modern CPU (one core, better 2)
29 | * 4Gb of RAM
30 |
31 | For clouds - be careful with burstable network disks (like gp2 on aws): you may need up to 1000 IOPs on the disk for on a long run, so gp3 with 3000 IOPs baseline is a better choice.
32 |
33 | The number of Zookeeper instances depends on the environment:
34 |
35 | * Production: 3 is an optimal number of zookeeper instances.
36 | * Development and Staging: 1 zookeeper instance is sufficient.
37 |
38 | See also:
39 |
40 | * [https://docs.altinity.com/operationsguide/clickhouse-zookeeper/](https://docs.altinity.com/operationsguide/clickhouse-zookeeper/)
41 | * [altinity-kb-proper-setup]({{}})
42 | * [zookeeper-monitoring]({{}})
43 |
44 | #### ClickHouse Hardware Configuration
45 |
46 | Configure the servers according to those recommendations the [ClickHouse Usage Recommendations](https://clickhouse.yandex/docs/en/operations/tips/).
47 |
48 | #### **Test Your Hardware**
49 |
50 | Be sure to test the following:
51 |
52 | * RAM speed.
53 | * Network speed.
54 | * Storage speed.
55 |
56 | It’s better to find any performance issues before installing ClickHouse.
57 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/altinity-kb-clickhouse-copier/_index.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "clickhouse-copier"
3 | linkTitle: "clickhouse-copier"
4 | description: >
5 | clickhouse-copier
6 | ---
7 | The description of the utility and its parameters, as well as examples of the config files that you need to create for the copier are in the official repo for the [ClickHouse® copier utility](https://github.com/clickhouse/copier/)
8 |
9 | The steps to run a task:
10 |
11 | 1. Create a config file for `clickhouse-copier` (zookeeper.xml)
12 | 2. Create a config file for the task (task1.xml)
13 | 3. Create the task in ZooKeeper and start an instance of `clickhouse-copier`
14 |
15 | `clickhouse-copier --daemon --base-dir=/opt/clickhouse-copier --config=/opt/clickhouse-copier/zookeeper.xml --task-path=/clickhouse/copier/task1 --task-file=/opt/clickhouse-copier/task1.xml`
16 |
17 | If the node in ZooKeeper already exists and you want to change it, you need to add the `task-upload-force` parameter:
18 |
19 | `clickhouse-copier --daemon --base-dir=/opt/clickhouse-copier --config=/opt/clickhouse-copier/zookeeper.xml --task-path=/clickhouse/copier/task1 --task-file=/opt/clickhouse-copier/task1.xml --task-upload-force=1`
20 |
21 | If you want to run another instance of `clickhouse-copier` for the same task, you need to copy the config file (zookeeper.xml) to another server, and run this command:
22 |
23 | `clickhouse-copier --daemon --base-dir=/opt/clickhouse-copier --config=/opt/clickhouse-copier/zookeeper.xml --task-path=/clickhouse/copier/task1`
24 |
25 | The number of simultaneously running instances is controlled be the `max_workers` parameter in your task configuration file. If you run more workers superfluous workers will sleep and log messages like this:
26 |
27 | ` ClusterCopier: Too many workers (1, maximum 1). Postpone processing`
28 |
29 | ### See also
30 |
31 | * https://github.com/clickhouse/copier/
32 | * Никита Михайлов. Кластер ClickHouse ctrl-с ctrl-v. HighLoad++ Весна 2021 [slides]( https://raw.githubusercontent.com/ClickHouse/clickhouse-presentations/master/highload2021/copier.pdf)
33 | * 21.7 have a huge bulk of fixes / improvements. https://github.com/ClickHouse/ClickHouse/pull/23518
34 | * https://altinity.com/blog/2018/8/22/clickhouse-copier-in-practice
35 | * https://github.com/getsentry/snuba/blob/master/docs/clickhouse-copier.md
36 | * https://hughsite.com/post/clickhouse-copier-usage.html
37 | * https://www.jianshu.com/p/c058edd664a6
38 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-schema-design/codecs/codecs-on-array-columns.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Codecs on array columns"
3 | linkTitle: "Codecs on array columns"
4 | description: >
5 | Codecs on array columns
6 | ---
7 |
8 |
9 | {{% alert title="Info" color="info" %}}
10 | Supported since 20.10 (PR [\#15089](https://github.com/ClickHouse/ClickHouse/pull/15089)). On older versions you will get exception:
11 | `DB::Exception: Codec Delta is not applicable for Array(UInt64) because the data type is not of fixed size.`
12 | {{% /alert %}}
13 |
14 | ```sql
15 | DROP TABLE IF EXISTS array_codec_test SYNC
16 |
17 | create table array_codec_test( number UInt64, arr Array(UInt64) ) Engine=MergeTree ORDER BY number;
18 | INSERT INTO array_codec_test SELECT number, arrayMap(i -> number + i, range(100)) from numbers(10000000);
19 |
20 |
21 | /**** Default LZ4 *****/
22 |
23 | OPTIMIZE TABLE array_codec_test FINAL;
24 | --- Elapsed: 3.386 sec.
25 |
26 |
27 | SELECT * FROM system.columns WHERE (table = 'array_codec_test') AND (name = 'arr')
28 | /*
29 | Row 1:
30 | ──────
31 | database: default
32 | table: array_codec_test
33 | name: arr
34 | type: Array(UInt64)
35 | position: 2
36 | default_kind:
37 | default_expression:
38 | data_compressed_bytes: 173866750
39 | data_uncompressed_bytes: 8080000000
40 | marks_bytes: 58656
41 | comment:
42 | is_in_partition_key: 0
43 | is_in_sorting_key: 0
44 | is_in_primary_key: 0
45 | is_in_sampling_key: 0
46 | compression_codec:
47 | */
48 |
49 |
50 |
51 | /****** Delta, LZ4 ******/
52 |
53 | ALTER TABLE array_codec_test MODIFY COLUMN arr Array(UInt64) CODEC (Delta, LZ4);
54 |
55 | OPTIMIZE TABLE array_codec_test FINAL
56 | --0 rows in set. Elapsed: 4.577 sec.
57 |
58 | SELECT * FROM system.columns WHERE (table = 'array_codec_test') AND (name = 'arr')
59 |
60 | /*
61 | Row 1:
62 | ──────
63 | database: default
64 | table: array_codec_test
65 | name: arr
66 | type: Array(UInt64)
67 | position: 2
68 | default_kind:
69 | default_expression:
70 | data_compressed_bytes: 32458310
71 | data_uncompressed_bytes: 8080000000
72 | marks_bytes: 58656
73 | comment:
74 | is_in_partition_key: 0
75 | is_in_sorting_key: 0
76 | is_in_primary_key: 0
77 | is_in_sampling_key: 0
78 | compression_codec: CODEC(Delta(8), LZ4)
79 | */
80 | ```
81 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/ttl/what-are-my-ttls.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "What are my TTL settings?"
3 | linkTitle: "What are my TTL settings"
4 | weight: 100
5 | description: >-
6 | What are my TTL settings?
7 | ---
8 |
9 | ## Using `SHOW CREATE TABLE`
10 |
11 | If you just want to see the current TTL settings on a table, you can look at the schema definition.
12 | ```
13 | SHOW CREATE TABLE events2_local
14 | FORMAT Vertical
15 |
16 | Query id: eba671e5-6b8c-4a81-a4d8-3e21e39fb76b
17 |
18 | Row 1:
19 | ──────
20 | statement: CREATE TABLE default.events2_local
21 | (
22 | `EventDate` DateTime,
23 | `EventID` UInt32,
24 | `Value` String
25 | )
26 | ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/default/events2_local', '{replica}')
27 | PARTITION BY toYYYYMM(EventDate)
28 | ORDER BY (EventID, EventDate)
29 | TTL EventDate + toIntervalMonth(1)
30 | SETTINGS index_granularity = 8192
31 | ```
32 | This works even when there's no data in the table. It does not tell you when the TTLs expire or anything specific to data in one or more of the table parts.
33 |
34 | ## Using system.parts
35 |
36 | If you want to see the actually TTL values for specific data, run a query on system.parts.
37 | There are columns listing all currently applicable TTL limits for each part.
38 | (It does not work if the table is empty because there aren't any parts yet.)
39 | ```
40 | SELECT *
41 | FROM system.parts
42 | WHERE (database = 'default') AND (table = 'events2_local')
43 | FORMAT Vertical
44 |
45 | Query id: 59106476-210f-4397-b843-9920745b6200
46 |
47 | Row 1:
48 | ──────
49 | partition: 202203
50 | name: 202203_0_0_0
51 | ...
52 | database: default
53 | table: events2_local
54 | ...
55 | delete_ttl_info_min: 2022-04-27 21:26:30
56 | delete_ttl_info_max: 2022-04-27 21:26:30
57 | move_ttl_info.expression: []
58 | move_ttl_info.min: []
59 | move_ttl_info.max: []
60 | default_compression_codec: LZ4
61 | recompression_ttl_info.expression: []
62 | recompression_ttl_info.min: []
63 | recompression_ttl_info.max: []
64 | group_by_ttl_info.expression: []
65 | group_by_ttl_info.min: []
66 | group_by_ttl_info.max: []
67 | rows_where_ttl_info.expression: []
68 | rows_where_ttl_info.min: []
69 | rows_where_ttl_info.max: []
70 | ```
71 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/altinity-kb-clickhouse-in-docker.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "ClickHouse® in Docker"
3 | linkTitle: "ClickHouse® in Docker"
4 | description: >
5 | ClickHouse® in Docker
6 | ---
7 | ## Do you have documentation on Docker deployments?
8 |
9 | Check
10 |
11 | * [https://hub.docker.com/r/clickhouse/clickhouse-server](https://hub.docker.com/r/clickhouse/clickhouse-server)
12 | * [https://docs.altinity.com/clickhouseonkubernetes/](https://docs.altinity.com/clickhouseonkubernetes/)
13 | * sources of entry point - [https://github.com/ClickHouse/ClickHouse/blob/master/docker/server/entrypoint.sh](https://github.com/ClickHouse/ClickHouse/blob/master/docker/server/entrypoint.sh)
14 |
15 | Important things:
16 |
17 | * use concrete version tag (avoid using latest)
18 | * if possible use `--network=host` (due to performance reasons)
19 | * you need to mount the folder `/var/lib/clickhouse` to have persistency.
20 | * you MAY also mount the folder `/var/log/clickhouse-server` to have logs accessible outside of the container.
21 | * Also, you may mount in some files or folders in the configuration folder:
22 | * `/etc/clickhouse-server/config.d/listen_ports.xml`
23 | * `--ulimit nofile=262144:262144`
24 | * You can also set on some linux capabilities to enable some of extra features of ClickHouse® (not obligatory): `SYS_PTRACE NET_ADMIN IPC_LOCK SYS_NICE`
25 | * you may also mount in the folder `/docker-entrypoint-initdb.d/` - all SQL or bash scripts there will be executed during container startup.
26 | * if you use cgroup limits - it may misbehave https://github.com/ClickHouse/ClickHouse/issues/2261 (set up `` manually)
27 | * there are several ENV switches, see: [https://github.com/ClickHouse/ClickHouse/blob/master/docker/server/entrypoint.sh](https://github.com/ClickHouse/ClickHouse/blob/master/docker/server/entrypoint.sh)
28 |
29 | TLDR version: use it as a starting point:
30 |
31 | ```bash
32 | docker run -d \
33 | --name some-clickhouse-server \
34 | --ulimit nofile=262144:262144 \
35 | --volume=$(pwd)/data:/var/lib/clickhouse \
36 | --volume=$(pwd)/logs:/var/log/clickhouse-server \
37 | --volume=$(pwd)/configs/memory_adjustment.xml:/etc/clickhouse-server/config.d/memory_adjustment.xml \
38 | --cap-add=SYS_NICE \
39 | --cap-add=NET_ADMIN \
40 | --cap-add=IPC_LOCK \
41 | --cap-add=SYS_PTRACE \
42 | --network=host \
43 | clickhouse/clickhouse-server:latest
44 |
45 | docker exec -it some-clickhouse-server clickhouse-client
46 | docker exec -it some-clickhouse-server bash
47 | ```
48 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-integrations/altinity-kb-kafka/error-handling.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Error handling"
3 | linkTitle: "Error handling"
4 | description: >
5 | Error handling
6 | ---
7 | ## Pre 21.6
8 |
9 | There are couple options:
10 |
11 | Certain formats which has schema in built in them (like JSONEachRow) could silently skip any unexpected fields after enabling setting `input_format_skip_unknown_fields`
12 |
13 | It's also possible to skip up to N malformed messages for each block, with used setting `kafka_skip_broken_messages` but it's also does not support all possible formats.
14 |
15 | ## After 21.6
16 |
17 | It's possible to stream messages which could not be parsed, this behavior could be enabled via setting: `kafka_handle_error_mode='stream'` and ClickHouse® wil write error and message from Kafka itself to two new virtual columns: `_error, _raw_message`.
18 |
19 | So you can create another Materialized View which would collect to a separate table all errors happening while parsing with all important information like offset and content of message.
20 |
21 | ```sql
22 | CREATE TABLE default.kafka_engine
23 | (
24 | `i` Int64,
25 | `s` String
26 | )
27 | ENGINE = Kafka
28 | SETTINGS kafka_broker_list = 'kafka:9092'
29 | kafka_topic_list = 'topic',
30 | kafka_group_name = 'clickhouse',
31 | kafka_format = 'JSONEachRow',
32 | kafka_handle_error_mode='stream';
33 |
34 | CREATE TABLE default.kafka_errors
35 | (
36 | `topic` String,
37 | `partition` Int64,
38 | `offset` Int64,
39 | `raw` String,
40 | `error` String
41 | )
42 | ENGINE = MergeTree
43 | ORDER BY (topic, partition, offset)
44 | SETTINGS index_granularity = 8192
45 |
46 |
47 | CREATE MATERIALIZED VIEW default.kafka_errors_mv TO default.kafka_errors
48 | AS
49 | SELECT
50 | _topic AS topic,
51 | _partition AS partition,
52 | _offset AS offset,
53 | _raw_message AS raw,
54 | _error AS error
55 | FROM default.kafka_engine
56 | WHERE length(_error) > 0
57 | ```
58 |
59 | ## Since 25.8
60 |
61 | dead letter queue can be used via setting: `kafka_handle_error_mode='dead_letter'` [https://github.com/ClickHouse/ClickHouse/pull/68873](https://github.com/ClickHouse/ClickHouse/pull/68873)
62 |
63 |
64 |
65 | 
66 |
67 | [https://github.com/ClickHouse/ClickHouse/pull/20249](https://github.com/ClickHouse/ClickHouse/pull/20249)
68 |
69 | [https://github.com/ClickHouse/ClickHouse/pull/21850](https://github.com/ClickHouse/ClickHouse/pull/21850)
70 |
71 | [https://altinity.com/blog/clickhouse-kafka-engine-faq](https://altinity.com/blog/clickhouse-kafka-engine-faq)
72 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-schema-design/ingestion-performance-and-formats.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Ingestion performance and formats"
3 | linkTitle: "Ingestion performance and formats"
4 | ---
5 | ```sql
6 | clickhouse-client -q 'select toString(number) s, number n, number/1000 f from numbers(100000000) format TSV' > speed.tsv
7 | clickhouse-client -q 'select toString(number) s, number n, number/1000 f from numbers(100000000) format RowBinary' > speed.RowBinary
8 | clickhouse-client -q 'select toString(number) s, number n, number/1000 f from numbers(100000000) format Native' > speed.Native
9 | clickhouse-client -q 'select toString(number) s, number n, number/1000 f from numbers(100000000) format CSV' > speed.csv
10 | clickhouse-client -q 'select toString(number) s, number n, number/1000 f from numbers(100000000) format JSONEachRow' > speed.JSONEachRow
11 | clickhouse-client -q 'select toString(number) s, number n, number/1000 f from numbers(100000000) format Parquet' > speed.parquet
12 | clickhouse-client -q 'select toString(number) s, number n, number/1000 f from numbers(100000000) format Avro' > speed.avro
13 |
14 | -- Engine=Null does not have I/O / sorting overhead
15 | -- we test only formats parsing performance.
16 |
17 | create table n (s String, n UInt64, f Float64) Engine=Null
18 |
19 |
20 | -- clickhouse-client parses formats itself
21 | -- it allows to see user CPU time -- time is used in a multithreaded application
22 | -- another option is to disable parallelism `--input_format_parallel_parsing=0`
23 | -- real -- wall / clock time.
24 |
25 | time clickhouse-client -t -q 'insert into n format TSV' < speed.tsv
26 | 2.693 real 0m2.728s user 0m14.066s
27 |
28 | time clickhouse-client -t -q 'insert into n format RowBinary' < speed.RowBinary
29 | 3.744 real 0m3.773s user 0m4.245s
30 |
31 | time clickhouse-client -t -q 'insert into n format Native' < speed.Native
32 | 2.359 real 0m2.382s user 0m1.945s
33 |
34 | time clickhouse-client -t -q 'insert into n format CSV' < speed.csv
35 | 3.296 real 0m3.328s user 0m18.145s
36 |
37 | time clickhouse-client -t -q 'insert into n format JSONEachRow' < speed.JSONEachRow
38 | 8.872 real 0m8.899s user 0m30.235s
39 |
40 | time clickhouse-client -t -q 'insert into n format Parquet' < speed.parquet
41 | 4.905 real 0m4.929s user 0m5.478s
42 |
43 | time clickhouse-client -t -q 'insert into n format Avro' < speed.avro
44 | 11.491 real 0m11.519s user 0m12.166s
45 | ```
46 |
47 | As you can see the JSONEachRow is the worst format (user 0m30.235s) for this synthetic dataset. Native is the best (user 0m1.945s). TSV / CSV are good in wall time but spend a lot of CPU (user time).
48 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-setup-and-maintenance/cgroups_k8s.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "cgroups and kubernetes cloud providers"
3 | linkTitle: "cgroups and k8s"
4 | weight: 100
5 | description: >-
6 | cgroups and kubernetes cloud providers.
7 | ---
8 |
9 | Why my ClickHouse® is slow after upgrade to version 22.2 and higher?
10 |
11 | The probable reason is that ClickHouse 22.2 started to respect cgroups (Respect cgroups limits in max_threads autodetection. [#33342](https://github.com/ClickHouse/ClickHouse/pull/33342) ([JaySon](https://github.com/JaySon-Huang)).
12 |
13 | You can observe that `max_threads = 1`
14 |
15 | ```sql
16 | SELECT
17 | name,
18 | value
19 | FROM system.settings
20 | WHERE name = 'max_threads'
21 |
22 | ┌─name────────┬─value─────┐
23 | │ max_threads │ 'auto(1)' │
24 | └─────────────┴───────────┘
25 | ```
26 |
27 | This makes ClickHouse to execute all queries with a single thread (normal behavior is half of available CPU cores, cores = 64, then 'auto(32)').
28 |
29 | We observe this cgroups behavior with AWS EKS (Kubernetes) environment and [Altinity
30 | ClickHouse Operator](https://github.com/Altinity/clickhouse-operator) in case if requests.cpu and limits.cpu are not set for a resource.
31 |
32 | ## Workaround
33 |
34 | We suggest to set requests.cpu = `half of available CPU cores`, and limits.cpu = `CPU cores`.
35 |
36 |
37 | For example in case of 16 CPU cores:
38 |
39 | ```xml
40 | resources:
41 | requests:
42 | memory: ...
43 | cpu: 8
44 | limits:
45 | memory: ....
46 | cpu: 16
47 | ```
48 |
49 |
50 | Then you should get a new result:
51 |
52 | ```sql
53 | SELECT
54 | name,
55 | value
56 | FROM system.settings
57 | WHERE name = 'max_threads'
58 |
59 | ┌─name────────┬─value─────┐
60 | │ max_threads │ 'auto(8)' │
61 | └─────────────┴───────────┘
62 | ```
63 |
64 | ## in depth
65 |
66 | For some reason AWS EKS sets cgroup kernel parameters in case of empty requests.cpu & limits.cpu into these:
67 |
68 | ```bash
69 | # cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
70 | -1
71 |
72 | # cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
73 | 100000
74 |
75 | # cat /sys/fs/cgroup/cpu/cpu.shares
76 | 2
77 | ```
78 |
79 | This makes ClickHouse to set `max_threads = 1` because of
80 |
81 | ```text
82 | cgroup_share = /sys/fs/cgroup/cpu/cpu.shares (2)
83 | PER_CPU_SHARES = 1024
84 | share_count = ceil( cgroup_share / PER_CPU_SHARES ) ---> ceil(2 / 1024) ---> 1
85 | ```
86 |
87 | ## Fix
88 |
89 | Incorrect calculation was fixed in https://github.com/ClickHouse/ClickHouse/pull/35815 and will work correctly on newer releases.
90 |
--------------------------------------------------------------------------------
/layouts/partials/page-meta-links.html:
--------------------------------------------------------------------------------
1 | {{ partial "search-input.html" . }}
2 | {{ if .File }}
3 | {{ $pathFormatted := replace .File.Path "\\" "/" }}
4 | {{ $gh_repo := ($.Param "github_repo") }}
5 | {{ $gh_subdir := ($.Param "github_subdir") }}
6 | {{ $gh_project_repo := ($.Param "github_project_repo") }}
7 | {{ $gh_branch := (default "master" ($.Param "github_branch")) }}
8 |
42 | {{ end }}
43 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-queries-and-syntax/mutations.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Mutations"
3 | linkTitle: "Mutations"
4 | description: >
5 | ALTER UPDATE / DELETE
6 | ---
7 | ## How to know if `ALTER TABLE … DELETE/UPDATE mutation ON CLUSTER` was finished successfully on all the nodes?
8 |
9 | A. mutation status in system.mutations is local to each replica, so use
10 |
11 | ```sql
12 | SELECT hostname(), * FROM clusterAllReplicas('your_cluster_name', system.mutations);
13 | -- you can also add WHERE conditions to that query if needed.
14 | ```
15 |
16 | Look on `is_done` and `latest_fail_reason` columns
17 |
18 | ## Are mutations being run in parallel or they are sequential in ClickHouse® (in scope of one table)
19 |
20 | 
21 |
22 | ClickHouse runs mutations sequentially, but it can combine several mutations in a single and apply all of them in one merge.
23 | Sometimes, it can lead to problems, when a combined expression which ClickHouse needs to execute becomes really big. (If ClickHouse combined thousands of mutations in one)
24 |
25 |
26 | Because ClickHouse stores data in independent parts, ClickHouse is able to run mutation(s) merges for each part independently and in parallel.
27 | It also can lead to high resource utilization, especially memory usage if you use `x IN (SELECT ... FROM big_table)` statements in mutation, because each merge will run and keep in memory its own HashSet. You can avoid this problem, if you will use [Dictionary approach](../update-via-dictionary) for such mutations.
28 |
29 | Parallelism of mutations controlled by settings:
30 |
31 | ```sql
32 | SELECT *
33 | FROM system.merge_tree_settings
34 | WHERE name LIKE '%mutation%'
35 |
36 | ┌─name───────────────────────────────────────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─type───┐
37 | │ max_replicated_mutations_in_queue │ 8 │ 0 │ How many tasks of mutating parts are allowed simultaneously in ReplicatedMergeTree queue. │ UInt64 │
38 | │ number_of_free_entries_in_pool_to_execute_mutation │ 20 │ 0 │ When there is less than specified number of free entries in pool, do not execute part mutations. This is to leave free threads for regular merges and avoid "Too many parts" │ UInt64 │
39 | └────────────────────────────────────────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────┘
40 | ```
41 |
--------------------------------------------------------------------------------
/content/en/engines/mergetree-table-engine-family/merge-performance-final-optimize-by.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Merge performance and OPTIMIZE FINAL"
3 | linkTitle: "Merge performance and OPTIMIZE FINAL"
4 | ---
5 |
6 | ## Merge Performance
7 |
8 | Main things affecting the merge speed are:
9 |
10 | * Schema (especially compression codecs, some bad types, sorting order...)
11 | * Horizontal vs Vertical merge
12 | * Horizontal = reads all columns at once, do merge sort, write new part
13 | * Vertical = first read columns from order by, do merge sort, write them to disk, remember permutation, then process the rest of columns on by one, applying permutation.
14 | * compact vs wide parts
15 | * Other things like server load, concurrent merges...
16 |
17 | ```sql
18 | SELECT name, value
19 | FROM system.merge_tree_settings
20 | WHERE name LIKE '%vert%';
21 |
22 | │ enable_vertical_merge_algorithm │ 1
23 | │ vertical_merge_algorithm_min_rows_to_activate │ 131072
24 | │ vertical_merge_algorithm_min_columns_to_activate │ 11
25 | ```
26 |
27 | * **Vertical merge** will be used if part has more than 131072 rows and more than 11 columns in the table.
28 |
29 | ```sql
30 | -- Disable Vertical Merges
31 | ALTER TABLE test MODIFY SETTING enable_vertical_merge_algorithm = 0
32 | ```
33 |
34 | * **Horizontal merge** used by default, will use more memory if there are more than 80 columns in the table
35 |
36 | ## OPTIMIZE TABLE example FINAL DEDUPLICATE BY expr
37 |
38 | When using
39 | [deduplicate](/altinity-kb-schema-design/row-level-deduplication/)
40 | feature in `OPTIMIZE FINAL`, the question is which row will remain and won't be deduped?
41 |
42 | For SELECT operations ClickHouse® does not guarantee the order of the resultset unless you specify ORDER BY. This random ordering is affected by different parameters, like for example `max_threads`.
43 |
44 | In a merge operation ClickHouse reads rows sequentially in storage order, which is determined by ORDER BY specified in CREATE TABLE statement, and only the first unique row in that order survives deduplication. So it is a bit different from how SELECT actually works. As FINAL clause is used then ClickHouse will merge all rows across all partitions (If it is not specified then the merge operation will be done per partition), and so the first unique row of the first partition will survive deduplication. Merges are single-threaded because it is too complicated to apply merge ops in-parallel, and it generally makes no sense.
45 |
46 | * [https://github.com/ClickHouse/ClickHouse/pull/17846](https://github.com/ClickHouse/ClickHouse/pull/17846)
47 | * [https://clickhouse.com/docs/en/sql-reference/statements/optimize/](https://clickhouse.com/docs/en/sql-reference/statements/optimize/)
48 |
--------------------------------------------------------------------------------
/content/en/upgrade/removing-lost-parts.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Removing lost parts"
3 | linkTitle: "Removing lost parts"
4 | description: >
5 | Removing lost parts
6 | ---
7 |
8 | ## There might be parts left in ZooKeeper that don't exist on disk
9 |
10 | The explanation is here https://github.com/ClickHouse/ClickHouse/pull/26716
11 |
12 | The problem is introduced in ClickHouse® 20.1.
13 |
14 | The problem is fixed in 21.8 and backported to 21.3.16, 21.6.9, 21.7.6.
15 |
16 | ## Regarding the procedure to reproduce the issue:
17 |
18 | The procedure was not confirmed, but I think it should work.
19 |
20 | 1) Wait for a merge on a particular partition (or run an OPTIMIZE to trigger one)
21 | At this point you can collect the names of parts participating in the merge from the system.merges table, or the system.parts table.
22 |
23 | 2) When the merge finishes, stop one of the replicas before the inactive parts are dropped (or detach the table).
24 |
25 | 3) Bring the replica back up (or attach the table).
26 | Check that there are no inactive parts in system.parts, but they stayed in ZooKeeper.
27 | Also check that the inactive parts got removed from ZooKeeper for another replica.
28 | Here is the query to check ZooKeeper:
29 | ```
30 | select name, ctime from system.zookeeper
31 | where path='/replicas//parts/'
32 | and name like ''
33 | ```
34 |
35 | 4) Drop the partition on the replica that DOES NOT have those extra parts in ZooKeeper.
36 | Check the list of parts in ZooKeeper.
37 | We hope that after this the parts on disk will be removed on all replicas, but one of the replicas will still have some parts left in ZooKeeper.
38 | If this happens, then we think that after a restart of the replica with extra parts in ZooKeeper it will try to download them from another replica.
39 |
40 | ## A query to find 'forgotten' parts
41 |
42 | https://kb.altinity.com/altinity-kb-useful-queries/parts-consistency/#compare-the-list-of-parts-in-zookeeper-with-the-list-of-parts-on-disk
43 |
44 | ## A query to drop empty partitions with failing replication tasks
45 |
46 | ```sql
47 | select 'alter table '||database||'.'||table||' drop partition id '''||partition_id||''';'
48 | from (
49 | select database, table, splitByChar('_',new_part_name)[1] partition_id
50 | from system.replication_queue
51 | where type='GET_PART' and not is_currently_executing and create_time < toStartOfDay(yesterday())
52 | group by database, table, partition_id) q
53 | left join
54 | (select database, table, partition_id, countIf(active) cnt_active, count() cnt_total
55 | from system.parts group by database, table, partition_id
56 | ) p using database, table, partition_id
57 | where cnt_active=0
58 | ```
59 |
--------------------------------------------------------------------------------
/content/en/altinity-kb-dictionaries/altinity-kb-sparse_hashed-vs-hashed.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "SPARSE_HASHED VS HASHED vs HASHED_ARRAY"
3 | linkTitle: "SPARSE_HASHED VS HASHED vs HASHED_ARRAY"
4 | description: >
5 | SPARSE_HASHED VS HASHED VS HASHED_ARRAY
6 | ---
7 | Sparse_hashed and hashed_array layouts are supposed to save memory but has some downsides. We can test it with the following:
8 |
9 | ```sql
10 | create table orders(id UInt64, price Float64)
11 | Engine = MergeTree() order by id;
12 |
13 | insert into orders select number, 0 from numbers(5000000);
14 |
15 | CREATE DICTIONARY orders_hashed (id UInt64, price Float64)
16 | PRIMARY KEY id SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000
17 | TABLE orders DB 'default' USER 'default'))
18 | LIFETIME(MIN 0 MAX 0) LAYOUT(HASHED());
19 |
20 | CREATE DICTIONARY orders_sparse (id UInt64, price Float64)
21 | PRIMARY KEY id SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000
22 | TABLE orders DB 'default' USER 'default'))
23 | LIFETIME(MIN 0 MAX 0) LAYOUT(SPARSE_HASHED());
24 |
25 | CREATE DICTIONARY orders_hashed_array (id UInt64, price Float64)
26 | PRIMARY KEY id SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000
27 | TABLE orders DB 'default' USER 'default'))
28 | LIFETIME(MIN 0 MAX 0) LAYOUT(HASHED_ARRAY());
29 |
30 | SELECT
31 | name,
32 | type,
33 | status,
34 | element_count,
35 | formatReadableSize(bytes_allocated) AS RAM
36 | FROM system.dictionaries
37 | WHERE name LIKE 'orders%'
38 | ┌─name────────────────┬─type─────────┬─status─┬─element_count─┬─RAM────────┐
39 | │ orders_hashed_array │ HashedArray │ LOADED │ 5000000 │ 68.77 MiB │
40 | │ orders_sparse │ SparseHashed │ LOADED │ 5000000 │ 76.30 MiB │
41 | │ orders_hashed │ Hashed │ LOADED │ 5000000 │ 256.00 MiB │
42 | └─────────────────────┴──────────────┴────────┴───────────────┴────────────┘
43 |
44 | SELECT sum(dictGet('default.orders_hashed', 'price', toUInt64(number))) AS res
45 | FROM numbers(10000000)
46 | ┌─res─┐
47 | │ 0 │
48 | └─────┘
49 | 1 rows in set. Elapsed: 0.546 sec. Processed 10.01 million rows ...
50 |
51 | SELECT sum(dictGet('default.orders_sparse', 'price', toUInt64(number))) AS res
52 | FROM numbers(10000000)
53 | ┌─res─┐
54 | │ 0 │
55 | └─────┘
56 | 1 rows in set. Elapsed: 1.422 sec. Processed 10.01 million rows ...
57 |
58 | SELECT sum(dictGet('default.orders_hashed_array', 'price', toUInt64(number))) AS res
59 | FROM numbers(10000000)
60 | ┌─res─┐
61 | │ 0 │
62 | └─────┘
63 | 1 rows in set. Elapsed: 0.558 sec. Processed 10.01 million rows ...
64 | ```
65 |
66 | As you can see **SPARSE_HASHED** is memory efficient and use about 3 times less memory (!!!) but is almost 3 times slower as well. On the other side **HASHED_ARRAY** is even more efficient in terms of memory usage and maintains almost the same performance as **HASHED** layout.
67 |
--------------------------------------------------------------------------------