├── README.md ├── binlog_parse_clickhouse.py ├── config.yaml ├── mysql_to_clickhouse_schema.py ├── mysql_to_clickhouse_schema_all.py ├── mysql_to_clickhouse_schema_test.py └── mysql_to_clickhouse_sync_pagination.py /README.md: -------------------------------------------------------------------------------- 1 | #### 使用场景: 从MySQL8.0实时解析binlog并复制到ClickHouse,适用于将MySQL8.0迁移至ClickHouse(ETL抽数据工具) 2 | 3 | #### 原理: 4 | 5 | 将解析 binlog 和执行 SQL 语句两个过程分别由两个线程来执行。 6 | 7 | 其中,解析 binlog 的线程每次解析完一个事件后通过队列将 SQL 语句传给 SQL 执行线程, 8 | SQL 执行线程从队列中取出 SQL 语句并按顺序依次执行,这样就保证了 SQL 语句的串行执行。 9 | 10 | ----------------------------------- 11 | #### ClickHouse 使用: 12 | 13 | #### 1)安装: 14 | 15 | ```shell> pip3 install clickhouse-driver pymysql mysql-replication -i "http://mirrors.aliyun.com/pypi/simple" --trusted-host "mirrors.aliyun.com"``` 16 | 17 | 注:clickhouse_driver库需要调用ssl,由于python3.10之后版本不在支持libressl使用ssl,需要用openssl1.1.1版本或者更高版本 18 | 19 | 参见:python3.10编译安装报SSL失败解决方法 20 | 21 | https://blog.csdn.net/mdh17322249/article/details/123966953 22 | 23 | #### 2)MySQL表结构转换为ClickHouse表结构 24 | ``` shell> vim mysql_to_clickhouse_schema.py(修改脚本里的配置信息)``` 25 | 26 | ##### 注:mysql_to_clickhouse_schema_test.py(该工具仅为单表测试使用) 27 | ##### mysql_to_clickhouse_schema_all.py(该工具将MySQL实例下的所有库 转换到 ClickHouse实例的相应库下) 28 | 29 | 运行 30 | 31 | ``` shell> python3 mysql_to_clickhouse_schema.py``` 32 | 33 | 原理:连接MySQL获取表结构schema,然后在ClickHouse里执行建表语句。 34 | 35 | #### 3) MySQL全量数据迁移至ClickHouse步骤: 36 | ```shell> python3 mysql_to_clickhouse_sync.py --mysql_host 192.168.198.239 --mysql_port 3336 --mysql_user admin --mysql_password 123456 --mysql_db yourDB --clickhouse_host 192.168.176.204 --clickhouse_port 9000 --clickhouse_user 123456 --clickhouse_password 123456 --clickhouse_database yourDB --batch_size 1000 --max_workers 10``` 37 | 38 | ##### 注:没有自增主键的表,采用LIMIT offset, limit分页方式拉取数据。默认并行10张表同时导出数据,每次轮询取1000条数据。 39 | 40 | 会在工具目录下,生成metadata.txt文件(将binlog文件名、位置点和GTID信息保存到metadata.txt文件中) 41 | 42 | #### 4)(ETL抽数据工具)将MySQL8.0迁移至ClickHouse(增量) 43 | ``` shell> vim binlog_parse_clickhouse.py(修改脚本里的配置信息)``` 44 | 45 | 前台运行 46 | 47 | ```shell> python3 binlog_parse_clickhouse.py -f config.yaml``` 48 | 49 | 后台运行 50 | 51 | ```shell> nohup python3 binlog_parse_clickhouse.py -f config.yaml > from_mysql_to_clickhouse.log 2>&1 &``` 52 | 53 | 54 | 55 | 56 | -------------------------------------------------------------------------------- /binlog_parse_clickhouse.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | - 从MySQL8.0实时解析binlog并复制到ClickHouse,适用于将MySQL8.0迁移至ClickHouse(ETL抽数据工具) 4 | - 支持DDL和DML语句操作(注:解析binlog转换(ClickHouse)CREATE TABLE语法较复杂,暂不支持) 5 | - 支持只同步某几张表或忽略同步某几张表 6 | - python3.10编译安装报SSL失败解决方法 7 | - https://blog.csdn.net/mdh17322249/article/details/123966953 8 | """ 9 | 10 | import os,sys 11 | import pymysql 12 | import signal 13 | import atexit 14 | import re 15 | import time,datetime 16 | import json 17 | from queue import Queue 18 | from threading import Thread 19 | from pymysqlreplication import BinLogStreamReader 20 | from pymysqlreplication.row_event import ( 21 | WriteRowsEvent, 22 | UpdateRowsEvent, 23 | DeleteRowsEvent 24 | ) 25 | from pymysqlreplication.event import QueryEvent 26 | from pymysqlreplication.event import GtidEvent 27 | from clickhouse_driver import Client 28 | import logging 29 | import yaml 30 | import argparse 31 | 32 | # 创建命令行参数解析器 33 | parser = argparse.ArgumentParser() 34 | # 添加-f/--file参数,用于指定db.yaml文件的路径 35 | parser.add_argument("-f", "--file", required=True, help="Path to db.yaml file") 36 | 37 | # 解析命令行参数 38 | args = parser.parse_args() 39 | 40 | # 获取传入的db.yaml文件路径 41 | file_path = args.file 42 | 43 | # 读取YAML配置文件 44 | with open(file_path, 'r') as file: 45 | config = yaml.safe_load(file) 46 | 47 | # 将YAML配置信息转换为变量 48 | source_mysql_settings = config.get('source_mysql_settings', {}) 49 | source_server_id = config.get('source_server_id', None) 50 | binlog_file = config.get('binlog_file', '') 51 | binlog_pos = config.get('binlog_pos', 4) 52 | ignore_tables = config.get('ignore_tables', None) # 默认值修改为None 53 | ignore_prefixes = config.get('ignore_prefixes', None) 54 | repl_tables = config.get('repl_tables', None) 55 | repl_prefixes = config.get('repl_prefixes', None) 56 | target_clickhouse_settings = config.get('target_clickhouse_settings', {}) 57 | clickhouse_cluster_name = config.get('clickhouse_cluster_name', None) 58 | LOG_FILE = config.get('LOG_FILE', '') 59 | 60 | # 配置日志记录 61 | logging.basicConfig(filename=LOG_FILE, level=logging.INFO, format='%(asctime)s %(levelname)s: %(message)s') 62 | 63 | #################以下代码不用修改################# 64 | def convert_mysql_to_clickhouse(mysql_sql): 65 | # 定义 MySQL 和 ClickHouse 数据类型的映射关系 66 | type_mapping = { 67 | 'bit': 'UInt8', 68 | 'tinyint': 'Int8', 69 | 'smallint': 'Int16', 70 | 'int': 'Int32', 71 | 'bigint': 'Int64', 72 | 'float': 'Float32', 73 | 'double': 'Float64', 74 | 'decimal': 'Decimal', 75 | 'char': 'String', 76 | 'varchar': 'String', 77 | 'text': 'String', 78 | 'mediumtext': 'String', 79 | 'longtext': 'String', 80 | 'enum': 'String', 81 | 'set': 'String', 82 | 'blob': 'String', 83 | 'varbinary': 'String', 84 | 'time': 'FixedString(8)', 85 | 'datetime': 'DateTime', 86 | 'timestamp': 'DateTime', 87 | 'date': 'DateTime' 88 | # 添加更多的映射关系... 89 | } 90 | 91 | # 对于每个映射关系进行替换 92 | clickhouse_sql = mysql_sql 93 | for mysql_type, clickhouse_type in type_mapping.items(): 94 | clickhouse_sql = re.sub(r'\b{}\b'.format(mysql_type), clickhouse_type, clickhouse_sql, flags=re.IGNORECASE) 95 | 96 | """ 97 | 解析binlog转换(ClickHouse)CREATE TABLE语法较复杂,想办法解决。 98 | 代码块…… 99 | """ 100 | 101 | if 'clickhouse_cluster_name' in globals(): 102 | clickhouse_sql = re.sub(r'\badd\b', f' ON CLUSTER {clickhouse_cluster_name} ADD COLUMN', clickhouse_sql, flags=re.IGNORECASE) 103 | clickhouse_sql = re.sub(r'\badd\s+column\b', f' ON CLUSTER {clickhouse_cluster_name} ADD COLUMN', clickhouse_sql, flags=re.IGNORECASE) 104 | clickhouse_sql = re.sub(r'\bdrop\b', f' ON CLUSTER {clickhouse_cluster_name} DROP COLUMN', clickhouse_sql, flags=re.IGNORECASE) 105 | clickhouse_sql = re.sub(r'\bdrop\s+column\b', f' ON CLUSTER {clickhouse_cluster_name} DROP COLUMN', clickhouse_sql, flags=re.IGNORECASE) 106 | clickhouse_sql = re.sub(r'\bmodify\b', f' ON CLUSTER {clickhouse_cluster_name} MODIFY COLUMN', clickhouse_sql, flags=re.IGNORECASE) 107 | else: 108 | clickhouse_sql = re.sub(r'\badd\b', 'ADD COLUMN', clickhouse_sql, flags=re.IGNORECASE) 109 | clickhouse_sql = re.sub(r'\badd\s+column\b', 'ADD COLUMN', clickhouse_sql, flags=re.IGNORECASE) 110 | clickhouse_sql = re.sub(r'\bdrop\b', 'DROP COLUMN', clickhouse_sql, flags=re.IGNORECASE) 111 | clickhouse_sql = re.sub(r'\bdrop\s+column\b', 'DROP COLUMN', clickhouse_sql, flags=re.IGNORECASE) 112 | clickhouse_sql = re.sub(r'\bmodify\b', 'MODIFY COLUMN', clickhouse_sql, flags=re.IGNORECASE) 113 | 114 | # 当rename table t1 to t2不被转换;当alter table t1 rename cid to cid2才会被转换 115 | if re.search(r'(? {binlog_pos}') 440 | except AttributeError as e: 441 | save_binlog_pos(current_binlog_file, binlog_pos) 442 | else: 443 | save_binlog_pos(current_binlog_file, binlog_pos) 444 | 445 | except KeyboardInterrupt: 446 | save_binlog_pos(current_binlog_file, binlog_pos) 447 | break 448 | 449 | except pymysql.err.OperationalError as e: 450 | print("MySQL Error {}: {}".format(e.args[0], e.args[1])) 451 | 452 | # 等待所有 SQL 语句执行完毕 453 | sql_queue.join() 454 | 455 | # 在程序退出时保存 binlog 位置 456 | atexit.register(exit_handler, stream, current_binlog_file, binlog_pos) 457 | 458 | # 接收 SIGTERM 和 SIGINT 信号 459 | signal.signal(signal.SIGTERM, save_binlog_pos_on_termination) 460 | signal.signal(signal.SIGINT, save_binlog_pos_on_termination) 461 | 462 | # 关闭连接 463 | atexit.register(target_conn.close) 464 | -------------------------------------------------------------------------------- /config.yaml: -------------------------------------------------------------------------------- 1 | source_mysql_settings: 2 | host: "192.168.198.239" 3 | port: 6666 4 | user: "admin" 5 | passwd: "123456" 6 | database: "test" 7 | charset: "utf8mb4" 8 | 9 | source_server_id: 66661 10 | 11 | binlog_file: "mysql-bin.000001" 12 | binlog_pos: 4 13 | 14 | # 可以根据需求取消以下注释并进行配置 15 | #ignore_tables: 16 | # - "t1" 17 | # - "yy" 18 | 19 | #ignore_prefixes: 20 | # - "^user_.*$" 21 | # - "^opt_.*$" 22 | 23 | #repl_tables: 24 | # - "nba" 25 | 26 | #repl_prefixes: 27 | # - "^rsz_.*$" 28 | 29 | target_clickhouse_settings: 30 | host: "192.168.176.204" 31 | port: 9000 32 | user: "admin" 33 | password: "123456" 34 | database: "cktest" 35 | 36 | # 如果需要设置ClickHouse集群名称,取消以下注释并修改相应值 37 | #clickhouse_cluster_name: "perftest_1shards_3replicas" 38 | 39 | LOG_FILE: "ck_repl_status.log" 40 | 41 | -------------------------------------------------------------------------------- /mysql_to_clickhouse_schema.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # MySQL表结构转换为ClickHouse表结构,仅为单库。 3 | 4 | import pymysql 5 | import re 6 | from clickhouse_driver import Client 7 | import logging 8 | 9 | #################修改以下配置配置信息################# 10 | # 配置信息 11 | MYSQL_HOST = "192.168.176.204" 12 | MYSQL_PORT = 3306 13 | MYSQL_USER = "admin" 14 | MYSQL_PASSWORD = "123456" 15 | MYSQL_DATABASE = "innodb_bts" 16 | 17 | CLICKHOUSE_HOST = "192.168.176.204" 18 | CLICKHOUSE_PORT = 9000 19 | CLICKHOUSE_USER = "admin" 20 | CLICKHOUSE_PASSWORD = "123456" 21 | CLICKHOUSE_DATABASE = "test_bts" 22 | 23 | # 设置ClickHouse集群的名字,这样方便在所有节点上同时创建表引擎ReplicatedMergeTree 24 | # 通过select * from system.clusters命令查看集群的名字 25 | #clickhouse_cluster_name = "perftest_1shards_3replicas" 26 | 27 | # 设置表引擎 28 | TABLE_ENGINE = "MySQL('192.168.176.204:3306', 'innodb_bts', 'tablelink', 'admin', '123456')" 29 | """ 30 | 说明:1) MySQL 表引擎可以提升复杂SQL查询性能,特别是对于小型到中型数据集,且执行频率不高的应用场景,例如BI报表凌晨跑批。 31 | 2) 权限要设置跟default用户一样,库名需要赋予ON *.* 32 | 3) tablelink为硬编码模板,后面要跟根据表名将其替换,这里需写死 33 | """ 34 | 35 | #TABLE_ENGINE = "MergeTree" 36 | #TABLE_ENGINE = "ReplicatedMergeTree" 37 | 38 | LOG_FILE = "convert_error.log" 39 | # 配置日志记录 40 | logging.basicConfig(filename=LOG_FILE, level=logging.INFO, format='%(asctime)s %(levelname)s: %(message)s') 41 | 42 | 43 | #################以下代码不用修改################# 44 | def convert_field_type(field_type): 45 | """ 46 | 将MySQL字段类型转换为ClickHouse字段类型 47 | """ 48 | field_type = field_type.split()[0] 49 | if "tinyint" in field_type: 50 | return "Int8" 51 | elif "smallint" in field_type: 52 | return "Int16" 53 | elif "mediumint" in field_type: 54 | return "Int32" 55 | elif field_type.startswith("int"): 56 | return "Int32" 57 | elif field_type.startswith("bigint"): 58 | return "Int64" 59 | elif "float" in field_type: 60 | return "Float32" 61 | elif "double" in field_type or "numeric" in field_type: 62 | return "Float64" 63 | elif "decimal" in field_type: 64 | precision_scale = re.search(r'\((.*?)\)', field_type).group(1) 65 | precision, scale = precision_scale.split(',') 66 | return f"Decimal({precision}, {scale})" 67 | elif "datetime" in field_type or "timestamp" in field_type or "date" in field_type: 68 | return "DateTime" 69 | elif "char" in field_type or "varchar" in field_type or "text" in field_type or "enum" in field_type or "set" in field_type: 70 | return "String" 71 | elif "bit" in field_type: 72 | return "UInt8" 73 | elif "time" in field_type: 74 | return "FixedString(8)" 75 | elif "blob" in field_type: 76 | return "String" 77 | elif "varbinary" in field_type: 78 | return "String" 79 | elif field_type.startswith("bit"): 80 | return "UInt8" 81 | else: 82 | raise ValueError(f"无法转化未知 MySQL 字段类型:{field_type}") 83 | 84 | 85 | def convert_mysql_to_clickhouse(mysql_conn, mysql_database, table_name, clickhouse_conn, clickhouse_database): 86 | """ 87 | 将MySQL表结构转换为ClickHouse 88 | """ 89 | global TABLE_ENGINE 90 | # 获取MySQL表结构 91 | mysql_cursor = mysql_conn.cursor() 92 | mysql_cursor.execute(f"SHOW KEYS FROM {mysql_database}.{table_name} WHERE Key_name = 'PRIMARY'") 93 | mysql_primary_key = [key[4] for key in mysql_cursor.fetchall()] 94 | 95 | mysql_cursor.execute(f"DESCRIBE {mysql_database}.{table_name}") 96 | mysql_columns = mysql_cursor.fetchall() 97 | 98 | # 创建ClickHouse表 99 | clickhouse_columns = [] 100 | if 'clickhouse_cluster_name' in globals() and TABLE_ENGINE == 'ReplicatedMergeTree': 101 | create_statement = f"CREATE TABLE IF NOT EXISTS {clickhouse_database}.{table_name} ON CLUSTER {clickhouse_cluster_name} (" 102 | else: 103 | create_statement = "CREATE TABLE IF NOT EXISTS " + clickhouse_database + "." + table_name + " (" 104 | for mysql_column in mysql_columns: 105 | column_name = mysql_column[0] 106 | column_type = mysql_column[1] 107 | 108 | # 转换字段类型 109 | clickhouse_type = convert_field_type(column_type) 110 | 111 | # 拼接SQL语句 112 | create_statement += f"{column_name} {clickhouse_type}," 113 | 114 | # 添加到columns列表中 115 | clickhouse_columns.append(column_name) 116 | 117 | # 设置主键 118 | primary_key_str = ",".join(mysql_primary_key) 119 | if 'mysql' not in TABLE_ENGINE.lower(): 120 | create_statement += f"PRIMARY KEY ({primary_key_str})" 121 | 122 | if 'mysql' in TABLE_ENGINE.lower(): 123 | # 保存初始值 124 | original_table_engine = TABLE_ENGINE 125 | 126 | TABLE_ENGINE = TABLE_ENGINE.replace('tablelink', table_name) 127 | create_statement = create_statement[:-1] 128 | create_statement += ") ENGINE = " + TABLE_ENGINE 129 | 130 | # 恢复为初始值 131 | TABLE_ENGINE = original_table_engine 132 | 133 | elif TABLE_ENGINE == "MergeTree": 134 | # 设置存储引擎为 MergeTree 135 | create_statement += ") ENGINE = MergeTree ORDER BY " + ','.join(mysql_primary_key) 136 | else: 137 | # 设置存储引擎为 ReplicatedMergeTree 138 | create_statement += f") ENGINE = ReplicatedMergeTree('/clickhouse/tables/{{shard}}/{table_name}', '{{replica}}') ORDER BY " + ','.join(mysql_primary_key) 139 | # 双括号{{ }}来表示'{shard}'和'{replica}'是作为字符串文本插入的固定值。 140 | 141 | # 执行SQL语句 142 | try: 143 | clickhouse_cursor = clickhouse_conn.execute(create_statement) 144 | # 输出ClickHouse表结构 145 | print(f"ClickHouse create statement: {create_statement}" + "\n") 146 | except Exception as e: 147 | print(f"{table_name}表 - 执行SQL语句失败!详见当前目录下生成的错误日志. \n") 148 | logging.error(f"{table_name}表 - 执行SQL语句失败:{create_statement} \n") 149 | logging.error(f"错误信息:{e}") 150 | 151 | 152 | def convert_mysql_database_to_clickhouse(mysql_conn, mysql_database, clickhouse_conn, clickhouse_database): 153 | """ 154 | 将MySQL数据库表结构转换为ClickHouse 155 | """ 156 | # 获取MySQL库中所有表 157 | mysql_cursor = mysql_conn.cursor() 158 | mysql_cursor.execute(f"SHOW TABLES FROM {mysql_database}") 159 | tables = mysql_cursor.fetchall() 160 | 161 | # 遍历所有表进行转换 162 | for table in tables: 163 | table_name = table[0] 164 | convert_mysql_to_clickhouse(mysql_conn, mysql_database, table_name, clickhouse_conn, clickhouse_database) 165 | 166 | if __name__ == "__main__": 167 | # 连接MySQL数据库 168 | mysql_conn = pymysql.connect( 169 | host=MYSQL_HOST, 170 | port=MYSQL_PORT, 171 | user=MYSQL_USER, 172 | password=MYSQL_PASSWORD, 173 | database=MYSQL_DATABASE 174 | ) 175 | 176 | # 连接ClickHouse数据库 177 | clickhouse_conn = Client(host=CLICKHOUSE_HOST, port=CLICKHOUSE_PORT, user=CLICKHOUSE_USER, password=CLICKHOUSE_PASSWORD, database=CLICKHOUSE_DATABASE) 178 | 179 | # 转换表结构 180 | convert_mysql_database_to_clickhouse(mysql_conn, MYSQL_DATABASE, clickhouse_conn, CLICKHOUSE_DATABASE) 181 | -------------------------------------------------------------------------------- /mysql_to_clickhouse_schema_all.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # MySQL表结构转换为ClickHouse表结构,将MySQL实例下的所有库,转换到ClickHouse实例的相应库下。 3 | import pymysql 4 | import re 5 | from clickhouse_driver import Client 6 | import logging 7 | 8 | #################修改以下配置配置信息################# 9 | # 配置信息 10 | MYSQL_HOST = "192.168.198.239" 11 | MYSQL_PORT = 3336 12 | MYSQL_USER = "admin" 13 | MYSQL_PASSWORD = "hechunyang" 14 | MYSQL_DATABASE = "hcy" 15 | 16 | CLICKHOUSE_HOST = "192.168.176.204" 17 | CLICKHOUSE_PORT = 9000 18 | CLICKHOUSE_USER = "hechunyang" 19 | CLICKHOUSE_PASSWORD = "123456" 20 | CLICKHOUSE_DATABASE = "hcy" 21 | 22 | # 设置ClickHouse集群的名字,这样方便在所有节点上同时创建表引擎ReplicatedMergeTree 23 | # 通过select * from system.clusters命令查看集群的名字 24 | #clickhouse_cluster_name = "perftest_1shards_3replicas" 25 | 26 | # 设置表引擎 27 | TABLE_ENGINE = "MergeTree" 28 | #TABLE_ENGINE = "ReplicatedMergeTree" 29 | 30 | LOG_FILE = "convert_error.log" 31 | # 配置日志记录 32 | logging.basicConfig(filename=LOG_FILE, level=logging.INFO, format='%(asctime)s %(levelname)s: %(message)s') 33 | 34 | #################以下代码不用修改################# 35 | 36 | def convert_field_type(field_type): 37 | """ 38 | 将MySQL字段类型转换为ClickHouse字段类型 39 | """ 40 | field_type = field_type.split()[0] 41 | if "tinyint" in field_type: 42 | return "Int8" 43 | elif "smallint" in field_type: 44 | return "Int16" 45 | elif "mediumint" in field_type: 46 | return "Int32" 47 | elif field_type.startswith("int"): 48 | return "Int32" 49 | elif field_type.startswith("bigint"): 50 | return "Int64" 51 | elif "float" in field_type: 52 | return "Float32" 53 | elif "double" in field_type or "numeric" in field_type: 54 | return "Float64" 55 | elif "decimal" in field_type: 56 | precision_scale = re.search(r'\((.*?)\)', field_type).group(1) 57 | precision, scale = precision_scale.split(',') 58 | return f"Decimal({precision}, {scale})" 59 | elif "datetime" in field_type or "timestamp" in field_type or "date" in field_type: 60 | return "DateTime" 61 | elif "char" in field_type or "varchar" in field_type or "text" in field_type or "enum" in field_type or "set" in field_type: 62 | return "String" 63 | elif "bit" in field_type: 64 | return "UInt8" 65 | elif "time" in field_type: 66 | return "FixedString(8)" 67 | elif "blob" in field_type: 68 | return "String" 69 | elif "varbinary" in field_type: 70 | return "String" 71 | elif field_type.startswith("bit"): 72 | return "UInt8" 73 | else: 74 | raise ValueError(f"无法转化未知 MySQL 字段类型:{field_type}") 75 | 76 | 77 | def convert_mysql_to_clickhouse(mysql_conn, mysql_database, table_name, clickhouse_conn, clickhouse_database): 78 | """ 79 | 将MySQL表结构转换为ClickHouse 80 | """ 81 | # 获取MySQL表结构 82 | mysql_cursor = mysql_conn.cursor() 83 | mysql_cursor.execute(f"SHOW KEYS FROM {mysql_database}.{table_name} WHERE Key_name = 'PRIMARY'") 84 | mysql_primary_key = [key[4] for key in mysql_cursor.fetchall()] 85 | 86 | mysql_cursor.execute(f"DESCRIBE {mysql_database}.{table_name}") 87 | mysql_columns = mysql_cursor.fetchall() 88 | 89 | # 创建ClickHouse表 90 | clickhouse_columns = [] 91 | if 'clickhouse_cluster_name' in globals() and TABLE_ENGINE == 'ReplicatedMergeTree': 92 | create_statement = f"CREATE TABLE IF NOT EXISTS {clickhouse_database}.{table_name} ON CLUSTER {clickhouse_cluster_name} (" 93 | else: 94 | create_statement = "CREATE TABLE IF NOT EXISTS " + clickhouse_database + "." + table_name + " (" 95 | 96 | for mysql_column in mysql_columns: 97 | column_name = mysql_column[0] 98 | column_type = mysql_column[1] 99 | 100 | # 转换字段类型 101 | clickhouse_type = convert_field_type(column_type) 102 | 103 | # 拼接SQL语句 104 | create_statement += f"{column_name} {clickhouse_type}," 105 | 106 | # 添加到columns列表中 107 | clickhouse_columns.append(column_name) 108 | 109 | # 设置主键 110 | primary_key_str = ",".join(mysql_primary_key) 111 | create_statement += f"PRIMARY KEY ({primary_key_str})" 112 | 113 | if TABLE_ENGINE == "MergeTree": 114 | # 设置存储引擎为 MergeTree 115 | create_statement += ") ENGINE = MergeTree ORDER BY " + ','.join(mysql_primary_key) 116 | else: 117 | # 设置存储引擎为 ReplicatedMergeTree 118 | create_statement += f") ENGINE = ReplicatedMergeTree('/clickhouse/tables/{{shard}}/{table_name}', '{{replica}}') ORDER BY " + ','.join(mysql_primary_key) 119 | # 双括号{{ }}来表示'{shard}'和'{replica}'是作为字符串文本插入的固定值。 120 | 121 | # 执行SQL语句 122 | try: 123 | clickhouse_cursor = clickhouse_conn.execute(create_statement) 124 | except Exception as e: 125 | logging.error(f"执行SQL语句失败:{create_statement}") 126 | # logging.error(f"错误信息:{e}") 127 | 128 | # 输出ClickHouse表结构 129 | # print(f"ClickHouse create statement: {create_statement}") 130 | 131 | 132 | def convert_mysql_database_to_clickhouse(mysql_conn, clickhouse_conn, excluded_databases=( 133 | "mysql", "sys", "information_schema", "performance_schema", "test")): 134 | """ 135 | 将MySQL实例下的所有数据库表结构转换为ClickHouse 136 | """ 137 | # 获取MySQL实例下所有数据库 138 | mysql_cursor = mysql_conn.cursor() 139 | mysql_cursor.execute("SHOW DATABASES") 140 | databases = mysql_cursor.fetchall() 141 | 142 | # 遍历所有数据库和其中的表进行转换,排除掉被过滤的数据库 143 | for database in databases: 144 | database_name = database[0] 145 | if database_name not in excluded_databases: 146 | # 创建相应的ClickHouse数据库,请确保ClickHouse的账户权限正确 147 | try: 148 | if 'clickhouse_cluster_name' in globals() and TABLE_ENGINE == 'ReplicatedMergeTree': 149 | create_database_statement = f"CREATE DATABASE IF NOT EXISTS {database_name} ON CLUSTER {clickhouse_cluster_name}" 150 | else: 151 | create_database_statement = f"CREATE DATABASE IF NOT EXISTS {database_name}" 152 | clickhouse_conn.execute(create_database_statement) 153 | except Exception as e: 154 | logging.error(f"创建ClickHouse数据库{database_name}失败:{e}") 155 | raise 156 | 157 | # 切换到当前数据库 158 | mysql_cursor.execute(f"USE {database_name}") 159 | 160 | # 获取当前数据库下的所有表 161 | mysql_cursor.execute("SHOW TABLES") 162 | tables = mysql_cursor.fetchall() 163 | 164 | # 遍历所有表进行转换 165 | for table in tables: 166 | table_name = table[0] 167 | convert_mysql_to_clickhouse(mysql_conn, database_name, table_name, clickhouse_conn, database_name) 168 | 169 | 170 | if __name__ == "__main__": 171 | # 连接MySQL数据库 172 | mysql_conn = pymysql.connect( 173 | host=MYSQL_HOST, 174 | port=MYSQL_PORT, 175 | user=MYSQL_USER, 176 | password=MYSQL_PASSWORD 177 | ) 178 | 179 | # 连接ClickHouse数据库 180 | clickhouse_conn = Client( 181 | host=CLICKHOUSE_HOST, 182 | port=CLICKHOUSE_PORT, 183 | user=CLICKHOUSE_USER, 184 | password=CLICKHOUSE_PASSWORD 185 | ) 186 | 187 | # 转化表结构(将MySQL实例下的所有库 转换到 ClickHouse实例的相应库下) 188 | convert_mysql_database_to_clickhouse(mysql_conn, clickhouse_conn) 189 | -------------------------------------------------------------------------------- /mysql_to_clickhouse_schema_test.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # MySQL表结构转换为ClickHouse表结构,该工具仅为单库单表测试使用 3 | 4 | import pymysql 5 | import re 6 | from clickhouse_driver import Client 7 | 8 | #################修改以下配置配置信息################# 9 | # MySQL数据库配置 10 | MYSQL_HOST = "192.168.198.239" 11 | MYSQL_PORT = 3336 12 | MYSQL_USER = "admin" 13 | MYSQL_PASSWORD = "hechunyang" 14 | MYSQL_DATABASE = "hcy" 15 | 16 | # ClickHouse数据库配置 17 | CLICKHOUSE_HOST = "192.168.176.204" 18 | CLICKHOUSE_PORT = 9000 19 | CLICKHOUSE_USER = "hechunyang" 20 | CLICKHOUSE_PASSWORD = "123456" 21 | CLICKHOUSE_DATABASE = "hcy" 22 | 23 | # 要操作的表名 24 | TABLE_NAME = "user" 25 | 26 | # 设置ClickHouse集群的名字,这样方便在所有节点上同时创建表引擎ReplicatedMergeTree 27 | # 通过select * from system.clusters命令查看集群的名字 28 | #clickhouse_cluster_name = "perftest_1shards_3replicas" 29 | 30 | # 设置表引擎 31 | TABLE_ENGINE = "MergeTree" 32 | #TABLE_ENGINE = "ReplicatedMergeTree" 33 | 34 | #################以下代码不用修改################# 35 | def convert_field_type(field_type): 36 | #print(field_type) 37 | """ 38 | 将MySQL字段类型转换为ClickHouse字段类型 39 | """ 40 | field_type = field_type.split()[0] 41 | if "tinyint" in field_type: 42 | return "Int8" 43 | elif "smallint" in field_type: 44 | return "Int16" 45 | elif "mediumint" in field_type: 46 | return "Int32" 47 | elif field_type.startswith("int"): 48 | return "Int32" 49 | elif field_type.startswith("bigint"): 50 | return "Int64" 51 | elif "float" in field_type: 52 | return "Float32" 53 | elif "double" in field_type or "numeric" in field_type: 54 | return "Float64" 55 | elif "decimal" in field_type: 56 | precision_scale = re.search(r'\((.*?)\)', field_type).group(1) 57 | precision, scale = precision_scale.split(',') 58 | return f"Decimal({precision}, {scale})" 59 | elif "datetime" in field_type or "timestamp" in field_type or "date" in field_type: 60 | return "DateTime" 61 | elif "char" in field_type or "varchar" in field_type or "text" in field_type or "enum" in field_type or "set" in field_type: 62 | return "String" 63 | elif "bit" in field_type: 64 | return "UInt8" 65 | elif "time" in field_type: 66 | return "FixedString(8)" 67 | elif "blob" in field_type: 68 | return "String" 69 | elif "varbinary" in field_type: 70 | return "String" 71 | elif field_type.startswith("bit"): 72 | return "UInt8" 73 | else: 74 | raise ValueError(f"无法转化未知 MySQL 字段类型:{field_type}") 75 | 76 | def convert_mysql_to_clickhouse(mysql_conn, mysql_database, mysql_table, clickhouse_conn, clickhouse_database): 77 | """ 78 | 将MySQL表结构转换为ClickHouse 79 | """ 80 | # 获取MySQL表结构 81 | mysql_cursor = mysql_conn.cursor() 82 | mysql_cursor.execute(f"SHOW KEYS FROM {mysql_database}.{mysql_table} WHERE Key_name = 'PRIMARY'") 83 | mysql_primary_key = [key[4] for key in mysql_cursor.fetchall()] 84 | 85 | mysql_cursor.execute(f"DESCRIBE {mysql_database}.{mysql_table}") 86 | mysql_columns = mysql_cursor.fetchall() 87 | 88 | # 创建ClickHouse表 89 | clickhouse_columns = [] 90 | #create_statement = "CREATE TABLE IF NOT EXISTS " + clickhouse_database + "." + mysql_table + " (" 91 | if 'clickhouse_cluster_name' in globals() and TABLE_ENGINE == 'ReplicatedMergeTree': 92 | create_statement = f"CREATE TABLE IF NOT EXISTS {clickhouse_database}.{mysql_table} ON CLUSTER {clickhouse_cluster_name} (" 93 | else: 94 | create_statement = "CREATE TABLE IF NOT EXISTS " + clickhouse_database + "." + mysql_table + " (" 95 | for mysql_column in mysql_columns: 96 | column_name = mysql_column[0] 97 | column_type = mysql_column[1] 98 | 99 | # 转换字段类型 100 | #print(f"column_type: {column_type}") 101 | clickhouse_type = convert_field_type(column_type) 102 | 103 | # 拼接SQL语句 104 | create_statement += f"{column_name} {clickhouse_type}," 105 | 106 | # 添加到columns列表中 107 | clickhouse_columns.append(column_name) 108 | 109 | # 设置主键 110 | primary_key_str = ",".join(mysql_primary_key) 111 | create_statement += f"PRIMARY KEY ({primary_key_str})" 112 | 113 | if TABLE_ENGINE == "MergeTree": 114 | # 设置存储引擎为 MergeTree 115 | create_statement += ") ENGINE = MergeTree ORDER BY " + ','.join(mysql_primary_key) 116 | else: 117 | # 设置存储引擎为 ReplicatedMergeTree 118 | create_statement += f") ENGINE = ReplicatedMergeTree('/clickhouse/tables/{{shard}}/{mysql_table}', '{{replica}}') ORDER BY " + ','.join(mysql_primary_key) 119 | # 双括号{{ }}来表示'{shard}'和'{replica}'是作为字符串文本插入的固定值。 120 | 121 | # 执行SQL语句 122 | try: 123 | clickhouse_cursor = clickhouse_conn.execute(create_statement) 124 | except Exception as e: 125 | print(f"执行SQL语句失败:{create_statement}") 126 | print(f"错误信息:{e}") 127 | 128 | # 输出ClickHouse表结构 129 | print(f"ClickHouse create statement: {create_statement}") 130 | 131 | if __name__ == "__main__": 132 | # 连接MySQL数据库 133 | mysql_conn = pymysql.connect( 134 | host=MYSQL_HOST, 135 | port=MYSQL_PORT, 136 | user=MYSQL_USER, 137 | password=MYSQL_PASSWORD, 138 | database=MYSQL_DATABASE 139 | ) 140 | 141 | # 连接ClickHouse数据库 142 | clickhouse_conn = Client( 143 | host=CLICKHOUSE_HOST, 144 | port=CLICKHOUSE_PORT, 145 | user=CLICKHOUSE_USER, 146 | password=CLICKHOUSE_PASSWORD, 147 | database=CLICKHOUSE_DATABASE 148 | ) 149 | 150 | # 转化表结构(将MySQL的hcy库的user表 转换为 ClickHouse的hcy库的user表) 151 | convert_mysql_to_clickhouse(mysql_conn, MYSQL_DATABASE, TABLE_NAME, clickhouse_conn, CLICKHOUSE_DATABASE) 152 | 153 | -------------------------------------------------------------------------------- /mysql_to_clickhouse_sync_pagination.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # MySQL全量数据导入到ClickHouse里,默认并行10张表同时导出数据,每次轮询取1000条数据。 3 | # 使用条件:表可以没有自增主键,测试环境MySQL 8.0 4 | """ 5 | shell> python3 mysql_to_clickhouse_sync.py --mysql_host 192.168.198.239 --mysql_port 3336 --mysql_user admin 6 | --mysql_password hechunyang --mysql_db hcy --clickhouse_host 192.168.176.204 7 | --clickhouse_port 9000 --clickhouse_user hechunyang --clickhouse_password 123456 8 | --clickhouse_database hcy --batch_size 1000 --max_workers 10 --exclude_tables "^table1" --include_tables "table2$" 9 | """ 10 | 11 | import argparse 12 | import pymysql.cursors 13 | from clickhouse_driver import Client 14 | from concurrent.futures import ThreadPoolExecutor 15 | import concurrent.futures 16 | import datetime 17 | import decimal 18 | import logging 19 | import sys 20 | import re 21 | 22 | 23 | # 创建日志记录器,将日志写入文件和控制台 24 | logger = logging.getLogger(__name__) 25 | logger.setLevel(logging.INFO) 26 | 27 | formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') 28 | 29 | file_handler = logging.FileHandler('sync.log') 30 | file_handler.setLevel(logging.INFO) 31 | file_handler.setFormatter(formatter) 32 | 33 | stream_handler = logging.StreamHandler(sys.stdout) 34 | stream_handler.setLevel(logging.DEBUG) 35 | stream_handler.setFormatter(formatter) 36 | 37 | logger.addHandler(file_handler) 38 | logger.addHandler(stream_handler) 39 | 40 | def read_from_mysql(table_name, start_id, end_id, mysql_config): 41 | mysql_connection = pymysql.connect(**mysql_config, autocommit=False, cursorclass=pymysql.cursors.DictCursor) 42 | try: 43 | with mysql_connection.cursor() as cursor: 44 | query = "SELECT * FROM {} WHERE _rowid >= {} AND _rowid <= {}".format(table_name, start_id, end_id) 45 | cursor.execute(query) 46 | results = cursor.fetchall() 47 | return results 48 | except Exception as e: 49 | logger.error(e) 50 | return [] 51 | 52 | def has_auto_increment(table_name, mysql_config): 53 | mysql_connection = pymysql.connect(**mysql_config, autocommit=False, cursorclass=pymysql.cursors.DictCursor) 54 | try: 55 | with mysql_connection.cursor() as cursor: 56 | query = f"SHOW COLUMNS FROM {table_name} WHERE `Key` = 'PRI' AND `Extra` = 'auto_increment'" 57 | cursor.execute(query) 58 | result = cursor.fetchone() 59 | return result is not None 60 | except Exception as e: 61 | logger.error(e) 62 | return False 63 | 64 | def read_from_mysql_with_limit(table_name, offset, limit, mysql_config): 65 | mysql_connection = pymysql.connect(**mysql_config, autocommit=False, cursorclass=pymysql.cursors.DictCursor) 66 | try: 67 | with mysql_connection.cursor() as cursor: 68 | query = "SELECT * FROM {} LIMIT {}, {}".format(table_name, offset, limit) 69 | cursor.execute(query) 70 | results = cursor.fetchall() 71 | return results 72 | except Exception as e: 73 | logger.error(e) 74 | return [] 75 | 76 | def insert_into_clickhouse(table_name, records, clickhouse_config): 77 | clickhouse_client = Client(**clickhouse_config) 78 | query = '' # 初始化查询语句 79 | try: 80 | column_names = list(records[0].keys()) 81 | values_list = [] 82 | for record in records: 83 | values = [] 84 | for column_name in column_names: 85 | value = record[column_name] 86 | if isinstance(value, str): 87 | value = value.replace("'", "''") 88 | values.append(f"'{value}'") 89 | elif isinstance(value, datetime.datetime) or isinstance(value, datetime.date): 90 | values.append(f"'{value}'") 91 | elif value is None: 92 | values.append("NULL") 93 | elif isinstance(value, (int, float)): 94 | values.append(str(value)) 95 | elif isinstance(value, decimal.Decimal): 96 | values.append(str(value)) 97 | else: 98 | values.append(f"'{str(value)}'") 99 | values_list.append(f"({','.join(values)})") 100 | query = f"INSERT INTO {table_name} ({','.join(column_names)}) VALUES {','.join(values_list)}" 101 | """ 102 | 在使用 ClickHouse 数据库的 decimal 类型时,当将值 '4.00' 插入数据库时,会直接截断为 '4',小数部分 '.00' 被移除。 103 | https://github.com/ClickHouse/ClickHouse/issues/51358 104 | https://github.com/ClickHouse/ClickHouse/issues/39153 105 | """ 106 | # 解决方案 107 | clickhouse_client.execute("set output_format_decimal_trailing_zeros=1") 108 | clickhouse_client.execute(query) 109 | ###调试使用 110 | ###logger.info(f"执行的SQL是:{query}") 111 | except Exception as e: 112 | logger.error(f"Error SQL query: {query}") # 记录错误的SQL语句 113 | logger.error(f"Error inserting records into ClickHouse: {e}") 114 | finally: 115 | clickhouse_client.disconnect() 116 | 117 | def worker(table_name, table_bounds, mysql_config, clickhouse_config, batch_size, max_workers): 118 | min_id, max_id = table_bounds[table_name] 119 | if min_id == max_id and min_id != 0: # 如果表只有一条记录,则直接处理 120 | records = read_from_mysql(table_name, min_id, max_id + 1, mysql_config) 121 | print(f"Retrieved {len(records)} record from MySQL table {table_name} with ID {min_id}") 122 | if len(records) > 0: 123 | insert_into_clickhouse(table_name, records, clickhouse_config) 124 | return 125 | 126 | row_count = 0 127 | if min_id != 0: 128 | row_count = max_id - min_id + 1 129 | 130 | if row_count <= 1000 or not has_auto_increment(table_name, mysql_config): # 如果行数小于等于 1000 或者没有自增主键,则使用LIMIT查询: 131 | # 如果行数小于等于 1000,则将批处理大小设置为行数 132 | #batch_size = row_count 133 | batch_size = 1000 134 | offset = 0 135 | while True: 136 | records = read_from_mysql_with_limit(table_name, offset, batch_size, mysql_config) 137 | print(f"Retrieved {len(records)} records from MySQL table {table_name} with LIMIT {offset}, {batch_size}") 138 | if len(records) == 0: 139 | break 140 | if len(records) > 0: 141 | insert_into_clickhouse(table_name, records, clickhouse_config) 142 | offset += batch_size 143 | else: 144 | batch_size = batch_size 145 | with ThreadPoolExecutor(max_workers=max_workers) as executor: 146 | for start_id in range(min_id, max_id, batch_size): 147 | end_id = start_id + batch_size - 1 148 | if end_id > max_id: 149 | end_id = max_id 150 | records = read_from_mysql(table_name, start_id, end_id, mysql_config) 151 | print(f"Retrieved {len(records)} records from MySQL table {table_name} between ID {start_id} and {end_id}") 152 | if len(records) > 0: 153 | executor.submit(insert_into_clickhouse, table_name, records, clickhouse_config) 154 | 155 | def table_iterator(tables): 156 | while True: 157 | for table_name in tables: 158 | yield table_name 159 | 160 | def main(args): 161 | mysql_config = { 162 | 'host': args.mysql_host, 163 | 'port': args.mysql_port, 164 | 'user': args.mysql_user, 165 | 'password': args.mysql_password, 166 | 'db': args.mysql_db, 167 | 'charset': 'utf8mb4', 168 | 'connect_timeout': 60, 169 | 'read_timeout': 60 170 | } 171 | 172 | clickhouse_config = { 173 | 'host': args.clickhouse_host, 174 | 'port': args.clickhouse_port, 175 | 'user': args.clickhouse_user, 176 | 'password': args.clickhouse_password, 177 | 'database': args.clickhouse_database 178 | } 179 | 180 | exclude_pattern = re.compile(args.exclude_tables) if args.exclude_tables else None 181 | include_pattern = re.compile(args.include_tables) if args.include_tables else None 182 | 183 | completed_tasks = 0 # 已完成的任务数 184 | 185 | mysql_connection = pymysql.connect(**mysql_config, autocommit=False, cursorclass=pymysql.cursors.DictCursor) 186 | mysql_connection.begin() 187 | try: 188 | with mysql_connection.cursor() as cursor: 189 | cursor.execute("FLUSH TABLES WITH READ LOCK") 190 | cursor.execute("SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ") 191 | cursor.execute("START TRANSACTION WITH CONSISTENT SNAPSHOT") # 设置一致性快照 192 | cursor.execute("SHOW TABLES") 193 | result = cursor.fetchall() 194 | #tables = [val for d in result for val in d.values()] 195 | tables = [val for d in result for val in d.values() if (not exclude_pattern or not exclude_pattern.search(val)) 196 | and (not include_pattern or include_pattern.search(val))] 197 | table_bounds = {} 198 | for table_name in tables: 199 | try: 200 | cursor.execute("SHOW COLUMNS FROM `{}` WHERE Extra = 'auto_increment'".format(table_name)) 201 | has_auto_increment = bool(cursor.fetchone()) 202 | if not has_auto_increment: 203 | # 如果表没有自增主键,则调用read_from_mysql_with_limit函数读取数据 204 | table_bounds[table_name] = (0, 0) 205 | else: 206 | cursor.execute( 207 | "SELECT IFNULL(MIN(_rowid), 0) AS `MIN(id)`, IFNULL(MAX(_rowid), 0) AS `MAX(id)` FROM `{}`".format( 208 | table_name)) 209 | row = cursor.fetchone() 210 | min_id, max_id = row['MIN(id)'], row['MAX(id)'] 211 | table_bounds[table_name] = (min_id, max_id) 212 | except pymysql.Error as err: 213 | error_message = str(err) 214 | if "_rowid" in error_message or "Unknown column" in error_message: 215 | logger.error("表 {} 缺少主键自增 ID".format(table_name)) 216 | else: 217 | logger.error("执行查询时出现错误: {}".format(error_message)) 218 | 219 | cursor.execute("SHOW MASTER STATUS") # 获取当前的binlog文件名和位置点信息 220 | binlog_row = cursor.fetchone() 221 | binlog_file, binlog_position, gtid = binlog_row['File'], binlog_row['Position'], binlog_row['Executed_Gtid_Set'] 222 | 223 | # 将binlog文件名、位置点和GTID信息保存到metadata.txt文件中 224 | with open('metadata.txt', 'w') as f: 225 | f.write('{}\n{}\n{}'.format(binlog_file, binlog_position, gtid)) 226 | 227 | cursor.execute("UNLOCK TABLES") 228 | 229 | except Exception as e: 230 | logger.error(e) 231 | 232 | tables = table_bounds.keys() 233 | table_iter = table_iterator(tables) 234 | 235 | # 并发十张表同时导入数据 236 | with ThreadPoolExecutor(max_workers=args.max_workers) as executor: 237 | task_list = [] 238 | for _ in range(len(tables)): 239 | table_name = next(table_iter) 240 | if (exclude_pattern and exclude_pattern.search(table_name)) or (include_pattern and not include_pattern.search(table_name)): 241 | continue 242 | task = executor.submit(worker, table_name, table_bounds, mysql_config, clickhouse_config, args.batch_size, args.max_workers) 243 | task_list.append(task) 244 | 245 | # 循环处理任何一个已完成的任务,并执行后续操作,直到所有任务都完成 246 | while completed_tasks < len(tables): 247 | done, _ = concurrent.futures.wait(task_list, return_when=concurrent.futures.FIRST_COMPLETED) 248 | for future in done: 249 | try: 250 | future.result() # 获取已完成任务的结果(如果有异常会抛出异常) 251 | except Exception as e: 252 | logger.error(e) 253 | 254 | completed_tasks += 1 # 更新已完成的任务数 255 | 256 | # 从任务列表中移除已完成的任务 257 | task_list = [task for task in task_list if not task.done()] 258 | 259 | # 动态创建新的任务,直到达到总任务数 260 | while len(task_list) < len(tables) and completed_tasks < len(tables): # 动态创建新的任务 261 | table_name = list(tables)[len(task_list)] 262 | task = executor.submit(worker, table_name, table_bounds, mysql_config, clickhouse_config, args.batch_size, args.max_workers) 263 | task_list.append(task) 264 | 265 | # 所有任务都完成后执行其他操作 266 | logger.info("All tasks completed.") 267 | 268 | def parse_args(): 269 | parser = argparse.ArgumentParser(description='MySQL to ClickHouse data synchronization') 270 | parser.add_argument('--mysql_host', type=str, required=True, help='MySQL host') 271 | parser.add_argument('--mysql_port', type=int, required=True, help='MySQL port') 272 | parser.add_argument('--mysql_user', type=str, required=True, help='MySQL username') 273 | parser.add_argument('--mysql_password', type=str, required=True, help='MySQL password') 274 | parser.add_argument('--mysql_db', type=str, required=True, help='MySQL database') 275 | parser.add_argument('--clickhouse_host', type=str, required=True, help='ClickHouse host') 276 | parser.add_argument('--clickhouse_port', type=int, required=True, help='ClickHouse port') 277 | parser.add_argument('--clickhouse_user', type=str, required=True, help='ClickHouse username') 278 | parser.add_argument('--clickhouse_password', type=str, required=True, help='ClickHouse password') 279 | parser.add_argument('--clickhouse_database', type=str, required=True, help='ClickHouse database') 280 | parser.add_argument('--batch_size', type=int, default=1000, help='Batch size for data import (default: 1000)') 281 | parser.add_argument('--max_workers', type=int, default=10, help='Maximum number of worker threads (default: 10)') 282 | parser.add_argument('--exclude_tables', type=str, default='', help='Tables to exclude (regular expression)') 283 | parser.add_argument('--include_tables', type=str, default='', help='Tables to include (regular expression)') 284 | return parser.parse_args() 285 | 286 | if __name__ == '__main__': 287 | args = parse_args() 288 | main(args) 289 | 290 | 291 | 292 | --------------------------------------------------------------------------------