├── README.md └── pt-archiver.php /README.md: -------------------------------------------------------------------------------- 1 | Percona pt-archiver重构版--大表数据归档工具 2 | 3 | 相信很多小伙伴们,在日常对接开发时,有很多大表在业务上并没有采取任何形式的切分,数据不停地往一张表里灌入,迟早有一天,磁盘空间报警。作为一个DBA,侧重点是对数据库的操作性能(大表增加字段/索引,QPS等)和存储容量加以考虑,我们会建议开发对数据库里的大表进行数据归档处理,例如将3个月内的订单表保留在当前表,历史数据切分后保存在归档表中,之后归档表从主库上移走以便腾出磁盘空间,并将其迁移至备份机中(有条件的可以将其转换为TokuDB引擎),以便提供大数据部门抽取至HDFS上。 4 | 5 | #### 一张大表,我们姑且说1亿条记录,原表我要保存近7天的数据。Percona pt-archiver工具是这样做的,逐条把历史数据insert到归档表,同时删除原表数据。7天数据比如说只有10万行,那么原表会直接删除9990万行记录,操作成本太高,固需要考虑重构。 6 | 7 | ### 重构版是这样做的,提取你要保留的7天数据至临时表,然后老表和临时表交换名字,这样大大缩减了可用时间。 8 | -------------------------------------------------------------------------------------------------------------------------------- 9 | ![归档工具架构图](https://github.com/user-attachments/assets/b257dbad-c0fc-4e43-92bc-4194cf6b0b92) 10 | 11 | 具体的工作原理: 12 | 13 | 1、如果表有触发器、或者表有外键、或者表没有主键、或者binlog_format设置的值不是ROW格式,工具将直接退出,不予执行。 14 | 15 | 2、创建一个归档临时表和原表一样的空表结构。 16 | 17 | ```CREATE TABLE IF NOT EXISTS ${mysql_table}_tmp like ${mysql_table};``` 18 | 19 | 20 | 3、在原表上创建增,删,改三个触发器将数据拷贝的过程中,原表产生的数据变更更新到临时表里。 21 | 22 | ``` 23 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_insert; 24 | 25 | CREATE TRIGGER pt_archiver_${mysql_database}_${mysql_table}_insert AFTER INSERT 26 | ON ${mysql_table} FOR EACH ROW 27 | REPLACE INTO ${mysql_database}.${mysql_table}_tmp ($column) VALUES ($new_column); 28 | 29 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_update; 30 | 31 | CREATE TRIGGER pt_archiver_${mysql_database}_${mysql_table}_update AFTER UPDATE 32 | ON ${mysql_table} FOR EACH ROW 33 | REPLACE INTO ${mysql_database}.${mysql_table}_tmp ($column) VALUES ($new_column); 34 | 35 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_delete; 36 | 37 | CREATE TRIGGER pt_archiver_${mysql_database}_${mysql_table}_delete AFTER DELETE 38 | ON ${mysql_table} FOR EACH ROW 39 | DELETE IGNORE FROM ${mysql_database}.${mysql_table}_tmp 40 | WHERE ${mysql_database}.${mysql_table}_tmp.id <=> OLD.id; 41 | ``` 42 | 43 | 这三个触发器分别对应于INSERT、UPDATE、DELETE三种操作: 44 | 45 | (1)INSERT操作,所有的INSERT INTO转换为REPLACE INTO,当有新的记录插入到原表时,如果触发器还未把该记录同步到临时表,而这条记录之前因某种原因已经存在了,那么我们就可以利用REPLACE INTO进行覆盖,这样数据也是一致的; 46 | 47 | (2)UPDATE操作,所有的UPDATE也转换为REPLACE INTO,如果临时表不存在原表更新的该记录,那么我们就直接插入该条记录;如果该记录已经同步到临时表了,那么直接进行覆盖插入即可,所有数据与原表也是一致的; 48 | 49 | (3)DELETE操作,原表有删除操作,会触发至临时表执行删除。如果删除的记录还未同步到临时表,那么可以不在临时表执行,因为原表中该行的数据已经被删除了,这样数据也是一致的。 50 | 51 | 4、拷贝原表数据到临时表(默认1000条一批次插入并休眠1秒) 52 | 53 | ``` 54 | INSERT LOW_PRIORITY IGNORE INTO ${mysql_database}.${mysql_table}_tmp 55 | SELECT * FROM ${mysql_database}.${mysql_table} WHERE id>=".$begin_Id." 56 | AND id<".($begin_Id=$begin_Id+$limit_chunk)." LOCK IN SHARE MODE; 57 | ``` 58 | 59 | 通过主键id(主键名字可以是非id)进行范围查找,分批次控制插入行数,已减少对原表的锁定时间(读锁/共享锁)---将大事务拆分成若干块小事务,如果临时表已经存在该记录将会忽略插入,并且在数据导入时,我们能通过sleep参数控制休眠时间,以减少对磁盘IO的冲击。 60 | 61 | 5、Rename原表为_bak,临时表Rename为原表,名字互换。 62 | 63 | ```RENAME TABLE ${mysql_table} to ${mysql_table}_bak, ${mysql_table}_tmp to ${mysql_table};``` 64 | 65 | 执行表改名字,会加table metadata lock元数据表锁,但基本是瞬间结束,故对线上影响不大。 66 | 67 | 6、删除原表上的三个触发器。 68 | 69 | ``` 70 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_insert; 71 | 72 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_update; 73 | 74 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_delete; 75 | ``` 76 | 77 | 至此全部过程结束,类似pt-osc原理。 78 | 79 | 注:考虑到删库跑路等安全性问题,工具没有对原表进行任何删除归档数据的操作。 80 | -------------------------------------------------------------------------------------------------------------------------------- 81 | 82 | # 视频演示 83 | https://www.douyin.com/video/7299863080652049715 84 | 85 | # 工具使用 86 | #### 1) PHP环境安装(Centos 7系统) 87 | ``` 88 | shell> yum install php php-mysqlnd -y 89 | ``` 90 | 91 | #### 2) pt-archiver.php执行 92 | ``` 93 | shell> php pt-archiver.php -h 192.168.0.10 -u admin -p "123456" -d test -P 3306 -t sbtest1 -w "id>=99900000" --limit 10000 --sleep 1 94 | ``` 95 | ``` 96 | 注:-w 过滤条件,例如"update_time >= DATE_FORMAT(DATE_SUB(now(),interval 10 day),'%Y-%m-%d')" 97 | 98 | --limit 分批次插入,默认一批插入10000行 99 | 100 | --sleep 每次插完1000行休眠1秒 101 | ``` 102 | 103 | # 有网友反馈5.7环境有问题,请执行下面的2条语句重跑即可。 104 | ``` 105 | mysql> set global show_compatibility_56=on; 106 | mysql> set global sql_mode=''; 107 | ``` 108 | --------------------------------------------------------------------------------------------- 109 | ### 如果对原表进行删除归档数据,可以借助原生工具 pt-archiver 进行分批缓慢删除。 110 | ``` 111 | shell> pt-archiver --source h=127.0.0.1,P=3306,u=admin,p='hechunyang',D=test,t=sbtest1 --purge --charset=utf8 --where "id <= 500000" --progress=200 --limit=200 --sleep=1 --txn-size=200 --statistics 112 | ``` 113 | 解释:删除test库,sbtest1表数据,字符集为utf8,删除条件是 id <= 5000000,每次取出200行进行处理,每处理200行则进行一次提交,每完成一次处理休眠1秒。 114 | -------------------------------------------------------------------------------- /pt-archiver.php: -------------------------------------------------------------------------------- 1 | -u <用户名> -p <密码> -d <数据库名> -P <端口> -t <表名> -w <过滤条件> --limit <分批次插入数量> --sleep <插入后的休眠时间>\n"; 14 | exit; 15 | } 16 | 17 | // 检查版本号参数 18 | if (isset($options['v'])) { 19 | echo "更新时间:2023-11-09 版本号: 1.1.1\n"; 20 | exit; 21 | } 22 | 23 | // 检查必选参数 24 | if (!isset($options['h'], $options['u'], $options['p'], $options['d'], $options['P'], $options['t'], $options['w'], $options['limit'], $options['sleep'])) { 25 | die("缺少参数,请使用 --help 选项查看使用说明。\n"); 26 | } 27 | 28 | $mysql_server = $options['h']; 29 | $mysql_username = $options['u']; 30 | $mysql_password = $options['p']; 31 | $mysql_database = $options['d']; 32 | $mysql_port = $options['P']; 33 | $mysql_table = $options['t']; 34 | $where_column = $options['w']; 35 | $limit_chunk = $options['limit']; 36 | $insert_sleep = $options['sleep']; 37 | 38 | ############################################### 39 | 40 | 41 | ######下面的代码不用更改!!!###### 42 | //########################################################// 43 | ini_set('date.timezone','Asia/Shanghai'); 44 | header("Content-type:text/html;charset=utf-8;"); 45 | $old_c=array(); 46 | $new_c=array(); 47 | 48 | $conn=mysqli_connect($mysql_server,$mysql_username,$mysql_password,$mysql_database,$mysql_port) or die("error connecting"); 49 | 50 | if (!$conn){ 51 | die("连接错误: " . mysqli_connect_error()); 52 | } 53 | 54 | mysqli_query($conn,"set names 'utf8'"); 55 | 56 | $check_trigger_sql = "SELECT * FROM information_schema.TRIGGERS WHERE TRIGGER_SCHEMA = '".$mysql_database."' AND TRIGGER_NAME like 'pt_archiver%';"; 57 | 58 | $query_trigger=mysqli_query($conn,$check_trigger_sql); 59 | 60 | if(mysqli_affected_rows($conn)>0){ 61 | die("检测到表已有触发器,退出主程序。". PHP_EOL); 62 | } 63 | 64 | $check_primary_key_id = "SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = '".$mysql_database."' AND TABLE_NAME = '".$mysql_table."' AND COLUMN_KEY = 'PRI'"; 65 | 66 | $query_pri=mysqli_query($conn,$check_primary_key_id); 67 | 68 | if(mysqli_affected_rows($conn)<=0){ 69 | die("检测到表没有主键,退出主程序。". PHP_EOL); 70 | } 71 | 72 | $check_foreign_key = "SELECT TABLE_NAME,REFERENCED_TABLE_NAME FROM information_schema.KEY_COLUMN_USAGE WHERE TABLE_SCHEMA = '${mysql_database}' AND REFERENCED_TABLE_NAME IS NOT NULL"; 73 | 74 | $query_fk=mysqli_query($conn,$check_foreign_key); 75 | 76 | while($row_fk = mysqli_fetch_array($query_fk)) { 77 | if($row_fk[0] == $mysql_table) { 78 | echo "检测到子表含有外键,子表是:" . $row_fk[0] . ",他的父表是:".$row_fk[1]."。退出主程序。". PHP_EOL; 79 | exit; 80 | } 81 | if($row_fk[1] == $mysql_table) { 82 | echo "检测到父表含有外键,父表是:" . $row_fk[1] . ",他的子表是:".$row_fk[0]."。退出主程序。". PHP_EOL; 83 | exit; 84 | } 85 | } 86 | 87 | //$check_binlog_format = "SELECT VARIABLE_NAME,VARIABLE_VALUE FROM information_schema.GLOBAL_VARIABLES WHERE VARIABLE_NAME = 'BINLOG_FORMAT' AND VARIABLE_VALUE = 'ROW'"; 88 | //兼容MySQL 8.0 89 | $check_binlog_format = "show variables where Variable_name = 'BINLOG_FORMAT' and Value = 'ROW'"; 90 | 91 | $query_binlog_format=mysqli_query($conn,$check_binlog_format); 92 | 93 | if(mysqli_affected_rows($conn)<=0){ 94 | die("检测到binlog_format设置的值不是ROW格式,退出主程序。". PHP_EOL); 95 | } 96 | 97 | ######----------------------------------------------------------------------###### 98 | $sql_create_tmp = "create table IF NOT EXISTS ${mysql_table}_tmp like ${mysql_table}"; 99 | $result1 = mysqli_query($conn,$sql_create_tmp); 100 | 101 | if ($result1) { 102 | echo "${mysql_table}_tmp临时表创建成功" . PHP_EOL; 103 | } else { 104 | die("${mysql_table}_tmp临时表创建失败" . PHP_EOL); 105 | } 106 | 107 | 108 | $sql_get_column = "SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = '".$mysql_database."' AND TABLE_NAME = '".$mysql_table."'"; 109 | $result2 = mysqli_query($conn,$sql_get_column); 110 | 111 | 112 | while($row = mysqli_fetch_array($result2)){ 113 | array_push($old_c,$row[0]); 114 | array_push($new_c,"NEW.".$row[0]); 115 | } 116 | 117 | 118 | $column=join(",",$old_c); 119 | $new_column=join(",",$new_c); 120 | 121 | 122 | $trigger= " 123 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_insert; 124 | CREATE TRIGGER pt_archiver_${mysql_database}_${mysql_table}_insert AFTER INSERT 125 | ON ${mysql_table} FOR EACH ROW 126 | REPLACE INTO ${mysql_database}.${mysql_table}_tmp ($column) VALUES ($new_column); 127 | 128 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_update; 129 | CREATE TRIGGER pt_archiver_${mysql_database}_${mysql_table}_update AFTER UPDATE 130 | ON ${mysql_table} FOR EACH ROW 131 | REPLACE INTO ${mysql_database}.${mysql_table}_tmp ($column) VALUES ($new_column); 132 | 133 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_delete; 134 | CREATE TRIGGER pt_archiver_${mysql_database}_${mysql_table}_delete AFTER DELETE 135 | ON ${mysql_table} FOR EACH ROW 136 | DELETE IGNORE FROM ${mysql_database}.${mysql_table}_tmp WHERE ${mysql_database}.${mysql_table}_tmp._rowid <=> OLD._rowid; 137 | "; 138 | 139 | echo "$trigger". PHP_EOL; 140 | 141 | if (mysqli_multi_query($conn, $trigger)) { 142 | do { 143 | if ($result = mysqli_store_result($conn)) { 144 | while ($row = mysqli_fetch_row($result)) { 145 | } 146 | mysqli_free_result($result); 147 | } 148 | if (mysqli_more_results($conn)) { 149 | } 150 | } while (mysqli_next_result($conn)); 151 | echo "${mysql_table}表触发器创建成功" . PHP_EOL; 152 | } 153 | else{ 154 | die("${mysql_table}表触发器创建失败" .mysqli_error($conn) . PHP_EOL); 155 | } 156 | 157 | 158 | //抽取历史数据到临时表 159 | 160 | $sql_get_Id = "SELECT _rowid,(SELECT max(_rowid) FROM ${mysql_database}.${mysql_table}) AS max_rowid FROM ${mysql_database}.${mysql_table} WHERE ${where_column} order by _rowid asc LIMIT 1"; 161 | echo $sql_get_Id . PHP_EOL; 162 | 163 | $result3 = mysqli_query($conn,$sql_get_Id); 164 | while($row1 = mysqli_fetch_array($result3)){ 165 | $begin_Id=number_format($row1['0'] ,0 ,'' ,''); //防止转换为科学计数法 166 | $max_Id=number_format($row1['1'] ,0 ,'' ,''); //防止转换为科学计数法 167 | } 168 | while(1==1){ 169 | $insert_select_tmp = "INSERT LOW_PRIORITY IGNORE INTO ${mysql_database}.${mysql_table}_tmp SELECT * FROM ${mysql_database}.${mysql_table} WHERE ${where_column} AND (_rowid>=".$begin_Id." AND _rowid<".($begin_Id=$begin_Id+$limit_chunk).") LOCK IN SHARE MODE "; 170 | echo $insert_select_tmp . PHP_EOL; 171 | 172 | mysqli_query($conn,"SET tx_isolation = 'REPEATABLE-READ'"); 173 | 174 | $result4 = mysqli_query($conn,$insert_select_tmp); 175 | 176 | echo "". PHP_EOL; 177 | echo "插入行数是: " . mysqli_affected_rows($conn) . PHP_EOL; 178 | 179 | if ($result4) { 180 | if(mysqli_affected_rows($conn)>=1){ 181 | echo "${mysql_table}_tmp临时表插入成功" . PHP_EOL; 182 | sleep($insert_sleep); 183 | } 184 | else if($begin_Id<$max_Id){ 185 | continue; 186 | } 187 | else{ 188 | $exec_sql="RENAME TABLE ${mysql_table} to ${mysql_table}_bak_".date('Ymd').", ${mysql_table}_tmp to ${mysql_table}; 189 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_insert; 190 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_update; 191 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_delete;"; 192 | if (mysqli_multi_query($conn, $exec_sql)) { 193 | do { 194 | if ($result5 = mysqli_store_result($conn)) { 195 | while ($row2 = mysqli_fetch_row($result5)) { 196 | } 197 | mysqli_free_result($result5); 198 | } 199 | if (mysqli_more_results($conn)) { 200 | } 201 | } while (mysqli_next_result($conn)); 202 | 203 | echo "${mysql_table}表归档成功" . PHP_EOL; 204 | break; 205 | } 206 | else{ 207 | echo mysqli_error($conn) . PHP_EOL; 208 | //脚本失败的时候,触发器自动删除------------------------------ 209 | $drop_trigger="DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_insert; 210 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_update; 211 | DROP TRIGGER IF EXISTS pt_archiver_${mysql_database}_${mysql_table}_delete;"; 212 | if (mysqli_multi_query($conn, $drop_trigger)) { 213 | do { 214 | if ($result6 = mysqli_store_result($conn)) { 215 | while ($row2 = mysqli_fetch_row($result6)) { 216 | } 217 | mysqli_free_result($result6); 218 | } 219 | } while (mysqli_next_result($conn)); 220 | } 221 | //------------------------------------------------------------ 222 | die("${mysql_table}表归档失败" .mysqli_error($conn) . PHP_EOL); 223 | } 224 | } 225 | } 226 | else{ 227 | die("${mysql_table}_tmp临时表插入失败 " .mysqli_error($conn). PHP_EOL); 228 | } 229 | } 230 | 231 | mysqli_close($conn); 232 | 233 | ?> 234 | --------------------------------------------------------------------------------