├── InnoDB ├── 001-InnoDB简介.md ├── InnoDB-page-merging-and-page-splitting.md ├── innodb-alter-table-add-index-and-insert-performance.md └── mysql-innodb-sorted-index-builds.md ├── README.md ├── contribution ├── 20180410-用了并行复制居然还有延迟.md ├── 20180424-关于MySQL 8.0几个重点,都在这里.md ├── 20180501-whats-new-in-mysql-8-0-GA.md └── 20180609-浅析MySQL主从复制技术(异步复制,同步复制,半同步复制).md └── mysql ├── 0-en-what-s-new-mysql-replication-mysql-80.md ├── 0-zh-what-s-new-mysql-replication-mysql-80.md ├── 1-en-Looking at Disk Utilization and Saturation.md ├── 1-zh-Looking at Disk Utilization and Saturation.md ├── 10-MySQL查询性能优化_不只是索引.md ├── 11-Four Ways MySQL Executes GROUP BY.md ├── 12-using-generated-columns-in-mysql-5-7-to-increase-query-performance.md ├── 13-MyRocks Engine Things to Know Before You Start.md ├── 14-What-To-Do-When-MySQL-Runs-Out-of-Memory: Troubleshooting-Guide.md ├── 15-Chunk Change: InnoDB Buffer Pool Resizing.md ├── 16-MySQL 8 and The FRM Drop… How To Recover Table DDL.md ├── 17-Why You Should Avoid Using “CREATE TABLE AS SELECT” Statement.md ├── 18-How to Restore MySQL Logical Backup at Maximum Speed.md ├── 19-MySQL5.6升级到5.7遇到的问题总结.md ├── 2-en-mysql-8-0-2-more-flexible-undo-tablespace-management.md ├── 2-zh-mysql-8-0-2-more-flexible-undo-tablespace-management.md ├── 20-Tuning InnoDB Primary Keys.md ├── 21-One-Billion-Tables-in-MySQL-8.0-with-ZFS.md ├── 22-How-Network-Bandwidth-Affects-MySQL-Performance.md ├── 22.A Look at MyRocks Performance ├── 24-Replace MariaDB 10.3 by MySQL 8.0.md ├── 25-MySQL InnoDB Cluster – consistency levels.md ├── 26-MySQL Memory Management, Memory Allocators and Operating System.md ├── 27-MySQL8.0-INFORMATION_SCHEMA增强.md ├── 28-mysql-got-an-error-reading-communication-packet-errors.md ├── 3-zh-mysql-8-0-2-replication-new-feature.md ├── 4-zh-mysql-5-7-initial-flushing-analysis-and-why-performance-schema-data-is-incomplete.md ├── 5-zh-Percona-Server-5.7-performance-improvements.md ├── 6-zh-Percona-Server-5.7 ├── 7-zh-Percona-Server-5.7-parallel-doublewrite.md ├── 8-zh-the-mysql-8.0.3-release-candidate-is-available.md └── 9-How to Choose the MySQL innodb_log_file_size.md /InnoDB/001-InnoDB简介.md: -------------------------------------------------------------------------------- 1 | # 15.InnoDB简介 2 | 3 | ``` 4 | 译者:徐晨亮 5 | 6 | 团队:无名队 7 | 8 | 说明:转载请注明出处 9 | ``` 10 | 11 | InnoDB是一种兼于高可靠性和高性能的通用存储引擎。在MySQL8.0中,InnoDB是默认的存储引擎。除非你配置了不同的默认存储引擎,否则在CREATE TABLE时不带ENGINE=子句时都会创建InnoDB表。 12 | 13 | ## InnoDB的主要优势 14 | 15 | - DML操作遵循ACID模型,具有提交、回滚和崩溃恢复功能的实务来保护用户数据。 16 | - 行级锁定和Oracle风格的一致性读取可提高多用户并发性和性能。 17 | - InnoDB表将你的数据排列在磁盘上,以根据主键优化查询。每个InnoDB表都有一个称为聚集索引的主键索引,为主键查询最小化I/O来组织数据。 18 | - 要保持数据完整性,InnoDB支持外键约束。使用外键,将检查插入,更新和删除,以确保它们不会导致不同表之间的不一致。 19 | 20 | 表15.1InnoDB存储引擎功能 21 | 22 | | 特征 | 支持 | 23 | | ----------------------------------------------------- | ------------------------------------------------------------ | 24 | | 备份/时间点恢复(在服务器中实现,而不是在存储引擎中) | 是 | 25 | | 集群数据库支持 | 没有 | 26 | | 聚集索引 | 是 | 27 | | 压缩数据 | 是 | 28 | | 数据缓存 | 是 | 29 | | 加密数据 | 是(通过加密功能在server层实现;在MySQL5.7级更高的版本中,支持静态数据表空间加密) | 30 | | 外键支持 | 是 | 31 | | 全文搜索索引 | 是(在MySQL5.6及更高版本中可以使用InnoDB对FULLTEXT索引的支持) | 32 | | 地理空间数据类型支持 | 是 | 33 | | 地理空间索引支持 | 是(在MySQL5.7及更高版本中可以使用InnoDB对地理空间索引的支持) | 34 | | 哈希索引 | 否(InnoDB在内部利用哈希索引来实现其自适应哈希索引功能) | 35 | | 索引缓存 | 是 | 36 | | 锁定粒度 | 行 | 37 | | MVCC | 是 | 38 | | 复制支持 | 是 | 39 | | 存储限制 | 64TB | 40 | | T树索引 | 没有 | 41 | | 事务支持 | 是 | 42 | | 更新数据字典的统计信息 | 是 | 43 | 44 | ##InnoDB增强功能和新功能 45 | 46 | 有关InnoDB增强功能和新功能的信息,请参阅: 47 | 48 | - "[MySQL8.0中的新InnoDB功能](https://dev.mysql.com/doc/refman/8.0/en/mysql-nutshell.html)"中的增强功能列表 49 | - [发行说明](https://dev.mysql.com/doc/relnotes/mysql/8.0/en/) 50 | 51 | ## 额外的InnoDB信息和资源 52 | 53 | - 有关InnoDB相关术语和定义吗,请参阅[MySQL词汇表](https://dev.mysql.com/doc/refman/8.0/en/glossary.html) 54 | - 对于专用于InnoDB存储引擎的论坛,请参阅[MySQL论坛::InnoDB](http://forums.mysql.com/list.php?22) 55 | 56 | ## 15.1 使用InnoDB表的好处 57 | 58 | 使用InnoDB表你可能会发现以下好处: 59 | 60 | - 如果你的服务器由于硬件或者软件问题崩溃,无论当时数据库中发生了什么,你都无需在重新启动数据库后执行任何特殊操作。InnoDB崩溃恢复回自动完成崩溃之前提交的所有变更,并撤销正在进行但仍未提交的任何变更。只需要在你离开的地方重启并继续。 61 | - InnoDB存储引擎维护了自己的缓冲池,在主要的内存中当数据被访问时用来缓冲表和索引数据。频繁被使用的数据直接从内存读取处理。此缓存适用于许多类型的信息并加快处理速度。在专用数据库服务器上,通常会将最多80%的物理内存分配给缓冲池。 62 | - 如果将相关数据拆分到不同的表中,则可以设置外键以此来强制引用完整性。更新或删除数据,其他表上的相关数据也将会自动更新或删除。尝试将主表上没有的数据插入到子表中,那么错误的数据将会自动剔除。 63 | - 如果数据在磁盘或内存中损坏,checksum机制将会提醒你在使用前伪造数据。 64 | - 当设计数据库为每个表设定合适的主键列,涉及到这些列的操作将会自动优化。在WHERE子句,ORDER BY子句,GROUP BY子句和join操作时使用主键会非常快。 65 | - 插入,更新和删除将会由change buffer机制进行优化。InnoDB不仅允许对同一个表进行并发读写访问,还可以缓存已经更改的数据以简化磁盘I/O。 66 | - 性能优势不仅仅局限于在超大表上运行长时间的查询。当一个表上的相同行被一次又一次地访问,AHI特性将会使这些操作变得非常快就像它们来自哈希表一样。 67 | - 你可以使用压缩表和关联的索引。 68 | - 你可以创建或者删除索引,并且对性能和可用性的影响非常小。 69 | - truncate每个独立表空间非常快,并且可以将释放的空间给操作系统使用,而系统表空间释放的空间只能给InnoDB重用。 70 | - 在Dynamic的行格式下,对`BLOB`和`long text`字段存储表数据的方式将更有效。 71 | - 你可以通过查询`INFORMATION_SCHEMA`表来监控存储引擎的内部工作方式 72 | - 你可以通过查询`performance schema`来监控存储引擎的性能详情 73 | - 你可以自由地混合InnoDB表和其他引擎表,甚至可以在同一语句中。例如,你可以在一个简单查询中使用join操作来组合InnoDB和MEMORY表数据。 74 | - InnoDB是为在处理大量数据提升cpu效率和最大化性能而设计 75 | - InnoDB表可以处理大量数据,即使在文件大小限制为2GB的操作系统上也是如此。 76 | 77 | 对于InnoDB你可以在应用程序代码中使用InnoDB特定的优化技术,可以参照[Optimizing for InnoDB Tables](https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb.html) 78 | 79 | ## 15.1.2 InnoDB表最佳实践 80 | 81 | 本节介绍使用InnoDB表的最佳实践 82 | 83 | - 为每个表在最常用的单列或多列创建主键,如果没有明显的主键,则指定一个自增的值 84 | - 基于ID从多个表获取数据时使用join。为了获得快速连接性能,在join列上定义外键并且在每个表上定义这些列时声明相同的数据类型。添加外键可以确保引用的列建立索引从而提高性能。外键还会将删除或更新操作广播到所有受影响的表上,并且如果父表中不存在相应的ID,会阻止在子表中插入数据。 85 | - 关闭自动提交。每秒几百次的提交限制性能(受存储设备写入速度的限制) 86 | - 通过START TRANSACTION和COMMIT语句将相关的DML操作放到一组事务中。当你不想频繁地提交,你也不想发出大批量的INSERT,UPDATE或者DELETE而跑几个小时并且不提交。 87 | - 不要使用LOCK TABLES语句。InnoDB能够同时处理多个会话,同时读取和写入同一个表,而不会牺牲可靠性或高性能。要获得对同一组行的排他写权限,请使用SELECT … FOR UPDATE语法来锁定你想更新的行。 88 | - 使用独立表空间或者使用通用表空间的多个文件来存放数据和索引,而不是系统表空间。innodb_file_per_table选项是默认的。 89 | - 评估你的数据和访问模式是否受益于InnoDB表或页面压缩功能。你可以在InnoDB不牺牲读/写功能的情况下使用压缩表。 90 | - 使用`--sql_mode=NO_ENGINE_SUBSTITUTION`选项来运行你的MySQL,避免在CREATE TABLE使用ENGINE=子句中指定的引擎出现问题时使用其他存储引擎来创建表。 91 | 92 | ## 15.1.3确认InnoDB是默认的存储引擎 93 | 94 | 使用`SHOW ENGINES`语句可以查看可用的MySQL存储引擎。查看DEFAULT是否InnoDB所在行 95 | 96 | ```mysql 97 | mysql>SHOW ENGINES; 98 | ``` 99 | 100 | 或者,可以查询`INFORMATION_SCHEMA.ENGINES`表 101 | 102 | ```mysql 103 | mysql>SELECT * FROM INFORMATION_SCHEMA.ENGINES; 104 | ``` 105 | 106 | ## 15.1.4测试和压测InnoDB 107 | 108 | 如果InnoDB不是你的默认存储引擎,那么可以通过重启server并且在命令行中指定`--default-storage-engine=InnoDB`或者在MySQL配置文件的[mysqld]部分定义`default-storage-engine=innodb`来确定你的数据库服务器或者应用程序是否正确使用了InnoDB。 109 | 110 | 由于更改默认存储引擎只会影响新创建的表,因此请运行所有程序安装步骤确保正确安装。然后测试所有的程序特性来确保所有的数据加载,编辑和查询特性都能工作。如果一个表依赖于其他引擎的特性,你将会收到错误提示。在CREATE TABLE语句中添加ENGINE=other_engine_name子句来避免错误。 111 | 112 | 如果你没有对存储引擎做出深思熟虑的决定,并且想要预览某些特定InnoDB表如何工作,为每个表使用命令`ALTER TABLE table_name ENGINE=InnoDB`;或者,在不打扰原始表的情况下运行测试查询和其他语句,请复制: 113 | 114 | ```mysql 115 | CREATE TABLE InnoDB_Table (...) ENGINE=InnoDB AS SELECT * FROM other_engine_table; 116 | ``` 117 | 118 | 要在实际工作负载下评估完整应用程序性能,请安装最新的MySQL服务器并运行基准测试。 119 | 120 | 测试应用的完整生命周期,从安装到大量使用,服务重启。在数据库忙于模拟电源故障时终止服务器进程,并在重新启动服务器时验证数据是否已成功恢复。 121 | 122 | 测试任何复制配置,尤其是你在master和slave上使用不同的MySQL版本和选项。 -------------------------------------------------------------------------------- /InnoDB/InnoDB-page-merging-and-page-splitting.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: InnoDB 页面的合并和拆分 3 | tags: InnoDB 4 | categories: InnoDB 5 | --- 6 | 7 | > 原文作者:Marco Tusa 8 | 发布日期:2017年4月10日 9 | 关键词: InnoDB,Innodb interals,Insight for DBAs, MySQL 10 | 原文链接: https://www.percona.com/blog/2017/04/10/innodb-page-merging-and-page-splitting/ 11 | 12 | 13 | 如果你遇到了全球为数不多的MySQL顾问之一,并要求他/她审查您的查询和/或模式,我相信他/她会告诉你有关良好主键设计的重要性,尤其是InnoDB,我相信他们开始向你解释索引合并和页面拆分。这两个概念和性能密切相关,并且在你设计任何索引(不仅仅是主键)时,都应该考虑这种关系。 14 | 15 | 那对你来说听起来像是胡言乱语,你可能是对的。这不是一件容易的事情,尤其是在讨论内部原理的时候。这不是你经常处理的事情,通常你根本不想处理它。 16 | 17 | 但有时这是必要的,如果是这样,这篇文章适合你。 18 | 19 | 在这篇文章中,我想解释一下InnoDB中一些最不清晰的内部操作:页面索引创建、页面合并和页面拆分。 20 | 21 | ***InnoDB中所有数据即索引***,你可能也听说过这句话,对吧? 但这到底意味着什么呢? 22 | 23 | 24 | 25 | ## 文件表组件(File-Table Components) 26 | 27 | 假设你己经安装了MySQL 5.7的最新版本(比如:[Percona Server for MySQL](https://www.percona.com/software/mysql-database/percona-server)),并且在你的模式 ***windmills*** 中有一张命名为 ***wmills*** 的表,在数据目录(默认是/var/lib/mysql目录)中,你将看到它包含: 28 | ```bash 29 | data/ 30 | windmills/ 31 | wmills.ibd 32 | wmills.frm 33 | ``` 34 | 35 | 这是因为从MySQL 5.6开始参数 ***innodb_file_per_table*** 设置成1的结果,使用该设置,模式中的每个表都由一个文件表示(如果表是分区的,则由多个文件表示)。 36 | 37 | 这里重要的是,物理容器是一个名为 ***wmills.ibd*** 的文件。这个文件被分成N个段。每个段都与一个索引相关联。 38 | 39 | 虽然文件的大小不会因为行删除而收缩,但是段本身可以相对于名为区段的子元素增长或收缩。区段只能存在于段中,并且具有1MB的固定大小(在默认页面大小的情况下)。页面是区段的子元素,默认大小为16KB。 40 | 41 | 因此,一个区段最多可以包含64页,一个页可以包含两到N行。一个页面可以包含的行数和行大小相关,这由表模式定义,InnoDB中有一条规则说,一个页面至少要有2行,因此,我们的行大小限制为8000字节。 42 | 43 | 如果你认为这听起来像**俄罗斯套娃(请自行搜图^_^)**,你是对的,下面的图片将有助于你理解: 44 | 45 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/segment_extent-e1491345857803.png) 46 | 47 | 48 | InnoDB使用B树来跨区段组织页面内部的数据。 49 | 50 | ## 根(Roots),分支(Branches)和叶子(Leaves) 51 | 每个页面(叶子)包含由主键组织的2-N行,树有专门的页面来管理不同的分支。这些被称为内部节点(inode)。 52 | 53 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Bplustree.png) 54 | 55 | 这张图片只是一个例子,并不代表下面的实际输出。 56 | 57 | 让我们来看看细节: 58 | ```bash 59 | ROOT NODE #3: 4 records, 68 bytes 60 | NODE POINTER RECORD ≥ (id=2) → #197 61 | INTERNAL NODE #197: 464 records, 7888 bytes 62 | NODE POINTER RECORD ≥ (id=2) → #5 63 | LEAF NODE #5: 57 records, 7524 bytes 64 | RECORD: (id=2) → (uuid="884e471c-0e82-11e7-8bf6-08002734ed50", millid=139, kwatts_s=1956, date="2017-05-01", location="For beauty's pattern to succeeding men.Yet do thy", active=1, time="2017-03-21 22:05:45", strrecordtype="Wit") 65 | ``` 66 | 表结构如下: 67 | ```sql 68 | CREATE TABLE `wmills` ( 69 | `id` bigint(11) NOT NULL AUTO_INCREMENT, 70 | `uuid` char(36) COLLATE utf8_bin NOT NULL, 71 | `millid` smallint(6) NOT NULL, 72 | `kwatts_s` int(11) NOT NULL, 73 | `date` date NOT NULL, 74 | `location` varchar(50) COLLATE utf8_bin DEFAULT NULL, 75 | `active` tinyint(2) NOT NULL DEFAULT '1', 76 | `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, 77 | `strrecordtype` char(3) COLLATE utf8_bin NOT NULL, 78 | PRIMARY KEY (`id`), 79 | KEY `IDX_millid` (`millid`) 80 | ) ENGINE=InnoDB; 81 | ``` 82 | 83 | 所有类型的B树都有一个入口点,称为根节点。我们在这里将其标识为第3页。根页面包含索引ID、inode数量等信息。INode页面包含关于页面本身及其值范围等信息。最后,我们有叶节点,这是我们可以找到数据的地方。在这个例子中,我们可以看到叶子节点#5有57条记录,总共7524个字节。在这一行下面是一条记录,您可以看到行数据。 84 | 85 | 这里的概念是,当你用表和行来组织数据时,InnoDB用分支、页面和记录来组织数据。记住InnoDB不是在单行基础上工作的,这一点非常重要。InnoDB总是对页面进行操作。加载页面后,它将扫描页面以查找请求的行/记录。 86 | 87 | 现在明白了吗?好,让我们继续。 88 | 89 | 90 | ## 页面内部(Page Internals) 91 | 92 | 一个页面可以是空的或是完全填满的(100%),行记录由主键组织,例如,如果你的表使用AUTO_INCREMENT,那么序列ID将是1、2、3、4,等等。 93 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_1.png) 94 | 95 | 一个页面还有一个重要的属性:***MERGE_THRESHOLD*** ,该参数的默认值是页面的50%,在InnoDB合并活动中起着非常重要的作用。 96 | 97 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_2.png) 98 | 99 | 当你插入数据时,如果传入的记录可以容纳在页面中,那么页面将按顺序被填满。 100 | 101 | 当一个页面被填满时,下一条记录将被插入到下一个页面: 102 | 103 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_4.png) 104 | 105 | 鉴于B树的性质,该结构不仅可以自上而下沿着树枝进行浏览,还可以横向浏览叶子节点。这是因为每个叶子节点页都有一个指向包含序列中下一个记录值的页的指针。 106 | 107 | 例如,第5页有对下一页(第6页)的引用。第6页向后引用前一页(第5页),并向前引用下一页(第7页)。 108 | 109 | 链表的这种机制允许快速、有序的扫描(即:范围扫描)。如前所述,这是在插入和基于AUTO_INCREMENT的主键时发生的情况。但是如果我开始删除值呢? 110 | 111 | ## 页面合并(Page Merging) 112 | 当你删除一条记录,该条记录不会物理删除,相反,它将记录标记为已删除,并且所使用的空间可以回收。 113 | 114 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_3.png) 115 | 116 | 当一个页面接收到足够的删除操作以匹配**MERGE_THRESHOLD**(默认为页面大小的50%)时,InnoDB开始查看最近的页面(下一个和前一个),看看是否有机会通过合并这两个页面来优化空间利用率。 117 | 118 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_4.png) 119 | 120 | 在这个例子中,第6页使用了不到一半的空间。第5页收到了许多删除,现在使用的也不到50%。从InnoDB的角度来看,它们是可合并的: 121 | 122 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_5.png) 123 | 124 | 合并操作的结果是第5页包含先前的数据和第6页的数据。第6页变为空页,可用于新数据。 125 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_6.png) 126 | 127 | 当我们更新一条记录并且新记录的大小使页面低于阈值时,也会发生同样的过程。 128 | 129 | 规则是: 合并发生在涉及紧密链接页面的删除和更新操作上。如果合并操作成功,则 ***INFORMATION_SCHEMA.INNODB_METRICS*** 中的 ***index_page_merge_successful*** 指标将递增。 130 | 131 | ## 页面拆分(Page Splits) 132 | 如上所述,一个页面可以被填充到100%。当这种情况发生时,下一页将记录新记录。 133 | 134 | 但是如果我们有以下情况呢? 135 | 136 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_7.png) 137 | 138 | 第10页没有足够的空间容纳新的(或更新的)记录。按照下一页的逻辑,记录应该在第11页。然而: 139 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_9.png) 140 | 141 | 第11页也是满的,不能无序插入数据。那么我们能做些什么呢? 142 | 143 | 还记得我们讲过的链表吗?此时,第10页的Prev=9, Next=11。 144 | 145 | InnoDB要做的是: 146 | 1. 创建一个新页面 147 | 2. 确定可以将原始页面(第10页)分割到哪里(在记录级别) 148 | 3. 移动记录 149 | 4. 重新定义页面关系 150 | 151 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_8.png) 152 | 153 | 创建一个新的页面#12: 154 | 155 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/Locality_10.png) 156 | 157 | 第11页保持原样。改变的是页面之间的关系: 158 | - 页面#10 Prev=9,Next=12 159 | - 页面#12 Prev=10, Next=11 160 | - 页面#11 Prev=12,Next=13 161 | 162 | b树的路径仍然可以看到一致性,因为它遵循逻辑组织。然而,页面的物理位置是无序的,并且在大多数情况下处于不同的区段。 163 | 164 | 作为一个规则,我们可以说: 页面分裂发生在插入或更新时,并导致页面错位(在许多情况下,在不同的区段上)。 165 | 166 | InnoDB跟踪 ***INFORMATION_SCHEMA.INNODB_METRICS*** 中的页面分割数量。可查看 ***index_page_split*** 和 ***index_page_reorg_tries/success*** 指标。 167 | 168 | 一旦创建了分割页面,回退的唯一方法是将创建的页面放到合并阈值以下。当这种情况发生时,InnoDB使用合并操作将数据从拆分页面移动。 169 | 170 | 另一种方法是通过**OPTIMIZE**表来重新组织数据。这可能是一个非常繁重和漫长的过程,但是,通常这是从过多页面位于稀疏区段的情况中恢复的惟一方法。 171 | 172 | 要记住的另一个方面是,在合并和拆分操作期间,InnoDB获得索引树的x-latch。在繁忙的系统中,这很容易成为一个关注点。这可能导致索引latch争用。如果没有合并和分割(又名写操作)只接触单个页面,这在InnoDB中称为“乐观”更新,latch是共享的。合并和分割称为“悲观”更新,latch是排他的。 173 | 174 | ## 我的主键 175 | 一个好的主键(PK)不仅对于检索数据很重要,而且在写入时正确地分布区段内的数据也很重要(这对于拆分和合并操作也很重要)。 176 | 177 | 在第一个例子,我有一个简单的自动主键。在第二个例子中,我的PK基于ID(范围1-200)和一个自自增值。在第三个示例中,我有相同的ID(范围1-200),但与UUID关联。 178 | 179 | 当插入时,InnoDB必须添加页面。这是一个分割操作: 180 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/split_1.png) 181 | 182 | 根据我使用的主键的类型,行为是非常不同的。 183 | 184 | 前两种示例中的数据分布将更加“紧凑”。这意味着它们也将有更好的空间利用率,而UUID的半随机特性将导致显著的“稀疏”页面分布(导致更多的页面和相关的分割操作)。 185 | 186 | 在合并的情况下,尝试合并的次数因PK类型而异。 187 | 188 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2017/04/merges_1-1024x542.png) 189 | 190 | 在插入-更新-删除操作中,与其他两种类型相比,自增主键具有更少的页面合并尝试和9.45%的成功率。带有UUID的PK(在图片的另一边)有更高的合并尝试次数,但同时也有显著更高的成功率,为22.34%,这是由于“稀疏”分布使得许多页面部分为空。 191 | 192 | 具有类似数字的PK值也来自二级索引。 193 | 194 | ## 讨论 195 | MySQL/InnoDB经常执行这些操作,而您对它们的可见性非常有限。但是它们会咬你一口,而且咬得很厉害,特别是如果使用主轴存储与SSD(顺便说一下,这两者有不同的问题)。 196 | 197 | 可悲的是,我们也几乎无法在服务器端使用参数或其他一些神奇的方法来优化它。但好消息是,在设计时可以做很多事情。 198 | 199 | 使用适当的主键并设计一个二级索引,记住不要滥用它们。计划适当的表维护窗口,你将有高效率的插入/删除/更新。 200 | 201 | 在InnoDB中,你不会有碎片化的记录,但是在页面-区段级别上,您可能遭遇噩梦。忽略表维护将导致IO级、内存级和InnoDB缓冲池级的更多工作。这是要记住的重要一点。 202 | 203 | 你必须定期重新构建一些表。使用它需要的任何技巧,包括分区和外部工具(pt-osc)。不要让一张表变得巨大和完全碎片化。 204 | 205 | 浪费磁盘空间? 需要加载三个页面而不是一个来检索您需要的记录集? 每次搜索都会显著增加读取量? 206 | 207 | 这是你的错,做事马虎没有借口! 208 | 209 | 祝大家享受MySQL! 210 | 211 | ## 致谢 212 | Laurynas Biveinis: 他花时间耐心地向我解释一些内部原理。 213 | Jeremy Cole: 他的项目[InnoDB_ruby](https://github.com/jeremycole/innodb_ruby)(我经常使用)。 -------------------------------------------------------------------------------- /InnoDB/innodb-alter-table-add-index-and-insert-performance.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: InnoDB ALTER TABLE INDEX和INSERT之性能 3 | tags: InnoDB 4 | categories: InnoDB 5 | --- 6 | 7 | > 原文作者:Satya Bodapati 8 | 发布日期:2019年6月27日 9 | 关键词: InnoDB, MySQL 10 | 原文链接: https://www.percona.com/blog/2019/06/27/innodb-alter-table-add-index-and-insert-performance/ 11 | 12 | 13 | 14 | 15 | 在我的前一篇博客[InnoDB排序索引构建](https://www.percona.com/blog/2019/05/08/mysql-innodb-sorted-index-builds/)中,我解释了排序索引构建的内部处理过程。文章最后我说了“没有缺点”。 16 | 17 | 从MySQL 5.6开始,包括ALTER TABLE ADD INDEX在内的许多ddl都变成了“在线模式”。这意味着,当**ALTER操作**正在进行时,可以有并发的select和DMLs。请参阅MySQL文档[online DDL](https://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl-operations.html)。从文档中,我们可以看到,DDL操作 ***ALTER TABLE ADD INDEX*** 允许并发的DML。 18 | 19 | 在5.7中引入的排序索引构建的主要缺点是在进行更改时插入性能降低,在这篇文章中,我们特别讨论了正在执行 ***ALTER ADD INDEX*** 的表上的单线程插入性能。 20 | 21 | 如果表很大,比如大约有6亿行或更多行,插入甚至会导致服务器崩溃。对于运行数小时且并发插入等待超过600秒的更改尤其如此。InnoDB的监视器线程使服务器崩溃,声明插入等待latch超过600秒。它被报告为MySQL[Bug#82940](https://bugs.mysql.com/bug.php?id=82940) 22 | 23 | ## 它修复了吗? 24 | 这个问题从5.7 GA开始就存在了,并且在Percona Server for MySQL的最新版本 5.7.26-29和8.0.15-6中得到了修复,这是[PS-3410](https://jira.percona.com/browse/PS-3410) bug修复的一部分。完成插入的数量取决于表是压缩的还是未压缩的,以及页面大小。 25 | 26 | Percona的补丁提供给了上游[Oracle MySQL](https://github.com/mysql/mysql-server/pull/268),但是还没有包含其中。为了MySQL社区的利益,我们希望Oracle在下一个5.7版本中包含这个修复。 27 | 28 | 如果不能升级到PS-5.7.26,一个合理的解决方案是使用[pt-online-schema-change](https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html)。使用此工具,确保磁盘空间至少等于原始表空间大小。 29 | 30 | ## 改善了多少? 31 | 改进的百分比取决于测试场景、机器配置等。请参阅下面的详细信息。 32 | 33 | 对于未压缩的表,修复版本(5.7.26)运行 ***ALTER ADD INDEX*** 时完成的插入比5.7.25(官方版)要多58%。 34 | 35 | 对于压缩表,当 ***ALTER ADD INDEX*** 运行时,完成的插入数要多322%。 36 | 37 | ## 与5.6相比如何? 38 | 修复之后,对于未压缩的表,***ALTER ADD INDEX*** 期间(来自单个连接)完成的插入数与5.6相同。 39 | 40 | 对于压缩表,使用5.6.44完成的插入数比5.7.26(有一个固定值)多43%。这有点令人惊讶,需要做更多的分析才能找到原因。这是另一个话题。 41 | 42 | ## 从设计的角度来看问题 43 | 作为排序索引构建的一部分,索引正在构建时,index->lock获得X(排他)模式,此锁在已排序索引构建的整个过程中一直持有,是的,你读的没错,完整的信息见[PS-3410](https://jira.percona.com/browse/PS-3410)。 44 | 45 | 并发插入将能够看到正在构建一个“新索引”。对于这样的索引,插入到[在线ALTER日志](https://jira.percona.com/browse/PS-3410)中,稍后在ALTER末尾执行这些日志。作为此操作的一部分,INSERT尝试以S (shared)模式获取index->lock,以查看索引是否处于online或中止状态。 46 | 47 | 由于排序索引构建过程在整个过程中都以X模式持有index->lock,并发插入操作等待这个latch,如果索引很大,insert线程的等待时间将超过600秒,这会导致服务器崩溃。 48 | 49 | ## 修复 50 | 解决方法相当简单。排序索引构建不需要在X模式下获取index->lock。在此阶段,未提交索引上没有并发读取。并发插入不会干扰已排序的索引构建。他们在线修改日志。因此,不获取正在构建的索引的index->lock是安全的。 51 | 52 | ## 测试用例 53 | 54 | 下面的MTR测试用例显示了运行ALTER时并发执行的insert的数量。注意,只有一个连接执行插入。 55 | 56 | 所有版本的测试都使用 ***innodb_buffer_pool_size = 1G*** 运行。使用了两个版本的表。一个具有常规16K页面大小,另一个具有4K页面大小的压缩表。 57 | 58 | 数据目录存储在RAM中,用于所有测试。您可以保存以下文件(例如mysql-test/t/ alter_insert_concurrent .test),并运行MTR测试用例: 59 | ```bash 60 | ./mtr --mem main.alter_insert_concurrency --mysqld=--innodb_buffer_pool_size=1073741824 61 | ``` 62 | 63 | 该测试向表插入1000万行,并创建索引(与 ***ALTER table t1 ADD INDEX*** 相同),在另一个连接中,一个接一个地执行插入,直到ALTER完成。 64 | ```sql 65 | --source include/have_innodb.inc 66 | --source include/count_sessions.inc 67 | 68 | connect (con1,localhost,root,,); 69 | 70 | CREATE TABLE t1( 71 | class INT, 72 | id INT, 73 | title VARCHAR(100), 74 | title2 VARCHAR(100) 75 | ) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4; 76 | 77 | DELIMITER |; 78 | CREATE PROCEDURE populate_t1() 79 | BEGIN 80 | DECLARE i int DEFAULT 1; 81 | 82 | START TRANSACTION; 83 | WHILE (i <= 1000000) DO 84 | INSERT INTO t1 VALUES (i, i, uuid(), uuid()); 85 | SET i = i + 1; 86 | END WHILE; 87 | COMMIT; 88 | END| 89 | 90 | CREATE PROCEDURE conc_insert_t1() 91 | BEGIN 92 | DECLARE i int DEFAULT 1; 93 | 94 | SELECT COUNT(*) INTO @val FROM INFORMATION_SCHEMA.PROCESSLIST WHERE ID != CONNECTION_ID() AND info LIKE "CREATE INDEX%"; 95 | 96 | IF @val > 0 THEN 97 | SELECT "ALTER STARTED"; 98 | END IF; 99 | 100 | WHILE (@val > 0) DO 101 | INSERT INTO t1 VALUES (i, i, uuid(), uuid()); 102 | SET i = i + 1; 103 | SELECT COUNT(*) INTO @val FROM INFORMATION_SCHEMA.PROCESSLIST WHERE ID != CONNECTION_ID() AND info LIKE "CREATE INDEX%"; 104 | END WHILE; 105 | SELECT concat('Total number of inserts is ', i); 106 | END| 107 | DELIMITER ;| 108 | 109 | --disable_query_log 110 | CALL populate_t1(); 111 | --enable_query_log 112 | 113 | --connection con1 114 | --send CREATE INDEX idx_title ON t1(title, title2); 115 | 116 | --connection default 117 | --sleep 1 118 | --send CALL conc_insert_t1(); 119 | 120 | --connection con1 121 | --reap 122 | 123 | --connection default 124 | --reap 125 | 126 | --disconnect con1 127 | 128 | DROP TABLE t1; 129 | 130 | DROP PROCEDURE populate_t1; 131 | DROP PROCEDURE conc_insert_t1; 132 | --source include/wait_until_count_sessions.inc 133 | ``` 134 | 135 | ## 测试数据 136 | ```bash 137 | compressed 4k : number of concurrent inserts (Avg of 6 runs) 138 | ============== ============================ 139 | PS 5.7.25 (and earlier) : 2315 140 | PS 5.7.26 (fix version) : 9785.66 (322% improvement compared to 5.7.25) (43% worse compared to 5.6) 141 | PS 5.6 : 17341 142 | 143 | 144 | 16K page size 145 | ============= 146 | PS 5.7.25 (and earlier) : 3007 147 | PS 5.7.26 (fix version) : 4768.33 (58.5% improvement compared to 5.7.25) (3.4% worse compared to 5.6) 148 | PS 5.6 : 4939 149 | ``` -------------------------------------------------------------------------------- /InnoDB/mysql-innodb-sorted-index-builds.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: InnoDB 排序索引的构建 3 | tags: InnoDB 4 | categories: InnoDB 5 | --- 6 | 7 | > 原文作者:Satya Bodapati 8 | 发布日期:2019年5月08日 9 | 关键词: InnoDB, MySQL 10 | 原文链接: https://www.percona.com/blog/2019/05/08/mysql-innodb-sorted-index-builds/ 11 | 12 | 13 | 我们不需要了解MySQL®和Percona Server如何构建索引。然而,如果你对构建过程有一定的了解,那么当你想为数据插入保留适当的空间时,它将会有所帮助。从MySQL5.7开始,开发者改变了他们为InnoDB构建二级索引的方式,应用自底而上的方法,而不是早期版本中使用的自顶而下方法。在这篇文章中,我将通过一个示例演示如何构建InnoDB索引。我将解释如何使用该认识来为参数 ***innodb_fill_factor*** 设置适当的值。 14 | 15 | 16 | 17 | ## 索引构建过程 18 | 要在已包含数据的表上构建索引,在InnoDB中有以下几个阶段: 19 | 1. 读取阶段(从聚集索引读取并构建辅助索引条目) 20 | 2. 合并排序阶段 21 | 3. 插入阶段(将已排序的记录插入辅助索引) 22 | 23 | 在5.6版本之前,MySQL每次插入一条记录来构建二级索引。这是一种“自上而下”的方法。对插入位置的搜索从根(顶部)开始,并到达相应的叶子页面(底部)。记录被插入到由游标指向的叶子页面上。在查找插入位置和执行页面分割和合并(在根节点和非根节点)方面,是有非常高的代价的。你知道有多少页面会拆分和合并吗,你可以阅读我同事**Marco Tusa**早期的博客[InnoDB页面拆分和合并](https://www.percona.com/blog/2017/04/10/innodb-page-merging-and-page-splitting/)。 24 | 25 | 从MySQL 5.7开始,添加索引期间的插入阶段使用“排序索引构建”,也称为“批量加载索引”。在这种方法中,索引是“自底而上”构建的。也就是说,首先构建叶子页面(底部),然后构建非叶子层,直到根(顶部)。 26 | 27 | ## 用例 28 | 在这些情况下使用排序索引构建: 29 | - ALTER TABLE t1 ADD INDEX (or CREATE INDEX) 30 | - ALTER TABLE t1 ADD FULLTEXT INDEX 31 | - ALTER TABLE t1 ADD COLUMN, ALGORITHM=INPLACE 32 | - OPTIMIZE TABLE t1 33 | 34 | 对于最后两个用例,ALTER创建一个中间表。中间表索引(主建索引和辅助索引)是使用“排序索引构建”来构建的。 35 | 36 | ## 算法 37 | 1. 在0级创建一个页面。还要为这个页面创建一个游标。 38 | 2. 使用第0级的游标插入页面,直到填满为止。 39 | 3. 一旦页面满了,创建一个同级页面(不会插入同级页面)。 40 | 4. 为当前完整页面创建一个节点指针(子页面中的最小键值、子页面号),并将一个节点指针插入到上面的一层(父页面)。 41 | 5. 在上层,检查游标是否已经定位。如果没有,为该级别创建父页面和游标。 42 | 6. 在父页面上插入节点指针 43 | 7. 如果父页面也满了,将重复步骤3、4、5、6 44 | 8. 现在插入到同级页面,并使游标指向同级页面。 45 | 9. 在所有插入的最后,每一层都有指针指向最右边的页面。提交所有游标(意味着提交修改页面的小事务,释放所有latch)。 46 | 47 | 为了简单起见,上面的算法跳过了关于压缩页面和blob(外部存储的blob)处理的细节。 48 | 49 | ## 自底而上构建索引的过程 50 | 51 | 通过一个示例,让我们看看如何自底向上构建辅助索引。同样,为了简单起见,假设叶子和非叶子页面中允许的最大记录数为3。 52 | ```sql 53 | CREATE TABLE t1 (a INT PRIMARY KEY, b INT, c BLOB); 54 | 55 | INSERT INTO t1 VALUES (1, 11, 'hello111'); 56 | INSERT INTO t1 VALUES (2, 22, 'hello222'); 57 | INSERT INTO t1 VALUES (3, 33, 'hello333'); 58 | INSERT INTO t1 VALUES (4, 44, 'hello444'); 59 | INSERT INTO t1 VALUES (5, 55, 'hello555'); 60 | INSERT INTO t1 VALUES (6, 66, 'hello666'); 61 | INSERT INTO t1 VALUES (7, 77, 'hello777'); 62 | INSERT INTO t1 VALUES (8, 88, 'hello888'); 63 | INSERT INTO t1 VALUES (9, 99, 'hello999'); 64 | INSERT INTO t1 VALUES (10, 1010, 'hello101010'); 65 | ALTER TABLE t1 ADD INDEX k1(b); 66 | ``` 67 | InnoDB将主键字段附加到辅助索引。二级索引k1的记录为格式(b, a),排序阶段后,记录为: 68 | (11,1), (22,2), (33,3), (44,4), (55,5), (66,6), (77,7), (88,8), (99,9), (1010, 10) 69 | 70 | ## 初始插入阶段 71 | 我们从记录(11,1)开始。 72 | 1. 在0级(叶子级)创建一个页面。 73 | 2. 创建页面的游标。 74 | 3. 所有插入都转到该页,直到该页满为止。 75 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2019/04/1.png) 76 | 77 | 箭头显示游标当前指向的位置。它当前位于第5页,接下来的插入将转到该页。 78 | 79 | 还有两个空闲槽,因此插入记录(22,2)和(33,3)是很简单的。 80 | 81 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2019/04/2.png) 82 | 83 | 对于下一个记录(44,4),第5页已满。以下是步骤 84 | 85 | ## 当页面被填满时索引构建 86 | 87 | 1. 创建一个同级页面——第6页。 88 | 2. 暂时不要插入到同级页面。 89 | 3. 在游标处提交页面,即小事务提交、释放latch等。 90 | 4. 作为提交的一部分,创建一个节点指针并将其插入父页面[当前级别+ 1],即在1级。 91 | 5. 节点指针的格式是(子页面中的最小键值,子页面号)。第5页的最小键值是(11,1)。在父级插入记录((11,1),5)。 92 | 6. 第1级的父页面还不存在。MySQL创建第7页和指向第7页的游标。 93 | 7. 将 ((11,1),5) 插入第7页。 94 | 8. 现在,返回到第0级并创建从第5页到第6页的链接,反之亦然。 95 | 9. 第0级的游标现在指向同级别的第6页。 96 | 10. 插入(44,4)到第6页。 97 | 98 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2019/04/bulk_load_44.png) 99 | 100 | 接下来的插入—(55,5)和(66,6)—很简单,它们转到第6页。 101 | 102 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2019/04/bulk_load_55_66.png) 103 | 104 | 插入记录(77,7)类似于(44,4),只是父页面(第7页)已经存在,并且它还有空间容纳另外两条记录。首先将节点指针(44,4),6)插入第7页,然后将节点指针(77,7)记录到第8页。 105 | 106 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2019/04/bulk_load_77-1.png) 107 | 108 | 插入记录(88,8)和(99,9)非常简单,因为第8页有两个空闲插槽。 109 | 110 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2019/04/bulk_load_88_99-1.png) 111 | 112 | 下一个插入(1010,10)。将节点指针((77,7),8)插入到第1级的父页面(第7页)。 113 | 114 | MySQL在0级创建同级页面9。将记录(1010,10)插入到第9页,并将游标标更改为该页。 115 | 116 | ![avatar](https://www.percona.com/blog/wp-content/uploads/2019/04/Bulk_Load-Page-5.png) 117 | 118 | 在所有级别提交游标。在上面的示例中,数据库在第0级提交第9页,在第1级提交第7页。我们现在有一个完整的B+-tree索引,它是自底向上构建的! 119 | 120 | ## 索引填充因素 121 | 122 | 全局变量 ***innodb_fill_factor*** 设置B树页面中用于插入的空间量。默认值为100,这意味着使用了整个页面(不包括页头、页尾)。聚集索引具有 ***innodb_fill_factor = 100*** 的豁免权。在这种情况下,聚集索引页空间的1/16保持空闲。即:6.25%的空间预留给未来的DML操作。 123 | 124 | 值设置为80表示MySQL使用80%的页面进行插入,剩下20%用于将来的更新。 125 | 126 | 如果 ***innodb_fill_factor*** 设置为100,没有多余的空间留给将来插入辅助索引。如果你希望在添加索引之后表上有更多dml,这可能会导致页面再次拆分和合并。在这种情况下,建议使用80-90之间的值。使用 ***optimization TABLE*** 或 ***ALTER TABLE DROP COLUMN ,ALGORITHM=INPLACE*** 语句,这个变量值还影响索引重建。 127 | 128 | 你不应使用太低的值,例如:低于50,因为索引会占用更多的磁盘空间。对于较低的值,索引中有更多的页面和索引统计数据抽样可能不是最优的。优化器可能会选择具有次优统计信息的错误查询计划。 129 | 130 | ## 排序索引构建的优点 131 | 132 | 1. 没有页面拆分(表压缩除外)和合并。 133 | 2. 不重复搜索插入位置。 134 | 3. 插入没有重做日志记录(除了页面分配),所以重做日志子系统的压力较小。 135 | 136 | ## 缺点 137 | 没有......好吧,有一个,它值得一篇单独的文章^_^,敬请期待。 138 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 「知数堂-藏经阁」项目说明 2 | 3 | 本项目由专业优质在线培训品牌「知数堂」发起,项目定名 **「知数堂-藏经阁」** ,旨在将优秀的英文技术文档翻译成中文,在努力提升自身技术、英文水平的同时,力所能及地帮助国内同行,获得行业认可并打造个人品牌,也是展示个人品牌的良好窗口。此外,我们也会向大家推荐优质的中文MySQL文章,同样欢迎自荐。 4 | 5 | 有愿意一起付出的同行,也欢迎加入,可联系叶师傅微信:4700963(加好友时请注明用意)。 6 | 7 | # 项目成员 8 | - 「琅琊阁」,琅琊阁-小剑伯、琅琊阁-江b、琅琊阁-简小鹿 9 | - 「天一阁」,天一阁-冷锋、天一阁-神谕、天一阁-jack、天一阁-Judy 10 | - 「星耀队」,星耀队-芬达、星耀队-刘莉、星耀队-Ziroro、星耀队-M哥、星耀队-顺子 11 | - 「菜鸟盟」,菜鸟盟–hades、菜鸟盟-bruce、菜鸟盟-冰焰 12 | - 「Geek队」,Geek队-黎明、Geek队-Ace、Geek队-海潮、Geek队-Yun 13 | - 「无名队」,无名队-新平、无名队-徐晨亮 14 | - 「云译社」,云译社-胡金、云译社-辛佳宇 15 | 16 | # 文档索引 17 | 18 | 1. 中: [MySQL 8.0 中复制的新特性](mysql/0-zh-what-s-new-mysql-replication-mysql-80.md), 英: [What’s New With MySQL Replication in MySQL 8.0](mysql/0-en-what-s-new-mysql-replication-mysql-80.md) 19 | 1. 中:[深入理解磁盘IO利用率及饱和度](mysql/1-zh-Looking%20at%20Disk%20Utilization%20and%20Saturation.md), 英:[Looking at Disk Utilization and Saturation](mysql/1-en-Looking%20at%20Disk%20Utilization%20and%20Saturation.md) 20 | 1. 中:[MySQL 8.0.2中更灵活的UNDO表空间管理方式](mysql/2-zh-mysql-8-0-2-more-flexible-undo-tablespace-management.md),英:[MySQL 8.0.2 More Flexible Undo Tablespace Management](mysql/2-en-mysql-8-0-2-more-flexible-undo-tablespace-management.md) 21 | -------------------------------------------------------------------------------- /contribution/20180410-用了并行复制居然还有延迟.md: -------------------------------------------------------------------------------- 1 | # 用了并行复制居然还有延迟 2 | 3 | ## 环境描述 4 | 5 | 前端是ogg 后端是mariadb galera cluster 2个节点,其中一个galera节点挂了一个slave从库。 6 | 7 | 大概环境是这样的 8 | 9 | ogg -->mariadb galera cluster*2 -->slave。 10 | 11 | 简单理解就是 12 | mariadb master-->slave 直接的延迟。(另外一个galera节点没有用到)。 13 | 14 | mariadb版本为10.1.14。开启了并行复制 模式为optimistic。线程数为20 15 | 16 | 在通过业务层面模拟压力测试。发现mariadb master-->slave 延迟大约为1分钟左右。希望MySQL DBA帮整延迟延迟时间最好限制在3秒甚至没有延迟。 17 | 18 | 19 | ## mariadb 并行复制概念描述 20 | 21 | mariadb并行复制总体来说分为三种:按照顺序并行保守模式(默认模式),乱序并行复制,乐观模式的有序并行复制。 22 | 23 | 个人理解: 24 | 25 | 有序方式:并行执行事物,但对commit顺序进行排序。以确保主库上事物提交顺序和从库顺序一致。这种模式的并行复制对应用来说完全透明。 26 | 27 | 无序方式:并行执行事物,主库的提交的事物顺序可能跟从库的执行顺序不一致。无序方式只能在gtid模式和应用明确指定使用无序的时候才会被使用。无序方式性能是最好,但需要GTID和应用的配合。 28 | 29 | 乐观并行复制模式:任何DML语句都可以并行运行,这可能会在slave导致冲突,如果二个事物视图修改同一行,检测到这样的冲突。这二个事物,后者会回滚,前者会执行,一旦前着执行,后者事物重新尝试。这种模式会受到```slave_domain_parallel_threads```限制。 官方描述如下 30 | ```Any transactional DML (INSERT/UPDATE/DELETE) is allowed to run in parallel, up to the limit of @@slave_domain_parallel_threads. This may cause conflicts on the slave, eg. if two transactions try to modify the same row. Any such conflict is detected, and the latter of the two transactions is rolled back, allowing the former to proceed. The latter transaction is then re-tried once the former has completed.``` 31 | 32 | ## 问题排查 33 | 34 | 在业务层模拟压力测试的时候通过dstat, 观察发现 io利用率不超过50%,cpu利用率不超过50%。通过dstat观察发现瓶颈可能不在主从服务的性能上,而是参数的配置。从而进行参数的调整。 35 | 首先是并行复制的模式,把乐观并行复制模式(optimistic)调整为conservative,并行的线程数进行调整为12个。 36 | 本人用sysbench 模拟客户的环境进行压力测试。我的环境为1C2G 版本跟线上环境一致,threads 为200。测试结果 37 | 38 | optimistic | conservative 39 | ---|--- 40 | 5s | 0s 41 | 14s | 7s 42 | 24s | 15s 43 | 33s | 24s 44 | 42s | 32s 45 | 51s | 41s 46 | 60s | 49s 47 | 69s | 58s 48 | 78s | 67s 49 | 87s | 75s 50 | 51 | 发现确实通过对模式的修改,能够缓解主从复制的延迟,但不能彻底的解决。在次测试,通过```pt-ioprofile```发现redo写入占大部分。通过对参数的调整 52 | ``` 53 | sync_binlog = 0 54 | innodb_flush_log_at_trx_commit =0 55 | master_info_repository = TABLE 56 | relay_log_info_repository = TABLE 57 | log_slave_updates=off 58 | ``` 59 | 再次进行测试 60 | 61 | conservative_old | conservative_new 62 | ---|--- 63 | 0s | 2s 64 | 7s | 4s 65 | 15s | 0s 66 | 24s | 1s 67 | 32s | 0s 68 | 41s | 0s 69 | 49s | 0s 70 | 58s | 2s 71 | 67s | 0s 72 | 75s | 0s 73 | 通过这次调整,发现确实已经几乎没有延迟了。那么为什么调整```innodb_flush_log_at_trx_commit```参数会对主从延迟有影响呢。 74 | ``` 75 | 0 每秒刷出一次log,避免性能问题。 76 | 1 在事务提交的时候,强制必须刷出所有log才算提交成功。 77 | 2 在0和1之间自动调整。 78 | ``` 79 | 在从库的环境中设置了```innodb_flush_log_at_trx_commit=0```和```sync_binlog=0```,在主从切换的过程中可以在脚本中把这二个参数修改为1,避免在切换主库后,主库宕机导致事物的丢失。 80 | 81 | ## 问题解决 82 | 83 | 按照剧本来说,调整了上面的参数,理论上能够解决了复制延迟,然而 然而 实际情况并没有。只是减少了延迟度,并没有根本解决延迟,这时候就很奇怪了,测试环境已经解决了延迟,为什么在实际环境中没有解决呢。,通过解析binlog 发现binlog的事物比没有ogg的环境中要大。后来,我ORACLE的同事协助排查。居然发现是OGG那层进行事物合并,把原来小的事物进行合并成了大事物,然后因为大事物产生了延迟。后来他调整了OGG的参数`GROUPTRANSOP`,终于搞定。 84 | \#注 GROUPTRANSOPS为以事务传输时,事务合并的单位,减少IO操作; 85 | 86 | ## 导致延迟的因素 87 | 88 | MySQL或者mariadb复制的瓶颈点? 89 | 90 | 默认的 MySQL的从库 io线程和sql线程 都是单线程。而MySQL的主库是并行写入。 91 | MySQL的并行复制是增加多个SQL线程,其原理大概是 首先主库必须标记某几个事物是同时提交,也就是last_commited的值是相同的才能在从库上并行回放。从库会有N个线程来等待事物处理。```slave_parallel_threads``` 值建议为8到12个为最佳 如果大于12个,会增加MySQL维护sql线程的成本,反而会影响性能。 92 | 93 | 是什么导致 MySQL或mariadb复制? 94 | 95 | 大体分类为 主库的表没有主键 唯一 普通索引 或者有大事物在阻塞。从库的延迟因素有 96 | 从库的性能跟不上主库,主从之间网络延迟,抖动阻塞,从库的参数配置不争取等。 97 | 98 | 那些参数可以减少主从延迟? 99 | 1.增大从库innodb_buffer_pool_size 可以缓存更多数据 减少由于转换导致的io压力 100 | 2.增大 innodb_log_size innodb_log_files_in_group的值 减少buffer pool的刷盘io 提示写性能 101 | 3.修改参数 innodb_flush_method 为o_DIRECT 提升写入性能 102 | 4.把从库的binlog关闭 或者关闭log_slave_updates 103 | 5.修改参数innodb_flush_log_at_trx_commit 为0或2 104 | 6.如果binlog没关闭 修改sync_binlog 参数为0或者一个很大的值 减少磁盘io压力 105 | 7.如果binlog格式为row 则需要加上主键 106 | 8.binlog格式为statement模式 存在ddl复制 可用讲tmpdir参数改到内存中 比如 dev shm 107 | 9.修改参数 master_info_repository relay_log_info_repository 为table 减少直接io导致的磁盘压力 108 | 109 | -------------------------------------------------------------------------------- /contribution/20180424-关于MySQL 8.0几个重点,都在这里.md: -------------------------------------------------------------------------------- 1 | 2 | > 提醒:部分连接需要翻墙哦 3 | > 持续更新请关注「3306π」公众号,也欢迎交流关于MySQL8.0的新特性和bug 4 | 5 | # 一 关于MySQL Server的改进 6 | 7 | ## 1.1 redo log 重构 8 | 在MySQL8.0中 重新设计了redo log。主要改进 fsync效率更高,减少锁,优化flush机制,不会频繁flush。支持更高用户并发请求 9 | ``` 10 | http://dimitrik.free.fr/blog/archives/2017/10/mysql-performance-80-redesigned-redo-log-readwrite-workloads-scalability.html 11 | ``` 12 | ## 1.2 MySQL DDL 13 | 在MySQL8.0中实现了DDL的原子性。 14 | ``` 15 | https://mysqlserverteam.com/atomic-ddl-in-mysql-8-0/ 16 | ``` 17 | ## 1.3 直方图 18 | 在MySQL8.0中添加了直方图的概念。用于索引的统计和分析。 19 | ``` 20 | https://mysqlserverteam.com/histogram-statistics-in-mysql/ 21 | ``` 22 | ## 1.4 降序索引 23 | MySQL 8.0 开始提供按降序啦~ 24 | ``` 25 | https://dev.mysql.com/doc/refman/8.0/en/descending-indexes.html 26 | ``` 27 | ## 1.5 隐藏索引 28 | MySQL8.0支持隐藏索引,在对索引的添加和修改,可以通过隐藏索引来实现,方便了索引的管理 29 | ``` 30 | https://dev.mysql.com/doc/refman/8.0/en/invisible-indexes.html 31 | ``` 32 | ## 1.6 临时表的改进 33 | 在5.7以来,所有内部临时表成为"ibtmp1"的共享表空间。此外临时表的元数据也存储在内存中。 34 | 在MySQL8.0中 MEMORY存储引擎也将被TempTable存储引擎替换为内部临时表的默认存储引擎。这个新引擎为ARCHAR和VARBINARY列提供更高效的存储空间。 35 | ``` 36 | https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html 37 | ``` 38 | 39 | 1.7 持久的全局变量 40 | MySQL8.0 通过新语法restart,使下次重启仍然生效。 41 | ``` 42 | http://lefred.be/content/mysql-8-0-changing-configuration-easily-and-cloud-friendly/ 43 | ``` 44 | ## 1.8 redo和undo的加密 45 | 在MysQL 5.7中,可以为每个表的表空间进行加密。而在MySQL8.0中,还可以为UNDO和REDO LOG进行加密。从而提高了MySQL的安全性 46 | ``` 47 | https://dev.mysql.com/doc/refman/8.0/en/innodb-tablespace-encryption.html#innodb-tablespace-encryption-redo-log 48 | ``` 49 | 50 | ## 1.9 Innodb 锁的修改 51 | 在SQL里添加参数`FOR UPDATE NOWAIT`和`FOR UPDATE SKIP LOCKED` 可以设置跳过锁的等待,或者跳过锁定。 52 | ``` 53 | https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html#innodb-locking-reads-nowait-skip-locked 54 | ``` 55 | ## 1.10 窗口函数 56 | 在MySQL8.0中 添加了窗口函数。更好的用于数据分析 57 | ``` 58 | http://elephantdolphin.blogspot.com/2017/09/mysql-8s-windowing-function-part-1.html 59 | ``` 60 | ## 1.11 新的优化器 61 | 在MySQL 8.0.3中引入了新的优化器SET_VAR,用于在SQL中指定参数配置。 62 | ``` 63 | https://mysqlserverteam.com/new-optimizer-hint-for-changing-the-session-system-variable/ 64 | ``` 65 | ## 1.12 角色 66 | 在MySQL8.0中 添加了角色的功能。更方便了用户的管理 67 | ``` 68 | http://datacharmer.blogspot.com/2017/09/revisiting-roles-in-mysql-80.html 69 | ``` 70 | ## 1.13 字符集的修改 71 | 在MySQL8.0.1中 MySQL支持了Unicode 9.0,并且修改了默认字符集为utf8mb4 72 | ``` 73 | http://lefred.be/content/mysql-clients-and-emojis/ 74 | ``` 75 | 76 | # 二 关于MySQL 复制的改进 77 | 78 | ## 2.1 复制方面修改 79 | 在MySQL8.0.3中 关于binlog和复制方面的新的改变。 80 | ``` 81 | http://datacharmer.blogspot.com/2017/09/revisiting-roles-in-mysql-80.html 82 | 83 | ``` 84 | ## 2.2 更高效josn复制 85 | 在MySQL8.0.3中关于JSON复制更高效,并提供了新的json功能。在binlog中只记录了update修改的内容,而不是全部记录。 86 | ``` 87 | https://mysqlhighavailability.com/efficient-json-replication-in-mysql-8-0/ 88 | ``` 89 | ## 2.3 复制增强 90 | 在MySQL8.0.3关于对复制的默认参数的修改和对组复制增加了 动态跟踪和调试日志和更多的性能方面的监控工具 91 | ``` 92 | https://mysqlhighavailability.com/replication-features-in-mysql-8-0-3/ 93 | ``` 94 | ## 2.4 MySQL 复制新功能 95 | 在MySQL8.0中对于复制的改进,增加了可观察性,提供了复制的效率(基于WRITESET的并行复制) 96 | ``` 97 | https://mysqlhighavailability.com/mysql-8-0-new-features-in-replication/ 98 | ``` 99 | 100 | # 三 关于MySQL MGR的改进 101 | 102 | ## 3.1 组复制白名单的支持 103 | 在MySQL8.0.4 中组复制的白名单的支持。有效的提高了组复制更加安全。 104 | ``` 105 | https://mysqlhighavailability.com/hostname-support-in-group-replication-whitelist/ 106 | ``` 107 | ## 3.2 MySQL INNODB Cluster 新功能 108 | 在MySQL INNODB Cluster 新添加的功能,了解一下 109 | ``` 110 | https://mysqlserverteam.com/mysql-innodb-cluster-whats-new-in-the-8-0-ga-release/ 111 | ``` 112 | ## 3.3 MySQL MGR的监控 113 | 提高了MySQL组复制的可观察性和可管理性,在MySQL8.0.4中增强了相关工具 114 | ``` 115 | https://mysqlhighavailability.com/more-p_s-instrumentation-for-group-replication/ 116 | ``` 117 | 118 | # 四 关于MySQL bug修复 119 | 120 | ## 4.1 自增列bug修复(199) 121 | 在MySQL8.0 关于自增列的bug的修复。不在采用max(自增id)+1的做法来确定下一个自增id。 122 | ``` 123 | http://lefred.be/content/bye-bye-bug-199/ 124 | 125 | ``` 126 | -------------------------------------------------------------------------------- /contribution/20180609-浅析MySQL主从复制技术(异步复制,同步复制,半同步复制).md: -------------------------------------------------------------------------------- 1 | # Preface # 2 | 3 | 4 | ---------- 5 | * as we all know,there're three kinds of replication in MySQL nowadays.such as,asynchronous replication,(fully) synchronous replication,semisynchronous replication.what's the difference between them?first of all,let's see the intact architecture picture of MySQL replication: 6 | 7 | ![](https://images2018.cnblogs.com/blog/645357/201806/645357-20180608154121323-871776738.png) 8 | 9 | **what will client do?** 10 | 11 | - genrates transactions 12 | - commits transactions to master 13 | - returns result to client 14 | 15 | **what will master do?** 16 | 17 | - executes transactions 18 | - generates binary logs 19 | - dump thread sends contents(binary logs) to IO Thread of slave 20 | 21 | **what will slave do?** 22 | 23 | - connects to master 24 | - IO Thread asks for data(binary logs) and gets it 25 | - generates relay logs 26 | - SQL Thread applies data(relay logs) 27 | 28 | # Method of different MySQL Replication # 29 | 30 | 31 | ---------- 32 | * generally speaking,the data changed on master will continuously be sent to slave,so the data on slave seems the equal with the master.this mechanism is usually used to backup on slave(reduce pressure of master),construct HA architecture(failover or dispatch w/r opertions),etc.nevertheless,because of all kinds of reasons,slave frequently defers in most scenario what's often grumbled by MySQL dba.below are different kind of replication methods,let's see the detail. 33 | 34 | ## **asynchronous replication** ## 35 | 36 | * since MySQL 3.2.22,this kind of replication was supported with statement format of binlog.then,untill MySQL 5.1.5 row format of binlog was supported.the mechanism of it is that as sonn as the master dump thread has sent the binary logs to the slave,the server returns the result from client.there is nothing to guarantee the binary logs is normally received by the slave(maybe the network failure occurs simultaneously).so it's unsafe in consistency what means your transaction will lost in the replication.this is also the original replication of MySQL.here's the picture about the procedure: 37 | 38 | ![](https://images2018.cnblogs.com/blog/645357/201806/645357-20180608153455322-912870836.png) 39 | 40 | 1. client send dml operations to the master meanwhile the transaction starts. 41 | 1. master execute these dml operations from client in transaction. 42 | 1. generate some binary logs which contain the transaction information. 43 | 1. master will return results to the client immediately after dump thread has sent these binary logs to slave. 44 | 1. slave receive the binary log by IO_Thread and apply the de relay log by SQL_Thread. 45 | 46 | 47 | in step 4,master won't judge whether slave has received the binary logs (which are sent by itself) or not.if the master crashs suddenly after it has sent the binary logs,but slave does not receive them at all on account of network delay,only if the slave take over the application at this time,the committed transactions will miss which means data loss.this is not commonly acceptalbe in most important product system especially the finacial about. 48 | 49 | ## **synchronous replication** ## 50 | 51 | * synchronous replication require master returns result to client only after the transactions be committed by all the slaves(receive and apply).this method will severely lead to bad performance on master unless you can guarantee the slaves can commit immediate without any delay(infact it's tough).nowadays,the only solution of synchronous replication is still the MySQL NDB Cluster.therefore,it's not recommended to use synchronous replication method. 52 | 53 | ## **semi-synchronous replication** ## 54 | 55 | * semi-synchronous replication seems the workaround solution of above two method which can strongly increase the consistency bween master and slave.it's supported since MySQL 5.5 and enhanced in MySQL 5.7.what's the mechanism of semi-sychronouos replication?master is permitted to return the result to client merely after only one slave has received binary logs,write them to the relay logs and returns an ACK signal to master.there're two ways of it,such as after_commit & after_sync,let's see the difference or them: 56 | 57 | ### after_commit(Since MySQL 5.5): ### 58 | 59 | ![](https://images2018.cnblogs.com/blog/645357/201806/645357-20180608153559604-795661315.jpg) 60 | 61 | * in this mechod,master performances a commit before it receives ACK signal from slave.let's suppose a scenario that once master crashs after it commits a transaction but it hasn't receive the ACK signal from slave.meanwhile,failover makes slave become the new master.how does the slave deal with then?will the transaction lost?it depends.there're two situations:* 62 | 63 | * slave has received the binary log,then turns it into relay log and applys it.there's no transaction loss. 64 | slave hasn't received the binary log,the transaction commited by master just now will lost,but the client will failure.* 65 | 66 | * therefore,after_commit cannot guarantee lossless replication.after_commit is the default mode(actually it's the only mode can be use) which is supported by MySQL 5.5 & 5.6.* 67 | 68 | ### after_sync(since MySQL 5.7): ### 69 | 70 | ![](https://images2018.cnblogs.com/blog/645357/201806/645357-20180608153613184-1323313780.jpg) 71 | 72 | * in the picture above,the t1 transaction shouldn't be lost because of the master merely commits to the storage engine after receive the ACK signal from slave.in spite of master may crash before receiving ACK signal,no transaction will lost as the master hasn't commit at all.meanwhile,the t2 transaction also get consistent query here. 73 | 74 | * in order to improve the data consistency(since after_commit has avoidless deficiency),MySQL official enhances the semi-synchronous replication which can be called "loss-less semi-synchronous replication" in MySQL 5.7 by add after_sync mode in parameter "rpl_semi_sync_master_wait_point". 75 | 76 | * caution,semi-synchronous replication may turn into asynchronous replication whenever the delay time of slave surpass the value which is specified in parameter "rpl_semi_sync_master_timeout"(default values is 10000 milliseconds).why it is permitted?i'm afraid to give consideration to performance of master.notwithstanding,you can also play a trick to prevent it from being converted over by set a infinite number in this parameter such as "10000000" or above.especially in case that your product is too important to lost data.* 77 | 78 | * further more,to configure semi-sychronous replication,you should implement the optional plugin component "rpl_semi_sync_master",which can be check by using command "show plugins;"* 79 | 80 | # **Summary:** # 81 | 82 | 83 | ---------- 84 | - commonly,semi-sync replication is strongly recommended when implements MySQL replication nowadays. 85 | - i utterly recommend to upgrade product system to MySQL 5.7 in order to use after_sync mode which can avoid data loss. 86 | - be careful of specify an inappropriate value in parameter "rpl_semi_sync_master_timeout" which will lead to conver semi-sync to async replication. 87 | 88 | -------------------------------------------------------------------------------- /mysql/0-en-what-s-new-mysql-replication-mysql-80.md: -------------------------------------------------------------------------------- 1 | ## [What’s New With MySQL Replication in MySQL 8.0](https://severalnines.com/blog/what-s-new-mysql-replication-mysql-80) 2 | 3 | MySQL 8.0, which as of now (August 2017) is still in beta state, brings some nice improvements to replication. Originally, it was developed for Group Replication (GR), but as GR uses regular replication under the hood, “normal” MySQL replication benefited from it. The improvement we mentioned is dependency tracking information stored in the binary log. What happens is that MySQL 8.0 now has a way to store information about which rows were affected by a given transaction (so called writeset), and it compares writesets from different transactions. This makes it possible to identify those transactions which did not work on the same subset of rows and, therefore, these may be applied in parallel. This may allow to increase the parallelization level by several times compared to the implementation from MySQL 5.7. What you need to keep in mind is that, eventually, a slave will see a different view of the data, one that never appeared on the master. This is because transactions may be applied in a different order than on the master. This should not be a problem though. The current implementation of multithreaded replication in MySQL 5.7 may also cause this issue unless you explicitly enable slave-preserve-commit-order. 4 | 5 | To control this new behavior, a variable binlog_transaction_dependency_tracking has been introduced. It can take three values: 6 | 7 | COMMIT_ORDER: this is the default one, it uses the default mechanism available in MySQL 5.7. 8 | WRITESET: It enables better parallelization and the master starts to store writeset data in binary log. 9 | WRITESET_SESSION: This ensures that transactions will be executed on the slave in order and the issue with a slave that sees a state of database which never was seen on the master is eliminated. It reduces parallelization but it still can provide better throughput than the default settings. 10 | Benchmark 11 | 12 | In July, on mysqlhighavailability.com, Vitor Oliveira wrote a post where he tried to measure the performance of new modes. He used the best case scenario - no durability whatsoever, to showcase the difference between old and new modes. We decided to use the same approach, this time in a more real-world setup: binary log enabled with log_slave_updates. Durability settings were left to default (so, sync_binlog=1 - that’s new default in MySQL 8.0, doublewrite buffer enabled, InnoDB checksums enabled etc.) Only exception in durability was innodb_flush_log_at_trx_commit set to 2. 13 | 14 | We used m4.2xl instances, 32G, 8 cores (so slave_parallel_workers was set to 8). We also used sysbench, oltp_read_write.lua script. 16 million rows in 32 tables were stored on 1000GB gp2 volume (that’s 3000 IOPS). We tested the performance of all of the modes for 1, 2, 4, 8, 16 and 32 concurrent sysbench connections. Process was as follows: stop slave, execute 100k transactions, start slave and calculate how long it takes to clear the slave lag. 15 | 16 | ![](https://severalnines.com/sites/default/files/blog/node_5091/image1.jpg) 17 | 18 | First of all, we don’t really know what happened when sysbench was executed using 1 thread only. Each test was executed five times after a warmup run. This particular configuration was tested two times - results are stable: single-threaded workload was the fastest. We will be looking into it further to understand what happened. 19 | 20 | Other than that, the rest of the results are in line with what we expected. COMMIT_ORDER is the slowest one, especially for low traffic, 2-8 threads. WRITESET_SESSION performs typically better than COMMIT_ORDER but it’s slower than WRITESET for low-concurrent traffic. 21 | 22 | How it can help me? 23 | 24 | The first advantage is obvious: if your workload is on the slow side yet your slaves have tendency to fall back in replication, they can benefit from improved replication performance as soon as the master will be upgraded to 8.0. Two notes here: first - this feature is backward compatible and 5.7 slaves can also benefit from it. Second - a reminder that 8.0 is still in beta state, we don’t encourage you to use beta software on production, although in dire need, this is an option to test. This feature can help you not only when your slaves are lagging. They may be fully caught up but when you create a new slave or reprovision existing one, that slave will be lagging. Having the ability to use “WRITESET” mode will make the process of provisioning a new host much faster. 25 | 26 | All in all, this feature will have much bigger impact that you may think. Given all of the benchmarks showing regressions in performance when MySQL handles traffic of low concurrency, anything which can help to speed up the replication in such environments is a huge improvement. 27 | 28 | If you use intermediate masters, this is also a feature to look for. Any intermediate master adds some serialization into how transactions are handled and executed - in real world, the workload on an intermediate master will almost always be less parallel than on the master. Utilizing writesets to allow better parallelization not only improves parallelization on the intermediate master but it also can improve parallelization on all of its slaves. It is even possible (although it would require serious testing to verify all pieces will fit correctly) to use an 8.0 intermediate master to improve replication performance of your slaves (please keep in mind that MySQL 5.7 slave can understand writeset data and use it even though it cannot generate it on its own). Of course, replicating from 8.0 to 5.7 sounds quite tricky (and it’s not only because 8.0 is still beta). Under some circumstances, this may work and can speed up CPU utilization on your 5.7 slaves. 29 | 30 | Other changes in MySQL replication 31 | 32 | Introducing writesets, while it is the most interesting, it is not the only change that happened to MySQL replication in MySQL 8.0. Let’s go through some other, also important changes. If you happen to use a master older than MySQL 5.0, 8.0 won’t support its binary log format. We don’t expect to see many such setups, but if you use some very old MySQL with replication, it’s definitely a time to upgrade. 33 | 34 | Default values have changed to make sure that replication is as crash-safe as possible: master_info_repository and relay_log_info_repository are set to TABLE. Expire_log_days has also been changed - now the default value is 30. In addition to expire_log_days, a new variable has been added, binlog_expire_log_seconds, which allows for more fine-grained binlog rotation policy. Some additional timestamps have been added to the binary log to improve observability of replication lag, introducing microsecond granularity. 35 | 36 | By all means, this is not a full list of changes and features related to MySQL replication. If you’d like to learn more, you can check the MySQL changelogs. Make sure you reviewed all of them - so far, features have been added in all 8.0 versions. 37 | 38 | As you can see, MySQL replication is still changing and becoming better. As we said at the beginning, it has to be a slow-paced process but it’s really great to see what is ahead. It’s also nice to see the work for Group Replication trickling down and reused in the “regular” MySQL replication. 39 | -------------------------------------------------------------------------------- /mysql/0-zh-what-s-new-mysql-replication-mysql-80.md: -------------------------------------------------------------------------------- 1 | # [MySQL 8.0复制新特性](https://github.com/zhishutech/tech-blog-en2zh/blob/master/mysql/what-s-new-mysql-replication-mysql-80.md) 2 | 3 | 4 | # 导读 5 | > MySQL 8.0 复制功能有很大改进提升,并行复制性能与5.7相比可能提高数倍,是不是很期待? 6 | > 7 | > 翻译团队:知数堂藏经阁项目 - 琅琊阁 8 | > 团队成员:琅琊阁-小剑伯、 琅琊阁-江b 、琅琊阁-简小鹿 9 | > 原文出处:https://severalnines.com/blog/what-s-new-mysql-replication-mysql-80 10 | > 原文作者:Krzysztof Ksiazek 11 | > 备注:发稿时,小编发现作者修改了原文,略尴尬哈 12 | 13 | 截至目前(2017年8月),MySQL 8.0 仍然是 beta 版本,复制功能有一些很棒的改进。 14 | 15 | 最初,这些改进是为组复制(GR)开发的,但由于 GR 在底层使用常规复制,所以传统的 MySQL 复制也能由此获益。 16 | 17 | 我们这里提到的改进是存储在binlog中的依赖关系跟踪信息。MySQL 8.0 使用某种方法来存储那些受事务影响的行的相关信息(这些行被称为`writeset`),且它会比较来自不同事务中的`writesets`。这样就能识别出那些修改的数据行没有交集的事务,那么这些事务就可以在从库上被并行回放。与 MySQL 5.7 的实现相比,这也许能增加数倍的并行化程度。 18 | 19 | 要注意,从库上可能会出现在主库上出现过的数据视图(比如查询数据时默认的显示顺序和在主库上查询结果不同)。这是因为事务可能被按照与主库不同的顺序去回放。当然,这其实没有什么问题。目前在 MySQL 5.7 中实现的多线程复制也可能会导致这个问题,除非您明确地启用 `slave-preserve-commit-order` 参数。 20 | >MySQL 8.0, which as of now (August 2017) is still in beta state, brings some nice improvements to replication. 21 | > 22 | > Originally, it was developed for Group Replication (GR), but as GR uses regular replication under the hood, “normal” MySQL replication benefited from it. 23 | > 24 | > The improvement we mentioned is dependency tracking information stored in the binary log. 25 | What happens is that MySQL 8.0 now has a way to store information about which rows were affected by a given transaction (so called writeset), and it compares writesets from different transactions. This makes it possible to identify those transactions which did not work on the same subset of rows and, therefore, these may be applied in parallel. 26 | This may allow to increase the parallelization level by several times compared to the implementation from MySQL 5.7. What you need to keep in mind is that, eventually, a slave will see a different view of the data, one that never appeared on the master. 27 | > 28 | > This is because transactions may be applied in a different order than on the master. This should not be a problem though. The current implementation of multithreaded replication in MySQL 5.7 may also cause this issue unless you explicitly enable slave-preserve-commit-order. 29 | 30 | 31 | 为了控制这个新的行为(从库上数据回放顺序),新增选项:`binlog_transaction_dependency_tracking`。 它可以取三个值: 32 | 33 | * COMMIT_ORDER:默认值,它使用 MySQL 5.7 中可用的默认机制。 34 | * WRITESET:它能实现更好的并行化,并且在主库的二进制日志中存储writeset数据。 35 | * WRITE_SESSION:它确保事务在从库中按顺序执行,并且消除了从库中看到主库从未出现过的数据库状态的问题。它降低了并行化程度,但是仍然提供了比默认设置更好吞吐量。 36 | 37 | >To control this new behavior, a variable `binlog_transaction_dependency_tracking` has been introduced. It can take three values: 38 | > 39 | >COMMIT\_ORDER: this is the default one, it uses the default mechanism available in MySQL 5.7. 40 | > 41 | >WRITESET: It enables better parallelization and the master starts to store writeset data in binary log. 42 | > 43 | >WRITESET\_SESSION: This ensures that transactions will be executed on the slave in order and the issue with a slave that sees a state of database which never was seen on the master is eliminated. It reduces parallelization but it still can provide better throughput than the default settings. 44 | 45 | 46 | ### 基准测试 47 | 48 | 7月份,在 mysqlhighavailability.com 网站上,Vitor Oliveira 写了一篇文章分析了他对新模式进行性能测试的情况。他使用 MySQL 中性能最好的情况 —— 无持久性设置,展示了新旧模式之间的区别。我们决定使用相同的方法进行测试,但同时设置一个更加真实的配置,启用 `log_slave_updates` 参数使得slave上也记录binlog。关于持久性设置,除了将 `innodb_flush_log_at_trx_commit` 设置为 2 ,其他均保留默认值(所以,`sync_binlog`=1 —— 这是 MySQL 8.0 中的新默认值,启用 `doublewrite buffer`,启用 `InnoDB checksums` ,等等)。 49 | >In July, on mysqlhighavailability.com, Vitor Oliveira wrote a post where he tried to measure the performance of new modes.He used the best case scenario - no durability whatsoever, to showcase the difference between old and new modes.We decided to use the same approach, this time in a more real-world setup: binary log enabled with `log_slave_updates`. Durability settings were left to default (so, `sync_binlog`=1 - that’s new default in MySQL 8.0, doublewrite buffer enabled, InnoDB checksums enabled etc.) Only exception in durability was `innodb_flush_log_at_trx_commit` set to 2. 50 | 51 | 52 | 我们使用 m4.2xl 实例,32G,8核(所以参数 `slave_parallel_workers` 也设置为 8)。我们同样使用 `sysbench`,`oltp_read_write.lua` 脚本。32 张表中的 1600 万行数据,存储在 1000GB gp2 卷(IOPS指标为3000)上。我们测试了 1、2、4、8、16、32 个并发连接下的所有模式,过程如下:关闭从库,执行 10 万个事务,启动从库并计算从库消除延迟所需的时间。 53 | >We used m4.2xl instances, 32G, 8 cores (so `slave_parallel_workers` was set to 8). We also used sysbench, `oltp_read_write.lua` script. 16 million rows in 32 tables were stored on 1000GB gp2 volume (that’s 3000 IOPS). We tested the performance of all of the modes for 1, 2, 4, 8, 16 and 32 concurrent sysbench connections. Process was as follows: stop slave, execute 100k transactions, start slave and calculate how long it takes to clear the slave lag. 54 | 55 | ![](https://severalnines.com/sites/default/files/blog/node_5091/image1.jpg) 56 | 57 | 58 | 首先,我们其实并不清楚使用单线程的 `sysbench` 压测时数据库到底发生了什么。每一次测试我们都在给数据库预热后,再执行了5次。这个特殊参数配置我们测试了2次,结果值是稳定一致的。使用单线程的 `sysbench` 压测是最快的。我们将会进一步研究,以了解发生了什么。 59 | >First of all, we don’t really know what happened when sysbench was executed using 1 thread only. Each test was executed five times after a warmup run. This particular configuration was tested two times - results are stable: single-threaded workload was the fastest. We will be looking into it further to understand what happened. 60 | 61 | 62 | 除此之外,其余的结果都符合我们的预期。`COMMIT_ORDER` 是最慢的,特别是低并发时(2-8线程)。`WRITESET_SESSION` 通常比 `COMMIT_ORDER` 更好,但是对于低并发流量,它比 `WRITESET` 慢。 63 | >Other than that, the rest of the results are in line with what we expected. `COMMIT_ORDER` is the slowest one, especially for low traffic, 2-8 threads. `WRITESET_SESSION` performs typically better than `COMMIT_ORDER` but it’s slower than `WRITESET` for low-concurrent traffic. 64 | 65 | 66 | ### 报告信息总结 67 | 68 | 第一个优点是显而易见的:如果主库工作负载较低,且从库复制速度倾向于变慢,那么只要你将主库升级到 8.0 就能从其改进的复制性能中获益。 69 | >The first advantage is obvious: if your workload is on the slow side yet your slaves have tendency to fall back in replication, they can benefit from improved replication performance as soon as the master will be upgraded to 8.0. 70 | 71 | 72 | 这里有两个注意事项: 73 | 74 | 1. 这个特性是向后兼容的,所以 5.7 的从库也能从中获益; 75 | 2. 请注意 MySQL 8.0 依然是 beta 版本,我们不鼓励您在生产环境中使用测试版,尽管你非常需要这些新功能。 76 | 77 | >Two notes here: first - this feature is backward compatible and 5.7 slaves can also benefit from it. Second - a reminder that 8.0 is still in beta state, we don’t encourage you to use beta software on production, although in dire need, this is an option to test. 78 | 79 | 80 | 这个特性不仅在你从库延迟时有作用,在你创建一个全新的从库或者重新配置一个已有的从库时,它也完全能跟得上。有了使用 "WRITESET" 模式的能力,配置一个新主机的过程将会变得更快。 81 | >This feature can help you not only when your slaves are lagging. They may be fully caught up but when you create a new slave or reprovision existing one, that slave will be lagging. Having the ability to use “WRITESET” mode will make the process of provisioning a new host much faster. 82 | 83 | 84 | 总而言之,这个特性带来的影响可能会产生超乎你想象。鉴于所有基准测试显示当 MySQL 处理低并发时性能较差,任何有助于加速在这种环境中复制的改进都将是巨大的进步。 85 | >All in all, this feature will have much bigger impact that you may think. Given all of the benchmarks showing regressions in performance when MySQL handles traffic of low concurrency, anything which can help to speed up the replication in such environments is a huge improvement. 86 | 87 | 88 | 如果你使用中继主库,这同样是你需要的特性。任何复制架构的中继主库都会在处理和执行事务时增加一些串行化信息。 —— 在真实的生产环境中,中继主库的工作负载几乎都是比主库的并行化程度低。利用 `writesets` 来实现更好的并行化,不仅能提高中继主库的并行化程度,而且可以提高所有从库的并行化程度。甚至可以使用 8.0 中继主库(虽然要经过严格的测试以验证所有的功能正常使用)来提高从库(请注意 MySQL 5.7 从库虽然不能自己生成 `writeset` 数据,但是它能识别和使用 `writeset` 数据)的复制性能。当然,从 8.0 复制到 5.7 听起来有点诡异(不仅仅是因为 8.0 还是 beta 版本)。在某些情况下,这可能会起作用,并可以加快 5.7 从库上的 CPU 利用率。 89 | 90 | >If you use intermediate masters, this is also a feature to look for.Any intermediate master adds some serialization into how transactions are handled and executed . - in real world, the workload on an intermediate master will almost always be less parallel than on the master.Utilizing writesets to allow better parallelization not only improves parallelization on the intermediate master but it also can improve parallelization on all of its slaves.It is even possible (although it would require serious testing to verify all pieces will fit correctly) to use an 8.0 intermediate master to improve replication performance of your slaves (please keep in mind that MySQL 5.7 slave can understand writeset data and use it even though it cannot generate it on its own).Of course, replicating from 8.0 to 5.7 sounds quite tricky (and it’s not only because 8.0 is still beta). Under some circumstances, this may work and can speed up CPU utilization on your 5.7 slaves. 91 | 92 | 93 | ### MySQL 复制的其他变化 94 | 95 | 除了最有趣的 `writesets` 新特性,MySQL 8.0 中关于 MySQL 复制的其他变化也是值得关注的。我们来看看其他的一些重要变化。 96 | 97 | 如果你碰巧使用了一个比 5.0 版本还老的MySQL,请注意 MySQL 8.0 将不再支持它的二进制日志格式。我们非常不建议采用这种 MySQL 复制方式,但是如果你真的在复制架构中使用一些非常老的 MySQL 版本,那真的是时候去升级了。 98 | >Introducing writesets, while it is the most interesting, it is not the only change that happened to MySQL replication in MySQL 8.0. Let’s go through some other, also important changes. If you happen to use a master older than MySQL 5.0, 8.0 won’t support its binary log format. We don’t expect to see many such setups, but if you use some very old MySQL with replication, it’s definitely a time to upgrade. 99 | 100 | 101 | 为了尽可能的保证复制架构中的MySQL数据库崩溃恢复时的数据库的安全性,MySQL 8.0 中一些默认值已更改: 102 | * `master_info_repository` 和 `relay_log_info_repository` 默认设置为 TABLE; 103 | * `expire_log_days` 的默认值也变成了 30; 104 | * 除了 `expire_log_days` 之外,还添加了一个新的参数 `binlog_expire_log_seconds`,它允许更细粒度的 binlog 轮换策略; 105 | * 二进制日志中添加了一些额外的时间戳,使复制延迟时可以更好地被观察,同时还引入了微秒级别的粒度。 106 | 107 | >Default values have changed to make sure that replication is as crash-safe as possible: `master_info_repository` and `relay_log_info_repository` are set to TABLE. `Expire_log_days` has also been changed - now the default value is 30. 108 | In addition to `expire_log_days`, a new variable has been added, `binlog_expire_log_seconds`, which allows for more fine-grained binlog rotation policy. Some additional timestamps have been added to the binary log to improve observability of replication lag, introducing microsecond granularity. 109 | 110 | 111 | 然而,这并不是所有 MySQL 8.0 复制相关的心功能的完整列表。如果你想了解更多信息,可以查看 [MySQL changelogs](https://dev.mysql.com/doc/relnotes/mysql/8.0/en/)。以确保您已经查看到所有相关信息 —— 到目前为止,所有 8.0 版本都添加了这些特性。 112 | >By all means, this is not a full list of changes and features related to MySQL replication. If you’d like to learn more, you can check the MySQL changelogs. Make sure you reviewed all of them - so far, features have been added in all 8.0 versions. 113 | 114 | 115 | 正如你所看到的,MySQL 复制仍然在变化而且越来越好。正如我们刚才所说的那样,MySQL 复制变化是一个缓慢的进程,但是 MySQL 复制的前景是非常好的。我们很高兴看到组复制的工作成果能在常规的 MySQL 复制中使用并且用得很好。 116 | >As you can see, MySQL replication is still changing and becoming better. As we said at the beginning, it has to be a slow-paced process but it’s really great to see what is ahead. It’s also nice to see the work for Group Replication trickling down and reused in the “regular” MySQL replication. 117 | -------------------------------------------------------------------------------- /mysql/10-MySQL查询性能优化_不只是索引.md: -------------------------------------------------------------------------------- 1 | # MySQL 查询性能优化:不只是索引 2 | 3 | 原文:[MySQL Query Performance: Not Just Indexes](https://www.percona.com/blog/2018/01/30/is-indexing-always-the-key-to-mysql-query-performance) 4 | 5 | 译者:魏新平 6 | 7 | 8 | 在本文当中,我将研究优化索引是否总是提高MySQL查询性能的关键。(剧透一下,不是) 9 | 10 | 当我们优化MySQL查询语句的时候,我们首先关心的通常是该查询是否使用了正确的索引来获取数据。如果获取数据永远是语句执行当中最耗时的操作,那么这种思路没有问题。但是,情况并非总是如此。 11 | 12 | 让我们看一下下面这个例子: 13 | ```sql 14 | mysql> show create table tbl G 15 | *************************** 1. row *************************** 16 |       Table: tbl 17 | Create Table: CREATE TABLE `tbl` ( 18 |  `id` int(11) NOT NULL AUTO_INCREMENT, 19 |  `k` int(11) NOT NULL DEFAULT '0', 20 |  `g` int(10) unsigned NOT NULL, 21 |  PRIMARY KEY (`id`), 22 |  KEY `k_1` (`k`) 23 | ) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1 24 | 1 row in set (0.00 sec) 25 | mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 G 26 | *************************** 1. row *************************** 27 |           id: 1 28 |  select_type: SIMPLE 29 |        table: tbl 30 |   partitions: NULL 31 |         type: ALL 32 | possible_keys: k_1 33 |          key: NULL 34 |      key_len: NULL 35 |          ref: NULL 36 |         rows: 998490 37 |     filtered: 50.00 38 |        Extra: Using where; Using temporary; Using filesort 39 | 1 row in set, 1 warning (0.00 sec) 40 | mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7; 41 | +--------+----+ 42 | | g      | c  | 43 | +--------+----+ 44 | |  28846 |  8 | 45 | | 139660 |  8 | 46 | | 153286 |  8 | 47 | ... 48 | | 934984 |  8 | 49 | +--------+----+ 50 | 22 rows in set (6.80 sec) 51 | ``` 52 | 当看到这样的语句,很多人会认为主要的原因是全表扫描。那么有人就会奇怪了,为什么MySQL的优化器不使用索引呢(那是因为该语句查询的数据不具备选择性,简单点说就是获取的数据相对于整张表来说太多了)这样的想法就会导致有些人强制使用索引,但是效果只会更差。 53 | ```sql 54 | mysql> select g,count(*) c from tbl force index(k) where k<1000000 group by g having c>7; 55 | +--------+----+ 56 | | g      | c  | 57 | +--------+----+ 58 | |  28846 |  8 | 59 | | 139660 |  8 | 60 | ... 61 | | 934984 |  8 | 62 | +--------+----+ 63 | 22 rows in set (9.37 sec) 64 | ``` 65 | 或许有些人会把基于k列的单列索引扩展成基于k和g列的联合索引。事实上,这样并不会有任何效果。 66 | ```sql 67 | mysql> alter table tbl drop key k_1, add key(k,g); 68 | Query OK, 0 rows affected (5.35 sec) 69 | Records: 0 Duplicates: 0 Warnings: 0 70 | 71 | mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 G 72 | *************************** 1. row *************************** 73 | id: 1 74 | select_type: SIMPLE 75 | table: tbl 76 | partitions: NULL 77 | type: range 78 | possible_keys: k 79 | key: k 80 | key_len: 4 81 | ref: NULL 82 | rows: 499245 83 | filtered: 100.00 84 | Extra: Using where; Using index; Using temporary; Using filesort 85 | 1 row in set, 1 warning (0.00 sec) 86 | 87 | mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7; 88 | +--------+----+ 89 | | g | c | 90 | +--------+----+ 91 | | 28846 | 8 | 92 | | 139660 | 8 | 93 | ... 94 | | 915436 | 8 | 95 | | 934984 | 8 | 96 | +--------+----+ 97 | 22 rows in set (6.80 sec) 98 | ``` 99 | 上述两种错误的优化思路完全是因为我们一直在错误的方向浪费精力:即迅速获取满足k<1000000的行。然而真正的问题并不是快速获取这些数据。如果我们把group by子句去掉,我们会的发现执行速度快了惊人的10倍。 100 | ```sql 101 | mysql> select sum(g) from tbl where k<1000000; 102 | +--------------+ 103 | | sum(g)       | 104 | +--------------+ 105 | | 500383719481 | 106 | +--------------+ 107 | 1 row in set (0.68 sec) 108 | ``` 109 | 针对这个特殊的语句,是否使用索引获取数据并不是主要的问题,我们应该关注于如何优化GROUP BY,这才是问题的所在。 110 | 在我的下篇博客当中,我将介绍4种MySQL执行GROUP BY的方法,来帮助更好的优化类似语句。 111 | 112 | -------------------------------------------------------------------------------- /mysql/12-using-generated-columns-in-mysql-5-7-to-increase-query-performance.md: -------------------------------------------------------------------------------- 1 | # 使用MySQL 5.7的生成列来提高查询性能 2 | 3 | 原文: Using MySQL 5.7 Generated Columns to Increase Query Performance 4 | 5 | 作者: Alexander Rubin 6 | 7 | 翻译: 星耀队@知数堂 8 | 9 | 在这篇博客中,我们将看看如何使用MySQL 5.7的虚拟列来提高查询性能。 10 | >In this blog post, we’ll look at ways you can use MySQL 5.7 generated columns (or virtual columns) to improve query performance. 11 | 12 | ### 说明 13 | 大约两年前,我发表了一个在MySQL5.7版本上关于虚拟列的文章。从那时开始,它成为MySQL5.7发行版当中,我最喜欢的一个功能点。原因很简单:在虚拟列的帮助下,我们可以创建间接索引(fine-grained indexes),可以显著提高查询性能。我要告诉你一些技巧,可以潜在地解决那些使用了GROUP BY 和 ORDER BY而慢的报表查询。 14 | >About two years ago I published a blog post about Generated (Virtual) Columns in MySQL 5.7. Since then, it’s been one of my favorite features in the MySQL 5.7 release. The reason is simple: with the help of virtual columns, we can create fine-grained indexes that can significantly increase query performance. I’m going to show you some tricks that can potentially fix slow reporting queries with GROUP BY and ORDER BY. 15 | 16 | ### 问题 17 | 最近我正在协助一位客户,他正挣扎于这个查询上: 18 | >Recently I was working with a customer who was struggling with this query: 19 | ```sql 20 | SELECT 21 | CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', 22 | COUNT(*) as 'No. of API Calls', 23 | AVG(ExecutionTime) as 'Avg. Execution Time', 24 | COUNT(distinct AccountId) as 'No. Of Accounts', 25 | COUNT(distinct ParentAccountId) as 'No. Of Parents' 26 | FROM ApiLog 27 | WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' 28 | GROUP BY CONCAT(verb, ' - ', replace(url,'.xml','')) 29 | HAVING COUNT(*) >= 1 ; 30 | ``` 31 | 32 | 这个查询运行了一个多小时,并且使用和撑满了整个 tmp目录(需要用到临时文件完成排序)。 33 | >The query was running for more than an hour and used all space in the tmp directory (with sort files). 34 | 35 | 表结构如下: 36 | >The table looked like this: 37 | ```sql 38 | CREATE TABLE `ApiLog` ( 39 | `Id` int(11) NOT NULL AUTO_INCREMENT, 40 | `ts` timestamp DEFAULT CURRENT_TIMESTAMP, 41 | `ServerName` varchar(50) NOT NULL default '', 42 | `ServerIP` varchar(50) NOT NULL default '', 43 | `ClientIP` varchar(50) NOT NULL default '', 44 | `ExecutionTime` int(11) NOT NULL default 0, 45 | `URL` varchar(3000) NOT NULL COLLATE utf8mb4_unicode_ci NOT NULL, 46 | `Verb` varchar(16) NOT NULL, 47 | `AccountId` int(11) NOT NULL, 48 | `ParentAccountId` int(11) NOT NULL, 49 | `QueryString` varchar(3000) NOT NULL, 50 | `Request` text NOT NULL, 51 | `RequestHeaders` varchar(2000) NOT NULL, 52 | `Response` text NOT NULL, 53 | `ResponseHeaders` varchar(2000) NOT NULL, 54 | `ResponseCode` varchar(4000) NOT NULL, 55 | ... // other fields removed for simplicity 56 | PRIMARY KEY (`Id`), 57 | KEY `index_timestamp` (`ts`), 58 | ... // other indexes removed for simplicity 59 | ) ENGINE=InnoDB; 60 | ``` 61 | 62 | 我们发现查询没有使用时间戳字段(“TS”)的索引: 63 | >We found out the query was not using an index on the timestamp field (“ts”): 64 | 65 | ```sql 66 | mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', avg(ExecutionTime) as 'Avg. Execution Time', count(distinct AccountId) as 'No. Of Accounts', count(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY CONCAT(verb, ' - ', replace(url,'.xml','')) HAVING COUNT(*) >= 1G 67 | *************************** 1. row *************************** 68 | id: 1 69 | select_type: SIMPLE 70 | table: ApiLog 71 | partitions: NULL 72 | type: ALL 73 | possible_keys: ts 74 | key: NULL 75 | key_len: NULL 76 | ref: NULL 77 | rows: 22255292 78 | filtered: 50.00 79 | Extra: Using where; Using filesort 80 | 1 row in set, 1 warning (0.00 sec) 81 | ``` 82 | 83 | 原因很简单:符合过滤条件的行数太大了,以至于影响一次索引扫描扫描的效率(或者至少优化器是这样认为的): 84 | >The reason for that is simple: the number of rows matching the filter condition was too large for an index scan to be efficient (or at least the optimizer thinks that): 85 | 86 | ```sql 87 | mysql> select count(*) from ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' ; 88 | +----------+ 89 | | count(*) | 90 | +----------+ 91 | | 7948800 | 92 | +----------+ 93 | 1 row in set (2.68 sec) 94 | ``` 95 | 96 | 总行数:21998514。查询需要扫描的总行数的36%(7948800/21998514)(按:当预估扫描行数超过20% ~ 30%时,即便有索引,优化器通常也会强制转成全表扫描)。 97 | >Total number of rows: 21998514. The query needs to scan 36% of the total rows (7948800 / 21998514). 98 | 99 | 在这种情况下,我们有许多处理方法: 100 | 101 | 1. 创建时间戳列和group by列的联合索引; 102 | 2. 创建一个覆盖索引(包含所有查询字段); 103 | 3. 仅对group列创建索引; 104 | 4. 创建索引松散索引扫描。 105 | >In this case, we have a number of approaches: 106 | >1. Create a combined index on timestamp column + group by fields 107 | >2. Create a covered index (including fields that are selected) 108 | >3. Create an index on just GROUP BY fields 109 | >4. Create an index for loose index scan 110 | 111 | 然而,如果我们仔细观察查询中“group by”部分,我们很快就意识到,这些方案都不能解决问题。以下是我们的GROUP BY部分: 112 | >However, if we look closer at the “GROUP BY” part of the query, we quickly realize that none of those solutions will work. Here is our GROUP BY part: 113 | ```sql 114 | GROUP BY CONCAT(verb, ' - ', replace(url,'.xml','')) 115 | ``` 116 | 117 | 这里有两个问题: 118 | 119 | 1. 它是计算列,所以MySQL不能扫描verb + url的索引。它首先需要连接两个字段,然后组成连接字符串。这就意味着用不到索引; 120 | 2. URL被定义为“varchar(3000) COLLATE utf8mb4_unicode_ci NOT NULL”,不能被完全索引(即使在全innodb_large_prefix= 1 参数设置下,这是UTF8启用下的默认参数)。我们能做部分索引,这对group by的sql优化并没有什么帮助。 121 | >There are two problems here: 122 | >1. It is using a calculating field, so MySQL can’t just scan the index on verb + url. It needs to first concat two fields, and then group on the concatenated string. That means that the index won’t be used. 123 | >2. The URL is declared as “varchar(3000) COLLATE utf8mb4_unicode_ci NOT NULL” and can’t be indexed in full (even with innodb_large_prefix=1 option, which is the default as we have utf8 enabled). We can only do a partial index, which won’t be helpful for GROUP BY optimization. 124 | 125 | 在这里,我尝试去对URL列添加一个完整的索引,在innodb_large_prefix=1参数下: 126 | >Here, I’m trying to add a full index on the URL with innodb_large_prefix=1: 127 | ```sql 128 | mysql> alter table ApiLog add key verb_url(verb, url); 129 | ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes 130 | ``` 131 | 132 | 嗯,通过修改“GROUP BY CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))”为 “GROUP BY verb, url” 会帮助(假设我们把字段定义从 varchar(3000)调小一些,不管业务上允许或不允许)。然而,这将改变结果,因URL字段不会删除 .xml扩展名了。 133 | >Well, changing the “GROUP BY CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))” to “GROUP BY verb, url” could help (assuming that we somehow trim the field definition from varchar(3000) to something smaller, which may or may not be possible). However, it will change the results as it will not remove the .xml extension from the URL field. 134 | 135 | ### 解决方案 136 | 好消息是,在MySQL 5.7中我们有虚拟列。所以我们可以在“CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))”之上创建一个虚拟列。最好的部分:我们不需要执行一组完整的字符串(可能大于3000字节)。我们可以使用MD5哈希(或更长的哈希,例如SHA1 / SHA2)作为group by的对象。 137 | >The good news is that in MySQL 5.7 we have virtual columns. So we can create a virtual column on top of “CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))”. The best part: we do not have to perform a GROUP BY with the full string (potentially > 3000 bytes). We can use an MD5 hash (or longer hashes, i.e., sha1/sha2) for the purposes of the GROUP BY. 138 | 139 | 下面是解决方案: 140 | >Here is the solution: 141 | ```sql 142 | alter table ApiLog add verb_url_hash varbinary(16) GENERATED ALWAYS AS (unhex(md5(CONCAT(verb, ' - ', replace(url,'.xml',''))))) VIRTUAL; 143 | alter table ApiLog add key (verb_url_hash); 144 | ``` 145 | 146 | 所以我们在这里做的是: 147 | 148 | 1. 声明虚拟列,类型为varbinary(16); 149 | 2. 在CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”)上创建虚拟列,并且使用MD5哈希转化后再使用unhex转化32位十六进制为16位二进制; 150 | 3. 对上面的虚拟列创建索引。 151 | >So what we did here is: 152 | >1. Declared the virtual column with type varbinary(16) 153 | >2. Created a virtual column on CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”), and used an MD5 hash on top plus an unhex to convert 32 hex bytes to 16 binary bytes 154 | >3. Created and index on top of the virtual column 155 | 156 | 现在我们可以修改查询语句,GROUP BY verb_url_hash列: 157 | >Now we can change the query and GROUP BY verb_url_hash column: 158 | ```sql 159 | mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) 160 | AS 'API Call', COUNT(*) as 'No. of API Calls', 161 | avg(ExecutionTime) as 'Avg. Execution Time', 162 | count(distinct AccountId) as 'No. Of Accounts', 163 | count(distinct ParentAccountId) as 'No. Of Parents' 164 | FROM ApiLog 165 | WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' 166 | GROUP BY verb_url_hash 167 | HAVING COUNT(*) >= 1; 168 | ERROR 1055 (42000): Expression #1 of SELECT list is not in 169 | GROUP BY clause and contains nonaggregated column 'ApiLog.ApiLog.Verb' 170 | which is not functionally dependent on columns in GROUP BY clause; 171 | this is incompatible with sql_mode=only_full_group_by 172 | ``` 173 | 174 | MySQL 5.7的严格模式是默认启用的,我们可以只针对这次查询修改一下。 175 | 176 | 现在解释计划看上去好多了: 177 | >MySQL 5.7 has a strict mode enabled by default, which we can change for that query only. 178 | 179 | >Now the explain plan looks much better: 180 | 181 | ```sql 182 | mysql> select @@sql_mode; 183 | +-------------------------------------------------------------------------------------------------------------------------------------------+ 184 | | @@sql_mode | 185 | +-------------------------------------------------------------------------------------------------------------------------------------------+ 186 | | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | 187 | +-------------------------------------------------------------------------------------------------------------------------------------------+ 188 | 1 row in set (0.00 sec) 189 | mysql> set sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'; 190 | Query OK, 0 rows affected (0.00 sec) 191 | mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', avg(ExecutionTime) as 'Avg. Execution Time', count(distinct AccountId) as 'No. Of Accounts', count(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY verb_url_hash HAVING COUNT(*) >= 1G 192 | *************************** 1. row *************************** 193 | id: 1 194 | select_type: SIMPLE 195 | table: ApiLog 196 | partitions: NULL 197 | type: index 198 | possible_keys: ts,verb_url_hash 199 | key: verb_url_hash 200 | key_len: 19 201 | ref: NULL 202 | rows: 22008891 203 | filtered: 50.00 204 | Extra: Using where 205 | 1 row in set, 1 warning (0.00 sec) 206 | ``` 207 | 208 | MySQL可以避免排序,速度更快。它将最终还是要扫描所有表的索引的顺序。响应时间明显更好:只需大概38秒而不再是大于一小时。 209 | >MySQL will avoid any sorting, which is much faster. It will still have to eventually scan all the table in the order of the index. The response time is significantly better: ~38 seconds as opposed to > an hour. 210 | 211 | ### 覆盖索引 212 | 现在我们可以尝试做一个覆盖索引,这将相当大: 213 | >Now we can attempt to do a covered index, which will be quite large: 214 | 215 | ```sql 216 | mysql> alter table ApiLog add key covered_index (`verb_url_hash`,`ts`,`ExecutionTime`,`AccountId`,`ParentAccountId`, verb, url); 217 | Query OK, 0 rows affected (1 min 29.71 sec) 218 | Records: 0 Duplicates: 0 Warnings: 0 219 | ``` 220 | 221 | 我们添加了一个“verb”和“URL”,所以之前我不得不删除表定义的COLLATE utf8mb4_unicode_ci。现在执行计划表明,我们使用了覆盖索引: 222 | >We had to add a “verb” and “url”, so beforehand I had to remove the COLLATE utf8mb4_unicode_ci from the table definition. Now explain shows that we’re using the index: 223 | ```sql 224 | mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', AVG(ExecutionTime) as 'Avg. Execution Time', COUNT(distinct AccountId) as 'No. Of Accounts', COUNT(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY verb_url_hash HAVING COUNT(*) >= 1G 225 | *************************** 1. row *************************** 226 | id: 1 227 | select_type: SIMPLE 228 | table: ApiLog 229 | partitions: NULL 230 | type: index 231 | possible_keys: ts,verb_url_hash,covered_index 232 | key: covered_index 233 | key_len: 3057 234 | ref: NULL 235 | rows: 22382136 236 | filtered: 50.00 237 | Extra: Using where; Using index 238 | 1 row in set, 1 warning (0.00 sec) 239 | ``` 240 | 响应时间下降到约12秒!但是,索引的大小明显地比仅verb_url_hash的索引(每个记录16字节)要大得多。 241 | >The response time dropped to ~12 seconds! However, the index size is significantly larger compared to just verb_url_hash (16 bytes per record). 242 | 243 | ### 结论 244 | MySQL 5.7的生成列提供一个有价值的方法来提高查询性能。如果你有一个有趣的案例,请在评论中分享。 245 | >MySQL 5.7 generated columns provide a valuable way to improve query performance. If you have an interesting case, please share in the comments. 246 | -------------------------------------------------------------------------------- /mysql/13-MyRocks Engine Things to Know Before You Start.md: -------------------------------------------------------------------------------- 1 | # MyRocks Engine: Things to Know Before You Start 2 | 原文:[MySQL Query Performance: Not Just Indexes](https://www.percona.com/blog/2018/02/01/myrocks-engine-things-know-start/) 3 | 4 | 译者:魏新平 5 | 6 | Percona 最近发布了Percona Server with MyRocks的GA版本。你能看到Facebook是怎样解释了在生产环境中使用MyRocks 取得的成功(直接翻译为你能够了解到为什么Facebook在生产环境使用MyRocks了是不是更好)。如果你使用[Percona repositories](https://www.percona.com/doc/percona-server/LATEST/myrocks/install.html) ,你能够简单的安装MyRocks插件并且用ps-admin --enable-rocksdb来启动它。 7 | 8 | >Percona recently released Percona Server with MyRocks as GA. You can see how Facebook explains wins they see in production with MyRocks. Now if you use Percona repositories, you can simply install MyRocks plugin and enable it with ps-admin --enable-rocksdb. 9 | 10 | 将它和典型的InnoDB进行比较时,存在一些主要和次要区别,我想在此强调一下。第一个主要的区别是MyRocks (based on RocksDB) 使用Log Structured Merge Tree数据结构,不是InnoDB的B+ tree数据结构。 11 | >There are some major and minor differences when comparing it to typical InnoDB deployments, and I want to highlight them here. The first important difference is that MyRocks (based on RocksDB) uses Log Structured Merge Tree data structure, not a B+ tree like InnoDB. 12 | 13 | 你能够在我发布在DZone的[文章](https://dzone.com/articles/how-three-fundamental-data-structures-impact-stora) 当中了解到更多关于LSM引擎的信息 。总的来说,LSM引擎更适合写密集型的应用场景,读取速度可能会比较慢,全表扫描对于引擎来说负担会太重。当使用MyRocks作为应用底层时,需要特别注意这一点。MyRocks 不是加强版的InnoDB,也不能在所有应用场景下替换InnoDB。他有自己的优势/局限,就像InnoDB一样,你需要根据你数据的存取模式来选择使用哪一个引擎。 14 | 15 | >You learn more about the LSM engine in my article for DZone.The summary is that an LSM data structure is good for write-intensive workloads, with the expense that reads might slow down (both point reads and especially range reads) and full table scans might be too heavy for the engine. This is important to keep in mind when designing applications for MyRocks. MyRocks is not an enhanced InnoDB, nor a one-size-fits-all replacement for InnoDB. It has its own pros/cons just like InnoDB. You need to decide which engine to use based on your applications data access patterns. 16 | 17 | 还有什么其他需要注意的区别的吗? 18 | >What other differences should you be aware of? 19 | 20 | 让我们看一下目录结构。当前,所有的表和数据库都是存储在mysqldir的.rocksdb隐藏目录当中。名字和地址可以改变,但是所有的数据库当中的所有表还是存储在一系列的.sst文件当中,没有per-table / per-database的区分。 21 | 22 | >Let’s look at the directory layout. Right now, all tables and all databases are stored in a hidden .rocksdb directory inside mysqldir. The name and location can be changed, but still all tables from all databases are stored in just a series of .sst files. There is no per-table / per-database separation. 23 | 24 | 默认情况下,MyRocks 使用LZ4来压缩所有的表。能够通过改变`rocksdb_default_cf_options`当中的变量来改变压缩的设置。默认值为`compression=kLZ4Compression;bottommost_compression=kLZ4Compression`。我们选择 LZ4,是因为它在很小的cpu负载下提供了可接受的压缩比。其他的压缩方式包括Zlib 和 ZSTD,或者直接不压缩。你能够在Peter和我的文章当中学习到更多关于[压缩比VS速度](https://www.percona.com/blog/2016/04/13/evaluating-database-compression-methods-update/) 的信息。为了比较装载了来自我自制路由器软件的流量统计数据的MyRocks表的物理大小,我使用了为pmacct收集器软件创建的下表。 25 | 26 | >By default in Percona Server for MySQL, MyRocks will use LZ4 compression for all tables. You can change compression settings by changing the rocksdb_default_cf_options server variable. By default it set to compression=kLZ4Compression;bottommost_compression=kLZ4Compression. We chose LZ4 compression as it provides acceptable compression level with very little CPU overhead. Other possible compression methods are Zlib and ZSTD, or no compression at all. You can learn more about compression ratio vs. speed in Peter’s and my post.To compare the data size of a MyRocks table loaded with traffic statistic data from my homebrew router, I’ve used the following table created for pmacct collector: 27 | ```sql 28 | CREATE TABLE `acct_v9` ( 29 | `tag` int(4) unsigned NOT NULL, 30 | `class_id` char(16) NOT NULL, 31 | `class` varchar(255) DEFAULT NULL, 32 | `mac_src` char(17) NOT NULL, 33 | `mac_dst` char(17) NOT NULL, 34 | `vlan` int(2) unsigned NOT NULL, 35 | `as_src` int(4) unsigned NOT NULL, 36 | `as_dst` int(4) unsigned NOT NULL, 37 | `ip_src` char(15) NOT NULL, 38 | `ip_dst` char(15) NOT NULL, 39 | `port_src` int(2) unsigned NOT NULL, 40 | `port_dst` int(2) unsigned NOT NULL, 41 | `tcp_flags` int(4) unsigned NOT NULL, 42 | `ip_proto` char(6) NOT NULL, 43 | `tos` int(4) unsigned NOT NULL, 44 | `packets` int(10) unsigned NOT NULL, 45 | `bytes` bigint(20) unsigned NOT NULL, 46 | `flows` int(10) unsigned NOT NULL, 47 | `stamp_inserted` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, 48 | `id` int(11) NOT NULL AUTO_INCREMENT, 49 | PRIMARY KEY (`id`) 50 | ) ENGINE=ROCKSDB AUTO_INCREMENT=20127562 51 | ``` 52 | 53 | 正如你所看见的,表中有大概2000万条数据。MyRocks (用默认的 LZ4 压缩)使用了828MB。 InnoDB (默认,未压缩) 使用了3760MB。 54 | >as you can see, there are about 20mln records in this table. MyRocks (with default LZ4 compression) uses 828MB. InnoDB (uncompressed) uses 3760MB. 55 | 56 | 你能够在 .rocksdb 目录的LOG文件当中找到RocksDB 实例的详细信息。查看这些日志,可以进行更详细的诊断。你也能够使用SHOW ENGINE ROCKSDB STATUS命令,但是这会比SHOW ENGINE INNODB STATUS返回的内容更复杂,需要消耗大量的精力和时间去理解。 57 | 58 | >You can find very verbose information about your RocksDB instance in the LOG file located in .rocksdb directory. Check this file for more diagnostics. You can also try the SHOW ENGINE ROCKSDB STATUS command, but it is even more cryptic than SHOW ENGINE INNODB STATUS. It takes time to parse and to understand it. 59 | 60 | 注意,现在MyRocks只支持 READ-COMMITTED 隔离级别。并没有 REPEATABLE-READ 隔离级别,也没有像InnoDB里一样的gap锁。理论上, 61 | RocksDB 只支持 SNAPSHOT 隔离级别。然而,MySQL 当中并没有SNAPSHOT 隔离级别的概念,所以我们没有实现特殊的语法去支持。如果你对这个感兴趣,请联系我们。 62 | >Keep in mind that at this time MyRocks supports only READ-COMMITTED isolation levels. There is no REPEATABLE-READ isolation level and no gap locking like in InnoDB. In theory, RocksDB should support SNAPSHOT isolation level. However, there is no notion of SNAPSHOT isolation in MySQL so we have not implemented the special syntax to support this level. Please let us know if you would be interested in this. 63 | 64 | 当你试图加载大量的数据到MyRocks 当中时,你可能会遇到问题(不幸的是这个可能是你使用MyRocks 时的首次工作,当你使用LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table 或者 ALTER TABLE innodb_table ENGINE=ROCKSDB)。假如你的表太大,并且你没有足够的内存,RocksDB 就会崩溃。在生产环境中,你应该为你加载数据的session设置rocksdb_bulk_load=1。了解更多请查看文章:https://github.com/facebook/mysql-5.6/wiki/data-loading。 65 | 66 | >For bulk loads, you may face problems trying to load large amounts of data into MyRocks (and unfortunately this might be the very first operation when you start playing with MyRocks as you try to LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table or ALTER TABLE innodb_table ENGINE=ROCKSDB). If your table is big enough and you do not have enough memory, RocksDB crashes. As a workaround, you should set rocksdb_bulk_load=1 for the session where you load data. See more on this page: https://github.com/facebook/mysql-5.6/wiki/data-loading. 67 | 68 | 在MyRocks中的Block cache有点类似于innodb_buffer_pool_size,但是对于MyRocks它主要有利于读取数据。您可能需要调整rocksdb_block_cache_size设置。另外,它默认使用buffered reads,在这种情况下,操作系统的cache缓存着压缩的数据,而RockDB block cache 会缓存未压缩的数据。你可以保持这种两层的缓存机制,或者你可以修改rocksdb_use_direct_reads=ON关闭缓存,强制block cache直接读取。LSM树的本质要求当一层变满时,有一个合并过程将压缩数据推到下一层。这个过程可能相当密集并会影响用户查询速度。可以将其调整为不那么密集。 69 | 70 | >Block cache in MyRocks is somewhat similar to innodb_buffer_pool_size, however for MyRocks it’s mainly beneficial for reads. You may want to tune the rocksdb_block_cache_size setting. Also keep in mind it uses buffered reads by default, and in this case the OS cache contains cached compressed data and RockDB block cache will contain uncompressed data. You may keep this setup to have two levels of cache, or you can disable buffering by forcing block cache to use direct reads with rocksdb_use_direct_reads=ON. 71 | The nature of LSM trees requires that when a level becomes full, there is a merge process that pushes compacted data to the next level. This process can be quite intensive and affect user queries. It is possible to tune it to be less intensive. 72 | 73 | 现在还没有类似于Percona XtraBackup一样的热备软件来执行MyRocks表的热备份(我们正在研究这个)。你可以使用mysqldump 来进行逻辑备份,或者使用文件系统层面的snapshots ,比如LVM 或 ZFS 。 74 | 75 | >Right now there is no hot backup software like Percona XtraBackup to perform a hot backup of MyRocks tables (we are looking into this). At this time you can use mysqldump for logical backups, or use filesystem-level snapshots like LVM or ZFS. 76 | 77 | 在我们的[官方文档](https://www.percona.com/doc/percona-server/LATEST/myrocks/limitations.html.)当中,你可以了解到更多关于MyRocks 的优势和局限性. 78 | 79 | >You can find more MyRocks specifics and limitations in our docs at https://www.percona.com/doc/percona-server/LATEST/myrocks/limitations.html. 80 | 81 | 我们期待大家的反馈 。 82 | >We are looking for feedback on your MyRocks experience! 83 | 84 | ##更新(2018-02-12) 85 | 在获得Facebook MyRocks 团队的反馈后,我对原来的文章进行了更新。 86 | >UPDATES (12-Feb-2018) 87 | Updates to the original post with the feedback provided by Facebook MyRocks team 88 | 89 | 1. 隔离级别 90 | MyRocks 支持READ COMMITTED 和 REPEATABLE READ隔离级别,不支持 91 | SERIALIZABLE。想了解更详细的信息可以阅读https://github.com/facebook/mysql-5.6/wiki/Transaction-Isolation。MyRocks 实现REPETABLE READ的方法和InnoDB不一样 — MyRocks 使用类似PostgreSQL 的snapshot isolation。 92 | 在Percona server 中,不允许在MyRocks 表上使用 REPEATABLE READ 隔离级别,因为REPEATABLE READ 隔离级别在innodb和myrocks db上的处理方式不一样。 93 | >1. Isolation Levels 94 | MyRocks supports READ COMMITTED and REPEATABLE READ. MyRocks does not support SERIALIZABLE. 95 | Please read https://github.com/facebook/mysql-5.6/wiki/Transaction-Isolation for details. 96 | The way to implement REPETABLE READ was different from MyRocks and InnoDB — MyRocks used 97 | PostgreSQL style snapshot isolation. 98 | In Percona Server we do not allow REPEATABLE READ for MyRocks tables, as the behavior will be different from InnoDB. 99 | 100 | 2. 在线二进制备份工具 101 | 网上有一个开源的在线二进制备份工具—myrocks_hotabackup:https://github.com/facebook/mysql-5.6/blob/fb-mysql-5.6.35/scripts/myrocks_hotbackup 102 | >2. Online Binary Backup Tool 103 | There is an open source online binary backup tool for MyRocks — myrocks_hotabackup 104 | https://github.com/facebook/mysql-5.6/blob/fb-mysql-5.6.35/scripts/myrocks_hotbackup 105 | -------------------------------------------------------------------------------- /mysql/14-What-To-Do-When-MySQL-Runs-Out-of-Memory: Troubleshooting-Guide.md: -------------------------------------------------------------------------------- 1 | >作者:Alexander Rubin 2 | >发布日期:2018-06-28 3 | >适用范围:MySQL, Percona Server for MySQL 4 | >关键词:memory, memory leaks, Memory Usage, MySQL server memory usage, MySQL Troubleshooting, Troubleshooting MySQL, troubleshooting tips 5 | 6 | 7 | Troubleshooting crashes is never a fun task, especially if MySQL does not report the cause of the crash. For example, when MySQL runs out of memory. Peter Zaitsev wrote a blog post in 2012: Troubleshooting MySQL Memory Usage with a lots of useful tips. With the new versions of MySQL (5.7+) and performance_schema we have the ability to troubleshoot MySQL memory allocation much more easily。 8 | 崩溃故障诊断绝不是一个有趣的任务,尤其如果MySQL没有报告崩溃的原因时,比如,MySQL运行时内存溢出。 Peter Zaitsev 在2012年写了一篇文章[Troubleshooting MySQL Memory Usage ](https://www.percona.com/blog/2012/03/21/troubleshooting-mysql-memory-usage/)里面有很多有用的提示.使用新版本MySQL(5.7+)结合performance_schema,我们可以更轻松地解决MySQL内存分配问题。 9 | 10 | In this blog post I will show you how to use it. 11 | 12 | First of all, there are 3 major cases when MySQL will crash due to running out of memory: 13 | 14 | 1. MySQL tries to allocate more memory than available because we specifically told it to do so. For example: you did not set innodb_buffer_pool_size correctly. This is very easy to fix 15 | 2. There is some other process(es) on the server that allocates RAM. It can be the application (java, python, php), web server or even the backup (i.e. mysqldump). When the source of the problem is identified, it is straightforward to fix. 16 | 3. Memory leaks in MySQL. This is a worst case scenario, and we need to troubleshoot. 17 | 18 | 在这篇博文中,我将向你展示如何使用它。 19 | 20 | 首先,MySQL因为内存溢出发生崩溃主要有以下三种情况: 21 | 22 | 1. MySQL试图分配比可用内存更多的内存,因为我们特意告诉它这样做.比如你没有正确的设置innodb_buffer_pool_size.这种情况很好解决。 23 | 2. 服务器上有其他一些进程分配了RAM内存.可能是应用程序(java,python,php),web服务器,或者甚至备份(比如mysqldump).确定问题的根源 24 | 后,可以直接修复。 25 | 3. MySQL内存泄漏.这时最糟糕的情况,这时需要我们进行故障诊断。 26 | 27 | ### Where to start troubleshooting MySQL memory leaks 28 | ### 从哪里开始诊断MySQL内存泄漏的问题 29 | 30 | Here is what we can start with (assuming it is a Linux server): 31 | 假设是一个linux服务器,我们可以从以下开始: 32 | 33 | **Part 1: Linux OS and config check** 34 | 1. Identify the crash by checking mysql error log and Linux log file (i.e. /var/log/messages or /var/log/syslog). You may see an entry saying that OOM Killer killed MySQL. Whenever MySQL has been killed by OOM “dmesg” also shows details about the circumstances surrounding it. 35 | 2. Check the available RAM: 36 | * free -g 37 | * cat /proc/meminfo 38 | 3. Check what applications are using RAM: “top” or “htop” (see the resident vs virtual memory) 39 | 4. Check mysql configuration: check /etc/my.cnf or in general /etc/my* (including /etc/mysql/* and other files). MySQL may be running with the different my.cnf (run ps ax| grep mysql ) 40 | 5. Run vmstat 5 5 to see if the system is reading/writing via virtual memory and if it is swapping 41 | 6. For non-production environments we can use other tools (like Valgrind, gdb, etc) to examine MySQL usage 42 | 43 | **第一部分: Linux 系统和配置检查** 44 | 45 | 1. 通过检查mysql error日志和linux日志(比如,/var/log/messages 或者 /var/log/syslog)确认崩溃. 46 | 你可能会看到一条条目说OOM Killer杀死了MySQL.每当MySQL被OOM杀死时,“dmesg”也会显示有关它周围情况的详细信息 47 | 2. 检查可用的RAM内存: 48 | * free -g 49 | * cat /proc/meminfo 50 | 3. 检查什么程序在使用内存:"top"或者htop(看resident和virtual列) 51 | 4. 检查mysql的配置:检查/etc/my.cnf或者一般的/etc/my*(包括/etc/mysql/*和其他文件). 52 | MySQL 可能跟着不同的my.cnf运行(用ps ax | grep mysql) 53 | 5. 运行vmstat 5 5 查看系统是否通过虚拟内存进行读写以及是否正在进行swap交换 54 | 6. 对于非生产环境,我们可以使用其他工具(如Valgrind,gdb等)来检查MySQL的使用情况. 55 | 56 | **Part 2: Checks inside MySQL** 57 | Now we can check things inside MySQL to look for potential MySQL memory leaks. 58 | MySQL allocates memory in tons of places. Especially: 59 | * Table cache 60 | * Performance_schema (run: show engine performance_schema status and look at the last line). That may be the cause for the systems with small amount of RAM, i.e. 1G or less 61 | * InnoDB (run show engine innodb status and check the buffer pool section, memory allocated for buffer_pool and related caches) 62 | * Temporary tables in RAM (find all in-memory tables by running: select * from information_schema.tables where engine='MEMORY' ) 63 | * Prepared statements, when it is not deallocated (check the number of prepared commands via deallocate command by running show global status like ‘ Com_prepare_sql';show global status like 'Com_dealloc_sql' ) 64 | 65 | **第二部分: 检查MySQL内部** 66 | 67 | 现在我们可以检查MySQL内部的东西来寻找潜在的MySQL内存泄漏情况: 68 | MySQL在很多地方分配内存.尤其: 69 | * 表缓存 70 | * Performance_schema(运行:show engine performance_schema status 然后看最后一行).这可能在系统RAM比较少(1G或更少)时的可能原因. 71 | * InnoDB(运行show engine innodb status 检查 buffer pool部分,为buffer pool及相关缓存分配的内存) 72 | * 内存中的临时表(查看所有内存表:select * from information_schema.tables where engine='MEMORY') 73 | * 预处理语句,当他们没有被释放时(通过运行show global status like 'Com_prepare_sql'和show global status like 'Com_dealloc_sql'来检查通过deallocate命令释放的预处理语句) 74 | 75 | The good news is: starting with MySQL 5.7 we have memory allocation in performance_schema. Here is how we can use it. 76 | 好消息是,从5.7开始我们可以通过performance_schema查看内存的分配情况.下面就展示如何使用它. 77 | 78 | 1. First, we need to enable collecting memory metrics. Run: 79 | 80 | ``` 81 | UPDATE setup_instruments SET ENABLED = 'YES' WHERE NAME LIKE 'memory/%'; 82 | ``` 83 | 84 | 2. Run the report from sys schema: 85 | 86 | ``` 87 | select event_name, current_alloc, high_alloc from sys.memory_global_by_current_bytes where current_count > 0; 88 | ``` 89 | 90 | 3. Usually this will give you the place in code when memory is allocated. It is usually self-explanatory. In some cases we can search for bugs or we might need to check the MySQL source code. 91 | --- 92 | 93 | 1. 首先,我们需要启用收集内存指标,运行如下语句: 94 | 95 | ``` 96 | UPDATE setup_instruments SET ENABLED = 'YES' WHERE NAME LIKE 'memory/%'; 97 | ``` 98 | 99 | 2. 运行sys schema里面的报告 100 | 101 | ``` 102 | select event_name,current_alloc,high_alloc from sys.memory_global_by_current_bytes where current_count > 0; 103 | ``` 104 | 105 | 3. 通常,这将在分配内存时为你提供代码,它通常是不言自明的.在某些情况下,我们可以搜索错误,或者我们可能需要检查MySQL源代码. 106 | 107 | For example, for the bug where memory was over-allocated in triggers ([https://bugs.mysql.com/bug.php?id=86821](https://bugs.mysql.com/bug.php?id=86821)) the select shows: 108 | 例如,有一个过度为触发器分配内存的bug([https://bugs.mysql.com/bug.php?id=86821](https://bugs.mysql.com/bug.php?id=86821)) 109 | 110 | ``` 111 | mysql> select event_name, current_alloc, high_alloc from memory_global_by_current_bytes where current_count > 0; 112 | +--------------------------------------------------------------------------------+---------------+-------------+ 113 | | event_name | current_alloc | high_alloc | 114 | +--------------------------------------------------------------------------------+---------------+-------------+ 115 | | memory/innodb/buf_buf_pool | 7.29 GiB | 7.29 GiB | 116 | | memory/sql/sp_head::main_mem_root | 3.21 GiB | 3.62 GiB | 117 | ... 118 | ``` 119 | 120 | 查询的显示如下: 121 | 122 | ``` 123 | mysql> select event_name, current_alloc, high_alloc from memory_global_by_current_bytes where current_count > 0; 124 | +--------------------------------------------------------------------------------+---------------+-------------+ 125 | | event_name | current_alloc | high_alloc | 126 | +--------------------------------------------------------------------------------+---------------+-------------+ 127 | | memory/innodb/buf_buf_pool | 7.29 GiB | 7.29 GiB | 128 | | memory/sql/sp_head::main_mem_root | 3.21 GiB | 3.62 GiB | 129 | ... 130 | ``` 131 | 132 | The largest chunk of RAM is usually the buffer pool but ~3G in stored procedures seems to be too high. 133 | 分配最大一块内存通常是buffer pool,但是约3G的存储过程似乎有点太高了. 134 | 135 | According to the [MySQL source code documentation](https://dev.mysql.com/doc/dev/mysql-server/8.0.0/classsp__head.html#details), sp_head represents one instance of a stored program which might be of any type (stored procedure, function, trigger, event). In the above case we have a potential memory leak. 136 | 根据[MySQL source code documentation](https://dev.mysql.com/doc/dev/mysql-server/8.0.0/classsp__head.html#details),sp_head表示存储程序里面的一个实例(比如存储过程,函数,触发器,事件).在上面的例子,我们有潜在的内存泄漏的风险. 137 | 138 | In addition we can get a total report for each higher level event if we want to see from the birds eye what is eating memory: 139 | 另外,我们想要鸟瞰什么吃掉了内存,我们可以获得每个事件更高级别活动的总体报告. 140 | 141 | ``` 142 | mysql> select substring_index( 143 | -> substring_index(event_name, '/', 2), 144 | -> '/', 145 | -> -1 146 | -> ) as event_type, 147 | -> round(sum(CURRENT_NUMBER_OF_BYTES_USED)/1024/1024, 2) as MB_CURRENTLY_USED 148 | -> from performance_schema.memory_summary_global_by_event_name 149 | -> group by event_type 150 | -> having MB_CURRENTLY_USED>0; 151 | +--------------------+-------------------+ 152 | | event_type | MB_CURRENTLY_USED | 153 | +--------------------+-------------------+ 154 | | innodb | 0.61 | 155 | | memory | 0.21 | 156 | | performance_schema | 106.26 | 157 | | sql | 0.79 | 158 | +--------------------+-------------------+ 159 | 4 rows in set (0.00 sec) 160 | ``` 161 | 162 | I hope those simple steps can help troubleshoot MySQL crashes due to running out of memory. 163 | 我希望这些简单的步骤可以帮助解决由于内存溢出导致的MySQL崩溃问题。 164 | -------------------------------------------------------------------------------- /mysql/17-Why You Should Avoid Using “CREATE TABLE AS SELECT” Statement.md: -------------------------------------------------------------------------------- 1 | ## 为什么要避免使用“CREATE TABLE AS SELECT”语句 2 | 3 | >作者: Alexander Rubin 4 | 发布日期:2018-01-10 5 | 关键词:create table as select, metadata locks, MySQL, open source database, row locking, table locking 6 | 适用范围: Insight for DBAs, MySQL 7 | 原文 http://www.percona.com/blog/2018/01/10/why-avoid-create-table-as-select-statement/ 8 | 9 | 10 | >In this blog post, I’ll provide an explanation why you should avoid using the CREATE TABLE AS SELECT statement. 11 | 12 | 在这篇博文中,我将解释为什么你应该避免使用CREATE TABLE AS SELECT语句。 13 | 14 | >The SQL statement “create table as select …” is used to create a normal or temporary table and materialize the result of the select. Some applications use this construct to create a copy of the table. This is one statement that will do all the work, so you do not need to create a table structure or use another statement to copy the structure. 15 | 16 | SQL语句“create table as select ...”用于创建普通表或临时表,并物化select的结果。某些应用程序使用这种结构来创建表的副本。一条语句完成所有工作,因此您无需创建表结构或使用其他语句来复制结构。 17 | 18 | >At the same time there are a number of problems with this statement: 19 | >1. You don’t create indexes for the new table 20 | >2. You are mixing transactional and non-transactional statements in one transaction. As with any DDL, it will commit current and unfinished transactions 21 | >3. CREATE TABLE … SELECT is not supported when using GTID-based replication 22 | >4. Metadata locks won’t release until the statement is finished 23 | 24 | 与此同时,这种语句存在许多问题: 25 | 26 | 1. 您不为新表创建索引 27 | 2. 您在一个事务中混合了事务性和非事务性语句时,与任何DDL一样,它将提交当前和未完成的事务 28 | 3. 使用基于GTID的复制时不支持 CREATE TABLE ... SELECT 29 | 4. 在语句完成之前,元数据锁不会释放 30 | 31 | ### CREATE TABLE AS SELECT语句可以把事物变得很糟糕 32 | 33 | >Let’s imagine we need to transfer money from one account to another (classic example). But in addition to just transferring funds, we need to calculate fees. The developers decide to create a table to perform a complex calculation. 34 | 35 | >Then the transaction looks like this: 36 | 37 | 让我们想象一下,我们需要将钱从一个账户转移到另一个账户(经典示例)。但除了转移资金外,我们还需要计算费用。开发人员决定创建一个表来执行复杂的计算。 38 | 39 | 然后事务看起来像这样: 40 | ```sql 41 | begin; 42 | update accounts set amount = amount - 100000 where account_id=123; 43 | -- now we calculate fees 44 | create table as select ... join ... 45 | update accounts set amount = amount + 100000 where account_id=321; 46 | commit; 47 | ``` 48 | >The “create table as select … join … ” commits a transaction that is not safe. In case of an error, the second account obviously will not be credited by the second account debit that has been already committed! 49 | 50 | >Well, instead of “create table … “, we can use “create temporary table …” which fixes the issue, as temporary table creation is allowed. 51 | 52 | “create table as select ... join ...”会提交一个事务,这是不安全的。如果出现错误,第二个帐户显然不会被已经提交的第二个帐户借记贷记! 53 | 54 | 好吧,我们可以使用“create temporary table …”来修复问题,而不是“create table … ”,因为允许临时表创建。 55 | 56 | ### GTID问题 57 | >If you try to use CREATE TABLE AS SELECT when GTID is enabled (and ENFORCE_GTID_CONSISTENCY = 1) you get this error: 58 | 59 | 如果在启用GTID时尝试使用CREATE TABLE AS SELECT(并且ENFORCE_GTID_CONSISTENCY = 1),则会出现此错误: 60 | ```sql 61 | General error: 1786 CREATE TABLE ... SELECT is forbidden when @@GLOBAL.ENFORCE_GTID_CONSISTENCY = 1. 62 | ``` 63 | >The application code may break. 64 | 65 | 应用程序代码可能会中断。 66 | 67 | ### 元数据锁问题 68 | >Metadata lock issue for CREATE TABLE AS SELECT is less known. ([More information about the metadata locking in general](https://dev.mysql.com/doc/refman/5.7/en/metadata-locking.html)). Please note: MySQL metadata lock is different from InnoDB deadlock, row-level locking and table-level locking. 69 | 70 | >This quick simulation demonstrates metadata lock: 71 | 72 | CREATE TABLE AS SELECT的元数据锁定问题鲜为人知。([有关元数据锁定的更多信息](https://dev.mysql.com/doc/refman/5.7/en/metadata-locking.html))。 请注意:MySQL元数据锁与InnoDB死锁、行级锁、表级锁是不同的。 73 | 74 | 以下速模拟演示了元数据锁定: 75 | 76 | **会话1:** 77 | ``` 78 | mysql> create table test2 as select * from test1; 79 | ``` 80 | **会话2:** 81 | ``` 82 | mysql> select * from test2 limit 10; 83 | ``` 84 | >-- blocked statement 85 | 86 | 语句被阻塞 87 | 88 | >This statement is waiting for the metadata lock: 89 | 90 | 此语句正在等待元数据锁: 91 | 92 | **会话3:** 93 | ``` 94 | mysql> show processlist; 95 | +----+------+-----------+------+---------+------+---------------------------------+------------------------------------------- 96 | | Id | User | Host | db | Command | Time | State | Info 97 | +----+------+-----------+------+---------+------+---------------------------------+------------------------------------------- 98 | | 2 | root | localhost | test | Query | 18 | Sending data | create table test2 as select * from test1 99 | | 3 | root | localhost | test | Query | 7 | Waiting for table metadata lock | select * from test2 limit 10 100 | | 4 | root | localhost | NULL | Query | 0 | NULL | show processlist 101 | +----+------+-----------+------+---------+------+---------------------------------+------------------------------------------- 102 | ``` 103 | >The same can happen another way: a slow select query can prevent some DDL operations (i.e., rename, drop, etc.): 104 | 105 | 同样地,可以采用另一种方式:慢查询可以阻塞某些DDL操作(即重命名,删除等): 106 | ``` 107 | mysql> show processlistG 108 | *************************** 1. row *************************** 109 | Id: 4 110 | User: root 111 | Host: localhost 112 | db: reporting_stage 113 | Command: Query 114 | Time: 0 115 | State: NULL 116 | Info: show processlist 117 | Rows_sent: 0 118 | Rows_examined: 0 119 | Rows_read: 0 120 | *************************** 2. row *************************** 121 | Id: 5 122 | User: root 123 | Host: localhost 124 | db: test 125 | Command: Query 126 | Time: 9 127 | State: Copying to tmp table 128 | Info: select count(*), name from test2 group by name order by cid 129 | Rows_sent: 0 130 | Rows_examined: 0 131 | Rows_read: 0 132 | *************************** 3. row *************************** 133 | Id: 6 134 | User: root 135 | Host: localhost 136 | db: test 137 | Command: Query 138 | Time: 5 139 | State: Waiting for table metadata lock 140 | Info: rename table test2 to test4 141 | Rows_sent: 0 142 | Rows_examined: 0 143 | Rows_read: 0 144 | 3 rows in set (0.00 sec) 145 | ``` 146 | >As we can see, CREATE TABLE AS SELECT can affect other queries. However, the problem here is not the metadata lock itself (the metadata lock is needed to preserve consistency). The problem is that the 147 | ***metadata lock will not be released until the statement is finished***. 148 | 149 | 我们可以看到,CREATE TABLE AS SELECT可以影响其他查询。但是,这里的问题不是元数据锁本身(需要元数据锁来保持一致性)。问题是 ***在语句完成之前不会释放元数据锁***。 150 | 151 | >The fix is simple: copy the table structure first by doing “create table new_table like old_table”, then do “insert into new_table select …”. The metadata lock is still held for the create table part (very short), but isn’t for the “insert … select” part (the total time to hold the lock is much shorter). To illustrate the difference, let’s look at two cases: 152 | 153 | >1. With “create table table_new as select … from table1“, other application connections can’t read from the destination table (table_new) for the duration of the statement (even “show fields from table_new” will be blocked) 154 | >2. With “create table new_table like old_table” + “insert into new_table select …”, other application connections can’t read from the destination table during the “insert into new_table select …” part. 155 | 156 | 157 | 修复很简单:首先复制表结构,执行“ create table new_table like old_table”,然后执行“insert into new_table select ...”。元数据锁仍然在创建表部分(非常短)持有,但“insert … select”部分不会持有(保持锁定的总时间要短得多)。为了说明不同之处,让我们看看以下两种情况: 158 | 1. 使用“create table table_new as select ... from table1 ”,其他应用程序连接 在语句的持续时间内 无法读取目标表(table_new)(甚至“show fields from table_new”将被阻塞) 159 | 2. 使用“create table new_table like old_table”+“insert into new_table select ...”,在“insert into new_table select ...”这部分期间,其他应用程序连接无法读取目标表。 160 | 161 | >In some cases, however, the table structure is not known beforehand. For example, we may need to materialize the result set of a complex select statement, involving joins and/or group by. In this case, we can use this trick: 162 | 163 | 然而,在某些情况下,表结构事先是未知的。例如,我们可能需要物化复杂select语句的结果集,包括joins、and/or、group by。在这种情况下,我们可以使用这个技巧: 164 | ``` 165 | create table new_table as select ... join ... group by ... limit 0; 166 | insert into new_table as select ... join ... group by ... 167 | ``` 168 | 169 | >The first statement creates a table structure and doesn’t insert any rows (LIMIT 0). The first statement places a metadata lock. However, it is very quick. The second statement actually inserts rows into the table and doesn’t place a metadata lock. 170 | 171 | 第一个语句创建一个表结构,不插入任何行(LIMIT 0)。第一个语句持有元数据锁。但是,它非常快。第二个语句实际上是在表中插入行,而不持有元数据锁。 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | -------------------------------------------------------------------------------- /mysql/19-MySQL5.6升级到5.7遇到的问题总结.md: -------------------------------------------------------------------------------- 1 | MySQL5.6升级到5.7遇到的问题 2 | 3 | 18年我们借助MHA+mydumper+myloader将生产环境MySQL从5.6全部升级到了5.7版本,整体还算平稳。但是也遇到一些问题,主要是以下三类问题: 4 | ``` 5 | (1)主从复制问题 6 | (2)range_optimizer_max_mem_size参数引起的性能问题 7 | (3)SQL兼容问题 8 | ``` 9 | 10 | 11 | 下面围绕这3个问题展开说明 12 | 13 | ### 1、主从复制问题 14 | MySQL5.7到小于5.6.22的复制存在bug(bug 74683),会导致复制中断,报错如下 15 | 16 | ``` 17 | 2018-12-20 10:40:02 35878 [ERROR] Slave I/O: Found a Gtid_log_event or Previous_gtids_log_event when @@GLOBAL.GTID_MODE = OFF. Error_code: 1784 18 | 2018-12-20 10:40:02 35878 [ERROR] Slave I/O: Relay log write failure: could not queue event from master, Error_code: 1595 19 | ``` 20 | 如果你的版本<5.6.23,建议你升级到>=5.6.23版本,因为这个bug是在5.6.23修复的,详细信息请见 21 | https://bugs.mysql.com/bug.php?id=74683 22 | 23 | ### 2、range_optimizer_max_mem_size参数引起的性能问题 24 | ##### 【问题描述】 25 | MySQL从5.6升级到5.7之后,开发反馈调度系统超时,和开发沟通后把问题SQL要了过来,SQL类似如下 26 | 27 | ``` 28 | select column1,column2 from tb123 where column1 in (3128611,3128612,3128613...这里省略30多万); 29 | ``` 30 | 其中 column1 字段有索引idx_column1(column1)。 31 | 32 | ##### 【原因分析】 33 | 分别在5.6 5.7上查看了这条SQL的执行计划和执行时间,发现这条SQL在5.6版本使用了column1索引,执行时间2秒,但是在5.7版本全表扫描,执行时间是18秒。 34 | 同时在5.7显示了一个warnings 35 | 36 | ``` 37 | >show warnings; 38 | Warning | 3170 | Memory capacity of 8388608 bytes for 'range_optimizer_max_mem_size' exceeded. Range optimization was not done for this query. 39 | ``` 40 | 从告警信息得知和参数 range_optimizer_max_mem_size 有关,查看官方文档得知: 41 | 这个参数是在mysql 5.7新增的,范围查询优化参数,这个参数限制范围查询优化使用的内存,默认值是8M,当使用内存超过8M则会放弃使用范围查询而采用其他方法比如用全表扫描来代替。而上面的SQL in里面的值太多,超出了8M,所以走了全表扫描。知道原因后,解决办法也很简单,就是增大这个值。 42 | ``` 43 | 最小值: 0 44 | 最大值: 18446744073709551615 45 | 默认值: 8388608 46 | ``` 47 | 下面我们测试一下这个参数对查询的影响 48 | ##### 【测试】 49 | 使用 range_optimizer_max_mem_size 的默认值,即8M 50 | 51 | ``` 52 | >show variables like '%range_optimizer_max_mem_size%'; 53 | +------------------------------+---------+ 54 | | Variable_name | Value | 55 | +------------------------------+---------+ 56 | | range_optimizer_max_mem_size | 8388608 | 57 | +------------------------------+---------+ 58 | 1 row in set (0.00 sec) 59 | 60 | >show create table sbtest1\G 61 | *************************** 1. row *************************** 62 | Table: sbtest1 63 | Create Table: CREATE TABLE `sbtest1` ( 64 | `id` int(11) NOT NULL AUTO_INCREMENT, 65 | `k` int(11) NOT NULL DEFAULT '0', 66 | `c` char(120) NOT NULL DEFAULT '', 67 | `pad` char(60) NOT NULL DEFAULT '', 68 | PRIMARY KEY (`id`), 69 | KEY `k_1` (`k`) 70 | ) ENGINE=InnoDB AUTO_INCREMENT=3000001 DEFAULT CHARSET=utf8 71 | 1 row in set (0.01 sec) 72 | 73 | >desc select * from sbtest1 where k in (369193,434819,486940)\G 74 | *************************** 1. row *************************** 75 | id: 1 76 | select_type: SIMPLE 77 | table: sbtest1 78 | partitions: NULL 79 | type: range 80 | possible_keys: k_1 81 | key: k_1 82 | key_len: 4 83 | ref: NULL 84 | rows: 3 85 | filtered: 100.00 86 | Extra: Using index condition 87 | 1 row in set, 1 warning (0.00 sec) 88 | ``` 89 | 从上面执行计划可以看到,使用到了二级索引 k_1。下面我们将range_optimizer_max_mem_size的值修改为2048测试 90 | 91 | ``` 92 | >set range_optimizer_max_mem_size=2048; 93 | Query OK, 0 rows affected (0.00 sec) 94 | 95 | >show variables like '%range_optimizer_max_mem_size%'; 96 | +------------------------------+-------+ 97 | | Variable_name | Value | 98 | +------------------------------+-------+ 99 | | range_optimizer_max_mem_size | 2048 | 100 | +------------------------------+-------+ 101 | 1 row in set (0.00 sec) 102 | 103 | >desc select * from sbtest1 where k in (369193,434819,486940)\G 104 | *************************** 1. row *************************** 105 | id: 1 106 | select_type: SIMPLE 107 | table: sbtest1 108 | partitions: NULL 109 | type: ALL 110 | possible_keys: k_1 111 | key: NULL 112 | key_len: NULL 113 | ref: NULL 114 | rows: 2884885 115 | filtered: 30.00 116 | Extra: Using where 117 | 1 row in set, 2 warnings (0.00 sec) 118 | 119 | >show warnings\G 120 | *************************** 1. row *************************** 121 | Level: Warning 122 | Code: 3170 123 | Message: Memory capacity of 2048 bytes for 'range_optimizer_max_mem_size' exceeded. Range optimization was not done for this query. 124 | *************************** 2. row *************************** 125 | Level: Note 126 | Code: 1003 127 | Message: /* select#1 */ select `sysbench`.`sbtest1`.`id` AS `id`,`sysbench`.`sbtest1`.`k` AS `k`,`sysbench`.`sbtest1`.`c` AS `c`,`sysbench`.`sbtest1`.`pad` AS `pad` from `sysbench`.`sbtest1` where (`sysbench`.`sbtest1`.`k` in (369193,434819,486940)) 128 | 2 rows in set (0.00 sec) 129 | ``` 130 | 从上面执行计划得知,走了全表扫描,没使用到二级索引k_1。 131 | 132 | ##### 【解决办法】 133 | 知道原因后,解决办法就很简单了,将 range_optimizer_max_mem_size 修改为100M后,SQL执行响应时间为2秒。 134 | 135 | range_optimizer_max_mem_size修改为多大合适,可以参考官网计算公式 136 | https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html 137 | 138 | ### 3、SQL兼容性问题 139 | 有一套系统从5.6升级到5.7之后,开发反馈相同的SQL在5.6可以正常显示内容,但是升级后不显示内容了。 140 | 下面我用类似的例子展现一下当时的情况 141 | 142 | ``` 143 | 表结构 144 | CREATE TABLE `t_star` ( 145 | `id` int(10) unsigned NOT NULL AUTO_INCREMENT, 146 | `name` varchar(20) NOT NULL DEFAULT '' COMMENT '名字', 147 | `gender` tinyint(4) NOT NULL DEFAULT '0' COMMENT '0:男,1:女', 148 | `city` varchar(10) NOT NULL DEFAULT '' COMMENT '所在城市', 149 | PRIMARY KEY (`id`) 150 | ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8; 151 | 152 | 内容 153 | >select * from t_star; 154 | +----+-----------+--------+--------+ 155 | | id | name | gender | city | 156 | +----+-----------+--------+--------+ 157 | | 1 | 姚明 | 0 | 上海 | 158 | | 2 | 邓超 | 0 | 南昌 | 159 | | 3 | 刘德华 | 0 | 香港 | 160 | | 4 | 刘亦菲 | 1 | 香港 | 161 | | 5 | 江疏影 | 1 | 上海 | 162 | +----+-----------+--------+--------+ 163 | ``` 164 | MySQL 5.6版本 165 | 166 | ``` 167 | >select * from t_star where city = '上海' and gender in (x'ACED0005757200115B4C6A6176612E6C616E672E4C6F6E673B7DE10AB2BBBC632B0200007870000000017372000E6A6176612E6C616E672E4C6F6E673B8BE490CC8F23DF0200014A000576616C7565787200106A6176612E6C616E672E4E756D62657286AC951D0B94E08B02000078700000000000000001'); 168 | +----+-----------+--------+--------+ 169 | | id | name | gender | city | 170 | +----+-----------+--------+--------+ 171 | | 5 | 江疏影 | 1 | 上海 | 172 | +----+-----------+--------+--------+ 173 | ``` 174 | MySQL 5.7版本 175 | 176 | ``` 177 | >select * from t_star where city = '上海' and gender in (x'ACED0005757200115B4C6A6176612E6C616E672E4C6F6E673B7DE10AB2BBBC632B0200007870000000017372000E6A6176612E6C616E672E4C6F6E673B8BE490CC8F23DF0200014A000576616C7565787200106A6176612E6C616E672E4E756D62657286AC951D0B94E08B02000078700000000000000001'); 178 | Empty set, 1 warning (0.00 sec) 179 | >show warnings\G 180 | *************************** 1. row *************************** 181 | Level: Warning 182 | Code: 1292 183 | Message: Truncated incorrect BINARY value: 'x'aced0005757200115b4c6a6176612e6c616e672e4c6f6e673b7de10ab2bbbc632b0200007870000000017372000e6a6176612e6c616e672e4c6f6e673b8be4 184 | ``` 185 | 从上面测试可以看出,同一条SQL在5.6可以正常显示结果,但是在5.7没显示任何信息且打印了一条告警信息。这个原因是,在5.6把gender的值转换成了1,而在5.7转换成了-1,SQL书写不规范导致的。 186 | 187 | 因此,核心业务系统从5.6升级到5.7,如果无法提前找出这种SQL,很可能会对业务造成影响,那么如何尽量避免这种问题,提前找到有问题的SQL呢? 188 | 这里提供一种思路,能从一定程度上减少这种事情的发生。 189 | 基本思路是:通过慢日志统计分析select,分别在5.6和5.7上运行,将运行结果输出到文件,对文件求MD5值,通过判断MD5值是否相同来判断相同SQL在5.6和5.7上执行结果是否相同。 190 | 下面是一种实现方法,可以参考。 191 | 192 | 假设要升级的MHA集群如下 193 | 194 | ``` 195 | 192.168.1.10:3306 master 5.6 196 | ---192.168.1.20:3306 slave1 5.6 197 | ---192.168.1.30:3306 slave2 5.6 198 | ``` 199 | 我们可以借助一台机器,在同一台机器上同时搭建5.6和5.7,如下所示 200 | 201 | ``` 202 | 192.168.1.10:3306 master 5.6 203 | ---192.168.1.20:3306 slave1 5.6 204 | ---192.168.1.30:3306 slave2 5.6 205 | ---192.168.1.40:3306(5.6)---192.168.1.40:3307(5.7) 206 | ``` 207 | 其中192.168.1.40是借助的机器,3306是5.6版本,3307是5.7版本,当检测SQL时,停止192.168.1.40:3306的复制即可,这样192.168.1.40:3306和192.168.1.40:3307数据是一致的。 208 | 209 | 下面是快速找出这种问题的简要步骤 210 | 211 | ``` 212 | 1、将线上实例的慢日志的时间修改的很小,这里将long_query_time修改为0,收集1个小时慢日志 213 | set global long_query_time=0; 214 | 收集期间注意观察服务器性能,别影响线上业务。 215 | 216 | 2、利用pt-query-digest分析慢日志并入表 217 | pt-query-digest --user=username --password=password --limit=100% --charset=utf8 --progress percentage,1 --filter '$event->{fingerprint} =~ m/^select/i' --history h=aa.aa.aa.aa,P=3306,D=test,t=query_history --no-report /data/mysql3306/log/slow.log 218 | 219 | 其中query_history表结构如下 220 | CREATE TABLE `query_history` ( 221 | `checksum` bigint(20) unsigned NOT NULL, 222 | `sample` longtext NOT NULL, 223 | `db_min` varchar(100) NOT NULL DEFAULT '', 224 | `ts_min` d_time NOT NULL DEFAULT '0000-00-00 00:00:00', 225 | `ts_max` d_time NOT NULL DEFAULT '0000-00-00 00:00:00', 226 | `ts_cnt` float DEFAULT NULL, 227 | PRIMARY KEY (`checksum`,`ts_min`,`ts_max`) 228 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8 229 | 230 | 3、停止192.168.1.40:3306的复制,这样192.168.1.40:3306和192.168.1.40:3307数据是一致的。 231 | 232 | 4、执行下面脚本找出MD5值不同的SQL 233 | #!/usr/bin/env python 234 | # coding: utf8 -*- 235 | check_hosts = ["192.168.1.40:3306","192.168.1.40:3307"] 236 | manager_host="aa.aa.aa.aa:3306" 237 | slow_log_to_file='/tmp/row.log' 238 | import pymysql as connector 239 | import traceback 240 | import hashlib 241 | red='\033[1;35m' 242 | end='\033[0m' 243 | lv1='\033[1;32m' 244 | lv2='\033[0m' 245 | def get_mysql_connection(server, db): 246 | try: 247 | dbconfig = { 248 | 'user': 'username', 249 | 'passwd': 'password', 250 | 'charset': 'utf8mb4', 251 | 'autocommit':True 252 | } 253 | host, port = server.split(':') 254 | port = int(port) 255 | dbconfig['host'] = host 256 | dbconfig['port'] = port 257 | dbconfig['db'] = db 258 | return connector.connect(**dbconfig) 259 | except Exception, e: 260 | print " get_mysql_connection() error : %s " % traceback.format_exc() 261 | raise e 262 | 263 | def md5sum(filename, blocksize=65536): 264 | hash = hashlib.md5() 265 | with open(filename, "rb") as f: 266 | for block in iter(lambda: f.read(blocksize), b""): 267 | hash.update(block) 268 | return hash.hexdigest() 269 | 270 | for check_host in check_hosts: 271 | print "%scheck mysql: %s %s" %(lv1,check_host,lv2) 272 | conn1 = get_mysql_connection(manager_host, 'test') 273 | sql_num = """select count(*) from query_history""" 274 | cur1 = conn1.cursor() 275 | cur1.execute(sql_num) 276 | sql_num = cur1.fetchall() 277 | sql_num = sql_num[0][0] 278 | print "sql_num: %s" %(sql_num) 279 | i = 0 280 | while (i < sql_num): 281 | dbname = """select db_min from query_history limit %s,1""" %(i) 282 | sql = """select sample from query_history limit %s,1""" %(i) 283 | cur1.execute(dbname) 284 | dbname = cur1.fetchall() 285 | cur1.execute(sql) 286 | sql = cur1.fetchall() 287 | if dbname[0][0] <> 'information_schema': 288 | print "[%s] dbname: %s" %(i,dbname[0][0]) 289 | conn2 = get_mysql_connection(check_host, dbname[0][0]) 290 | cur2 = conn2.cursor() 291 | cur2.execute(sql[0][0]) 292 | rows = cur2.fetchall() 293 | conn2.close() 294 | with open(slow_log_to_file, 'w') as slow_log: 295 | for row in rows: 296 | column=row[0] 297 | print >> slow_log, "%s" %(column) 298 | md5_value=md5sum(slow_log_to_file) 299 | print "%s" %(md5_value) 300 | i=i+1 301 | else: 302 | i=i+1 303 | pass 304 | conn1.close() 305 | 306 | 307 | 5、分析MD5值不同的SQL,推进改进 308 | ``` 309 | 310 | 本文总结了MySQL从5.6升级到5.7过程中遇到的一些问题,希望对大家有帮助。 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | -------------------------------------------------------------------------------- /mysql/2-en-mysql-8-0-2-more-flexible-undo-tablespace-management.md: -------------------------------------------------------------------------------- 1 | 原文:[MySQL 8.0.2 More Flexible Undo Tablespace Management](http://mysqlserverteam.com/mysql-8-0-2-more-flexible-undo-tablespace-management/) 2 | 3 | 作者:[Kevin Lewis](http://mysqlserverteam.com/author/kevin/) 4 | 5 | 翻译团队:天一阁 6 | 7 | In MySQL 8.0.2 DMR we will introduce features which make managing undo tablespaces easier in InnoDB. 8 | 9 | The main improvement is that you can now create and drop undo tablespaces at any time. You can change the config file setting before any startup, whether recovery is needed or not. And you can either increase or decrease the number of undo tablespaces while the engine is busy. 10 | 11 | **innodb_undo_tablespaces:** Undo Tablespaces contain Rollback Segments which in turn contain undo logs. Undo Logs are used to ‘rollback’ transactions and to create earlier versions of data which is used by Multi Version Concurrency Control to present a consistent image of the database during a transaction. 12 | 13 | Previously, the number of undo tablespaces that InnoDB uses was established when the database was initialized. It can now be set to any value between 0 and 127 at any time; at startup in either the config file or on the command line, or while online by issuing ‘SET GLOBAL INNODB_UNDO_TABLESPACES=n’. 14 | 15 | When you choose zero undo tablespaces, all rollback segments are tracked by the system tablespace. This is the old way of storing Rollback Segments before separate Undo Tablespaces were added in version 5.6. We are trying to move away from using the system tablespace in this way, so the default value is not set to 2. In the near future, the minimum value will become 2, which means that the system tablespace will not be used for any rollback segments. So please do not keep innodb_undo_tablespaces=0 in your config files. 16 | 17 | **innodb_undo_log_truncate:** We chose a minimum of 2 undo tablespaces because you need at least 2 in order for one of them to be truncated. Undo truncation allows InnoDB to shrink the undo tablespace size after unusually large transactions. Previously, the innodb_undo_log_truncate setting defaulted to OFF. With version 8.0.2 it defaults to ON. 18 | 19 | **innodb_rollback_segments:** This can now be set to any value between 1 and 128 at any time; at startup in either the config file or the command line, or while online by issuing ‘SET GLOBAL INNODB_ROLLBACK_SEGMENTS=n’. 20 | 21 | This setting used to be the number of rollback segments that the whole server could support. It is now the number of rollback segments in each undo tablespace, allowing a greater number of rollback segments to be used by concurrent transactions. The default value is still 128. 22 | 23 | **innodb_undo_logs:** This setting was introduced in 5.6 as an alternate or alias of innodb_rollback_segments. It was a little confusing in terminology since in InnoDB, ‘Undo Logs’ are stored in Rollback Segments, which are file segments of an Undo Tablespace. In v8.0.2, we are dropping the use of this setting and requiring Innodb_rollback_segments to be used instead. The latest released version 5.7.19 contains deprecation warnings if it is used. 24 | 25 | **Undo Tablespace Name and Location:** Undo tablespaces are located in the directory specified by the setting innodb_undo_directory. If that setting is not used, they are created in the ‘datadir’ location. Previously they had names like ‘undo001’, ‘undo002’, etc. In v8.0.2 DMR they have names like ‘undo_001’, ‘undo_002’, etc. The reason for the name change is that these newer undo tablespaces contain a new header page that maps the locations of each rollback segment it contains. In version 5.6 when separate undo tablespaces were introduced, their rollback segment header page numbers were tracked in the system tablespace which limited the number of rollback segments for the whole instance to 128. 26 | 27 | Since each undo tablespace can now track its own rollback segments with this new page, these are really new types of undo tablespaces and need to have a different naming convention. This is also the reason that innodb_rollback_segments now defines the number of rollback segments per undo tablespace instead of the number for the whole MySQL instance. 28 | 29 | **Automatic Upgrade:** Before this change, the system tablespace tracked all rollback segments whether they were in the system tablespace or in undo tablespaces. If you start MySQL 8.0.2 on an existing database that uses the system tablespace to track rollback segments, at least 2 new undo tablespaces will be generated automatically. This will be common since the previous default value for innodb_undo_tablespaces was 0. Mysql 5.7 databases will go through an upgrade process which among other things will create the new DD from the old FRM files. As part of this process, at least 2 new undo tablespaces will be created as well. InnoDB can still use the existing rollback segments and undo tablespaces defined in the system tablespace if they have undo logs in them at startup. But it will not assign these old rollback segments to any new transactions. So once undo recovery is finished and these undo logs are not needed anymore, the old undo tablespaces are deleted. 30 | 31 | **Advantages and Benefits:** This change allows you to dynamically add more undo tablespaces and rollback segments as a database installation grows. With more undo tablespaces, it is easier to use undo tablespace truncation to minimize the disk space dedicated to rollback segments. Also, more rollback segments mean that concurrent transactions are more likely to use separate rollback segments for their undo logs which results in less contention for the same resources. 32 | 33 | Thanks for using MySQL! 34 | 35 | -------------------------------------------------------------------------------- /mysql/2-zh-mysql-8-0-2-more-flexible-undo-tablespace-management.md: -------------------------------------------------------------------------------- 1 | 原文: [MySQL 8.0.2 More Flexible Undo Tablespace Management](http://mysqlserverteam.com/mysql-8-0-2-more-flexible-undo-tablespace-management/) 2 | 3 | 作者:[Kevin Lewis](http://mysqlserverteam.com/author/kevin/) 4 | 5 | 翻译团队:天一阁 6 | 7 | ## MySQL 8.0.2中更灵活的UNDO表空间管理方式 8 | 9 | In MySQL 8.0.2 DMR we will introduce features which make managing undo tablespaces easier in InnoDB. 10 | 11 | 在MySQL 8.0.2 DMR版本中,我们将提高InnoDB的UNDO表空间易管理性。 12 | 13 | The main improvement is that you can now create and drop undo tablespaces at any time. You can change the config file setting before any startup, whether recovery is needed or not. And you can either increase or decrease the number of undo tablespaces while the engine is busy. 14 | 15 | 主要有几点提升:可以随时自由地创建或删除UNDO表空间;也可以在启动前更改相关设置,无论是否需要进行InnoDB恢复;即便InnoDB引擎处于繁忙状态时,也可以增加或减少UNDO表空间的数量。 16 | 17 | **innodb_undo_tablespaces**: Undo Tablespaces contain Rollback Segments which in turn contain undo logs. Undo Logs are used to ‘rollback’ transactions and to create earlier versions of data which is used by Multi Version Concurrency Control to present a consistent image of the database during a transaction. 18 | 19 |  **innodb_undo_tablespaces**: UNDO表空间包括回滚段,而回滚段又包括UNDO日志。UNDO日志用于'回滚'事务和创建MVCC所需要的旧版本数据,以便在一个事务中保证数据库快照的一致性。 20 | 21 | Previously, the number of undo tablespaces that InnoDB uses was established when the database was initialized. It can now be set to any value between 0 and 127 at any time; at startup in either the config file or on the command line, or while online by issuing ‘SET GLOBAL INNODB_UNDO_TABLESPACES=n’. 22 | 23 | 在以前,当数据库初始化时,InnoDB的UNDO表空间的数量是确定的。而现在可以随时将其值设置为0~127之间的任意值,可通过启动时读取的配置文件,或者命令行,或者通过在线‘SET GLOBAL INNODB_UNDO_TABLESPACES=n’进行修改。 24 | 25 | When you choose zero undo tablespaces, all rollback segments are tracked by the system tablespace. This is the old way of storing Rollback Segments before separate Undo Tablespaces were added in version 5.6. We are trying to move away from using the system tablespace in this way, so the default value is not set to 2. In the near future, the minimum value will become 2, which means that the system tablespace will not be used for any rollback segments. So please do not keep innodb_undo_tablespaces=0 in your config files. 26 | 27 | 当UNDO表空间数量设置为0时(不使用独立UNDO表空间),所有的回滚段将存储在系统表空间中。这是在 5.6版本之前,无法支持独立UNDO表空间存储回滚段情景下的旧模式。我们尝试通过这种方式尽量不使用系统表空间,所以默认值不会设置为2。在将来的版本中该选项最小值是2,这表明系统表空间将不会被用作任何回滚段。所以请不要在你的配置文件中设置innodb_undo_tablespaces=0。 28 | 29 | **innodb_undo_log_truncate**: We chose a minimum of 2 undo tablespaces because you need at least 2 in order for one of them to be truncated. Undo truncation allows InnoDB to shrink the undo tablespace size after unusually large transactions. Previously, the innodb_undo_log_truncate setting defaulted to OFF. With version 8.0.2 it defaults to ON. 30 | 31 |  **innodb_undo_log_truncate**:我们将UNDO表空间的最小值设为2,因为当一个undo表空间被清空时,至少还需要有另一个undo表空间。InnoDB将在大事务结束后进行UNDO清除操作以收缩UNDO表空间大小。以前,innodb_undo_log_truncate的默认值为OFF,而在8.0.2版本该值默认为ON。 32 | 33 | **innodb_rollback_segments**: This can now be set to any value between 1 and 128 at any time; at startup in either the config file or the command line, or while online by issuing ‘SET GLOBAL INNODB_ROLLBACK_SEGMENTS=n’. 34 | 35 |  **innodb_rollback_segments**: 选择可以随时设置为1~128之间的任何值。可通过启动时读取的配置文件,或直接在命令行中传递参数,或者启动后在线执行命令‘SET GLOBAL INNODB_ROLLBACK_SEGMENTS=n’。 36 | 37 | This setting used to be the number of rollback segments that the whole server could support. It is now the number of rollback segments in each undo tablespace, allowing a greater number of rollback segments to be used by concurrent transactions. The default value is still 128. 38 | 39 | 这个选项曾是用于整个服务器可以支持的回滚段数。现在为每一个UNDO表空间的回滚段数,允许并发事务使用更多的回滚段数,该选项默认值仍为128。 40 | 41 | **innodb_undo_logs**: This setting was introduced in 5.6 as an alternate or alias of innodb_rollback_segments. It was a little confusing in terminology since in InnoDB, ‘Undo Logs’ are stored in Rollback Segments, which are file segments of an Undo Tablespace. In v8.0.2, we are dropping the use of this setting and requiring Innodb_rollback_segments to be used instead. The latest released version 5.7.19 contains deprecation warnings if it is used. 42 | 43 |  **innodb_undo_logs**:该选项在5.6中作为innodb_rollback_segments的替代或者别名所引入。在InnoDB中术语有一点儿混乱,‘Undo Logs’存储在回滚段中,这是UNDO表空间的文件段。在8.0.2版本,我们打算弃用该选项改用Innodb_rollback_segments选项。在最新的发布的5.7.19版本中将增加这个不建议使用的warnings提示。 44 | 45 | Undo Tablespace Name and Location: Undo tablespaces are located in the directory specified by the setting innodb_undo_directory. If that setting is not used, they are created in the ‘datadir’ location. Previously they had names like ‘undo001’, ‘undo002’, etc. In v8.0.2 DMR they have names like ‘undo_001’, ‘undo_002’, etc. The reason for the name change is that these newer undo tablespaces contain a new header page that maps the locations of each rollback segment it contains. In version 5.6 when separate undo tablespaces were introduced, their rollback segment header page numbers were tracked in the system tablespace which limited the number of rollback segments for the whole instance to 128. 46 | 47 | UNDO表空间命名和位置:UNDO表空间位于innodb_undo_directory所指定的目录中。如果该选项没有被使用,则放在‘datadir’中。以前,他们被命名为‘undo001’, ‘undo002’。在8.0.2 DMR版本,他们被称作‘undo_001’, ‘undo_002’等。改名的原因是在新的UNDO表空间中包含了一个新的头页面,其映射了每一个回滚段的位置。而在5.6版本中,当使用独立UNDO表空间时,其回滚段头页面号由系统表空间所跟踪,并且限制整个实例的回滚段数为128。 48 | 49 | Since each undo tablespace can now track its own rollback segments with this new page, these are really new types of undo tablespaces and need to have a different naming convention. This is also the reason that innodb_rollback_segments now defines the number of rollback segments per undo tablespace instead of the number for the whole MySQL instance. 50 | 51 | 由于每个UNDO表空间可以使用这个新页来跟踪自己的回滚段,这些是真正新的UNDO表空间类型,并需要有不同的命名约定。这也是innodb_rollback_segments现在定义每个UNDO表空间回滚段数量而不是整个MySQL实例数量的原因。 52 | 53 | **Automatic Upgrade**: Before this change, the system tablespace tracked all rollback segments whether they were in the system tablespace or in undo tablespaces. If you start MySQL 8.0.2 on an existing database that uses the system tablespace to track rollback segments, at least 2 new undo tablespaces will be generated automatically. This will be common since the previous default value for innodb_undo_tablespaces was 0. Mysql 5.7 databases will go through an upgrade process which among other things will create the new DD from the old FRM files. As part of this process, at least 2 new undo tablespaces will be created as well. InnoDB can still use the existing rollback segments and undo tablespaces defined in the system tablespace if they have undo logs in them at startup. But it will not assign these old rollback segments to any new transactions. So once undo recovery is finished and these undo logs are not needed anymore, the old undo tablespaces are deleted. 54 | 55 | **自动升级**:在此更变之前,系统表空间会跟踪所有回滚段,无论他们位于系统表空间还是UNDO表空间。如果现在在一个使用系统表空间跟踪回滚段的已经存在了的数据库上启动MySQL 8.0.2。将自动生成至少两个新的UNDO表空间。这将是很正常的,因为innodb_undo_tablespaces之前的默认值为0。MySQL 5.7数据库将直接通过升级进程,其中包括从旧的FRM文件创建新的DD。作为此进程的一部分,还将创建至少两个新的UNDO表空间。已经存在于系统表空间中的回滚段和undo表空间,如果尚有未清除的undo log,则它们仍然会被InnoDB识别并使用。但是InnoDB不会将这些老的回滚段分配给任何新的事务。所以一旦UNDO恢复完成,并不再需要这些UNDO日志,旧的UNDO表空间也将被删除。 56 | 57 | **Advantages and Benefits**: This change allows you to dynamically add more undo tablespaces and rollback segments as a database installation grows. With more undo tablespaces, it is easier to use undo tablespace truncation to minimize the disk space dedicated to rollback segments. Also, more rollback segments mean that concurrent transactions are more likely to use separate rollback segments for their undo logs which results in less contention for the same resources. 58 | 59 | **更多好处**:此新特性允许你在数据库规模增长时,动态地添加更多的UNDO表空间和回滚段。使用更多的UNDO表空间,可以更轻松地通过UNDO表空间清除来减少用于存放回滚段的磁盘空间消耗。此外,更多的回滚段意味着并发事务可尽可能的使用单独的回滚段,以减少相同资源的争用。 60 | 61 | Thanks for using MySQL! 62 | -------------------------------------------------------------------------------- /mysql/22-How-Network-Bandwidth-Affects-MySQL-Performance.md: -------------------------------------------------------------------------------- 1 | # 网络带宽是如何影响MySQL性能的 2 | 3 | 作者:[Vadim Tkachenko](https://www.percona.com/blog/author/vadim/) 4 | 5 | 发布时间:2018-10-22 6 | 7 | 标签:[network], [network performance] 8 | 9 | 文章原文:[How Network Bandwidth Affects MySQL Performance](https://www.percona.com/blog/2019/02/19/how-network-bandwidth-affects-mysql-performance/) 10 | 11 | *Network is a major part of a database infrastructure. However, often performance benchmarks are done on a local machine, where a client and a server are collocated – I am guilty myself. This is done to simplify the setup and to exclude one more variable (the networking part), but with this we also miss looking at how network affects performance.* 12 | 13 | 网络是数据库基础架构的主要部分。但是,通常性能基准测试是在本地计算机上完成的,客户端和服务器同一台机器上。这样做是为了简化设置并排除网络部分的因素,但是我们也错过了查看网络如何影响MySQL性能 14 | 15 | *The network is even more important for clustering products like [Percona XtraDB Cluster](https://www.percona.com/software/mysql-database/percona-xtradb-cluster) and [MySQL Group Replication](https://dev.mysql.com/doc/refman/8.0/en/group-replication.html). Also, we are working on our [Percona XtraDB Cluster Operator](https://www.percona.com/blog/2019/01/18/percona-xtradb-cluster-operator-early-access-0-2-0-release-is-now-available/) for Kubernetes and OpenShift, where network performance is critical for overall performance.* 16 | 17 | 对于像[Percona XtraDB Cluster](https://www.percona.com/software/mysql-database/percona-xtradb-cluster)和[MySQL Group Replication](https://dev.mysql.com/doc/refman/8.0/en/group-replication.html)这样的集群产品来说,网络更为重要。此外,我们正在为Kubernetes和OpenShift开发pxc operator,其中网络性能对整体性能至关重要。 18 | 19 | *In this post, I will look into networking setups. These are simple and trivial, but are a building block towards understanding networking effects for more complex setups.* 20 | 21 | 在这篇文章中,我将深入探究网络设置。这些都是简单而微不足道的,但它们是了解更复杂设置的网络影响的基石。 22 | 23 | ## 设置 24 | 25 | *I will use two bare-metal servers, connected via a dedicated 10Gb network. I will emulate a 1Gb network by changing the network interface speed with ethtool -s eth1 speed 1000 duplex full autoneg off command* 26 | 27 | 我将使用两台裸机服务器,通过专用的10Gb的网络连接。我将通过`ethtool -s eth1 speed 1000 duplex full autoneg off`命令来模拟1Gb的网络 28 | 29 | ![network test topology](https://www.percona.com/blog/wp-content/uploads/2019/02/network-test-topology.png) 30 | 31 | *I will run a simple benchmark:* 32 | 33 | 我将会运行一个简单的压测 34 | 35 | `sysbench oltp_read_only --mysql-ssl=on --mysql-host=172.16.0.1 --tables=20 --table-size=10000000 --mysql-user=sbtest --mysql-password=sbtest --threads=$i --time=300 --report-interval=1 --rand-type=pareto` 36 | 37 | *This is run with the number of threads varied from 1 to 2048. All data fits into memory – innodb_buffer_pool_size is big enough – so the workload is CPU-intensive in memory: there is no IO overhead.* 38 | 39 | 压测线程数将会从1增长到2048。所有的数据全部都存在内存当中-innodb_buffer_pool_size足够大-所以工作的负载时在cpu密集型:没有IO的开销 40 | 41 | *Operating System: Ubuntu 16.04* 42 | 43 | ### Benchmark N1. Network bandwidth 44 | 45 | *In the first experiment I will compare 1Gb network vs 10Gb network.* 46 | 47 | 在第一个实验当中,我将对比1Gb网络和10Gb网络 48 | 49 | ![1gb vs 10gb network](https://www.percona.com/blog/wp-content/uploads/2019/02/1gb-vs-10gb-network.png) 50 | 51 | | **threads/throughput** | **1Gb network** | **10Gb network** | 52 | | ---------------------- | --------------- | ---------------- | 53 | | 1 | 326.13 | 394.4 | 54 | | 4 | 1143.36 | 1544.73 | 55 | | 16 | 2400.19 | 5647.73 | 56 | | 32 | 2665.61 | 10256.11 | 57 | | 64 | 2838.47 | 15762.59 | 58 | | 96 | 2865.22 | 17626.77 | 59 | | 128 | 2867.46 | 18525.91 | 60 | | 256 | 2867.47 | 18529.4 | 61 | | 512 | 2867.27 | 17901.67 | 62 | | 1024 | 2865.4 | 16953.76 | 63 | | 2048 | 2761.78 | 16393.84 | 64 | 65 | *Obviously the 1Gb network performance is a bottleneck here, and we can improve our results significantly if we move to the 10Gb network.* 66 | 67 | 很显然,1Gb网络性能是这里的瓶颈,如果我们迁移到10Gb网络,我们可以显着改善我们的结果。 68 | 69 | *To see that 1Gb network is bottleneck we can check the network traffic chart in PMM:* 70 | 71 | 我们可以从PMM的网络流量图中可以看到1Gb的网络是瓶颈 72 | 73 | ![network traffic in PMM](https://www.percona.com/blog/wp-content/uploads/2019/02/network-traffic-in-PMM.png) 74 | 75 | *We can see we achieved 116MiB/sec (or 928Mb/sec) in throughput, which is very close to the network bandwidth.* 76 | 77 | 我们可以看到outbound的网络流量已经达到了116MB/s(或者928Mb/s),已经非常接近网络带宽了 78 | 79 | *But what we can do if the our network infrastructure is limited to 1Gb?* 80 | 81 | 但是网络设施在1Gb的限制下,我们还能做什么呢? 82 | 83 | ### Benchmark N2. Protocol compression 84 | 85 | *There is a feature in MySQL protocol whereby you can see the compression for the network exchange between client and server: --mysql-compression=on for sysbench.* 86 | 87 | 在MySQL协议中有一个特性,你可以通过压缩客户端和服务端的网络交换,sysbench可以打开`--mysql-compression` 88 | 89 | *Let’s see how it will affect our results.* 90 | 91 | 让我们来看下它会不会影响我们的结果 92 | 93 | ![1gb network with compression protocol](https://www.percona.com/blog/wp-content/uploads/2019/02/1gb-network-with-compression-protocol.png) 94 | 95 | | threads/throughput | 1Gb network | 1Gb with compression protocol | 96 | | ------------------ | ----------- | ----------------------------- | 97 | | 1 | 326.13 | 198.33 | 98 | | 4 | 1143.36 | 771.59 | 99 | | 16 | 2400.19 | 2714 | 100 | | 32 | 2665.61 | 3939.73 | 101 | | 64 | 2838.47 | 4454.87 | 102 | | 96 | 2865.22 | 4770.83 | 103 | | 128 | 2867.46 | 5030.78 | 104 | | 256 | 2867.47 | 5134.57 | 105 | | 512 | 2867.27 | 5133.94 | 106 | | 1024 | 2865.4 | 5129.24 | 107 | | 2048 | 2761.78 | 5100.46 | 108 | 109 | *Here is an interesting result. When we use all available network bandwidth, the protocol compression actually helps to improve the result.* 110 | 111 | 实验结果很有意思,当我们用尽了所有的网络带宽的时候,压缩协议实际上是帮我们提高了吞吐量 112 | 113 | ![10g network with compression protocol](https://www.percona.com/blog/wp-content/uploads/2019/02/10g-network-with-compression-protocol.png) 114 | 115 | | threads/throughput | 10Gb | 10Gb with compression | 116 | | ------------------ | -------- | --------------------- | 117 | | 1 | 394.4 | 216.25 | 118 | | 4 | 1544.73 | 857.93 | 119 | | 16 | 5647.73 | 3202.2 | 120 | | 32 | 10256.11 | 5855.03 | 121 | | 64 | 15762.59 | 8973.23 | 122 | | 96 | 17626.77 | 9682.44 | 123 | | 128 | 18525.91 | 10006.91 | 124 | | 256 | 18529.4 | 9899.97 | 125 | | 512 | 17901.67 | 9612.34 | 126 | | 1024 | 16953.76 | 9270.27 | 127 | | 2048 | 16393.84 | 9123.84 | 128 | 129 | *But this is not the case with the 10Gb network. The CPU resources needed for compression/decompression are a limiting factor, and with compression the throughput actually only reach half of what we have without compression.* 130 | 131 | 但是在10Gb网络下并没有起到作用,压缩和解压缩需要的cpu资源变成了瓶颈,压缩下的性能实际上只有未压缩的一半 132 | 133 | *Now let’s talk about protocol encryption, and how using SSL affects our results.* 134 | 135 | 现在,我们来讨论下协议加密和使用SSL是否影响我们的结果 136 | 137 | ### Benchmark N3. Network encryption 138 | 139 | ![1gb network and 1gb with SSL](https://www.percona.com/blog/wp-content/uploads/2019/02/1gb-network-and-1gb-with-SSL.png) 140 | 141 | | threads/throughput | 1Gb network | 1Gb SSL | 142 | | ------------------ | ----------- | ------- | 143 | | 1 | 326.13 | 295.19 | 144 | | 4 | 1143.36 | 1070 | 145 | | 16 | 2400.19 | 2351.81 | 146 | | 32 | 2665.61 | 2630.53 | 147 | | 64 | 2838.47 | 2822.34 | 148 | | 96 | 2865.22 | 2837.04 | 149 | | 128 | 2867.46 | 2837.21 | 150 | | 256 | 2867.47 | 2837.12 | 151 | | 512 | 2867.27 | 2836.28 | 152 | | 1024 | 2865.4 | 1830.11 | 153 | | 2048 | 2761.78 | 1019.23 | 154 | 155 | ![10gb network and 10gb with SSL](https://www.percona.com/blog/wp-content/uploads/2019/02/10gb-network-and-10gb-with-SSL.png) 156 | 157 | | **threads/throughput** | **10Gb** | **10Gb SSL** | 158 | | ---------------------- | -------- | ------------ | 159 | | 1 | 394.4 | 359.8 | 160 | | 4 | 1544.73 | 1417.93 | 161 | | 16 | 5647.73 | 5235.1 | 162 | | 32 | 10256.11 | 9131.34 | 163 | | 64 | 15762.59 | 8248.6 | 164 | | 96 | 17626.77 | 7801.6 | 165 | | 128 | 18525.91 | 7107.31 | 166 | | 256 | 18529.4 | 4726.5 | 167 | | 512 | 17901.67 | 3067.55 | 168 | | 1024 | 16953.76 | 1812.83 | 169 | | 2048 | 16393.84 | 1013.22 | 170 | 171 | *For the 1Gb network, SSL encryption shows some penalty – about 10% for the single thread – but otherwise we hit the bandwidth limit again. We also see some scalability hit on a high amount of threads, which is more visible in the 10Gb network case.* 172 | 173 | 对于1Gb网络,SSL加密显示了一些损失 - 单线程约为10% - 但是其他情况下我们再次达到带宽限制。在10Gb网络中,我们看到在大量线程下性能损失更加明显。 174 | 175 | *With 10Gb, the SSL protocol does not scale after 32 threads. Actually, it appears to be a scalability problem in OpenSSL 1.0, which MySQL currently uses.* 176 | 177 | 在10Gb网络下,使用SSL协议在32线程后,吞吐量并没有增加。实际上,它似乎是MySQL目前使用的OpenSSL 1.0中的可伸缩性问题。 178 | 179 | *In our experiments, we saw that OpenSSL 1.1.1 provides much better scalability, but you need to have a special build of MySQL from source code linked to OpenSSL 1.1.1 to achieve this. I don’t show them here, as we do not have production binaries.* 180 | 181 | 在我们的实验中,我们看到OpenSSL 1.1.1提供了更好的可伸缩性,但是您需要从链接到OpenSSL 1.1.1的源代码中获得特殊的MySQL构建来实现这一点。我没有在这里展示它们,因为我们没有生产版本的二进制文件。 182 | 183 | ## Conclusions 184 | 185 | 1. Network performance and utilization will affect the general application throughput. 186 | 2. Check if you are hitting network bandwidth limits 187 | 3. Protocol compression can improve the results if you are limited by network bandwidth, but also can make things worse if you are not 188 | 4. SSL encryption has some penalty (~10%) with a low amount of threads, but it does not scale for high concurrency workloads. 189 | 190 | ## 结论 191 | 192 | 1. 网络性能和利用率将影响一般应用程序吞吐量 193 | 2. 检查您是否达到了网络带宽限制 194 | 3. 如果受到网络带宽的限制,协议压缩可以改善结果,但如果不是,则会使事情变得更糟 195 | 4. SSL加密在线程数量较少的情况下会有一些损失(约10%),但对于高并发工作负载,性能并不会有所增长 -------------------------------------------------------------------------------- /mysql/24-Replace MariaDB 10.3 by MySQL 8.0.md: -------------------------------------------------------------------------------- 1 | > 原文 https://lefred.be/content/replace-mariadb-10-3-by-mysql-8-0/ 2 | > 作者:[lefred](https://lefred.be/content/author/lefred/) 3 | > 翻译:无名队 4 | 5 | # 为什么要迁移到MySQL8.0? 6 | 7 | *MySQL 8.0 brings a lot of new features. These features make MySQL database much more secure (like new authentication, secure password policies and management, …) and fault tolerant (new data dictionary), more powerful (new redo log design, less contention, extreme scale out of InnoDB, …), better operation management (SQL Roles, instant add columns), many (but really many!) replication enhancements and native group replication… and finally many cool stuff like the new Document Store, the new MySQL Shell and MySQL InnoDB Cluster that you should already know if you follow this blog (see these [TOP 10 for features for developers](https://lefred.be/content/top-10-mysql-8-0-features-for-developers/) and this [TOP 10 for DBAs & OPS](https://lefred.be/content/top-10-mysql-8-0-features-for-dbas-ops/)).* 8 | 9 | MySQL8.0带来了很多新特性。这些新特性使得MySQL数据库更加安全(例如新的认证方式,安全的密码策略和管理方式,...)和容错(新的数据字典)功能更强大(新的redo设计,争用更少,极度扩展InnoDB,…),更好的操作管理(SQL角色,即时添加列 ),很多(其实真的很多)复制增强和本地组复制...最后还有很多很酷的东西例如文档存储,全新的MySQL Shell和MySQL InnoDB cluster,如果你看过以下这些博客的话你应该已经知道了([TOP 10 for features for developers](https://lefred.be/content/top-10-mysql-8-0-features-for-developers/) 和[TOP 10 for DBAs & OPS](https://lefred.be/content/top-10-mysql-8-0-features-for-dbas-ops/))) 10 | 11 | ## 不再是替代品 12 | 13 | *We saw in this previous post how to migrate from MariaDB 5.5 (default on CentOS/RedHat 7) to MySQL. This was a straight forward migration as at the time MariaDB was a drop in replacement for MySQL…**but this is not the case anymore since MariaDB 10.x !*** 14 | 15 | 我们在上一篇文章中看到了如何从MariaDB 5.5(在CentOS/RedHat7上默认)迁移到MySQL。这是一个直接的迁移,因为当时MariDB是MySQL的替代品…但是从MariaDB 10.x开始情况就不一样了。 16 | 17 | 让我们开始迁移到MySQL8.0 18 | 19 | ## 选项 20 | 21 | *Two possibilities are available to us:* 22 | 23 | 1. *Use logical dump for schemes and data* 24 | 2. *Use logical dump for schemes and transportable InnoDB tablespaces for the data* 25 | 26 | 我们有两种方式: 27 | 28 | - 对schema和数据逻辑导出 29 | - 对schema逻辑导出,使用InnoDB表空间交换处理数据 30 | 31 | ## 准备迁移 32 | 33 | **方式1-全部逻辑导出** 34 | 35 | *It’s recommended to avoid to have to deal with `mysql.*` tables are they won’t be compatible, I recommend you to save all that information and import the required entries like users manually. It’s maybe the best time to do some cleanup.* 36 | 37 | 最好不要迁移`mysql.*`这些表,因为它们不兼容,我建议你保存所有的信息并且手动导入需要的条目例如用户表。这可能是做一些清理的最佳时机。 38 | 39 | *As we are still using our WordPress site to illustrate this migration. I will dump the `wp` database:* 40 | 41 | 我们仍然使用我们的WordPress网站来演示迁移。我将导出`wp`数据库: 42 | 43 | ```mysql 44 | mysqldump -B wp> wp.sql 45 | ``` 46 | 47 | > *MariaDB doesn’t provide* `mysqlpump`*, so I used the good old* `mysqldump`*. There was a nice article this morning about MySQL logical dump solutions,* [see it here](https://mydbops.wordpress.com/2019/03/26/mysqldump%E2%80%8B-vs-mysqlpump-vs-mydumper/)*.* 48 | > 49 | > MariaDB没有提供`mysqlpump`,所以我们使用了`mysqldump`。这里有一篇很好的关于MySQL逻辑导出解决方案的文章,[请看这里](https://mydbops.wordpress.com/2019/03/26/mysqldump%E2%80%8B-vs-mysqlpump-vs-mydumper/) 50 | 51 | **方式2**-表结构导出 & InnoDB表传输 52 | 53 | *First we take a dump of our database without the data (`-d`):* 54 | 55 | 首先我们只导出数据库结构 56 | 57 | ```mysql 58 | mysqldump -d -B wp > wp_nodata.sq 59 | ``` 60 | 61 | *Then we export the first table space:* 62 | 63 | 然后我们导出第一个表空间 64 | 65 | ```mysql 66 | [wp]> flush tables wp_comments for export; 67 | Query OK, 0 rows affected (0.008 sec 68 | ``` 69 | 70 | *We copy it to the desired location (the `.ibd` and the `.cfg`):* 71 | 72 | 我们将其拷贝到所需的位置(`.ibd`和`.cfg`) 73 | 74 | ```shell 75 | cp wp/wp_comments.ibd ~/wp_innodb/ 76 | cp wp/wp_comments.cfg ~/wp_innodb/ 77 | ``` 78 | 79 | *And finally we unlock the table:* 80 | 81 | 最后,我们解锁表 82 | 83 | ```mysql 84 | [wp]> unlock tables; 85 | ``` 86 | 87 | *These operation above need to be repeated for all the tables ! If you have a large amount of table I encourage you to script all these operations.* 88 | 89 | 以上这些操作需要为每个表都重复做一次!如果你有很多表,我建议你使用脚本来做这些操作 90 | 91 | ## 替换二进制文件/安装MySQL 8.0 92 | 93 | *Unlike previous version, if we install MySQL from the Community Repo as seen on this post, MySQL 8.0 won’t be seen as a conflicting replacement for MariaDB 10.x. To avoid any conflict and installation failure, we will replace the MariaDB packages by the MySQL ones using the **swap** command of `yum`:* 94 | 95 | 与以前的版本不同,如果我们从社区网站上安装MySQL,MySQL8.0将不会被视为MariaDB 10.x兼容替代品。为了避免任何不兼容和安装失败,我们将使用`yum swap`的命令来将MySQL包替换MariaDB的包 96 | 97 | ```shell 98 | yum swap -- install mysql-community-server mysql-community-libs-compat -- \ 99 | remove MariaDB-server MariaDB-client MariaDB-common MariaDB-compat 100 | ``` 101 | 102 | > *This new yum command is very useful, and allow other dependencies like php-mysql or postfix for example to stay installed without breaking some dependencies* 103 | > 104 | > 这个新的yum命令非常有用,并且允许其他依赖项(如php-mysql或postfix)保持安装而不会破坏某些依赖项 105 | 106 | *The result of the command will be something similar to:* 107 | 108 | 这个命令的结果类似于 109 | 110 | ```shell 111 | Removed: 112 | MariaDB-client.x86_64 0:10.3.13-1.el7.centos 113 | MariaDB-common.x86_64 0:10.3.13-1.el7.centos 114 | MariaDB-compat.x86_64 0:10.3.13-1.el7.centos 115 | MariaDB-server.x86_64 0:10.3.13-1.el7.centos 116 | Installed: 117 | mysql-community-libs-compat.x86_64 0:8.0.15-1.el7 118 | mysql-community-server.x86_64 0:8.0.15-1.el7 119 | Dependency Installed: 120 | mysql-community-client.x86_64 0:8.0.15-1.el7 121 | mysql-community-common.x86_64 0:8.0.15-1.el7 122 | mysql-community-libs.x86_64 0:8.0.15-1.el7 123 | ``` 124 | 125 | *Now the best is to empty the datadir and start `mysqld`:* 126 | 127 | 现在最好清空datadir然后启动`mysqld`: 128 | 129 | ```shell 130 | rm -rf /var/lib/mysql/* 131 | systemctl start mysql 132 | ``` 133 | 134 | *This will start the initialize process and start MySQL.* 135 | 136 | *As you may know, by default MySQL is now more secure and a new password has been generated to the `root` user. You can find it in the error log (`/var/log/mysqld.log`):* 137 | 138 | 这将会开始初始化进程然后启动MySQL 139 | 140 | 你可能知道,默认情况下,MySQL现在更加安全,并且已为`root`用户生成密码。你可以在错误日志(/var/log/mysqld.log)中找到它: 141 | 142 | ```mysql 143 | 2019-03-26T12:32:14.475236Z 5 [Note] [MY-010454] [Server] 144 | A temporary password is generated for root@localhost: S/vfafkpD9a 145 | ``` 146 | 147 | *At first login with the `root` user, the password must be changed:* 148 | 149 | 第一次使用`root`用户登录,必须更改密码: 150 | 151 | ```mysql 152 | mysql -u root -p 153 | mysql> set password='Complicate1#' 154 | ``` 155 | 156 | ## 添加凭据 157 | 158 | *Now we need to create our database (`wp`), our user and its credentials.* 159 | 160 | 现在我们需要创建我们的数据库(wp),我们的用户及其密码 161 | 162 | > Please, note that the PHP version used by default in CentOS might now be yet compatible with the new default secure authentication plugin, therefor we will have to create our user with the older authentication plugin, `mysql_native_password`. For more info see these posts: 163 | > 164 | > 请注意,CentOS中默认使用的PHP版本现在可能与新的默认安全认证插件兼容,因此我们必须使用旧的认证插件创建我们的用户`mysql_native_password`。有关更多信息,请参阅以下帖子 165 | > 166 | > \- [在不破坏旧应用程序的情况下迁移到MySQL 8.0](https://lefred.be/content/migrating-to-mysql-8-0-without-breaking-old-application/) 167 | > 168 | > \- [Drupal和MySQL 8.0.11 - 我们在那里吗?](https://lefred.be/content/drupal-and-mysql-8-0-11-are-we-there-yet/) 169 | > 170 | > \- [Joomla!和MySQL 8.0.12](https://lefred.be/content/joomla-and-mysql-8-0-12/) 171 | > 172 | > \- [PHP 7.2.8和MySQL 8.0](https://lefred.be/content/php-7-2-8-mysql-8-0/) 173 | 174 | ```mysql 175 | mysql> create user 'wp'@'127.0.0.1' identified with 176 | 'mysql_native_password' by 'fred'; 177 | ``` 178 | 179 | > by default, this password (*fred*) won’t be allowed with the default [password policy](https://dev.mysql.com/doc/refman/8.0/en/validate-password-options-variables.html#sysvar_validate_password.policy). 180 | > 181 | > To not have to change our application, it’s possible to override the policy like this: 182 | > 183 | > 默认情况下,这个密码(fred)不会被默认的密码策略通过。为了不修改我们的程序,可以通过下面的命令来覆盖策略: 184 | > 185 | > ```mysql 186 | > mysql> set global validate_password.policy=LOW; 187 | > mysql> set global validate_password.length=4 188 | > ``` 189 | 190 | *It’s possible to see the user and its authentication plugin easily using the following query:* 191 | 192 | 可以通过如下sql很轻松地查看用户及相应的认证插件 193 | 194 | ```mysql 195 | mysql> select Host, User, plugin,authentication_string from mysql.user where User='wp'; 196 | +-----------+------+-----------------------+-------------------------------------------+ 197 | | Host | User | plugin | authentication_string | 198 | +-----------+------+-----------------------+-------------------------------------------+ 199 | | 127.0.0.1 | wp | mysql_native_password | *6C69D17939B2C1D04E17A96F9B29B284832979B7 | 200 | +-----------+------+-----------------------+-------------------------------------------+ 201 | ``` 202 | 203 | *We can now create the database and grant the privileges to our user:* 204 | 205 | 现在我们可以创建数据库并授权给我们的用户: 206 | 207 | ```mysql 208 | mysql> create database wp; 209 | Query OK, 1 row affected (0.00 sec) 210 | mysql> grant all privileges on wp.* to 'wp'@'127.0.0.1'; 211 | Query OK, 0 rows affected (0.01 sec) 212 | ``` 213 | 214 | ## 恢复数据 215 | 216 | *This process is also defined by the options chosen earlier.* 217 | 218 | 此过程也由前面的选择而定 219 | 220 | 方式1 221 | 222 | *This option, is the most straight forward, one restore and our site is back online:* 223 | 224 | 这个方式最直接,一次还原然后我们的网站重新上线: 225 | 226 | ```mysql 227 | mysql -u wp -pfred wp <~/wp.sql 228 | ``` 229 | 230 | 方式2 231 | 232 | *This operation is more complicated as it requires more steps.* 233 | 234 | *First we will have to restore all the schema with no data:* 235 | 236 | 这个方式相对来说更复杂因为它需要更多步骤 237 | 238 | 首先我们需要先恢复schema结构 239 | 240 | ```mysql 241 | mysql -u wp -pfred wp <~/wp_nodata.sql 242 | ``` 243 | 244 | *And now for every tables we need to perform the following operations:* 245 | 246 | 然后,对于每张表我们需要进行如下操作 247 | 248 | ```mysql 249 | mysql> alter table wp_posts discard tablespace; 250 | 251 | cp ~/wp_innodb/wp_posts.ibd /var/lib/mysql/wp/ 252 | cp ~/wp_innodb/wp_posts.cfg /var/lib/mysql/wp/ 253 | chown mysql. /var/lib/mysql/wp/wp_posts.* 254 | 255 | mysql> alter table wp_posts import tablespace 256 | ``` 257 | 258 | *Yes, this is required for all tables, this is why I encourage you to script it if you choose this option.* 259 | 260 | 是的,所有的表都需要这么操作,所以这也是为什么我建议你使用脚本来跑如果你选择了这种方式 261 | 262 | ## 结论 263 | 264 | *So as you could see, it’s still possible to migrate from MariaDB to MySQL but since 10.x, this is not a drop in replacement anymore and requires several steps including logical backup.* 265 | 266 | 正如你看到的,仍然可以从MariaDB迁移到MySQL,但是从10.x开始,这不再是替代品,需要几个步骤,包括逻辑备份。 -------------------------------------------------------------------------------- /mysql/25-MySQL InnoDB Cluster – consistency levels.md: -------------------------------------------------------------------------------- 1 | > 原文 : 2 | > 3 | > 作者:[lefred](https://lefred.be/content/author/lefred/) 4 | > 5 | > 翻译:无名队 6 | 7 | *Consistency during reads have been a small concern from the adopters of MySQL InnoDB Cluster (see [this post](https://lefred.be/content/mysql-group-replication-read-your-own-write-across-the-group/) and [this one](https://proxysql.com/blog/proxysql-gtid-causal-reads)).* 8 | 9 | *This is why MySQL supports now (since 8.0.14) a new consistency model to avoid such situation when needed.* 10 | 11 | *[Nuno Carvalho](https://twitter.com/rekconk) and Aníbal Pinto already posted a blog series I highly encourage you to read:* 12 | 13 | - *[Group Replication – Consistency Levels](https://mysqlhighavailability.com/group-replication-consistency-levels/)* 14 | - *[Group Replication: Preventing stale reads on primary fail-over!](https://mysqlhighavailability.com/group-replication-preventing-stale-reads-on-primary-fail-over/) (you can also check [this post](https://lefred.be/content/mysql-innodb-cluster-8-0-12-abort_server/))* 15 | - *[Group Replication – Consistent Reads](https://mysqlhighavailability.com/group-replication-consistent-reads)* 16 | - *[Group Replication – Consistent Reads Deep Dive](https://mysqlhighavailability.com/group-replication-consistent-reads-deep-dive/)* 17 | 18 | *After those great articles, let’s check how that does work with some examples.* 19 | 20 | 一致性读取已经成为MySQL InnoDB Cluster使用者的关心的话题(看[这篇文章](https://lefred.be/content/mysql-group-replication-read-your-own-write-across-the-group/)和[这篇文章](https://proxysql.com/blog/proxysql-gtid-causal-reads)) 21 | 22 | 这就是为什么MySQL 8.0.14开始,当我们需要避免这些情况时,支持使用新的一致性模型。 23 | 24 | 我强烈建议你阅读Nuno Carvalho和Anibal Pinto的系列文章 25 | 26 | - [Group Replication – Consistency Levels](https://mysqlhighavailability.com/group-replication-consistency-levels/) 27 | - [Group Replication: Preventing stale reads on primary fail-over!](https://mysqlhighavailability.com/group-replication-preventing-stale-reads-on-primary-fail-over/) (you can also check [this post](https://lefred.be/content/mysql-innodb-cluster-8-0-12-abort_server/)) 28 | - [Group Replication – Consistent Reads](https://mysqlhighavailability.com/group-replication-consistent-reads) 29 | - [Group Replication – Consistent Reads Deep Dive](https://mysqlhighavailability.com/group-replication-consistent-reads-deep-dive/) 30 | 31 | 在看过这些优秀文章后,让我们通过一些例子来探究它是如何工作的。 32 | 33 | ## 环境 34 | 35 | *This is how the environment is setup:* 36 | 37 | - *3 members: `mysql1`, `mysql2` & `mysql3`* 38 | - *the cluster runs in Single-Primay mode* 39 | - *`mysql1` is the Primary Master* 40 | - *some [extra sys views](https://gist.github.com/lefred/153448f7ea0341d6d0daa2738db6fcd8) are installed* 41 | 42 | 以下是环境设置: 43 | 44 | - 3个成员:`mysql1`,`mysql2` & `mysql3` 45 | - 集群运行模式为Single-Primary模式 46 | - `mysql1`为Primary Master 47 | - 预装了某些额外的sys views 48 | 49 | ## 案例1-EVENTUAL 50 | 51 | *This is the default behavior (`group_replication_consistency='EVENTUAL'`). The scenario is the following:* 52 | 53 | - *we display the default value of the session variable controlling the Group Replication Consistency on the Primary and on one Secondary* 54 | - *we lock a table on a Secondary master (`mysql3`) to block the apply of the transaction coming from the Primary* 55 | - *we demonstrate that even if we commit a new transaction on `mysql1`, we can read the table on `mysql3` and the new record is missing (the write could not happen due to the lock)* 56 | - *once unlocked, the transaction is applied and the record is visible on the Secondary master (`mysql3`) too* 57 | 58 | group_replication_consistency='EVENTUAL'`是默认的参数设置。以下是方案: 59 | 60 | - 我们演示了控制Primary和其中一个Secondary间组复制一致性的session级变量的默认值 61 | - 我们在Secondary Master(`mysql3`)锁定了一张表来阻塞应用来自Primay的事务 62 | - 我们演示了即使我们在`mysql1`上提交了一个新的事务,我们在`mysql3`上读取表发现新的记录丢失了(由于锁的存在导致写入无法发生) 63 | - 一旦释放锁,事务被应用并且在Secondary Master(`mysql3`)也变得可见了 64 | 65 | ## 案例2-BEFORE 66 | 67 | *In this example, we will illustrate how we can avoid inconsistent reads on a Secondary master:* 68 | 69 | 在这个例子中,我们将会说明我们如何在Secondary Master上避免非一致性读取。 70 | 71 | *As you could notice, once we have set the session variable controlling the consistency, operations on the table (the server is READ-ONLY) are waiting for the Apply Queue to be empty before returning the result set.* 72 | 73 | 正如你所看到的,一旦我们设置了session变量控制一致性,在返回结果前,表上的操作(server设置了READ_ONLY)都会等待应用队列清空。 74 | 75 | *We could also notice that the wait time (timeout) for this read operation is very long (8 hours by default) and can be modified to a shorter period:* 76 | 77 | 我们同样也可以注意到读取操作的等待时间(超时时间)非常长(默认8小时)并且可以修改到更短的时间: 78 | 79 | *We used `SET wait_timeout=10` to define it to 10 seconds.* 80 | 81 | 我们设置了`wait_timeout=10`来定义到10秒 82 | 83 | *When the timeout is reached, the following error is returned:* 84 | 85 | 当达到超时时间后,将会报出如下错误: 86 | 87 | ```mysql 88 | ERROR: 3797: Error while waiting for group transactions commit on group_replication_consistency= 'BEFORE' 89 | ``` 90 | 91 | ## 案例3-AFTER 92 | 93 | *It’s also possible to return from commit on the **writer** only when all members applied the change too. Let’s check this in action too:* 94 | 95 | 只有当所有成员也应用了更改时,写入的机器才能接收到commit的返回。让我们来检查下这个动作: 96 | 97 | *This can be considered as synchronous writes as the return from commit happens only when all members have applied it. However you could also notice that in this consistency level, `wait_timeout` has not effect on the write. In fact `wait_timeout` has only effect on **read operations** when the consistency level is different than `EVENTUAL`.* 98 | 99 | 这可以被视为同步写入,只有当所有的成员已经应用了,commit才能发生。 然而你可以发现在该一致性级别下,`wait_timeout`在写入中并没有影响。事实上,`wait_timeout`只会影响读取操作并且一致性级别不是`EVENTUAL`的情况下。 100 | 101 | *This means that this can lead to several issues if you lock a table for any reason. If the DBA needs to perform some maintenance operations and requires to lock a table for a long time, it’s mandatory to not operate queries in `AFTER` or ` BEFORE_AND_AFTER`while in such maintenance.* 102 | 103 | 如果你不管任何原因锁定了一张表,这意味着这将引发几个问题。如果DBA需要做一些维护操作并且需要长时间锁定一张表,在维护过程中,必须避免在`AFTER`或`BEFORE_AND_AFTER`中查询。 104 | 105 | ## 案例4-Scope 106 | 107 | *In the following video, I just want to show you the “scope” of these “waits” for transactions that are in the applying queue.* 108 | 109 | 在下面的视频中,我只想向您演示应用队列中"等待"事务的"范围" 110 | 111 | *We will lock again `t1` but on a Secondary master, we will perform a `SELECT` from table `t2`, the first time we will keep the default value of `group_replication_consistency`(`EVENTUAL`) and the second time we will change the consistency level to `BEFORE` :* 112 | 113 | 我们将会在一个Secondary master上再次锁定`t1`表,然后在`t2`表上执行一个select,第一次测试我们会设置`group_replication_consistency`为默认值(`EVENTUAL`),第二次测试我们会调整一致性级别为`BEFORE`。 114 | 115 | *We could see that as soon as they are transactions in the apply queue, if you change the consistency level to something `BEFORE`, it needs to wait for the previous transactions in the queue to be applied even if those events are related or not to the same table(s) or record(s). It doesn’t matter* 116 | 117 | 我们可以看到,只要它们是应用队列中的事务,如果你将一致性级别修改为`BEFORE`,它将队列中先前的事务被应用,即使这些表或记录是相关或不相关的,这都无所谓。 118 | 119 | ## 案例5-Observability 120 | 121 | *Of course it’s possible to check what’s going on and if queries are waiting for something.* 122 | 123 | 当然,我们需要检查发生了什么以及是否有查询在等待。 124 | 125 | BEFORE 126 | 127 | *When `group_replication_consistency` is set to **BEFORE** (or includes it), while a transaction is waiting for the applying queue to be committed, it’s possible to track those waiting transactions by running the following query:* 128 | 129 | 当`group_replication_consistency`设置为BEFORE(或者包含它),当一个事务正在等待应用队列被提交,可以通过以下sql来查询这些等待的事务: 130 | 131 | ```mysql 132 | SELECT * FROM information_schema.processlist 133 | WHERE state='Executing hook on transaction begin.'; 134 | ``` 135 | 136 | AFTER 137 | 138 | *When `group_replication_consistency` is set to **AFTER** (or includes it), while a transaction is waiting for the transaction to be committed on the other members too, it’s possible to track those waiting transactions by running the following query:* 139 | 140 | 当`group_replication_consistency`设置为AFTER(或者包含它),当一个事务正在等待在其他成员节点上被提交,可以通过以下sql来查询这些等待的事务: 141 | 142 | ```mysql 143 | SELECT * FROM information_schema.processlist 144 | WHERE state='waiting for handler commit'; 145 | ``` 146 | 147 | *It’s also possible to have even more information joining the processlist and InnoDB Trx tables:* 148 | 149 | 通过关联processlist和InnoDB Trx表能够得到更多信息 150 | 151 | ```mysql 152 | SELECT *, TIME_TO_SEC(TIMEDIFF(now(),trx_started)) lock_time_sec 153 | FROM information_schema.innodb_trx JOIN information_schema.processlist 154 | ON processlist.ID=innodb_trx.trx_mysql_thread_id 155 | WHERE state='waiting for handler commit' ORDER BY trx_started\G 156 | ``` 157 | 158 | ## 结论 159 | 160 | *This consistency level is a wonderful feature but it could become dangerous if abused without full control of your environment.* 161 | 162 | 一致性级别是个非常好的特性,但如果在没有完全控制环境的情况下滥用,它可能会变得危险。 163 | 164 | *I would avoid to set anything `AFTER` globally if you don’t control completely your environment. Table locks, DDLs, logical backups, snapshots could all delay the commits and transactions could start pilling up on the Primary Master. But if you control your environment, you have now the complete freedom to control completely the consistency you need on your MySQL InnoDB Cluster.* 165 | 166 | 如果你没有完全控制你的环境请避免设置为全局`AFTER`。表锁、DDL、逻辑备份、快照等都可能在Primary Master上造成延迟提交和事务。但是,如果控制了你的环境,你现在完全可以控制你的MySQL InnoDB集群一致性级别。 -------------------------------------------------------------------------------- /mysql/26-MySQL Memory Management, Memory Allocators and Operating System.md: -------------------------------------------------------------------------------- 1 | # MySQL内存管理,内存分配器和操作系统 2 | > 原文 :[MySQL Memory Management, Memory Allocators and Operating System]() 3 | > 4 | > 作者:[Sveta Smirnova](https://www.percona.com/blog/author/sveta-smirnova/) 5 | > 6 | > 翻译:郑志江 7 | > 8 | > 校对:徐晨亮 9 | 10 | *When users experience memory usage issues with any software, including MySQL®, their first response is to think that it’s a symptom of a memory leak. As this story will show, this is not always the case.* 11 | 12 | *This story is about a bug* 13 | 14 | 当用户使用软件(包括MySQL)碰到内存问题时,我们第一反应就是内存泄漏。正如这篇文章所示,并不总是这样。 15 | 16 | 这篇文章阐述一个关于内存的bug。 17 | 18 | *All Percona Support customers are eligible for bug fixes, but their options vary. For example, [Advanced+](https://www.percona.com/services/support/support-tiers-mysql) customers are offered a HotFix build prior to the public release of software with the patch. [Premium](https://www.percona.com/services/support/support-tiers-mysql) customers do not even have to use Percona software: we may port our patches to upstream for them. But for Percona products all Support levels have the right to have a fix.* 19 | 20 | 所有的percona支持的客户都有获得bug修复的资格,但他们也有不同的选择。比如,超级客户在软件补丁正式发布之前就可以获得hotfiix版本,优质客户甚至不需要使用percona的软件,我们也可以为他们把补丁推到上游。但对于与percona产品来说,所有支持等级都有权得到bug修复。 21 | 22 | *Even so, this does not mean we will fix every unexpected behavior, even if we accept that behavior to be a valid bug. One of the reasons for such a decision might be that while the behavior is clearly wrong for Percona products, this is still a feature request.* 23 | 24 | 即便如此,这并不意味着我们会修复所有的意外情况,即使我们接受这种情况为一个有效bug。做出这样的决定的原因之一可能是这个意外情况虽然很明确是错误的,但对于percona产品本身来说确实一个产品需求 25 | 26 | ## 作为学习案例的一个bug 27 | 28 | *A good recent example of such a case is [PS-5312](https://jira.percona.com/browse/PS-5312) – the bug is repeatable with upstream and reported at [bugs.mysql.com/95065*](https://bugs.mysql.com/bug.php?id=95065) 29 | 30 | 最近一个很好的案例是 [PS-5312](https://jira.percona.com/browse/PS-5312)——这个bug可在上游复现并被记录在[bugs.mysql.com/95065](https://www.percona.com/blog/2019/05/02/mysql-memory-management-memory-allocators-and-operating-system/)。 31 | 32 | *This reports a situation whereby access to InnoDB fulltext indexes leads to growth in memory usage. It starts when someone queries a fulltext index, grows until a maximum, and is not freed for quite a long time.* 33 | 34 | 这个报告阐述了一种情况,当访问InnoDB的全文索引的时候会导致内存使用量增长。这种情况出现在一些全文索引的查询,内存会持续增长直到达到最大值,并且很长时间不会释放。 35 | 36 | [*Yura Sorokin](https://jira.percona.com/secure/ViewProfile.jspa?name=yura.sorokin) from the Percona Engineering Team investigated if this is a memory leak and found that it is not.* 37 | 38 | 来自Percona工程团队的Yura Sorokin研究表明,这种情况并不属于内存泄漏范畴。 39 | 40 | *When InnoDB resolves a fulltext query, it creates a memory heap in the function fts_query_phrase_search This heap may grow up to 80MB. Additionally, it has a big number of blocks ( mem_block_t ) which are not always used continuously and this, in turn, leads to memory fragmentation.* 41 | 42 | 当InnoDB解析一个全文查询时,它会创建在`fts_query_phrase_search`函数中创建一个内存堆,这个堆可能增长到80M。另外,这个过程还会使用到大量非连续块(`mem_block_t`)进而产生的内存碎片。 43 | 44 | *In the function exit , the memory heap is freed. InnoDB does this for each of the allocated blocks. At the end of the function, it calls free() which belongs to one of the memory allocator libraries, such as malloc or [jemalloc](http://jemalloc.net/). From the mysqld point of view, everything is done correctly: there is no memory leak.* 45 | 46 | 在函数出口,这些内存堆会被释放。InnoDB会为其分配的每一个块做这个操作。在函数结束时,会调用一个内存分配器库中的`free()`操作,比如`malloc`或者`jemalloc`。从MySQL本身来看,这都是没问题的,不存在内存泄漏。 47 | 48 | *However while free() should release memory when called, it is not required to return it back to the operating system. If the memory allocator decides that the same memory blocks will be required soon, it may still keep them for the mysqld process. This explains why you might see that mysqld still uses a lot of memory after the job is finished and all de-allocations are done.* 49 | 50 | 然而,调用`free()`确实时是应该释放内存,但不需要将其返回给操作系统。如果内存分配器发现这些内存块马上还需要被用到,则会将他们保留住继续用于mysqld进程。这就解释了为什么mysqld在job完成且所有取消资源分配都结束后还会占用大量内存。 51 | 52 | *This in practice is not a big issue and should not cause any harm. But if you need the memory to be returned to the operating system quicker, you could try alternative memory allocators, such as [jemalloc](http://jemalloc.net/). The latter was proven to solve the issue with [PS-5312](https://jira.percona.com/browse/PS-5312).* 53 | 54 | 这个在实际生产中并不是一个大问题,按道理不应该造成任何事故。但是如果你需要更快地将内存返回给操作系统,你可以尝试非传统的内存分配器,类似`jemallolc`。它被证明可以解决[PS-5312](https://jira.percona.com/browse/PS-5312)的问题。 55 | 56 | *Another factor which improves memory management is the number of CPU cores: the more we used for the test, the faster the memory was returned to the operating system. This, probably, can be explained by the fact that if you have multiple CPUs, then the memory allocator can dedicate one of them just for releasing memory to the operating system.* 57 | 58 | 另一个改善内存管理的因素是cpu内核数量:在测试中,cpu核数越多,内存返回给操作系统的速度会越快。这可能是你拥有多个CPU,则其中一个可专门用作内存分配器释放内存到操作系统。 59 | 60 | *The very first implementation of InnoDB full text indexes introduced this flaw. As our engineer Yura Sorokin [found](https://jira.percona.com/browse/PS-5312?focusedCommentId=236644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-236644):* 61 | 62 | > - The very first 5.6 commit which introduces Full Text Search Functionality for InnoDB WL#5538: InnoDB Full-Text Search Support – 63 | > - Implement WL #5538 InnoDB Full-Text Search Support, merge – [https://github.com/mysql/mysql-server/commit/b6169e2d944 ](https://github.com/mysql/mysql-server/commit/b6169e2d944)– also has this problem. 64 | 65 | 正如我们的工程师`Yura Sorokin`所发现的一样,下面两点阐述了InnoDB全文索引的早期实现引入了这个缺陷: 66 | 67 | - 5.6版本MySQL最早对InnoDB WL全文索引功能引入的介绍:#5538: InnoDB全文搜索支持 – https://dev.mysql.com/worklog/task/?id=5538 68 | - 实现WL #5538 InnoDB全文搜索支持与合并 - - 也存在同样的问题问题 69 | 70 | ## 修复方法 71 | 72 | *We have a few options to fix this:* 73 | 74 | 1. *Change implementation of InnoDB fulltext index* 75 | 2. *Use custom memory library like jemalloc* 76 | 77 | *Both have their advantages and disadvantages.* 78 | 79 | 我们有两种方法来修复这个问题: 80 | 81 | ​ 1.修改InnoDB全文索引的实现 82 | 83 | ​ 2.使用自定义内存库,例如`jemalloc` 84 | 85 | 这两种方法都有各自的优缺点。 86 | 87 | **Option 1** *means we are introducing an incompatibility with upstream, which may lead to strange bugs in future versions. This also means a full rewrite of the InnoDB fulltext code which is always risky in GA versions, used by our customers.* 88 | 89 | **方法1** 意味着我们引入了与软件上游不兼容性的风险,这可能会导致新版本中出现未知的错误。也意味着彻底重写InnoDB全文索引部分代码,这在用户们使用的GA版本中是有风险的。 90 | 91 | **Option 2** *means we may hit flaws in the [jemalloc](https://jira.percona.com/browse/PS-5312) library which is designed for performance and not for the safest memory allocation.* 92 | 93 | **方法2** 则也为着我们可能会命中一些jemalloc库中性能大于安全的策略缺陷。 94 | 95 | *So we have to choose between these two not ideal solutions.* 96 | 97 | *Since **option 1** may lead to a situation when Percona Server will be incompatible with upstream, we prefer **option 2**and look forward for the upstream fix of this bug.* 98 | 99 | 因此我们不得不在这两个并不完美的方法中选择一个。 100 | 101 | 鉴于方法一可能导致percona服务与上游的不兼容,我们更倾向于用方法二来解决问题,并期待着上游修复这个bug。 102 | 103 | ## 结论 104 | 105 | *If you are seeing a high memory usage by the mysqld process, it is not always a symptom of a memory leak. You can use memory instrumentation in Performance Schema to find out how allocated memory is used. Try alternative memory libraries for better processing of allocations and freeing of memory. Search the user manual for LD_PRELOADto find out how to set it up at these pages [here](https://dev.mysql.com/doc/refman/8.0/en/mysqld-safe.html) and [here](https://dev.mysql.com/doc/mysql-installation-excerpt/8.0/en/using-systemd.html).* 106 | 107 | 如果发现mysqld进程占用内存很高,并不代表一定是内存泄漏。我们可以在Performance Schema中使用内存检测来了解今进程是如何使用已分配的内存。也可以尝试替换内存库来更好地处理内存分配与释放。关于LD_RELOAD如何配置,请查阅MySQL用户手册对应页面 [mysqld-safe](https://dev.mysql.com/doc/refman/8.0/en/mysqld-safe.html)和[using-system](https://dev.mysql.com/doc/mysql-installation-excerpt/8.0/en/using-systemd.html)。 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | -------------------------------------------------------------------------------- /mysql/28-mysql-got-an-error-reading-communication-packet-errors.md: -------------------------------------------------------------------------------- 1 | # MySQL “Got an error reading communication packet” 2 | 3 | > 作者:[Muhammad Irfan](https://www.percona.com/blog/author/mirfan/) 4 | > 5 | > 原文地址:https://www.percona.com/blog/2016/05/16/mysql-got-an-error-reading-communication-packet-errors/ 6 | > 7 | > 翻译:徐晨亮 8 | 9 | 10 | 11 | In this blog post, we’ll discuss the possible reasons for MySQL “Got an error reading communication packet” errors and how to address them. 12 | 13 | 在这篇文章中,我们将会讨论MySQL"Got an error reading communication packet"错误的原因以及如何定位它们。 14 | 15 | In Percona’s managed services, we often receive customer questions on communication failure errors. So let’s discuss possible reasons for this error and how to remedy it. 16 | 17 | 在Percona的托管服务中,我们经常收到关于通信失败错误的客户咨询。因此让我们一起来讨论下该错误可能的原因以及如何来规避。 18 | 19 | ##MySQL Communication Errors 20 | 21 | First of all, whenever a communication error occurs, it increments the status counter for either [Aborted_clients](http://dev.mysql.com/doc/refman/5.6/en/server-status-variables.html#statvar_Aborted_clients) or [Aborted_connects](http://dev.mysql.com/doc/refman/5.6/en/server-status-variables.html#statvar_Aborted_connects), which describe the number of connections that were aborted because the client died without closing the connection properly and the number of failed attempts to connect to MySQL server (respectively). The possible reasons for both errors are numerous (see the Aborted_clients increments or Aborted_connects increments sections in the MySQL [manual](http://dev.mysql.com/doc/refman/5.6/en/communication-errors.html)). 22 | 23 | 首先,不论何时通信错误发生,状态计数器`Aborted clients`或者`Aborted connects`都会增加,表示由于客户端未正确关闭连接而断开的次数,以及连接到MySQL服务器失败的尝试次数。这两个错误的可能原因有很多(参见MySQL [手册](http://dev.mysql.com/doc/refman/5.6/en/communication-errors.html)中的Aborted_clients increments或aborted_connections increments部分) 24 | 25 | In the case of [log_warnings](http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_log_warnings), MySQL also writes this information to the error log (shown below): 26 | 27 | 在[log_warnings](http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_log_warnings)>1的情况下,MySQL同样将该信息写入到error log(如下所示): 28 | 29 | ```mysql 30 | [Warning] Aborted connection 305628 to db: 'db' user: 'dbuser' host: 'hostname' (Got an error reading communication packets) 31 | [Warning] Aborted connection 305627 to db: 'db' user: 'dbuser' host: 'hostname' (Got an error reading c 32 | ``` 33 | 34 | In this case, MySQL increments the status counter for Aborted_clients, which could mean: 35 | 36 | - The client connected successfully but terminated improperly (and may relate to not closing the connection properly) 37 | - The client slept for longer than the defined [wait_timeout](http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_wait_timeout) or [interactive_timeout](http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_interactive_timeout) seconds (which ends up causing the connection to sleep for wait_timeout seconds and then the connection gets forcibly closed by the MySQL server) 38 | - The client terminated abnormally or exceeded the [max_allowed_packet](https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_max_allowed_packet) for queries 39 | 40 | The above is not an all-inclusive list. Now, let’s identify what is causing this problem and how to remedy it. 41 | 42 | 43 | 44 | 以下情况下, MySQL会增加Aborted_clients状态变量的计数器,也就意味着: 45 | 46 | - 客户端已经成功连接,但是异常终止了(可能与未正确关闭连接有关系) 47 | 48 | - 客户端sleep时间超过了变量[wait_timeout](http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_wait_timeout)或 [interactive_timeout](http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_interactive_timeout)定义的秒数(最终导致连接休眠的时间超过系统变量wait_timeout的值,然后被MySQL强行关闭) 49 | - 客户端异常中断或查询超出了 [max_allowed_packet](https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_max_allowed_packet)值 50 | 51 | 以上是一个非全部包含的原因列表。现在让我们找出造成该问题的原因并且如何规避。 52 | 53 | 54 | 55 | ## Fixing MySQL Communication Errors 56 | 57 | To be honest, aborted connection errors are not easy to diagnose. But in my experience, it’s related to network/firewall issues most of the time. We usually investigate those issues with the help of Percona toolkit scripts, i.e. [pt-summary](https://www.percona.com/doc/percona-toolkit/2.2/pt-summary.html) / [pt-mysql-summary](https://www.percona.com/doc/percona-toolkit/2.2/pt-mysql-summary.html) / [pt-stalk](https://www.percona.com/doc/percona-toolkit/2.2/pt-stalk.html). The outputs from those scripts can be very helpful. 58 | 59 | 说实话,aborted connections错误并不容易诊断。但是在我的经验中,大多数时间是网络或者防火墙的原因。我们通常使用Percona toolkit脚本,例如 [pt-summary](https://www.percona.com/doc/percona-toolkit/2.2/pt-summary.html) / [pt-mysql-summary](https://www.percona.com/doc/percona-toolkit/2.2/pt-mysql-summary.html) / [pt-stalk](https://www.percona.com/doc/percona-toolkit/2.2/pt-stalk.html)来调查这些问题。这些脚本的输出非常有用。 60 | 61 | Some of the reasons for aborted connection errors can be: 62 | 63 | - A high rate of connections sleeping inside MySQL for hundred of seconds is one of the symptoms that applications aren’t closing connections after doing work, and instead relying on the wait_timeout to close them. I strongly recommend changing the application logic to properly close connections at the end of an operation. 64 | 65 | - Check to make sure the value of max_allowed_packet is high enough, and that your clients are not receiving a “packet too large” message. This situation aborts the connection without properly closing it. 66 | 67 | - Another possibility is TIME_WAIT. I’ve noticed many TIME_WAIT notifications from the netstat, so I would recommend confirming the connections are well managed to close on the application side. 68 | 69 | - Make sure the transactions are committed (begin and commit) properly so that once the application is “done” with the connection it is left in a clean state. 70 | 71 | - You should ensure that client applications do not abort connections. For example, if PHP has option max_execution_time set to 5 seconds, increasing [connect_timeout](https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_connect_timeout) would not help because PHP will kill the script. Other programming languages and environments can have similar safety options. 72 | 73 | - Another cause for delay in connections is DNS problems. Check if you have [skip-name-resolve](http://dev.mysql.com/doc/refman/5.6/en/server-options.html#option_mysqld_skip-name-resolve) enabled and if hosts are authenticated against their IP address instead of their hostname. 74 | 75 | - One way to find out where your application is misbehaving is to add some logging to your code that will save the application actions along with the MySQL connection ID. With that, you can correlate it to the connection number from the error lines. Enable the Audit log plugin, which logs connections and query activity, and check the [Percona Audit Log Plugin](https://www.percona.com/doc/percona-server/5.6/management/audit_log_plugin.html) as soon as you hit a connection abort error. You can check for the audit log to identify which query is the culprit. If you can’t use the Audit plugin for some reason, you can consider using the MySQL general log – however, this can be risky on a loaded server. You should enable the [general log](http://dev.mysql.com/doc/refman/5.6/en/query-log.html) for at least a few minutes. While it puts a heavy burden on the server, errors happen fairly often so you should be able to collect the data before the log grows too large. I recommend enabling the general log with an -f tail, then disable the general log when you see the next warning in the log. Once you find the query from the aborted connection, identify which piece of your application issues that query and co-relate the queries with portions of your application. 76 | 77 | - Try increasing the [net_read_timeout](https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_net_read_timeout) and [net_write_timeout](https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_net_write_timeout) values for MySQL and see if that reduces the number of errors. net_read_timeout is rarely the problem unless you have an extremely poor network. Try tweaking those values, however, because in most cases a query is generated and sent as a single packet to the server, and applications can’t switch to doing something else while leaving the server with a partially received query. There is a very detailed [blog post](https://www.percona.com/blog/2007/07/08/mysql-net_write_timeout-vs-wait_timeout-and-protocol-notes/) on this topic from our CEO, Peter Zaitsev. 78 | 79 | 80 | 81 | 以下是造成aborted connection错误的可能原因: 82 | 83 | - 在MySQL内部,MySQL内部处于休眠了几百秒的状态的连接中很大比例是应用程序在做完工作后没有关闭连接造成的,而是依靠wait_tiemout系统变量来关闭连接。 我强烈建议修改应用程序逻辑,在操作结束后正确关闭连接 84 | - 检查以确保`max_allowed_packet`足够大,你的客户端不会收到"packet too large"的消息。这种情况下的连接断开属于由于没有正确关闭连接 85 | - 另外一种可能是`TIME_WAIT`。我曾经多次从netstat注意到`TIME_WAIT`提示,所以我建议在应用端确认很好地管理来关闭连接 86 | - 确保事务提交(begin和commit)都正确提交以保证一旦应用程序完成以后留下的连接是处于干净的状态 87 | - 你应该确保客户端程序不会断开连接。例如,如果PHP设置了`max_execution_time`为5秒,增加`connect_timeout`并不会起到作用,因为PHP会kill脚本。其他程序语言和环境也有类似的安全选项 88 | - 连接延迟的另外一个原因是DNS问题。检查参数`skip-name-resolve`是否打开,以及是否根据主机的IP地址而不是主机名对主机进行身份验证 89 | - 发现你的应用程序故障的一种办法是添加一些日志到你的代码中来保存包含连接ID的应用程序行为。有了它,你能够将连接数字与错误行数对应起来了。打开审计日志插件,记录了连接和查询操作,一旦触发到了连接断开的错误,你应该检查Percona审计日志。你可以通过检查审计日志找出哪个查询是根本原因。如果由于某些原因你不能使用审计日志,你可以考虑使用MySQL的常规日志-然而对于高负载的服务器来说这样是有风险的。再不济,你可以打开常规日志几分钟。打开常规日志会给服务器增加巨大负担,并且经常会发生错误,因此你应该在日志增长太大之前就收集完数据。我建议来打开常规日志并使用-f tail,然后当你在日志中看到下一个警告时关闭。一旦从断开的连接中找到查询,请确定查询的应用程序问题的哪一部分,并将查询与应用程序的某些部分关联起来。 90 | - 尝试增加MySQL的`net_read_timeout`和`net_write_timeout`的参数值然后观察是否减少错误数。`net_read_timeout`一般很少出问题,除非你的网络真的很糟糕。但是,尝试调整这些值,因为在大多数情况下,生成一个查询并将其作为一个包发送到服务器,而应用程序不能在将部分接收到的查询留给服务器的同时去做其他事情 91 | 92 | 93 | 94 | Aborted connections happen because a connection was not closed properly. The server can’t cause aborted connections unless there is a networking problem between the server and the client (like the server is half duplex, and the client is full duplex) – but that is the network causing the problem, not the server. In any case, such problems should show up as errors on the networking interface. To be extra sure, check the ifconfig -a output on the MySQL server to check if there are errors. 95 | 96 | 发生连接断开的原因是因为连接没有正确关闭。服务器并不能造成连接断开,除非服务器和客户端之间有网络问题(例如服务器是单工而客户端是双工的)-但是这是网络造成的问题,而不是服务器。在任何情况下,这些问题都应该在网络接口上显示为错误。另外,请检查MySQL服务器上的`ifconfig -a`输出是否有错误。 97 | 98 | Another way to troubleshoot this problem is via tcpdump. You can refer to this blog post on [how to track down the source of aborted connections](https://www.percona.com/blog/2008/08/23/how-to-track-down-the-source-of-aborted_connects/). Look for potential network issues, timeouts and resource issues with MySQL. 99 | 100 | 另外一种定位该问题的方法是通过tcpdump。你可以参考这篇文章[how to track down the source of aborted connections](https://www.percona.com/blog/2008/08/23/how-to-track-down-the-source-of-aborted_connects/)找到MySQL的潜在网络、超时和资源问题。 101 | 102 | I found this [blog post](https://www.percona.com/blog/2011/04/18/how-to-use-tcpdump-on-very-busy-hosts/) useful in explaining how to use tcpdump on busy hosts. It provides help for tracking down the TCP exchange sequence that led to the aborted connection, which can help you figure out why the connection broke. 103 | 104 | 我发现了 [这篇](https://www.percona.com/blog/2011/04/18/how-to-use-tcpdump-on-very-busy-hosts/)关于解释如何在一台繁忙的机器上使用tcpdump的文章非常有用。它提供了跟踪导致断开连接的TCP交换序列的帮助,能够帮你找出连接中断的原因。 105 | 106 | For network issues, use a ping to calculate the round trip time (RTT) between a machine where mysqld is located and the machine from where the application makes requests. Send a large file (1GB or more) to and from client and server machines, watch the process using tcpdump, then check if an error occurred during transfer. Repeat this test a few times. I also found this from my colleague Marco Tusa useful: [Effective way to check network connection](http://www.tusacentral.net/joomla/index.php/mysql-blogs/164-effective-way-to-check-the-network-connection-when-in-need-of-a-geographic-distribution-replication-.html). 107 | 108 | 对于网络问题,使用`ping`来计算从发起请求的应用服务器到mysqld服务器间的往返时间(RTT)。从客户端发送一个大文件(1GB或者更大)到服务端,使用tcpdump观察进程,并检查传输期间是否有错误发生。重复测试数次。我还发现我同事Marco Tusa的文章也非常有用[Effective way to check network connection](http://www.tusacentral.net/joomla/index.php/mysql-blogs/164-effective-way-to-check-the-network-connection-when-in-need-of-a-geographic-distribution-replication-.html) 109 | 110 | One other idea I can think of is to capture the netstat -s output along with a timestamp after every N seconds (e.g., 10 seconds so you can relate netstat -s output of BEFORE and AFTER an aborted connection error from the MySQL error log). With the aborted connection error timestamp, you can co-relate it with the netstat sample captured as per a timestamp of netstat, and watch which error counters increased under the TcpExt section of netstat -s. 111 | 112 | 我能想到的另外一种思路是每隔N秒抓取`netstat -s`加上时间戳的输出(例如隔10秒,你可以将`netstat -s`的输出与MySQL的错误日志中的连接断开错误前后联系起来)。通过断开连接错误时间戳,你可以将它与捕捉到的带时间戳的netstat示例关联起来,并观察在netstat -s的TcpExt部分中哪些错误计数器增加了。 113 | 114 | Along with that, you should also check the network infrastructure sitting between the client and the server for proxies, load balancers, and firewalls that could be causing a problem. 115 | 116 | 与此同时,你还应该检查客户机和服务器之间的网络基础设施,以查找可能导致问题的代理、负载均衡和防火墙。 117 | 118 | **Conclusion:**In addition to diagnosing communication failure errors, you also need to take into account faulty ethernets, hubs, switches, cables, and so forth which can cause this issue as well. You must replace the hardware itself to properly diagnose these issues. 119 | 120 | **结论:** 除了诊断通信故障错误之外,还需要考虑网卡、hub、交换机、电缆等因为这些都有可能导致故障。必须更换硬件才能正确诊断这些问题。 121 | -------------------------------------------------------------------------------- /mysql/3-zh-mysql-8-0-2-replication-new-feature.md: -------------------------------------------------------------------------------- 1 | # [MySQL 8.0.2复制新特性](http://mysqlhighavailability.com/replication-features-in-mysql-8-0-2/) 2 | MySQL 8 正在变得原来越好,而且这也在我们MySQL复制研发团队引起了一阵热潮。我们一直致力于全面提升MySQL复制,通过引入新的和一些有趣的功能。此外,我们还听取了社区的建议和反馈。因此,我们很荣幸能够与你一同见证最新版本(MySQL 8.0.2)的里程碑式的发布,为此我们总结了其中的一些值得注意的变化。跟随我们下面的博客,我们将会分享这些新功能的一些见解。 3 | >MySQL 8 is shaping up quite nicely. And we are having a blast in the MySQL replication team while this is happening. We are continuously improving all-things-replication by introducing new and interesting features. In addition, we have been listening to our community and addressing their feedback. As such, we would like to celebrate with you all the latest development milestone release, MySQL 8.0.2, by summarizing the noteworthy changes that are in it. Follow up blog posts shall provide insights on these new enhancements. 4 | ## 我们对MySQL 组复制进行了加强,主要有以下几个方面: 5 | 6 | * 不允许对离开组的成员进行更改:每当组成员离开群组,离开的成员将会自动设置super_read_only,这可以防止DBA,用户或路由层/代理中间件/负载均衡端等带来的的意外更改。除了默认离开组复制的成员不能够进行修改以外,也可以从刚加入开始就开始禁止写入,我们也可以在服务器启动时设置super_read_only参数并启动组复制插件。一旦组复制动成功,他会自动调整super_read_only的值。在多主模式下,所有的节点都将不会设置super_read_only参数 ;在单主的模式下,除了主节点以外,其他的节点都会设置super_read_only为ON 。如果很不幸,你的组复制启动失败了的话,super_read_only =1 设置会继续保持,将不能进行任何写入操作。这些最新的变化同样适用于MySQL 5.7.19和MySQL 8.0.2。所有的这些,有很大部分是因为我们听取了社区的反馈(BUG#84728, BUG#84795, BUG#84733)然后进行开发和加强。--在此 感谢Kenny Gryp 7 | >Disallow changes to members that have left the group. Whenever a member leaves the group voluntarily, it will automatically set super_read_only. This prevents accidental changes from the DBA, user, or router/proxy/load balancers. In addition to disallowing changes on members that have left the group by default, it is also possible to protect the writes from the beginning. I.e., it is possible to set super_read_only and start the group replication plugin while the server starts. Once Group Replication starts successfully, it will adjust super_read_only’s value properly. On multi-primary mode, the super_read_only will be unset on all joining members; On single-primary, only the primary will have super_read_only unset. In the unlikely event that Group Replication fails to start, the super_read_only is preserved, thence no writes are allowed. This last change is actually in both, MySQL 5.7.19 and MySQL 8.0.2. All of these were highly influenced by feedback we got from our community (BUG#84728, BUG#84795, BUG#84733) – Thank you Kenny Gryp! 8 | 9 | * 可以在Performance Schema 中查看更多信息:在Performance Schema现存的表中,对相关的统计信息的可读性进行了加强。“replication_group_members” 和 “replication_group_member_stats” 表也做了相关拓展,现在可以清楚的看到组成员的角色信息,组成员版本和事物计数器(本地/远程) 10 | >More information on Performance Schema tables. More observability enhancements were added to existing performance schema tables. “replication_group_members” and “replication_group_member_stats” tables have been extended and now display information such as members roles, members versions and counters for the different types of transactions (local/remote). 11 | 12 | * 通过分配权重来指定主库的选举:用户可以通过指定组成员的权重来控制主库的选举,当现有的主节点退出组复制,权重最高的节点就会被提升为主节点。 13 | >Influence primary election by assigning promotion weights. The user is able to influence the primary election by determining member weights. When the existing primary goes away, the member with the highest weight is the one to be elected on fail-over. 14 | 15 | * 流量控制机制加了一些微调项:用户现在可以更精细的调节流量控制组件。可以定义每个成员的最小配额,整个组的最小提交配额,流程控制窗口等等。 16 | >Fine tuning options for the flow control mechanism. Users can now do more fine tuning of the flow control component. They can define the minimum quota per member, minimum commit quota for the entire group, the flow control window, and more! 17 | 18 | MySQL 8.0.1 已经在MySQL复制核心框架添加了很多引人注目的功能。而MySQL 8.0.2在此基础上又有很大的提升,主要如下: 19 | * 增强对接收(IO)线程的管理,即使磁盘已满:此功能提高了接收线程和其他线程之间的内部协调效率,减少彼此的争用。对于终端用户来说,这意味着在磁盘变满并且接收线程阻塞的情况下,它不再阻塞监视操作,例如SHOW SLAVE STATUS。它还引入了一个新的线程状态,即接收线程正在等待磁盘空间资源状态。此外,当磁盘已满的时候,且不能通过释放磁盘空间使接收线程继续没有完成的工作时,可以手动强制停掉它,一般情况下不会有什么问题。如果一个event只写了部分,那它会被清掉以确保relay的一致性。当接收线程轮转刷新relay log且需等待磁盘空间释放时,这种情况下可能就要当心了。 20 | >Enhanced management of the receiver (IO) thread even when disk is full. This feature improves the internal coordination between receiver and other threads, removing some contention. To the end user, this means that in the event that the disk becomes full and the receiver thread blocks, it no longer blocks monitoring operations, such as SHOW SLAVE STATUS. A new thread state is also introduced stating that the receiver thread is waiting for disk space. Moreover, when the disk is full and you cannot free disk space up to allow the receiver thread to continue its activity, you can forcefully stop it, generally without side effects. If there is an event that is partially written this is cleared and relay log is left in a consistent state. Some extra care needs to be taken when receiver thread is rotating the relay log and is waiting for disk space to become available. 21 | 22 | * binary log中记录更多的元数据信息:将事物长度添加到全局事务日志事件。这可以对我们未来的优化工作有很大的帮助,而且也提高了binary log的可读性。 23 | >More metadata into the binary log. Added the length of the transaction to the global transaction log event. This enables potential future optimizations to be implemented as well as better observability support into the binary log. 24 | 25 | 如果你在研究MySQL复制的内部机制与原理,我们将很高兴与你一起分享我们做了一些清理工作,并为我们的基础组件添加了一个有趣的服务: 26 | >If you are into MySQL replication internals, we are happy to let you know that we have done some clean ups and added one interesting service to our components infrastructure: 27 | 28 | * 组成员事件可以传播到内部其他组件。通过利用新的基础服务架构,组复制插件现在可以通知服务器中的其他组件关于成员关联的事件。例如,告知某个组成员角色改变导致仲裁节点丢失等。其他的组件可以对这个信息作出反馈,并且用户也可以自己开发组件用来注册和监听这些事件。 29 | >Group membership events are propagated to other components internally. By taking advantage of the new services infrastructure, the group replication plugin can now notify other components in the server about membership related events. For instance, notify that membership has changed and that quorum was lost. Other components can react to this information. Users can write their own components that register and listen to these events. 30 | 31 | * 从XCom(标准的Paxos实现,能严格保证正确性)的内部结构中删除节点上的冗余信息。我们在XCom的结构中删除了一些冗余信息,这使它变得更加简单、减少错误信息,更容易监控那些节点加入或者离开集群,同时它会在系统中保留以前的信息。 32 | >Removed redundant information on nodes from internal structures in XCom. This work removed a bit of redundant information in XCom’s structures, making it simpler and less error prone to track which servers have left and rejoined, while there is still stale information in the system about previous views. 33 | 34 | * 对XCom核心和新编码风格进行了几项改进:我们已经修复了XCom的几个BUG,重新格式化了代码,使它符合Google的编码准则,如果你恰巧是一个开发人员,并且再看我们Paxos实现的源代码,你会发现改版后的代码将会更加容易阅读和理解。 35 | >Several enhancements to XCom core and new coding style. XCom has had several bug fixes and its code was reformatted to meet the google coding guidelines. If you are a developer and enjoy looking at our Paxos implementation, this will make it easier for you to read and understand the code. 36 | 37 | * 移除了一些老旧版本binary log转换的源代码:这个清理工作我们清除了一些老版本MySQL数据库产的的binary logs转化为新版本能够识别的一些代码(现在仅支持MySQL 5.0以及以上版本)。 38 | >Removed cross version conversion code for very old binary log formats. This is a clean up that removes code converting binary logs generated by very old versions of MySQL to a newer format as understood by newer versions of MySQL (versions 5.0 and higher). 39 | 40 | 还有一件有意思的事情,我们已经在MySQL 8.0.2中更改了以下复制默认值: 41 | * 复制的元数据信息默认存储在InnoDB系统表中。这将使MySQL复制功能变得更加强大,在复制崩溃并且自动恢复时候能够使用InnoDB事物的特性来保证恢复到指定位置的正确性。此外,新功能还要求将元数据以表的形式存储(比如组复制和多源复制),它与MySQL 8的新的数据字典保持一致。 42 | >Replication metadata is stored in (InnoDB) system tables by default. This makes replication more robust by default when it comes to crashing and recovering automatically the replication positions. Also, newer features require the metadata information to be stored in tables (e.g., group replication, mutli-source replication). It aligns with the overall direction of MySQL 8 and the new data dictionary in it. 43 | 44 | * 基于RBR时SLAVE的SQL线程的哈希扫描被默认开启:这也许并不是一个被广泛认同的做法,但是当从库有一些没有主键约束的表的时候性能会有提高。在这种情况下,使用基于RBR的复制时,此更改会最大程度降低性能损失,因为它会减少更新行数,而非像以前那样可能要全表扫描更新([slave_rows_search_algorithms](https://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html#option_mysqld_slave-rows-search-algorithms)参数默认TABLE_SCAN,INDEX_SCAN,HASH_SCAN)。 45 | >hash scans for row-based applier are enabled default. Perhaps not a popular knob, but one that helps when slaves do not have tables with primary keys. This change will minimize the performance penalty in that case, when using row-based replication, since it reduces the number of table scans required to update all rows. 46 | 47 | * transaction-write-set-extraction参数会默认开启:使用写集提取,为用户启动组复制或在主服务器上使用基于WRITESET的依赖关系对master进行跟踪。 48 | >Write set extraction is enabled by default. With write set extraction, it is one less configuration step for the user to start group replication or use WRITESET based dependency tracking on the master. 49 | 50 | * 默认开启Binary log 过期时间:expire-logs-days默认设置为30(30天) 51 | >Binary log expiration is enabled by default. The binary logs are expired after 30 days by default (expire-logs-days=30). 52 | 53 | 如你所知,我们一直很忙。事实上,[MySQL 8.0.2 Milestone Release](http://mysqlserverteam.com/the-mysql-8-0-2-milestone-release-is-available/)已经发布了。在复制方面,我们非常高兴看到许多有趣的功能被加入进来。 54 | >As you can see, we have been busy. :) In fact, MySQL 8.0.2 is showing a great feature set across the board. On the replication side, we are very happy to see so many interesting features getting released. 55 | 56 | 接下来将会有专门的博客来介绍说明这些功能。你也可以自己下载进行测试([下载地址](https://dev.mysql.com/downloads/mysql/8.0.html#downloads)),我们需要留意的是MySQL 8.0.2还是DMR版本,并没有GA,使用它需要自己承担风险。另外不要忘记,我们欢迎而且很期望得到你们的反馈。您可以通过错误报告,功能报告,复制邮件列表或仅对这个(或后续的)博文发表评论来给予我们反馈。MySQL 8将会越来越好,越来越精彩。 57 | >Stick around as there will be blog posts detailing further the features mentioned above. Actually, go and try them yourself, as you can download a MySQL 8.0.2 packages here. Note that MySQL 8.0.2 is a development milestone release (DMR), thence not declared generally available yet. Use it at your own risk. Don’t forget, feedback is always welcome and highly appreciated. You can send it to us through bug reports, feature reports, the replication mailing list or by just leaving a comment on this (or subsequent) blog posts. MySQL 8 looks rather nice! Exciting! 58 | -------------------------------------------------------------------------------- /mysql/4-zh-mysql-5-7-initial-flushing-analysis-and-why-performance-schema-data-is-incomplete.md: -------------------------------------------------------------------------------- 1 | > 原文 [MySQL 5.7: initial flushing analysis and why Performance Schema data is incomplete](https://www.percona.com/blog/2016/05/03/mysql-5-7-initial-flushing-analysis-and-why-performance-schema-data-is-incomplete/) 2 | > 作者:Laurynas Biveinis and Alexey Stroganov   3 | > 译者:天一阁 4 | 5 | In this post, we’ll examine why in an initial flushing analysis we find that Performance Schema data is incomplete. 6 | 本文我们将阐述为什么在初始化的刷新分析中P_S数据不完整的原因。 7 | 8 | [Having shown the performance impact of Percona Server 5.7 patches](https://www.percona.com/blog/2016/03/17/percona-server-5-7-performance-improvements/), we can now discuss their technical reasoning and details. Let’s revisit the MySQL 5.7.11 performance schema synch wait graph from the previous [post](https://www.percona.com/blog/2016/03/17/percona-server-5-7-performance-improvements/), for the case of unlimited InnoDB concurrency: 9 | 10 | [Percona Server 5.7 performance improvements](https://www.percona.com/blog/2016/03/17/percona-server-5-7-performance-improvements/) 文中已经表明了Percona Server 5.7补丁对于性能的影响,我们现在可以讨论它们的技术原理和细节。让我们从上文中回顾一下MySQL 5.7.11 performance schema synch wait曲线图,在这个测试中不限制InnoDB并发线程数: 11 | 12 | ![TOP 5 performance schema synch waits](https://www.percona.com/blog/wp-content/uploads/2016/03/5711.blog_.n6.v1.png) 13 | 14 | First of all, this graph is a little “nicer” than reality, which limits its diagnostic value. There are two reasons for this. The first one is that page cleaner worker threads are invisible to Performance Schema (see [bug 79894](http://bugs.mysql.com/bug.php?id=79894)). This alone limits PFS value in 5.7 if, for example, one tries to select only the events in the page cleaner threads or monitors low concurrency where the cleaner thread count is non-negligible part of the total threads. 15 | 16 | 首先,这个曲线图看起来要比实际情况“好一些”,这使得它的诊断价值有限。这有两个原因。第一个是因为page cleaner线程在Performance Schema中不可见(见[bug 79894](http://bugs.mysql.com/bug.php?id=79894))。这个仅限于5.7中的PFS值,如果只查page cleanner线程中的事件,或者监视低并发性,其中cleaner线程数是整个线程的不可忽视的部分。 17 | 18 | To understand the second reason, let’s look into PMP for the same setting. Note that selected intermediate stack frames were removed for clarity, especially in the InnoDB mutex implementation. 19 | 20 | 为了理解第二个原因,让我们来看看相同设置的PMP。 请注意,为了清楚起见,移除了所选的中间堆栈帧,尤其是在InnoDB互斥实现中。 21 | 22 | ``` 23 | 660 pthread_cond_wait,enter(ib0mutex.h:850),buf_dblwr_write_single_page(ib0mutex.h:850),buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page(buf0flu.cc:1096),buf_flush_single_page_from_LRU(buf0flu.cc:2217),buf_LRU_get_free_block(buf0lru.cc:1401),... 24 | 631 pthread_cond_wait,buf_dblwr_write_single_page(buf0dblwr.cc:1213),buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page(buf0flu.cc:1096),buf_flush_single_page_from_LRU(buf0flu.cc:2217),buf_LRU_get_free_block(buf0lru.cc:1401),... 25 | 337 pthread_cond_wait,PolicyMutex(ut0mutex.ic:89),get_next_redo_rseg(trx0trx.cc:1185),trx_assign_rseg_low(trx0trx.cc:1278),trx_set_rw_mode(trx0trx.cc:1278),lock_table(lock0lock.cc:4076),... 26 | 324 libaio::??(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2448),LinuxAIOHandler::poll(os0file.cc:2594),... 27 | 241 pthread_cond_wait,PolicyMutex(ut0mutex.ic:89),trx_write_serialisation_history(trx0trx.cc:1578),trx_commit_low(trx0trx.cc:2135),... 28 | 147 pthread_cond_wait,enter(ib0mutex.h:850),trx_undo_assign_undo(ib0mutex.h:850),trx_undo_report_row_operation(trx0rec.cc:1918),... 29 | 112 pthread_cod_wait,mtr_t::s_lock(sync0rw.ic:433),btr_cur_search_to_nth_level(btr0cur.cc:1008),... 30 | 83 poll(libc.so.6),Protocol_classic::get_command(protocol_classic.cc:965),do_command(sql_parse.cc:935),handle_connection(connection_handler_per_thread.cc:301),... 31 | 64 pthread_cond_wait,Per_thread_connection_handler::block_until_new_connection(thr_cond.h:136),... 32 | ``` 33 | 34 | The top wait in both PMP and the graph is the 660 samples of enter mutex inbuf_dblwr_write_single_pages, which is the doublewrite mutex. Now try to find the nearly as hot 631 samples of event wait inbuf_dblwr_write_single_page in the PFS output. You won’t find it because InnoDB OS event waits are not annotated in Performance Schema. In most cases this is correct, as OS event waits tend to be used when there is no work to do. The thread waits for work to appear, or for time to pass. But in the report above, the waiting thread is blocked from proceeding with useful work (see [bug 80979](http://bugs.mysql.com/bug.php?id=80979)). 35 | 36 | 在PMP和图表中最多的等待事件是660个inbuf_dblwr_write_single_pages事件,这是doublewrite mutex。现在尝试在PFS中找到同样也很高的的631个inbuf_dblwr_write_single_page事件. 但却无法找到,因为InnoDB OS事件等待并不再PFS中记录。这在大多数情况下是没问题的,因为当InnoDB内部没事做时,则进入InnoDB OS wait事件状态。该线程要么等着出现工作时间,要么随着时间消逝。但是在上面的报告中,等待线程被阻止无法处理必要的工作(见[bug 80979](http://bugs.mysql.com/bug.php?id=80979))。 37 | 38 | Now that we’ve shown the two reasons why PFS data is not telling the whole server story, let’s take PMP data instead and consider how to proceed. Those top two PMP waits suggest 1) the server is performing a lot of single page flushes, and 2) those single page flushes have their concurrency limited by the eight doublewrite single-page flush slots available, and that the wait for a free slot to appear is significant. 39 | 40 | 现在我们已经解释完了PFS数据中并没体现mysql全部状态的两个原因,那么让我们转而考虑PMP数据,并考虑如何进行。上面的两个PMP等待意味着,1)服务器执行大量的single page flush,2)这些single page flush的并发性受限于doublewrite中8个可用的single-page flush slot,并且很明显是在等待空闲的slot出现。 41 | 42 | Two options become apparent at this point: either make the single-page flush doublewrite more parallel or reduce the single-page flushing in the first place. We’re big fans of the latter option since version 5.6 performance work, where we configured Percona Server to not perform single-page flushes at all by introducing the [innodb_empty_free_list_algorithm](https://www.percona.com/doc/percona-server/5.6/performance/xtradb_performance_improvements_for_io-bound_highly-concurrent_workloads.html) option, with the “backoff” default. 43 | 44 | 这点上显然有两种选择:要么使single-page flush doublewrite更加并行,要么减少single-page flush。自5.6版本的性能优化开始,我们一直坚定后一种选择,在这里,我们通过引入 [innodb_empty_free_list_algorithm](https://www.percona.com/doc/percona-server/5.6/performance/xtradb_performance_improvements_for_io-bound_highly-concurrent_workloads.html)选项来配置Percona Server,使得不执行single-page flush,默认情况下是“backoff”。 45 | 46 | The next post in the series will describe how we removed single-page flushing in 5.7. 47 | 本系列的下一篇文章将描述如何删除5.7中的single-page flush。 48 | 49 | 50 | 51 | 52 | -------------------------------------------------------------------------------- /mysql/5-zh-Percona-Server-5.7-performance-improvements.md: -------------------------------------------------------------------------------- 1 | >作者:Alexey Stroganov, Laurynas Biveinis and Vadim Tkachenko 2 | >发布时间:2016.3.17 3 | >文章关键字:Benchmarks,MySQL,Percona Server 4 | >原文:[Percona Server 5.7 performance improvements](https://www.percona.com/blog/2016/03/17/percona-server-5-7-performance-improvements/) 5 | 6 | In this blog post, we’ll be discussing Percona Server 5.7 performance improvements. 7 | 8 | 在这篇文章中,我们将讨论Percona Server 5.7有哪些性能提升。 9 | 10 | Starting from the Percona Server 5.6 release, we’ve introduced several significant changes that help address performance problems for highly-concurrent I/O-bound workloads. Some of our research and improvements were re-implemented for MySQL 5.7 – one of the best MySQL releases. But even though MySQL 5.7 showed progress in various aspects of scalability and performance, we’ve found that it’s possible to push I/O bound workload limits even further. 11 | 12 | 从Percona Server 5.6发布以来,我们引入了几个重要的更新,有助于高并发I/O负载场景下的性能瓶颈定位。我们(在性能方面的)某些研究和提升在目前最好的MySQL版本5.7下被重新实现了。但即使MySQL 5.7在扩展性和性能等方便都有所提升,我们还是发现了可以增进I/O工作负载性能的一些地方。 13 | 14 | Percona Server 5.7.11 currently has two major performance features in this area: 15 | 16 | Percona Server 5.7.11 有两个主要的性能方面的特性: 17 | 18 | **Multi-threaded LRU flusher**. In a limited form, this feature exists in Percona Server 5.6. We split the LRU flusher thread out of the existing page cleaner thread, and it is now solely tasked with flushing the flush list. Along with several other important changes, this notably improved I/O bound workload performance. MySQL 5.7 has also made a step forward by introducing a pool of page cleaner threads that should help improve parallelism in flushing. However, we believe that the current approach is not good enough – especially for LRU flushing. In one of our next Percona Server 5.7 performance improvements posts, we’re going to describe aspects of MT flushing, and why it’s especially important to have an independent MT LRU flusher. 19 | 20 | **多线程LRU刷新**,这个特性在Percona Server 5.6就存在了,但效果有限。我们把LRU刷新线程从page cleaner线程中分离出来,现在完全只做flush list的刷新。除此外还有其他几个重要的变化,这些都显著提升了高I/O负载为主时的性能。MySQL 5.7更进一步引入了多个page cleaner线程机制,这将有助于提高flush的并行性。但我们还是认为目前的做法还不够好 —— 尤其是LRU list刷新方面。在我们下一篇介绍Percona Server 5.7性能改进的文章中,我们将讲述MT刷新(Multi-threaded)方面的内容,以及为什么独立的MT LRU 刷新特别重要。   21 | 22 | **Parallel doublewrite buffer**. For ages, MySQL has had only one doublewrite buffer for flushing data pages. So even if you had several threads for flushing you couldn’t efficiently use them – doublewrite quickly became a bottleneck. We’ve changed that by attaching two doublewrite buffers to each buffer pool instance: one for each type of page flushing (LRU and flush list). This completely avoids any doublewrite contention, regardless of the flusher thread count. We’ve also moved the doublewrite buffer out of the system tablespace so you can now configure its location.   23 | 24 | **并行doublewrite buffer**,长期以来,MySQL只有一个doublewrite buffer来刷新数据页。所以,即使你有多个线程进行flush,也不能有效地使用他们 —— doublewrite会很快成为瓶颈。我们则在每个buffer pool instance中实现两个doublewrite buffer,每个分别负责不同类型的页面刷新(LRU和flush list)。这完全避免了doublewrite的争用,无需考虑flush的线程数大小。我们还将doublewrite buffer从系统表空间中分离出来,可以自行配置其路径。 25 | 26 | Now let’s review the results of a sysbench OLTP_RW, I/O-bound scenario. Below are the key settings that we used in our test: 27 | 28 | 现在我们看下sysbench OLTP_RW,I/O负载为主场景下的测试结果。下面是测试中使用的关键设置:   29 | 30 | 数据集大小:100GB   31 | ``` 32 | innodb_buffer_pool_size=25GB 33 | innodb_doublewrite=1 34 | innodb_flush_log_at_trx_commit=1 35 | ``` 36 | 37 | ![Sysbench OLTP_RW](https://www.percona.com/blog/wp-content/uploads/2016/03/5711.blog_.n1.v1.png) 38 | 39 | While evaluating MySQL 5.7 RC we observed a performance drop in I/O-bound workloads, and it looked very similar to MySQL 5.6 behavior. The reason for the drop is the lack of free pages in the buffer pool. Page cleaner threads are unable to perform enough LRU flushing to keep up with the demand, and the query threads resort to performing single page flushes. This results in increased contention between all the of the flushing structures (especially the doublewrite buffer). 40 | 41 | 当 [评估MySQL 5.7 RC](https://www.percona.com/blog/2015/10/26/state-percona-server-5-6-mysql-5-6-mysql-5-7rc/ )时,我们观察到I/O负载为主场景中的性能下降,而且看起来类似MySQL 5.6的表现。性能下降的原因是buffer pool没有可用的空闲页。page cleaner 线程不能及时进行完成 LRU flush,而且查询线程又触发了单页刷新。这导致所有flushing structures(尤其是doublewrite buffer)之间的争用增加。 42 | 43 | For ages (Vadim discussed this ten years ago!) InnoDB has had a universal workaround for most scalability issues: the innodb_thread_concurrency system variable. It allows you to limit the number of active threads within InnoDB and reduce shared resource contention. However, it comes with a trade-off in that the maximum possible performance is also limited. 44 | 45 | 很长时间(Vadim十年前讨论过),InnoDB大多数扩展性问题都有统一的解决方案:`innodb_thread_concurrency` 选项。它可以限制InnoDB内部活跃线程的数量并减少共享资源争用。但是,but,它的缺陷也很明显,限制了可以发挥的最大性能。 46 | 47 | To understand the effect, we ran the test two times with two different InnoDB concurrency settings: 48 | 49 | 想了解其效果如何,我们在两种不同的InnoDB并发设置下进行对比测试: 50 | 51 | **innodb_thread_concurrency=0**: with this default value Percona Server 5.7 shows the best results, while MySQL 5.7 shows sharply decreasing performance with more than 64 concurrent clients. 52 | 53 | **innodb_thread_concurrency=0**,这是默认值,Percona Server 5.7 表现最好,而MySQL 5.7 显示超过64并发之后性能就急剧下降。 54 | 55 | **innodb_thread_concurrency=64**: limiting the number of threads inside InnoDB affects throughput for Percona Server slightly (with a small drop from the default setting), but for MySQL that setting change is a huge help. There were no drops in performance after 64 threads, and it’s able to maintain this performance level up to 4k threads (with some variance). 56 | 57 | **innodb_thread_concurrency=64**,限制了InnoDB内部并发数,略微影响Percona Server的吞吐量(和默认设置模式略有下降),但是对于MySQL这个改变有很大帮助。在64个并发之后,性能并没有下降,而是在达到4k并发时依然能保持这个性能表现(当然会有些波动)。 58 | 59 | To understand the details better, let’s zoom into the test run with 512 threads: 60 | 61 | 为了更好的了解细节,我们用512线程并发查看运行中的细节: 62 | 63 | ![Sysbench OLTP_RW,512线程](https://www.percona.com/blog/wp-content/uploads/2016/03/5711.blog_.n2.v4.png) 64 | 65 | The charts above show that contentions significantly affect unrestricted concurrency throughput, but affect latency even worse. Limiting concurrency helps to address contentions, but even with this setting Percona Server shows 15-25% better. 66 | 67 | 上图显示,争用会严重影响不设限制的并发吞吐量,但对响应延迟影响更大。限制并发有助于解决争用,但即使这样,Percona Server 还是有 15% ~ 25%的优势。 68 | 69 | Below you can see the contention situation for each of the above runs. The graphs show total accumulated waiting time across all threads per synchronization object (per second). For example, the absolute hottest object across all graphs is the doublewrite mutex in MySQL-5.7.11 (without thread concurrency limitation). It has about 17 seconds of wait time across 512 client threads for each second of run time. 70 | 71 | 以下你可以看到上述每次测试的争用情况。图中显示了所有线程每次同步对象(每秒)的累计等待时间。从所有图中可见,最热的对象是MySQL 5.7.11的doublewrite mutex(innodb_thread_concurrency=0),每秒并行512线程时共有大概有17s的等待时间。 72 | 73 | ![Top 5 performance schema synch waits](https://www.percona.com/blog/wp-content/uploads/2016/03/5711.blog_.n4.v6.png) 74 | 75 | **mysql server 设置** 76 | ``` 77 | innodb_log_file_size=10G 78 | innodb_doublewrite=1 79 | innodb_flush_log_at_trx_commit=1 80 | innodb_buffer_pool_instances=8 81 | innodb_change_buffering=none 82 | innodb_adaptive_hash_index=OFF 83 | innodb_flush_method=O_DIRECT 84 | innodb_flush_neighbors=0 85 | innodb_read_io_threads=8 86 | innodb_write_io_threads=8 87 | innodb_lru_scan_depth=8192 88 | innodb_io_capacity=15000 89 | innodb_io_capacity_max=25000 90 | loose-innodb-page-cleaners=4 91 | table_open_cache_instances=64 92 | table_open_cache=5000 93 | loose-innodb-log_checksum-algorithm=crc32 94 | loose-innodb-checksum-algorithm=strict_crc32 95 | max_connections=50000 96 | skip_name_resolve=ON 97 | loose-performance_schema=ON 98 | loose-performance-schema-instrument='wait/synch/%=ON', 99 | ``` 100 | ### **总结** 101 | If you are already testing 5.7, consider giving Percona Server a spin – especially if your workload is I/O bound. We’ve worked hard on Percona Server 5.7 performance improvements. In upcoming posts, we will delve into the technical details of our LRU flushing and doublewrite buffer changes. 102 | 103 | 如果你测试过MySQL 5.7,不妨再测试下Percona Server ——尤其在I/O负载为主的场景。我们在Percona Server 5.7的性能改进上煞费苦心。在后续发布的文章中,我们将深入了解LRU flushing和doublewrite buffer上的一些变化等技术细节。 104 | -------------------------------------------------------------------------------- /mysql/6-zh-Percona-Server-5.7: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhishutech/tech-blog-en2zh/0fb994692c4eae19daf5f6b525515871d098a26a/mysql/6-zh-Percona-Server-5.7 -------------------------------------------------------------------------------- /mysql/7-zh-Percona-Server-5.7-parallel-doublewrite.md: -------------------------------------------------------------------------------- 1 | >作者:Laurynas Biveinis and Alexey Stroganov 2 | >发布时间:2016.5.9 3 | >文章关键字:MySQL 4 | >原文:[Percona Server 5.7 parallel doublewrite](https://www.percona.com/blog/2016/05/09/percona-server-5-7-parallel-doublewrite/) 5 | 6 | In this blog post, we’ll discuss the ins and outs of Percona Server 5.7 parallel doublewrite. 7 | 8 | 在这篇文章中,我们将由里及外讨论Percona Server 5.7的并行doublewrite。 9 | 10 | After implementing parallel LRU flushing as described in the previous post, we went back to benchmarking. At first, we tested with the doublewrite buffer turned off. We wanted to isolate the effect of the parallel LRU flusher, and the results validated the design. Then we turned the doublewrite buffer back on and saw very little, if any, gain from the parallel LRU flusher. What happened? Let’s take a look at the data: 11 | 12 | 在[上篇文章中](https://www.percona.com/blog/2016/05/05/percona-server-5-7-multi-threaded-lru-flushing/) ,我们描述了多线程LRU刷新线程的实现,现在让我们回到基准测试。首先,在doublewrite buffer关闭状态下进行测试。我们想要隔离并行LRU刷新带来的影响,结果验证了设想。然后,重新开启doublewrite,发现从并行LRU刷新获得的好处很小。到底发生了什么?我们先来看看数据: 13 | 14 | ![TOP performance schema synch waits](https://www.percona.com/blog/wp-content/uploads/2016/03/5710.3.pfs_.all_.png) 15 | 16 | We see that the doublewrite buffer mutex is gone as expected and that the top waiters are the rseg mutexes and the index lock (shouldn’t this be fixed in 5.7?). Then we checked PMP: 17 | 18 | 如上图,我们看到doublewrite buffer互斥量如预期一样消失了,最高的等待是rseg 互斥量和index锁(这不应该在5.7修复了么?)。接着,我们检查下PMP: 19 | ``` 20 | 2678 nanosleep(libpthread.so.0),...,buf_LRU_get_free_block(buf0lru.cc:1435),... 21 | 867 pthread_cond_wait,...,log_write_up_to(log0log.cc:1293),... 22 | 396 pthread_cond_wait,...,mtr_t::s_lock(sync0rw.ic:433),btr_cur_search_to_nth_level(btr0cur.cc:1022),... 23 | 337 libaio::??(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2325),... 24 | 240 poll(libc.so.6),...,Protocol_classic::read_packet(protocol_classic.cc:810),... 25 | ``` 26 | Again we see that PFS is not telling the whole story, this time due to a missing annotation in XtraDB. Whereas the PFS results might lead us to leave the flushing analysis and focus on the rseg/undo/purge or check the index lock, PMP clearly shows that a lack of free pages is the biggest source of waits. Turning on the doublewrite buffer makes LRU flushing inadequate again. This data, however, doesn’t tell us why that is. 27 | 28 | 我们再次看到,PFS并没有显示所有的内容,这里是因为[XtraDB的缺陷](https://bugs.launchpad.net/percona-server/+bug/1561945)。而PFS的结果让我们忽略刷新方面的分析,转而聚焦rseg/undo/purge或者索引锁的检查上。PMP清晰地展现缺少空闲也是最大等待的源头。打开doublewriter buffer又会导致LRU刷新不足。然而,这些数据并没有告诉我们为什么会这样。 29 | 30 | To see how enabling the doublewrite buffer makes LRU flushing perform worse, we collect PFS and PMP data only for the server flusher (cleaner coordinator, cleaner worker, and LRU flusher) threads and I/O completion threads: 31 | 32 | 为了了解为何开启了doublewrite buffer 会使LRU刷新变得糟糕,我们收集了PFS和PMP数据,这些数据只包含刷新相关(cleaner coordinator,cleaner worker,以及LRU flusher)线程和I/O相关线程: 33 | 34 | ![TOP performance schema synch waits](https://www.percona.com/blog/wp-content/uploads/2016/04/5710.3.flushers.only_.png) 35 | 36 | If we zoom in from the whole server to the flushers only, the doublewrite mutex is back. Since we removed its contention for the single page flushes, it must be the batch doublewrite buffer usage by the flusher threads that causes it to reappear. The doublewrite buffer has a single area for 120 pages that is shared and filled by flusher threads. The page add to the batch action is protected by the doublewrite mutex, serialising the adds, and results in the following picture: 37 | 38 | 如果我们从整个服务放大到刷新线程,就又能看到douoblewrite mutex了。由于我们移除了单页刷新之间的争用,所以它会在刷新线程批量使用doublewrite buffer时重新出现。doublewrite buffer有一个有120个page的单独区域,刷新线程负责填充并共享使用。将页添加到批处理操作由doublewrite mutex保护,持续添加之后的结果如下图: 39 | 40 | ![Shared Double Write Buffer](https://www.percona.com/blog/wp-content/uploads/2016/04/dblw_mysql_1.png) 41 | 42 | By now we should be wary of reviewing PFS data without checking its results against PMP. Here it is: 43 | 44 | 现在我们应该更谨慎地评估PFS数据,并与PMP进行对比。PMP结果如下: 45 | ``` 46 | 139 libaio::??(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2448),LinuxAIOHandler::poll(os0file.cc:2594),... 47 | 56 pthread_cond_wait,...,os_event_wait_low(os0event.cc:534),buf_dblwr_add_to_batch(buf0dblwr.cc:1111),...,buf_flush_LRU_list_batch(buf0flu.cc:1555),...,buf_lru_manager(buf0flu.cc:2334),... 48 | 25 pthread_cond_wait,...,os_event_wait_low(os0event.cc:534),buf_flush_page_cleaner_worker(buf0flu.cc:3482),... 49 | 21 pthread_cond_wait,...,PolicyMutex(ut0mutex.ic:89),buf_page_io_complete(buf0buf.cc:5966),fil_aio_wait(fil0fil.cc:5754),io_handler_thread(srv0start.cc:330),... 50 | 8 pthread_cond_timedwait,...,buf_flush_page_cleaner_coordinator(buf0flu.cc:2726),... 51 | ``` 52 | As with the single-page flush doublewrite contention and the wait to get a free page in the previous posts, here we have an unannotated-for-Performance Schema doublewrite OS event wait (same bug 80979): 53 | 54 | 与之前文章中提到的单页刷新doublewrite争用,等待一个空闲的页的情景一样,这里我们有一个在Performance Schema中未被注解的doublewrite OS 事件(见[bug80979](http://bugs.mysql.com/bug.php?id=80979))。 55 | 56 | ``` 57 | if (buf_dblwr->batch_running) { 58 | /* This not nearly as bad as it looks. There is only 59 | page_cleaner thread which does background flushing 60 | in batches therefore it is unlikely to be a contention 61 | point. The only exception is when a user thread is 62 | forced to do a flush batch because of a sync 63 | checkpoint. */ 64 | int64_t sig_count = os_event_reset(buf_dblwr->b_event); 65 | mutex_exit(&buf_dblwr->mutex); 66 | os_event_wait_low(buf_dblwr->b_event, sig_count); 67 | goto try_again; 68 | } 69 | ``` 70 | This is as bad as it looks (the comment is outdated). A running doublewrite flush blocks any doublewrite page add attempts from all the other flusher threads for the duration of the flush (up to 120 data pages written twice to storage): 71 | 72 | 这看起来很糟糕(里面的注释可以不用关注,已经过时)。活跃的doublewrite刷新时会阻塞所有其他flush线程任何的doublewrite page添加(多达120个页写入两次存储)。 73 | 74 | ![Shared Double Write Buffer](https://www.percona.com/blog/wp-content/uploads/2016/04/dblw_ms_2-2.png) 75 | 76 | The issue also occurs with MySQL 5.7 multi-threaded flusher but becomes more acute with the PS 5.7 multi-threaded LRU flusher. There is no inherent reason why all the parallel flusher threads must share the single doublewrite buffer. Each thread can have its own private buffer, and doing so allows us to add to the buffers and flush them independently. This means a lot of synchronisation simply disappears. Adding pages to parallel buffers is fully asynchronous: 77 | 78 | 使用MySQL 5.7多线程flush也会出现此问题,但Percona Server 5.7的多线程LRU 刷新尤为突出。但并发flush线程并非必须共享单个doublewrite buffer。每个线程都可以有自己的私有buffer,这样可以允许添加到buffer并单独刷新它们。这意味着大量的同步会消失。将页面添加到并行buffer完全是异步的。   79 | 80 | ![Parallel Double Write Buffers](https://www.percona.com/blog/wp-content/uploads/2016/04/dblw_ps_1.png) 81 | 82 | And so is flushing them: 83 | 84 | 变成了下面的刷新模式: 85 | 86 | ![Multiple double write buffers](https://www.percona.com/blog/wp-content/uploads/2016/04/dblw_ps_2.png) 87 | 88 | This behavior is what we shipped in the 5.7.11-4 release, and the performance results were shown in a previous post. To see how the private doublewrite buffer affects flusher threads, let’s look at isolated data for those threads again. 89 | 90 | 这个特性是我们在5.7.11-4版本添加的,其性能提升效果在[之前的文章](https://www.percona.com/blog/2016/03/17/percona-server-5-7-performance-improvements/)中已经展示。想知道私有doublewrite buffer对flush线程的影响,让我们再看下这些线程的隔离数据:   91 | 92 | Performance Schema: 93 | 94 | ![TOP performance schema synch waits](https://www.percona.com/blog/wp-content/uploads/2016/04/5711.flusher.only_.png) 95 | 96 | It shows the redo log mutex as the current top contention source from the PFS point of view, which is not caused directly by flushing. 97 | 98 | 从PFS的角度看,redo log互斥量是当前使用量最多的争用来源,这不是直接由flush引起的。 99 | 100 | PMP data looks better too: 101 | 102 | PMP的数据看起来好点: 103 | ``` 104 | 112 libaio::??(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2455),...,io_handler_thread(srv0start.cc:330),... 105 | 54 pthread_cond_wait,...,buf_dblwr_flush_buffered_writes(buf0dblwr.cc:1287),...,buf_flush_LRU_list(buf0flu.cc:2341),buf_lru_manager(buf0flu.cc:2341),... 106 | 35 pthread_cond_wait,...,PolicyMutex(ut0mutex.ic:89),buf_page_io_complete(buf0buf.cc:5986),...,io_handler_thread(srv0start.cc:330),... 107 | 27 pthread_cond_wait,...,buf_flush_page_cleaner_worker(buf0flu.cc:3489),... 108 | 10 pthread_cond_wait,...,enter(ib0mutex.h:845),buf_LRU_block_free_non_file_page(ib0mutex.h:845),buf_LRU_bloc 109 | ``` 110 | The buf_dblwr_flush_buffered_writes now waits for its own thread I/O to complete and doesn’t block other threads from proceeding. The other top mutex waits belong to the LRU list mutex, which is again not caused directly by flushing. 111 | 112 | buf_dblwr_flush_buffered_writes 现在等待自己的线程I/O完成,并不阻塞其他线程运行。其他较高的互斥量等待属于LRU list mutex,这也不是由flush引起的。 113 | 114 | This concludes the description of the current flushing implementation in Percona Server. To sum up, in these post series we took you through the road to the current XtraDB 5.7 flushing implementation: 115 | 116 | * Under high concurrency I/O-bound workloads, the server has a high demand for free buffer pages. This demand can be satisfied by either LRU batch flushing, either single page flushing. 117 | * Single page flushes cause a lot of doublewrite buffer contention and are bad even without the doublewrite. 118 | * Same as in XtraDB 5.6, we removed the single page flushing altogether. 119 | * Existing cleaner LRU flushing could not satisfy free page demand. 120 | * Multi-threaded LRU flushing design addresses this issue – if the doublewrite buffer is disabled. 121 | * If the doublewrite buffer is enabled, MT LRU flushing contends on it, negating its improvements. 122 | * Parallel doublewrite buffers address this bottleneck. 123 | 124 | 以下是对当前Percona Server的flush实现描述。总结一下,在这些的系列文章中,我们重现了XtraDB 5.7刷新实现的风雨历程: 125 | * 在I/O密集的工作负载下,server对空闲buffer页面的需求量很大,这要求通过批量的LRU刷新,或者单个页面刷新来满足需要。 126 | * 单个页面flush会导致大量的doublewrite buffer争用,即使没开启doublewrite也是糟糕的。 127 | * 与XtraDB 5.6 相同,我们一并移除了单页flush。 128 | * 现有的LRU刷新机制不能满足空闲页的需求。 129 | * 多线程LRU刷新解决了这个问题——如果doublewrite buffer关闭掉。 130 | * 如果开启了doublewrite,则多线程的LRU flush会争用它,也会使得性能下降。 131 | * 并行的doublewrite buffer解决了这个问题。 132 | 133 | -------------------------------------------------------------------------------- /mysql/9-How to Choose the MySQL innodb_log_file_size.md: -------------------------------------------------------------------------------- 1 | ++++++++ ++++How to Choose the MySQL innodb_log_file_size 2 | 3 | Peter Zaitsev | October 18, 2017 | Posted In: InnoDB, Insight for DBAs, MySQL 4 | 5 | 在这篇博客里,我将介绍怎么设置MySQL innodb_log_file_size大小。 6 | 7 | In this blog post, I’ll provide some guidance on how to choose the MySQL innodb_log_file_size. 8 | 9 | 当使用默认的innodb存储引擎时,MySQL也像很多数据库管理系统那样使用日志持久性归档数据。这将确保一个事务提交后,数据不会在崩溃或者断电事件中丢失。 10 | 11 | Like many database management systems, MySQL uses logs to achieve data durability (when using the default InnoDB storage engine). This ensures that when a transaction is committed, data is not lost in the event of crash or power loss. 12 | 13 | MySQL的Innodb存储引擎使用一个固定的循环使用的redo log空间。 14 | 它的大小由innodb_log_file_size 和 innodb_log_files_in_group (默认为2)这两个参数控制。你可以统计两个参数的值相乘得到redo log实际可以用的空间大小。从技术上,你通过改变innodb_log_file_size或者innodb_log_files_in_group 参数来控制redo空间大小是没用的。大部分人只使用innodb_log_file_size,只留下innodb_log_files_in_group。(尽管从技术上讲,你可以通过改变innodb log file_size或innodb_log_file_in_group参数来控制redo空间的大小,但大多数人只是使用innodb_log_file_size参数,而不使用innodb_log_file_in_group。) 15 | 16 | MySQL’s InnoDB storage engine uses a fixed size (circular) Redo log space. The size is controlled by innodb_log_file_size and innodb_log_files_in_group (default 2). You multiply those values and get the Redo log space that available to use. ++While technically it shouldn’t matter whether you change either the innodb_log_file_size or innodb_log_files_in_group variable to control the Redo space size, most people just work with the innodb_log_file_size and leave innodb_log_files_in_group alone.++ 17 | 18 | 对于主要以写为主的工作环境Innodb存储引擎的redo空间是一项很重要的配置选项。然而,它是有代价的。你配置的越大,写IO优化越好,但是增加redo空间也意味着当系统断电或其它原因崩溃时恢复的时间也会更长。 19 | 20 | Configuring InnoDB’s Redo space size is one of the most important configuration options for write-intensive workloads. However, it comes with trade-offs. The more Redo space you have configured, the better InnoDB can optimize write IO. However, increasing the Redo space also means longer recovery times when the system loses power or crashes for other reasons. 21 | 22 | 给定 innodb_log_file_size值去估计一次系统故障恢复任务花费的时间是不容易也不明确的。它取决于硬件,MySQL版本及工作负载。它变化范围比较大。(10倍或更大变化,取决于具体情况)。然而,五分钟左右生成一个 1G的 innodb_log_feile_size 是一个比较合适的值。如果这个参数对你的环境真的非常重要,我建议模拟在全负载(在数据库已经完全预热)下系统崩溃来测试它。 23 | 24 | It is not easy or straightforward to predict how much time a system crash recovery takes for a specific innodb_log_file_size value – it depends on the hardware, MySQL version and workload. It can vary widely (10 times difference or more, depending on the circumstances). However, around five minutes per 1GB of innodb_log_file_size is a decent ballpark number. If this is really important for your environment, I would recommend testing it by a simulating system crash under full load (after the database has completely warmed up). 25 | 26 | 使用恢复时间作为参考来限定Innodb 重做日志的值的同时,也可以通过一些其它方法看到这个数字。尤其是你已经安装了PMM. 27 | 28 | While recovery time can be a guideline for the limit of the InnoDB Log File size, there are a couple of other ways you can look at this number – especially if you have Percona Monitoring and Management installed. 29 | 30 | 查看PMM中的""MySQL Innodb 指标"仪表盘,如果你看到如下图: 31 | 32 | Check Percona Monitoring and Management’s “MySQL InnoDB Metrics” Dashboard. If you see a graph like this: 33 | 34 | ![image](https://www.percona.com/blog/wp-content/uploads/2017/10/innodb_log_file_size.png) 35 | 36 | 可以看到Uncheckpointed Bytes值已经非常接近Max Checkpoint Age,你可以肯定当前的innodb_log_file_size值已经限制了系统的性能。增大它可以带来实质的性能提升。 37 | 38 | where Uncheckpointed Bytes is pushing very close to the Max Checkpoint Age, you can almost be sure your current innodb_log_file_size is limiting your system’s performance. Increasing it can provide substantial performance improvements. 39 | 40 | 如果你看到的是这样的: 41 | If you see something like this instead: 42 | 43 | ![image](https://www.percona.com/blog/wp-content/uploads/2017/10/innodb_log_file_size-2.png) 44 | 45 | 46 | 当Uncheckpointed Bytes值远低于Max Checkpoint Age值,增家日志文件的大小不会带来明显的性能提升。 47 | 48 | where the number of Uncheckpointed Bytes is well below the Max Checkpoint Age, then increasing the log file size won’t give you a significant improvement. 49 | 50 | 注意:很多MySQL配置参数是关联的。一个特定的日志文件大小对于小的innodb_buffer_pool_size来说是足够的,大的innodb_buffer_pool值可能需要更大的日志文件来提升性能。 51 | 52 | Note: many MySQL settings are interconnected. While a specific log file size might be good enough for smaller innodb_buffer_pool_size, larger InnoDB Buffer Pool values might warrant larger log files for optimal performance. 53 | 54 | 另外需要注意的是:我们早期提到的恢复时间是依赖Uncheckpointed Bytes而不是总的日志文件大小。如果你没有看到恢复时间随着更大的innodb_log_file_size值而增加,请查看innodb checkpoint age 图:可能是你在你的工作负载和配置下没有达到充分的利用大的日志文件大小。 55 | 56 | Another thing to keep in mind: the recovery time we spoke about early really depends on the Uncheckpointed Bytes rather than total log file size. If you do not see recovery time increasing with a larger innodb_log_file_size, check out InnoDB Checkpoint Age graph – it might be you just can’t fully utilize large log files with your workload and configuration. 57 | 58 | 查看日志文件大小的另外一种方法是查看日志空间使用情况 59 | 60 | Another way to look at the log file size is in context of log space usage: 61 | 62 | ![image](https://www.percona.com/blog/wp-content/uploads/2017/10/innodb_log_file_size-3.png) 63 | 64 | 这种图展示了每小时写入innodb日志文件的数据量,以及总的innodb日志文件大小。在上图中,我们可以看到日志空间大小为2G,每小时写入日志文件的数据量为12G。这意味着我们每10分钟循环使用一次。 65 | 66 | This graph shows the amount of Data Written to the InnoDB log files per hour, as well as the total size of the InnoDB log files. In the graph above, we have 2GB of log space and some 12GB written to the Log files per hour. This means we cycle through logs every ten minutes. 67 | 68 | Innodb在每次循环周期中中至少要刷新一次在缓冲池中的每个脏页。 69 | 70 | InnoDB has to flush every dirty page in the buffer pool at least once per log file cycle time. 71 | 72 | Innodb在不那么频繁刷新的情况下得到更好的性能,并且在ssd设备上损耗更小。我觉得这个值在15分钟以上或1个小时更好了。 73 | 74 | InnoDB gets better performance when it does that less frequently, and there is less wear and tear on SSD devices. I like to see this number at no less than 15 minutes. One hour is even better. 75 | 76 | 总结 77 | 78 | Summary 79 | 80 | 得到innodb_log_file_file的大小对于获得合理的快速恢复时间和好的系统性能之间的平衡是很重要的。记住,你的恢复时间并不是你想象的不重要。我希望在这篇博客中描述的技术可以帮助你找到适合你的情况的最佳值。 81 | 82 | Getting the innodb_log_file_file size is important to achieve the balance between reasonably fast crash recovery time and good system performance. Remember, your recovery time objective it is not as trivial as you might imagine. I hope the techniques described in this post help you to find the optimal value for your situation! 83 | --------------------------------------------------------------------------------