├── .gitignore ├── Setup ├── .DS_Store ├── 部署 │ ├── .DS_Store │ ├── images │ │ ├── .DS_Store │ │ ├── 2xZfM-jvkPjEpqXASZPIf6I6O60hkOMSF7BHOckE0j8.png │ │ ├── FRo9X-wvrhPyA5-rFJ_PEv2i63pmDgHHqjdwjvq_1Dg.png │ │ ├── Gl-NJECOnWKQBz5y_3I_RIUzWsye0U_M0gNrkzkiih4.png │ │ ├── HuIf-L441loi0GgS4f5fxDsGdo_K2uVc2Uax3BrRsac.png │ │ ├── IN2I79t-Cw8BgFvmfyyl5_T3SoaqtWPF2PfLy93AXbQ.png │ │ ├── PHGaWHq8bNVaqspDGa7Oo1722vM3oHhME2jq2DJCNII.png │ │ └── bHw5VkhfbgEnaerjTGLn4y0CandtPoE5BxBJGFW_2Sk.png │ ├── 安装-Installation.md │ ├── orchestrator raft vs. synchronous replication setup.md │ └── Orchestrator raft, consensus cluster.md └── 配置 │ ├── images │ ├── iShot2022-04-12 22.13.11.png │ ├── iShot2022-04-12 22.15.07.png │ ├── 1kxuGV1iI1U7rZK3yQhPkXCEZpZzW7TaTN_CwqAyNpA.png │ ├── 25MkezPTNCkyXdnpbJaKDWHOxhdf3VgyzaX4k4mZNvM.png │ ├── 48Ltk1zlaW22ugGODvp2u7ZAqMsf6yGGT5CuKuGMSfo.png │ ├── 4TCirhAfMrpkTeG2ZrKUJUVQsfD1n5qA83KcSRw54Rg.png │ ├── 4i4hx9fYyV6E-0PeU0F3Mk2m1dom08aLixKb0vsMipg.png │ ├── 5K7nberFBWpLa5v-VVjiQiPyOeUrFYhZ0dZJRwB_zr4.png │ ├── 5yy7_oU2QnMdikFRE3J_00TcJtfZjqxfSf1MNHxMVP4.png │ ├── 7-RjIA4bx4-FOq_sDJn_WRMtFRoxkX_A5CahtNqiXzI.png │ ├── GFJhBu2Z7O-s6_wT3hLnP81fMWEl85fSo0M8yYAMh-U.png │ ├── KI52M2ewCzgomON9vdDSYCu_vaEEPeCRJTqQpZif7y8.png │ ├── LYYOvAdu7jcvWRuFA7bh5fvd12szDLlkMjYAThaAXqg.png │ ├── NDwHjAdiNzpqSnWjTdy7DxHXhOLHodjVgJveNj_CCsw.png │ ├── NxnswFBqrS8I4ku3Rtnpz3WQcY6V380al-vDOXCcCGQ.png │ ├── Pm6GaFheUuPzQ8n8Qrxan_xkVTgj-jNLaFQ2zcCdEZo.png │ ├── RDnuegqaH_88rPJFc1DmZmiI32X25yPsYm1jDWr9OCE.png │ ├── W6s7vgvkDg350Yjdpc_m3xXQw4LWjCHNqdysd0r8ZRk.png │ ├── YF2K3BPZGSG6349x-hR6lEaNVzrjRAUtdsBm_W502uM.png │ ├── Yfs-1grFK4J78irF4EEx0JPgLN4QezxsG_I9CS6QCOU.png │ ├── bZ1uMPhudDv7STif4jkbPxFrznFEwTi-kIorLS_xNfY.png │ ├── ebmjSBe8JpCvqd8v5Dj-D3wN6AFudIjgGQUg6hVE7DE.png │ ├── gr_I5szl3Efa0ADCb3mCNKcz-8uaHu3BGxFtqTPjFO8.png │ ├── i_T7qFQkEbRSpu9x0TkAFWOuIscduU1GRTZ7VrL5obM.png │ ├── j9pi_DAbcAfjZ2eV2FfKwoh-al37bdRRhnbUqHYHF4M.png │ ├── jEeWlhmpmyvvYJi12GCiMD8iKNSAfQLWMYVylAMbD3M.png │ ├── ovnF7dCxAl5qyBtiZkGnKUkJFJqRLUzWgT2-8JMcqSo.png │ ├── rha3Rxq4HLjfM1Wruh1Z3yqu0tjaOqtgz2NlZhEBfvA.png │ ├── rmBMJfgtm1-ZIYfCVM8h-Qe0T3GoIrGOtkBSsMBOqYo.png │ ├── wtSBLVWbYevYpRJGs5tNgy7El1_98SXXzFc1ptSwA2E.png │ ├── x5t8cRCxj_qLm2sxsoF_lVvRU-SernBbL5EV8uOaAbI.png │ └── xOvHLpI8rM3eqWelrjshg2bbBVRAwvfiDKs_Ljv6LyM.png │ ├── Configuration Topology control.md │ ├── Configuration Discovery, name resolving.md │ ├── Orchestrator configuration in larger environments.md │ ├── Configuration Basic Discovery.md │ ├── Configuration Key-Value stores.md │ ├── Configuration Backend.md │ ├── Configuration Failure detection.md │ ├── Configuration Discovery, Pseudo-GTID.md │ ├── Configuration.md │ ├── Configuration Raft.md │ ├── Configuration Discovery, classifying servers.md │ ├── 示例配置文件.md │ └── Configuration Recovery.md ├── Meta ├── Bugs.md ├── Risk.md ├── Who uses Orchestrator .md ├── Gotchas.md ├── Presentations.md └── Supported Topologies and Versions.md ├── Use ├── images │ ├── AMoy_xel0fmOGGie-Yovh22wfFE1HMvILpjBlPTdG10.png │ ├── M7imHf5DWca_GwUDPYla03IrZbhHmR70clfWgGWmXEo.png │ ├── RiTdDpAptLqkCXB_QEB8_XgHDots9ZHUR0ml66qqS4o.png │ ├── Um6tPmjf6hlkCm40jqC04F3t9dRW3So0LejypEaDqRk.png │ ├── WD9ZaVcXFXm-7PL1VL_Vldi41C0dtoK8H33GyrT9tu8.png │ ├── YBOmFgVN1lZNFBacxg8J-0gO7A6rPB19tWMKRbu5xjQ.png │ ├── ccKLzksN_qAQCjBYluO_EddYqLi9xOLKblrT_leE96Q.png │ ├── eJMLBMBa6c9nrx7pEClTcZH3v_fK0atJZ5T7uqea5NI.png │ ├── kIu0Bg6jvv5_I9XCueAbIXwn4TRcF7NiDLr1N4iDuyw.png │ ├── r8ax_PwFzQ27WFzNrKfqCUs1KxnsTNgXeKmR5unT16E.png │ ├── tJsg29aJqEXSv3HW8ghNgZ-T0SYyC_9Hd8BqHnFUXdI.png │ └── x-ogK7jdiwx7WHFKvU3g4DNBeof9eGaM2Ii-9RXEnYA.png ├── Execution.md ├── orchestrator-client.md ├── Using the Web interface.md ├── Executing via command line.md ├── Using the web API.md └── Scripting samples.md ├── Various ├── images │ └── oNJrGmL9ujg6O1bMUvXmoXIZ_ef2u_OjI2SRl2EUMQg.png ├── Security.md ├── Agents.md ├── Pseudo GTID.md └── SSL and TLS.md ├── Deployment ├── images │ ├── 50Axq5Tb-HSk4xdvGP1b5GUj-LK_FObapENI33iofeE.png │ ├── BIjP2MvFIWGIWGT1Q2uOFgsnJ7l4_-SisU_M-nNfXAI.png │ ├── ENBunJzMa15CJC0xwniCzZQ-VRvV-WZ2IOUQKhFGVvw.png │ ├── NQ-ePOZ7dpey41yHy6LfSGOZ98AkOX1mxhM-wTSv7ms.png │ ├── RKcild_9LBYlOFCLhgSLRMIZpTFTrTBdTkaxeEQsGJA.png │ ├── dlnTL2c8qw-wWm2A91Y7zuMz0j-RLHGMH9FWPQlPJ9s.png │ └── p9yrEDChIfZ9JcH01eVqYMbOK3JXJR4iue-knBqMxQU.png ├── shard backend模式部署.md ├── Orchestrator高可用.md ├── raft模式部署.md └── 在生产环境中部署Orchestrator.md ├── Introduction ├── images │ └── tJsg29aJqEXSv3HW8ghNgZ-T0SYyC_9Hd8BqHnFUXdI.png ├── Requirements.md ├── License.md ├── Download.md └── About.md ├── Developers ├── Contributions.md ├── Developers.md ├── Understanding CI.md ├── System test environment.md ├── Building and testing.md └── Docker.md ├── Operation ├── Status Checks.md └── Tags.md ├── Failure detection & recovery ├── Key-Value stores.md └── Failure detection.md ├── Quick guides ├── FAQ.md └── First Steps.md ├── TOC.md ├── README.md └── LICENSE.md /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | .DS_Store 3 | -------------------------------------------------------------------------------- /Setup/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/.DS_Store -------------------------------------------------------------------------------- /Setup/部署/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/部署/.DS_Store -------------------------------------------------------------------------------- /Meta/Bugs.md: -------------------------------------------------------------------------------- 1 | # Bugs 2 | Please report bugs on the [Issues page](https://github.com/openark/orchestrator/issues) -------------------------------------------------------------------------------- /Setup/部署/images/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/部署/images/.DS_Store -------------------------------------------------------------------------------- /Setup/配置/images/iShot2022-04-12 22.13.11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/iShot2022-04-12 22.13.11.png -------------------------------------------------------------------------------- /Setup/配置/images/iShot2022-04-12 22.15.07.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/iShot2022-04-12 22.15.07.png -------------------------------------------------------------------------------- /Use/images/AMoy_xel0fmOGGie-Yovh22wfFE1HMvILpjBlPTdG10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/AMoy_xel0fmOGGie-Yovh22wfFE1HMvILpjBlPTdG10.png -------------------------------------------------------------------------------- /Use/images/M7imHf5DWca_GwUDPYla03IrZbhHmR70clfWgGWmXEo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/M7imHf5DWca_GwUDPYla03IrZbhHmR70clfWgGWmXEo.png -------------------------------------------------------------------------------- /Use/images/RiTdDpAptLqkCXB_QEB8_XgHDots9ZHUR0ml66qqS4o.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/RiTdDpAptLqkCXB_QEB8_XgHDots9ZHUR0ml66qqS4o.png -------------------------------------------------------------------------------- /Use/images/Um6tPmjf6hlkCm40jqC04F3t9dRW3So0LejypEaDqRk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/Um6tPmjf6hlkCm40jqC04F3t9dRW3So0LejypEaDqRk.png -------------------------------------------------------------------------------- /Use/images/WD9ZaVcXFXm-7PL1VL_Vldi41C0dtoK8H33GyrT9tu8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/WD9ZaVcXFXm-7PL1VL_Vldi41C0dtoK8H33GyrT9tu8.png -------------------------------------------------------------------------------- /Use/images/YBOmFgVN1lZNFBacxg8J-0gO7A6rPB19tWMKRbu5xjQ.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/YBOmFgVN1lZNFBacxg8J-0gO7A6rPB19tWMKRbu5xjQ.png -------------------------------------------------------------------------------- /Use/images/ccKLzksN_qAQCjBYluO_EddYqLi9xOLKblrT_leE96Q.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/ccKLzksN_qAQCjBYluO_EddYqLi9xOLKblrT_leE96Q.png -------------------------------------------------------------------------------- /Use/images/eJMLBMBa6c9nrx7pEClTcZH3v_fK0atJZ5T7uqea5NI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/eJMLBMBa6c9nrx7pEClTcZH3v_fK0atJZ5T7uqea5NI.png -------------------------------------------------------------------------------- /Use/images/kIu0Bg6jvv5_I9XCueAbIXwn4TRcF7NiDLr1N4iDuyw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/kIu0Bg6jvv5_I9XCueAbIXwn4TRcF7NiDLr1N4iDuyw.png -------------------------------------------------------------------------------- /Use/images/r8ax_PwFzQ27WFzNrKfqCUs1KxnsTNgXeKmR5unT16E.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/r8ax_PwFzQ27WFzNrKfqCUs1KxnsTNgXeKmR5unT16E.png -------------------------------------------------------------------------------- /Use/images/tJsg29aJqEXSv3HW8ghNgZ-T0SYyC_9Hd8BqHnFUXdI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/tJsg29aJqEXSv3HW8ghNgZ-T0SYyC_9Hd8BqHnFUXdI.png -------------------------------------------------------------------------------- /Use/images/x-ogK7jdiwx7WHFKvU3g4DNBeof9eGaM2Ii-9RXEnYA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Use/images/x-ogK7jdiwx7WHFKvU3g4DNBeof9eGaM2Ii-9RXEnYA.png -------------------------------------------------------------------------------- /Various/images/oNJrGmL9ujg6O1bMUvXmoXIZ_ef2u_OjI2SRl2EUMQg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Various/images/oNJrGmL9ujg6O1bMUvXmoXIZ_ef2u_OjI2SRl2EUMQg.png -------------------------------------------------------------------------------- /Deployment/images/50Axq5Tb-HSk4xdvGP1b5GUj-LK_FObapENI33iofeE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Deployment/images/50Axq5Tb-HSk4xdvGP1b5GUj-LK_FObapENI33iofeE.png -------------------------------------------------------------------------------- /Deployment/images/BIjP2MvFIWGIWGT1Q2uOFgsnJ7l4_-SisU_M-nNfXAI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Deployment/images/BIjP2MvFIWGIWGT1Q2uOFgsnJ7l4_-SisU_M-nNfXAI.png -------------------------------------------------------------------------------- /Deployment/images/ENBunJzMa15CJC0xwniCzZQ-VRvV-WZ2IOUQKhFGVvw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Deployment/images/ENBunJzMa15CJC0xwniCzZQ-VRvV-WZ2IOUQKhFGVvw.png -------------------------------------------------------------------------------- /Deployment/images/NQ-ePOZ7dpey41yHy6LfSGOZ98AkOX1mxhM-wTSv7ms.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Deployment/images/NQ-ePOZ7dpey41yHy6LfSGOZ98AkOX1mxhM-wTSv7ms.png -------------------------------------------------------------------------------- /Deployment/images/RKcild_9LBYlOFCLhgSLRMIZpTFTrTBdTkaxeEQsGJA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Deployment/images/RKcild_9LBYlOFCLhgSLRMIZpTFTrTBdTkaxeEQsGJA.png -------------------------------------------------------------------------------- /Deployment/images/dlnTL2c8qw-wWm2A91Y7zuMz0j-RLHGMH9FWPQlPJ9s.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Deployment/images/dlnTL2c8qw-wWm2A91Y7zuMz0j-RLHGMH9FWPQlPJ9s.png -------------------------------------------------------------------------------- /Deployment/images/p9yrEDChIfZ9JcH01eVqYMbOK3JXJR4iue-knBqMxQU.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Deployment/images/p9yrEDChIfZ9JcH01eVqYMbOK3JXJR4iue-knBqMxQU.png -------------------------------------------------------------------------------- /Setup/部署/images/2xZfM-jvkPjEpqXASZPIf6I6O60hkOMSF7BHOckE0j8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/部署/images/2xZfM-jvkPjEpqXASZPIf6I6O60hkOMSF7BHOckE0j8.png -------------------------------------------------------------------------------- /Setup/部署/images/FRo9X-wvrhPyA5-rFJ_PEv2i63pmDgHHqjdwjvq_1Dg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/部署/images/FRo9X-wvrhPyA5-rFJ_PEv2i63pmDgHHqjdwjvq_1Dg.png -------------------------------------------------------------------------------- /Setup/部署/images/Gl-NJECOnWKQBz5y_3I_RIUzWsye0U_M0gNrkzkiih4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/部署/images/Gl-NJECOnWKQBz5y_3I_RIUzWsye0U_M0gNrkzkiih4.png -------------------------------------------------------------------------------- /Setup/部署/images/HuIf-L441loi0GgS4f5fxDsGdo_K2uVc2Uax3BrRsac.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/部署/images/HuIf-L441loi0GgS4f5fxDsGdo_K2uVc2Uax3BrRsac.png -------------------------------------------------------------------------------- /Setup/部署/images/IN2I79t-Cw8BgFvmfyyl5_T3SoaqtWPF2PfLy93AXbQ.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/部署/images/IN2I79t-Cw8BgFvmfyyl5_T3SoaqtWPF2PfLy93AXbQ.png -------------------------------------------------------------------------------- /Setup/部署/images/PHGaWHq8bNVaqspDGa7Oo1722vM3oHhME2jq2DJCNII.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/部署/images/PHGaWHq8bNVaqspDGa7Oo1722vM3oHhME2jq2DJCNII.png -------------------------------------------------------------------------------- /Setup/部署/images/bHw5VkhfbgEnaerjTGLn4y0CandtPoE5BxBJGFW_2Sk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/部署/images/bHw5VkhfbgEnaerjTGLn4y0CandtPoE5BxBJGFW_2Sk.png -------------------------------------------------------------------------------- /Setup/配置/images/1kxuGV1iI1U7rZK3yQhPkXCEZpZzW7TaTN_CwqAyNpA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/1kxuGV1iI1U7rZK3yQhPkXCEZpZzW7TaTN_CwqAyNpA.png -------------------------------------------------------------------------------- /Setup/配置/images/25MkezPTNCkyXdnpbJaKDWHOxhdf3VgyzaX4k4mZNvM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/25MkezPTNCkyXdnpbJaKDWHOxhdf3VgyzaX4k4mZNvM.png -------------------------------------------------------------------------------- /Setup/配置/images/48Ltk1zlaW22ugGODvp2u7ZAqMsf6yGGT5CuKuGMSfo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/48Ltk1zlaW22ugGODvp2u7ZAqMsf6yGGT5CuKuGMSfo.png -------------------------------------------------------------------------------- /Setup/配置/images/4TCirhAfMrpkTeG2ZrKUJUVQsfD1n5qA83KcSRw54Rg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/4TCirhAfMrpkTeG2ZrKUJUVQsfD1n5qA83KcSRw54Rg.png -------------------------------------------------------------------------------- /Setup/配置/images/4i4hx9fYyV6E-0PeU0F3Mk2m1dom08aLixKb0vsMipg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/4i4hx9fYyV6E-0PeU0F3Mk2m1dom08aLixKb0vsMipg.png -------------------------------------------------------------------------------- /Setup/配置/images/5K7nberFBWpLa5v-VVjiQiPyOeUrFYhZ0dZJRwB_zr4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/5K7nberFBWpLa5v-VVjiQiPyOeUrFYhZ0dZJRwB_zr4.png -------------------------------------------------------------------------------- /Setup/配置/images/5yy7_oU2QnMdikFRE3J_00TcJtfZjqxfSf1MNHxMVP4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/5yy7_oU2QnMdikFRE3J_00TcJtfZjqxfSf1MNHxMVP4.png -------------------------------------------------------------------------------- /Setup/配置/images/7-RjIA4bx4-FOq_sDJn_WRMtFRoxkX_A5CahtNqiXzI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/7-RjIA4bx4-FOq_sDJn_WRMtFRoxkX_A5CahtNqiXzI.png -------------------------------------------------------------------------------- /Setup/配置/images/GFJhBu2Z7O-s6_wT3hLnP81fMWEl85fSo0M8yYAMh-U.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/GFJhBu2Z7O-s6_wT3hLnP81fMWEl85fSo0M8yYAMh-U.png -------------------------------------------------------------------------------- /Setup/配置/images/KI52M2ewCzgomON9vdDSYCu_vaEEPeCRJTqQpZif7y8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/KI52M2ewCzgomON9vdDSYCu_vaEEPeCRJTqQpZif7y8.png -------------------------------------------------------------------------------- /Setup/配置/images/LYYOvAdu7jcvWRuFA7bh5fvd12szDLlkMjYAThaAXqg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/LYYOvAdu7jcvWRuFA7bh5fvd12szDLlkMjYAThaAXqg.png -------------------------------------------------------------------------------- /Setup/配置/images/NDwHjAdiNzpqSnWjTdy7DxHXhOLHodjVgJveNj_CCsw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/NDwHjAdiNzpqSnWjTdy7DxHXhOLHodjVgJveNj_CCsw.png -------------------------------------------------------------------------------- /Setup/配置/images/NxnswFBqrS8I4ku3Rtnpz3WQcY6V380al-vDOXCcCGQ.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/NxnswFBqrS8I4ku3Rtnpz3WQcY6V380al-vDOXCcCGQ.png -------------------------------------------------------------------------------- /Setup/配置/images/Pm6GaFheUuPzQ8n8Qrxan_xkVTgj-jNLaFQ2zcCdEZo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/Pm6GaFheUuPzQ8n8Qrxan_xkVTgj-jNLaFQ2zcCdEZo.png -------------------------------------------------------------------------------- /Setup/配置/images/RDnuegqaH_88rPJFc1DmZmiI32X25yPsYm1jDWr9OCE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/RDnuegqaH_88rPJFc1DmZmiI32X25yPsYm1jDWr9OCE.png -------------------------------------------------------------------------------- /Setup/配置/images/W6s7vgvkDg350Yjdpc_m3xXQw4LWjCHNqdysd0r8ZRk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/W6s7vgvkDg350Yjdpc_m3xXQw4LWjCHNqdysd0r8ZRk.png -------------------------------------------------------------------------------- /Setup/配置/images/YF2K3BPZGSG6349x-hR6lEaNVzrjRAUtdsBm_W502uM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/YF2K3BPZGSG6349x-hR6lEaNVzrjRAUtdsBm_W502uM.png -------------------------------------------------------------------------------- /Setup/配置/images/Yfs-1grFK4J78irF4EEx0JPgLN4QezxsG_I9CS6QCOU.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/Yfs-1grFK4J78irF4EEx0JPgLN4QezxsG_I9CS6QCOU.png -------------------------------------------------------------------------------- /Setup/配置/images/bZ1uMPhudDv7STif4jkbPxFrznFEwTi-kIorLS_xNfY.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/bZ1uMPhudDv7STif4jkbPxFrznFEwTi-kIorLS_xNfY.png -------------------------------------------------------------------------------- /Setup/配置/images/ebmjSBe8JpCvqd8v5Dj-D3wN6AFudIjgGQUg6hVE7DE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/ebmjSBe8JpCvqd8v5Dj-D3wN6AFudIjgGQUg6hVE7DE.png -------------------------------------------------------------------------------- /Setup/配置/images/gr_I5szl3Efa0ADCb3mCNKcz-8uaHu3BGxFtqTPjFO8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/gr_I5szl3Efa0ADCb3mCNKcz-8uaHu3BGxFtqTPjFO8.png -------------------------------------------------------------------------------- /Setup/配置/images/i_T7qFQkEbRSpu9x0TkAFWOuIscduU1GRTZ7VrL5obM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/i_T7qFQkEbRSpu9x0TkAFWOuIscduU1GRTZ7VrL5obM.png -------------------------------------------------------------------------------- /Setup/配置/images/j9pi_DAbcAfjZ2eV2FfKwoh-al37bdRRhnbUqHYHF4M.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/j9pi_DAbcAfjZ2eV2FfKwoh-al37bdRRhnbUqHYHF4M.png -------------------------------------------------------------------------------- /Setup/配置/images/jEeWlhmpmyvvYJi12GCiMD8iKNSAfQLWMYVylAMbD3M.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/jEeWlhmpmyvvYJi12GCiMD8iKNSAfQLWMYVylAMbD3M.png -------------------------------------------------------------------------------- /Setup/配置/images/ovnF7dCxAl5qyBtiZkGnKUkJFJqRLUzWgT2-8JMcqSo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/ovnF7dCxAl5qyBtiZkGnKUkJFJqRLUzWgT2-8JMcqSo.png -------------------------------------------------------------------------------- /Setup/配置/images/rha3Rxq4HLjfM1Wruh1Z3yqu0tjaOqtgz2NlZhEBfvA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/rha3Rxq4HLjfM1Wruh1Z3yqu0tjaOqtgz2NlZhEBfvA.png -------------------------------------------------------------------------------- /Setup/配置/images/rmBMJfgtm1-ZIYfCVM8h-Qe0T3GoIrGOtkBSsMBOqYo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/rmBMJfgtm1-ZIYfCVM8h-Qe0T3GoIrGOtkBSsMBOqYo.png -------------------------------------------------------------------------------- /Setup/配置/images/wtSBLVWbYevYpRJGs5tNgy7El1_98SXXzFc1ptSwA2E.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/wtSBLVWbYevYpRJGs5tNgy7El1_98SXXzFc1ptSwA2E.png -------------------------------------------------------------------------------- /Setup/配置/images/x5t8cRCxj_qLm2sxsoF_lVvRU-SernBbL5EV8uOaAbI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/x5t8cRCxj_qLm2sxsoF_lVvRU-SernBbL5EV8uOaAbI.png -------------------------------------------------------------------------------- /Setup/配置/images/xOvHLpI8rM3eqWelrjshg2bbBVRAwvfiDKs_Ljv6LyM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Setup/配置/images/xOvHLpI8rM3eqWelrjshg2bbBVRAwvfiDKs_Ljv6LyM.png -------------------------------------------------------------------------------- /Introduction/images/tJsg29aJqEXSv3HW8ghNgZ-T0SYyC_9Hd8BqHnFUXdI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Fanduzi/orchestrator-zh-doc/HEAD/Introduction/images/tJsg29aJqEXSv3HW8ghNgZ-T0SYyC_9Hd8BqHnFUXdI.png -------------------------------------------------------------------------------- /Introduction/Requirements.md: -------------------------------------------------------------------------------- 1 | # Requirements 2 | # [Requirements](https://github.com/openark/orchestrator/blob/master/docs/requirements.md) 3 | `orchestrator`是一个独立的应用程序. 当被配置为与MySQL后端一起运行时, 需要安装`MySQL`. 当被配置为与`SQLite`后端一起运行时, 不需要进一步的依赖性. 4 | 5 | `orchestrator` 在 Linux 64 位和 Mac OS/X 上构建和测试. 官方二进制文件仅适用于 Linux. -------------------------------------------------------------------------------- /Introduction/License.md: -------------------------------------------------------------------------------- 1 | # License 2 | `orchestrator` is released under the Apache License 2.0. 3 | 4 | See [LICENSE](https://github.com/openark/orchestrator/blob/master/LICENSE) 5 | 6 | This repository includes 3rd party libraries which have their own licenses. These libraries are found under [https://github.com/openark/orchestrator/tree/master/vendor](https://github.com/openark/orchestrator/tree/master/vendor) -------------------------------------------------------------------------------- /Developers/Contributions.md: -------------------------------------------------------------------------------- 1 | # Contributions 2 | # Contributions 3 | The repository is open for pull requests. Please note: 4 | 5 | * Please first consul with the maintainers by creating an issue to discuss. 6 | * All contributions must be licensed under the [Apache License 2.0](https://github.com/openark/orchestrator/blob/master/docs/license.md) or compatible. 7 | * Please understand that the maintainers may not have the resources to review all pull requests in a timely fashion. 8 | * Please understand if the maintainer believes accepting your pull request may introduce excessive maintenance toll. 9 | 10 | -------------------------------------------------------------------------------- /Setup/配置/Configuration Topology control.md: -------------------------------------------------------------------------------- 1 | # Configuration: Topology control 2 | # [Configuration: topology control](https://github.com/openark/orchestrator/blob/master/docs/configuration-topology-control.md) 3 | The following configuration affects how `orchestrator` applies changes to topology servers: 4 | 5 | `orchestrator` will figure out the name of the cluster, data center, and more. 6 | 7 | ```yaml 8 | { 9 | "UseSuperReadOnly": false, 10 | } 11 | ``` 12 | ### UseSuperReadOnly 13 | 默认为`false`. 当为`true`时, 每当`orchestrator`被要求set/clear `read_only`时, 它也会将更改应用到`super_read_only`. `super_read_only`仅在Oracle MySQL和Percona Server上可用. -------------------------------------------------------------------------------- /Introduction/Download.md: -------------------------------------------------------------------------------- 1 | # Download 2 | `orchestrator` is released as open source and is available at [GitHub](https://github.com/openark/orchestrator). Find official releases in [https://github.com/openark/orchestrator/releases](https://github.com/openark/orchestrator/releases) 3 | 4 | `orchestrator` packages can be found in [https://packagecloud.io/github/orchestrator](https://packagecloud.io/github/orchestrator) 5 | 6 | For developers: `orchestrator` is go-gettable. Issue: 7 | 8 | ```Plain Text 9 | go get github.com/openark/orchestrator/... 10 | ``` 11 | See [Developers](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Developers.md) 12 | -------------------------------------------------------------------------------- /Developers/Developers.md: -------------------------------------------------------------------------------- 1 | # Developers 2 | # [Developers](https://github.com/openark/orchestrator/blob/master/docs/developers.md) 3 | To build, test and contribute to `orchestrator`, please refer t othe following pages: 4 | 5 | * [Understanding CI](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Understanding%20CI.md) 6 | * [Building and testing](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Building%20and%20testing.md) 7 | * [System test environment](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/System%20test%20environment.md) 8 | * [Docker](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Docker.md) 9 | * [Contributions](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Contributions.md) 10 | -------------------------------------------------------------------------------- /Meta/Risk.md: -------------------------------------------------------------------------------- 1 | # Risk 2 | # [Risks](https://github.com/openark/orchestrator/blob/master/docs/risks.md) 3 | 大多数情况下, `orchestrator` 仅从您的拓扑中读取状态. 默认配置是每分钟轮询一次每个实例 4 | 5 | `orchestrator`将连接到您的拓扑服务器, 并将限制并发连接的数量. 6 | 7 | 你可以用`orchestrator`来重构你的拓扑结构: 移动复制, 改变复制树. `orchestrator` 将尽最大努力: 8 | 9 | 1. 确保您只将实例移动到它可以复制的有效位置(例如, 您不要将 5.5 服务器放在 5.6 服务器之下) 10 | 2. 确保您在正确的时间移动实例(即实例和受影响的服务器没有严重滞后, 以便操作能够及时完成) 11 | 3. Do the math correctly: stop the replica at the right time, roll it forward to the right position, `CHANGE MASTER` to the correct location & position. 12 | 13 | 上述内容经过了很好的检验. 14 | 15 | 你可以使用`orchestrator`来故障转移你的拓扑结构. 你会关注到: 16 | 17 | * `orchestrator`不会在没有需要时进行故障转移. 18 | * 当有需要时, `orchestrator`会进行故障转移. 19 | * 故障转移不会损失太多服务器. 20 | > A failover doesn't lose too many servers 21 | * 故障转移以一致的拓扑结构结束. 22 | > Failover ends with a consistent topology 23 | 24 | 故障转移是通过钩子与你的部署内在地联系在一起的 25 | 26 | > Failovers are inherently tied to your deployments through hooks. 27 | 28 | 故障转移总是有风险的. 请确保对其进行测试. 29 | 30 | Please make sure to read the [LICENSE](https://github.com/openark/orchestrator/blob/master/LICENSE), and especially the "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND" part. 31 | 32 | -------------------------------------------------------------------------------- /Setup/配置/Configuration Discovery, name resolving.md: -------------------------------------------------------------------------------- 1 | # Configuration: Discovery, name resolving 2 | # [Configuration: discovery, name resolving](https://github.com/openark/orchestrator/blob/master/docs/configuration-discovery-resolve.md) 3 | 让orchestrator知道如何解析主机名(hostname). 4 | 5 | Most people will want this: 6 | 7 | ```yaml 8 | { 9 | "HostnameResolveMethod": "default", 10 | "MySQLHostnameResolveMethod": "@@hostname", 11 | } 12 | ``` 13 | 您的主机可以通过IP地址、和/或short names、和/或fqdn, 和/或VIP相互指代. 14 | 15 | `orchestrator`需要唯一和一致地识别一个主机. 它通过解析目标主机名来做到这一点. 16 | 17 | `"MySQLHostnameResolveMethod": "@@hostname"` 是最简单的方式. 可选项有: 18 | 19 | * `"HostnameResolveMethod": "cname"`: do CNAME resolving on hostname 20 | 21 | * `"HostnameResolveMethod": "default"`: no special resolving via net protocols 22 | 23 | * `"MySQLHostnameResolveMethod": "@@hostname"`: issue a `select @@hostname` 24 | 25 | * `"MySQLHostnameResolveMethod": "@@report_host"`: issue a `select @@report_host`, requires `report_host` to be configured 26 | 27 | * `"HostnameResolveMethod": "none"` and `"MySQLHostnameResolveMethod": ""`: do nothing. Never resolve. This may appeal to setups where everything uses IP addresses at all times. 28 | -------------------------------------------------------------------------------- /Meta/Who uses Orchestrator .md: -------------------------------------------------------------------------------- 1 | # Who uses Orchestrator? 2 | # [Who uses Orchestrator?](https://github.com/openark/orchestrator/blob/master/docs/users.md) 3 | Question: Is orchestrator used by many people? Answer: Right now there is no full list of people who are actively using it so it seemed like a good idea to generate a page of current users. If your name or your company's name is missing from this and you would like to be added please send a pull request to github.com/openark/orchestrator/pulls 4 | 5 | Please keep this list in alphabetical order and include the company name and url and a short description of usage. 6 | 7 | * [Booking.com](http://www.booking.com/) 8 | * [Box](http://www.box.com/) 9 | * [Citymobil](https://www.city-mobil.ru/) 10 | * [GitHub Inc](http://www.github.com/) 11 | * [Vitess framework](http://vitess.io/) 12 | * [HubSpot](https://www.hubspot.com/) 13 | * [MessageBird](https://www.messagebird.com/) 14 | * [Outbrain](http://www.outbrain.com/) 15 | * [PagerDuty](http://www.pagerduty.com/) 16 | * [Rentalcars.com](http://www.rentalcars.com/) 17 | * [SendGrid](https://sendgrid.com/) 18 | * [Slack](https://slackhq.com/) 19 | * [Square](http://squareup.com/) 20 | * [SurveyMonkey](http://www.surveymonkey.com/) 21 | * [Wix](http://www.wix.com/) -------------------------------------------------------------------------------- /Operation/Status Checks.md: -------------------------------------------------------------------------------- 1 | # Status Checks 2 | # [Status Checks](https://github.com/openark/orchestrator/blob/master/docs/status-checks.md) 3 | There is a status endpoint located at `/api/status`, 它对系统进行健康检查, 如果一切正常, 则返回 HTTP 状态代码 200. 否则, 它会返回 HTTP 状态代码 500. 4 | 5 | #### Custom Status Checks 6 | 由于公司可能会对其状态检查端点使用各种标准, 因此您可以通过设置进行自定义: 7 | 8 | ```json 9 | { 10 | "StatusEndpoint": "/my_status" 11 | } 12 | ``` 13 | 或者你想要的任何endpoint. 14 | 15 | #### Lightweight Health Check 16 | 这个状态检查是一个非常轻量级的检查, 因为我们假设您的负载均衡器可能会频繁地访问它或其他一些频繁的监控。. 如果您想要更丰富的检查来实际更改数据库, 您可以使用以下命令进行设置: 17 | 18 | >  This status check is a very lightweight check because we assume your load balancer might be hitting it frequently or some other frequent monitoring. If you want a richer check that actually makes changes to the database you can set that with: 19 | 20 | ```json 21 | { 22 | "StatusSimpleHealth": false 23 | } 24 | ``` 25 | #### SSL Verification 26 | 最后, 如果您使用 SSL/TLS 运行, 我们不需要状态检查来提供有效的 OU 或客户端证书. 如果您正在使用更丰富的检查并希望打开验证,您可以设置: 27 | 28 | > Lastly if you run with SSL/TLS we *don't* require the status check to have a valid OU or client cert to be presented. If you're using that richer check and would like to have the verification turned on you can set: 29 | 30 | ```json 31 | { 32 | "StatusOUVerify": true 33 | } 34 | ``` 35 | -------------------------------------------------------------------------------- /Use/Execution.md: -------------------------------------------------------------------------------- 1 | # Execution 2 | # [Execution](https://github.com/openark/orchestrator/blob/master/docs/execution.md) 3 | #### Executing as web/API service 4 | 假设你已经在`/usr/local/orchestrator`下安装了`orchestrator` : 5 | 6 | ```bash 7 | cd /usr/local/orchestrator && ./orchestrator http 8 | ``` 9 | `Orchestrator`将开始监听`3000`端口. 将你的浏览器指向`http://your.host:3000/`, 你就可以开始了. 你可以跳到下一节. 10 | 11 | 如果您喜欢调试消息: 12 | 13 | ```bash 14 | cd /usr/local/orchestrator && ./orchestrator --debug http 15 | ``` 16 | 或者, 在出现错误时显示更详细的信息: 17 | 18 | ```bash 19 | cd /usr/local/orchestrator && ./orchestrator --debug --stack http 20 | ``` 21 | 以上在 `/etc/orchestrator.conf.json`、`conf/orchestrator.conf.json`、`orchestrator.conf.json` 中按顺序查找配置. 通常是将配置放在`/etc/orchestrator.conf.json` 中. 由于它包含您的 MySQL 服务器的凭据, 您可能希望限制对该文件的访问. 您可以选择为配置文件使用不同的位置, 在这种情况下执行: 22 | 23 | ```bash 24 | cd /usr/local/orchestrator && ./orchestrator --debug --config=/path/to/config.file http 25 | ``` 26 | 默认情况下, Web/API 服务将对所有已知服务器发出连续、无限的轮询. 这使`orchestrator`的数据保持最新. 你通常想要这种行为. 但你可以禁用它, 使`orchestrator`只为 API/Web 提供服务, 但从不更新实例状态 27 | 28 | ```bash 29 | cd /usr/local/orchestrator && ./orchestrator --discovery=false http 30 | ``` 31 | 上述内容对开发和测试很有用. 你可能希望保持默认状态 32 | 33 | >  The above is useful for development and testing purposes. You probably wish to keep to the defaults. -------------------------------------------------------------------------------- /Setup/配置/Orchestrator configuration in larger environments.md: -------------------------------------------------------------------------------- 1 | # Orchestrator configuration in larger environments 2 | # [Orchestrator configuration in larger environments](https://github.com/openark/orchestrator/blob/master/docs/configuration-large.md) 3 | 如果监控大量的服务器, 后端数据库会成为一个瓶颈. 下面的内容是指使用MySQL作为orchestrator的后端. 4 | 5 | 一些配置选项允许你控制吞吐量. 这些设置是: 6 | 7 | * `BufferInstanceWrites` 8 | * `InstanceWriteBufferSize` 9 | * `InstanceFlushIntervalMilliseconds` 10 | 11 | * `DiscoveryMaxConcurrency` 12 | 使用`DiscoveryMaxConcurrency`限制orchestrator的并发发现数量, 并确保后端服务器的`max_connections`设置足够高, 以允许 orchestrator 根据需要建立尽可能多的连接. 13 | 通过设置 `BufferInstanceWrites: True`, 当轮询完成后, 结果将被缓冲,直到`InstanceFlushIntervalMilliseconds`过后或已经写入了`InstanceWriteBufferSize` 大小的数据. 14 | > 译者注: 满足`InstanceFlushIntervalMilliseconds` 或`InstanceWriteBufferSize` 条件后, 缓存的数据将被清空 15 | 16 | 缓冲写入是按照写入的时间排序的, 使用单个 `insert ... on duplicate key update ...` 调用. 如果同一主机出现两次, 则只有最后一次写入会写入数据库. 17 | `InstanceFlushIntervalMilliseconds`应该远远低于`InstancePollSeconds`, 因为这个值太高意味着数据没有被写入orchestrator 后端数据库. 这可能会导致`not recently checked`问题. 另外, 不同的健康检查是针对后端数据库状态运行的, 所以如果更新不够频繁, 可能会导致orchestrator无法正确检测到不同的故障情况. 18 | 19 | 对于较大的Orchestrator环境, 建议的初始值可能是: 20 | 21 | ```yaml 22 | ... 23 | "BufferInstanceWrites": true, 24 | "InstanceWriteBufferSize": 1000, 25 | "InstanceFlushIntervalMilliseconds": 50, 26 | "DiscoveryMaxConcurrency": 1000, 27 | ... 28 | ``` 29 | -------------------------------------------------------------------------------- /Setup/配置/Configuration Basic Discovery.md: -------------------------------------------------------------------------------- 1 | # Configuration: Basic Discovery 2 | # [Configuration: basic discovery](https://github.com/openark/orchestrator/blob/master/docs/configuration-discovery-basic.md) 3 | 让`orchestrator`知道如何查询MySQL拓扑结构, 提取哪些信息. 4 | 5 | ```yaml 6 | { 7 | "MySQLTopologyCredentialsConfigFile": "/etc/mysql/orchestrator-topology.cnf", 8 | "InstancePollSeconds": 5, 9 | "DiscoverByShowSlaveHosts": false, 10 | } 11 | ``` 12 | `MySQLTopologyCredentialsConfigFile`遵循与`MySQLOrchestratorCredentialsConfigFile`类似的规则. 你可以选择使用纯文本凭证. 13 | 14 | ```Plain Text 15 | [client] 16 | user=orchestrator 17 | password=orc_topology_password 18 | ``` 19 | 或者, 你可以选择使用纯文本凭证 20 | 21 | ```yaml 22 | { 23 | "MySQLTopologyUser": "orchestrator", 24 | "MySQLTopologyPassword": "orc_topology_password", 25 | } 26 | ``` 27 | `orchestrator`将每隔`InstancePollSeconds`秒进行一次探测. 28 | 29 | 请在所有的MySQL节点, 授予以下权限: 30 | 31 | ```sql 32 | CREATE USER 'orchestrator'@'orc_host' IDENTIFIED BY 'orc_topology_password'; 33 | GRANT SUPER, PROCESS, REPLICATION SLAVE, REPLICATION CLIENT, RELOAD ON *.* TO 'orchestrator'@'orc_host'; 34 | GRANT SELECT ON meta.* TO 'orchestrator'@'orc_host'; 35 | GRANT SELECT ON ndbinfo.processes TO 'orchestrator'@'orc_host'; -- Only for NDB Cluster 36 | GRANT SELECT ON performance_schema.replication_group_members TO 'orchestrator'@'orc_host'; -- Only for Group Replication / InnoDB cluster 37 | ``` 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | -------------------------------------------------------------------------------- /Meta/Gotchas.md: -------------------------------------------------------------------------------- 1 | # Gotchas 2 | # [Gotchas](https://github.com/openark/orchestrator/blob/master/docs/gotchas.md) 3 | * 默认情况下, `orchestrator` 每分钟仅轮询一次服务器(可通过 `InstancePollSeconds` 进行配置). 这意味着您看到的任何状态本质上都是一种估计. 不同的实例在不同的时间被轮询. 例如, 您在集群页面上看到的状态不一定反映给定的时间点, 而是最后一分钟(或您使用的任何轮询间隔)中不同时间点的组合 4 | 5 | The `problems` drop down to the right也是异步执行的. 因此, 您可能会在两个不同状态的两个位置(一次在集群页面中, 一次在问题下拉列表中)看到相同的实例. 6 | 7 | > The `problems` drop down to the right is also executed asynchronously. You may therefore see the same instance in two places (once in the `cluster` page, once in the `problems` drop down) in two different states. 8 | 9 | 如果您想获得最新的实例状态, 请使用实例设置对话框中的“刷新”按钮 10 | 11 | * `orchestrator`可能需要几分钟的时间来完全检测一个集群的拓扑结构. 这个时间取决于拓扑结构的深度(如果你有复制的复制, 时间会增加). 这是由于`orchestrator`独立地轮询实例, 对拓扑结构的了解必须在下次轮询时从主站传播到副本. 12 | * 具体来说, 如果你故障切换到一个新的主站, 你可能会发现在几分钟内, 拓扑结构看起来是空的. 这可能发生, 因为实例曾经识别自己属于某个拓扑结构, 而这个拓扑结构现在正在被破坏. 这是自愈. 刷新并查看`Clusters`菜单, 查看随着时间推移新创建的集群(名称以新主机的名称命名). 13 | * Specifically, if you fail over to a new master, you may find that for a couple minutes the topologies seem empty. This may happen because instances used to identify themselves as belonging to a certain topology that is now being destroyed. This is self-healing. Refresh and look at the `Clusters` menu to review the newly created cluster (names after the new master) over time. 14 | * Don't restart `orchestrator` while you're running a seed (only applies to working with *orchestrator-agent*) 15 | 16 | Otherwise `orchestrator` is non-intrusive and self-healing. You can restart it whenever you like. 17 | 18 | -------------------------------------------------------------------------------- /Meta/Presentations.md: -------------------------------------------------------------------------------- 1 | # Presentations 2 | # [Orchestrator Presentations](https://github.com/openark/orchestrator/blob/master/docs/presentations.md) 3 | This page lists presentations about Orchestrator made by different people. If you have something to add here please provide your name, the date and event/location of the presentation and a link to it and add details in reverse chronological order. 4 | 5 | |Date (yyyy-mm-dd)|Author|Presentation|Venue/Location| 6 | | ----- | ----- | ----- | ----- | 7 | |2017-10-03|Simon Mudd|[How to setup orchestrator to manage thousands of MySQL servers](https://www.slideshare.net/sjmudd/how-to-set-up-orchestrator-to-manage-thousands-of-mysql-servers)| | 8 | |2017-05-17|Simon Mudd|[MySQL Failover and Orchestrator](https://www.slideshare.net/sjmudd/mmug18-mysql-failover-and-orchestrator)|Madrid MySQL Users Group| 9 | |2017-04-25|Shlomi Noach|[Practical Orchestrator](https://www.percona.com/live/17/sites/default/files/slides/practical-orchestrator-pl-2017.pdf)|Percona Live 2017| 10 | |2017-04-06|Shlomi Noach|["MySQL High Availability tools" followup, the missing piece: orchestrator](http://code.openark.org/blog/mysql/mysql-high-availability-tools-followup-the-missing-piece-orchestrator)|Blog post| 11 | |2017-02-13|Simon Mudd|[Thoughts on setting up Orchestrator in a Production environment](http://blog.wl0.org/2017/02/thoughts-on-setting-up-orchestrator-in-a-production-environment/)|Blog post| 12 | |2016-03-08|Tibor Korocz and Kenny Gryp|[Orchestrator: MySQL Replication Topology Manager](https://www.percona.com/blog/2016/03/08/orchestrator-mysql-replication-topology-manager/)|Blog post| 13 | 14 | -------------------------------------------------------------------------------- /Setup/配置/Configuration Key-Value stores.md: -------------------------------------------------------------------------------- 1 | # Configuration: Key-Value stores 2 | # [Configuration: Key-Value stores](https://github.com/openark/orchestrator/blob/master/docs/configuration-kv.md) 3 | `orchestrator` 支持以下key-value存储: 4 | 5 | * 一个基于关系表的内部存储 6 | 7 | * [Consul](https://github.com/hashicorp/consul) 8 | * [ZooKeeper](https://zookeeper.apache.org/) 9 | `orchestrator` 通过将集群的 master 存储在 KV 中来支持 master 发现. 10 | 11 | ```yaml 12 | "KVClusterMasterPrefix": "mysql/master", 13 | "ConsulAddress": "127.0.0.1:8500", 14 | "ZkAddress": "srv-a,srv-b:12181,srv-c", 15 | "ConsulCrossDataCenterDistribution": true, 16 | ``` 17 | `KVClusterMasterPrefix` is the prefix to use for master discovery entries. 例如, 您的集群别名是 `mycluster` 并且主库主机名是 `some.host-17.com` 那么您会看到一个条目, 其中: 18 | 19 | * Key为`mysql/master/mycluster` 20 | * Value为`some.host-17.com:3306` 21 | 22 | 注意: 在 ZooKeeper 上, 键将自动以 `/` 为前缀. 23 | 24 | #### Breakdown entries 25 | 除了上述之外, `orchestrator` 还分解了主条目并添加了以下内容(通过上面的示例进行说明): 26 | 27 | * `mysql/master/mycluster/hostname`, value is `some.host-17.com` 28 | * `mysql/master/mycluster/port`, value is `3306` 29 | * `mysql/master/mycluster/ipv4`, value is `192.168.0.1` 30 | * `mysql/master/mycluster/ipv6`, value is `` 31 | 32 | `/hostname`、`/port`、`/ipv4` 和 `/ipv6` 扩展会自动添加到任何master entry. 33 | 34 | ### Stores 35 | 如果指定, `ConsulAddress`表示一个可以使用Consul HTTP服务的地址. 如果没有指定, 则不会尝试访问Consul. 36 | 37 | 如果指定, `ZkAddress`表示要连接的一个或多个ZooKeeper服务器. 每个服务器的默认端口是2181. 以下都是等效的: 38 | 39 | * srv-a,srv-b:12181,srv-c 40 | 41 | * srv-a,srv-b:12181,srv-c:2181 42 | * srv-a:2181,srv-b:12181,srv-c:2181 43 | 44 | ### Consul specific 45 | 关于Consul的具体设置, 见[Key-Value stores](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Key-Value%20stores.md) . 46 | -------------------------------------------------------------------------------- /Setup/配置/Configuration Backend.md: -------------------------------------------------------------------------------- 1 | # Configuration: Backend 2 | # [Configuration: backend](https://github.com/openark/orchestrator/blob/master/docs/configuration-backend.md) 3 | 让orchestrator知道在哪里可以找到后端数据库. 在此设置中, `orchestrator`将在端口:3000上提供HTTP服务 4 | 5 | ```yaml 6 | { 7 | "Debug": false, 8 | "ListenAddress": ":3000", 9 | } 10 | ``` 11 | 你可以选择`MySQL`后端或`SQLite`后端. 参见[Orchestrator高可用](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/Orchestrator高可用.md), 了解使用这两种方式的场景、可能性和原因。 12 | 13 | ## MySQL backend 14 | 你需要设置提供给`orchestrator` 使用的库名和用户密码: 15 | 16 | ```yaml 17 | { 18 | "MySQLOrchestratorHost": "orchestrator.backend.master.com", 19 | "MySQLOrchestratorPort": 3306, 20 | "MySQLOrchestratorDatabase": "orchestrator", 21 | "MySQLOrchestratorCredentialsConfigFile": "/etc/mysql/orchestrator-backend.cnf", 22 | } 23 | ``` 24 | `MySQLOrchestratorCredentialsConfigFile` 示例: 25 | 26 | ```Plain Text 27 | [client] 28 | user=orchestrator_srv 29 | password=${ORCHESTRATOR_PASSWORD} 30 | ``` 31 | 其中`user`或`password`可以是明文, 也可以是环境变量. 32 | 33 | 另外, 你可以选择在配置文件中使用纯文本形式的用户名和密码. 34 | 35 | ```yaml 36 | { 37 | "MySQLOrchestratorUser": "orchestrator_srv", 38 | "MySQLOrchestratorPassword": "orc_server_password", 39 | } 40 | ``` 41 | #### MySQL backend DB setup 42 | 对于MySQL后端数据库, 你将需要授予必要的权限. 43 | 44 | ```sql 45 | CREATE USER 'orchestrator_srv'@'orc_host' IDENTIFIED BY 'orc_server_password'; 46 | GRANT ALL ON orchestrator.* TO 'orchestrator_srv'@'orc_host'; 47 | ``` 48 | ## SQLite backend 49 | 默认使用的后端数据库是`MySQL`. 要设置使用`SQLite`, 请设置: 50 | 51 | ```yaml 52 | { 53 | "BackendDB": "sqlite", 54 | "SQLite3DataFile": "/var/lib/orchestrator/orchestrator.db", 55 | } 56 | ``` 57 | `SQLite`被嵌入到`orchestrator`中. 58 | 59 | > `SQLite` is embedded within `orchestrator`. 60 | 61 | 如果`SQLite3DataFile`参数指定的文件不存在, `orchestrator`将创建它. `orchestrator`需要对给定的路径/文件有写权限. 62 | -------------------------------------------------------------------------------- /Setup/配置/Configuration Failure detection.md: -------------------------------------------------------------------------------- 1 | # Configuration: Failure detection 2 | # [Configuration: failure detection](https://github.com/openark/orchestrator/blob/master/docs/configuration-failure-detection.md) 3 | `orchestrator`将始终检测您的拓扑结构的故障. 作为一个配置问题, 您可以设置轮询频率和具体方式, 以便`orchestrator`在检测到故障时通知您. 4 | 5 | >  [Failure detection](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Failure%20detection.md) 6 | 7 | 恢复将在[Configuration: Recovery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Recovery.md)中讨论 8 | 9 | ```yaml 10 | { 11 | "FailureDetectionPeriodBlockMinutes": 60, 12 | } 13 | ``` 14 | `orchestrator`每秒钟运行一次检测 15 | 16 | `FailureDetectionPeriodBlockMinutes`是一种反垃圾邮件机制, 它可以阻止`orchestrator`一次又一次地通知同一检测. 17 | 18 | ### Hooks 19 | 配置`orchestrator`以在发现时采取行动: 20 | 21 | ```yaml 22 | { 23 | "OnFailureDetectionProcesses": [ 24 | "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countReplicas}' >> /tmp/recovery.log" 25 | ], 26 | } 27 | ``` 28 | 有许多神奇的变量(如上面的`{failureCluster}`), 你可以发送给你的外部钩子. 完整列表请见[Topology recovery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Topology%20recovery.md) 29 | 30 | ### MySQL configuration 31 | 由于故障检测使用MySQL拓扑结构本身作为信息来源, 因此建议你设置你的MySQL复制, 以便错误将被清楚地显示或快速地缓解. 32 | 33 | * `set global slave_net_timeout = 4` 详见[文档](https://dev.mysql.com/doc/refman/5.7/en/replication-options-replica.html#sysvar_slave_net_timeout). 这在从库和它的主库之间设置了一个很短的(2秒)心跳间隔, 并将使从库迅速识别故障. 如果没有这个设置, 些情况下可能需要一分钟才能检测到. 34 | >  `MASTER_HEARTBEAT_PERIOD` The heartbeat interval defaults to half the value of the `slave_net_timeout` system variable 35 | * `CHANGE MASTER TO MASTER_CONNECT_RETRY=1, MASTER_RETRY_COUNT=86400` . 在复制失败的情况下, 让复制体每隔1秒尝试重新连接(默认为60秒). 在短暂的网络问题下, 该设置尝试快速恢复复制, 如果成功, 将避免`orchestrator`的运行failure/recovery操作. 36 | -------------------------------------------------------------------------------- /Various/Security.md: -------------------------------------------------------------------------------- 1 | # Security 2 | # [Security](https://github.com/openark/orchestrator/blob/master/docs/security.md) 3 | 在HTTP模式(API或Web)下操作时, 可以通过以下两种方式限制对`orchestrator`的访问: 4 | 5 | ## basic authentication 6 | 添加一下内容到`orchestrator` 配置文件: 7 | 8 | ```yaml 9 | "AuthenticationMethod": "basic", 10 | "HTTPAuthUser": "dba_team", 11 | "HTTPAuthPassword": "time_for_dinner" 12 | ``` 13 | 使用`basic` authentication(基本身份验证), 只有一个凭证, 没有角色 14 | 15 | `Orchestrator`的配置文件包含了你的MySQL服务器的凭证, 以及上面指定的基本认证凭证. 保持它的安全性(例如`chmod 600`). 16 | 17 | >  with `basic` authentication there's just one single credential, and no roles. 18 | 19 | ## basic authentication, extended 20 | 添加一下内容到`orchestrator` 配置文件: 21 | 22 | ```yaml 23 | "AuthenticationMethod": "multi", 24 | "HTTPAuthUser": "dba_team", 25 | "HTTPAuthPassword": "time_for_dinner" 26 | ``` 27 | `multi` authentication的工作方式与*basic authentication*类似, 但也接收用户使用`readonly`用户并指定任意密码连接. `readonly`用户被允许查看所有内容, 但不能通过API进行写操作(如停止复制、重新指定(repointing)复制、发现新实例等). 28 | 29 | ## Headers authentication 30 | 通过反向代理转发的header进行身份验证(例如,Apache2 将请求中继到协调器). 需要配置: 31 | 32 | ```yaml 33 | "AuthenticationMethod": "proxy", 34 | "AuthUserHeader": "X-Forwarded-User", 35 | ``` 36 | 你需要配置你的反向代理, 通过HTTP header发送认证用户的名字, 并使用与`AuthUserHeader`配置的相同的头名称. 37 | 38 | For example, an Apache2 setup may look like the following: 39 | 40 | ```yaml 41 | RequestHeader unset X-Forwarded-User 42 | RewriteEngine On 43 | RewriteCond %{LA-U:REMOTE_USER} (.+) 44 | RewriteRule .* - [E=RU:%1,NS] 45 | RequestHeader set X-Forwarded-User %{RU}e 46 | ``` 47 | 48 | 49 | `proxy` authentication允许角色. 一些用户是高级用户(*Power users*), 其余的只是普通用户. 高级用户可以更改拓扑, 而普通用户处于只读模式. 要指定已知 DBA 的列表, 请使用: 50 | 51 | ```yaml 52 | "PowerAuthUsers": [ 53 | "wallace", "gromit", "shaun" 54 | ], 55 | ``` 56 | 57 | 58 | 或者, 无论如何, 您可以通过以下方式将整个`orchestrator`进程变为只读: 59 | 60 | ```yaml 61 | "ReadOnly": "true", 62 | ``` 63 | 你可以将`ReadOnly`与你喜欢的任何认证方法结合起来. 64 | -------------------------------------------------------------------------------- /Various/Agents.md: -------------------------------------------------------------------------------- 1 | # Agents 2 | # [Agents](https://github.com/openark/orchestrator/blob/master/docs/agents.md) 3 | 你可以选择在你的MySQL主机上安装[orchestrator-agent](https://github.com/openark/orchestrator-agent). `orchestrator-agent`是一个服务, 它在你的`orchestrator`服务器上注册, 并通过Web API接受`orchestrator`的请求. 4 | 5 | 支持的请求与一般的、操作系统和LVM操作有关, 如: 6 | 7 | > Supported requests relate to general, OS and LVM operations, such as: 8 | 9 | * 在主机上关闭/启动MySQL. 10 | > Stopping/starting MySQL service on host 11 | * 获取MySQL操作系统信息, 如数据目录、端口、磁盘空间使用情况等. 12 | > Getting MySQL OS info such as data directory, port, disk space usage 13 | * 执行各种LVM操作, 如寻找LVM快照, 挂载/卸载快照. 14 | > Performing various LVM operations such as finding LVM snapshots, mounting/unmounting a snapshot 15 | * 在主机之间传输数据(如通过netcat). 16 | > Transferring data between hosts (e.g. via `netcat`) 17 | 18 | `orchestrator-agent`旨在持续努力解决主机的特定操作. 它最初是为了克服克隆和恢复问题而开发的, 后来又扩展到其他领域. 19 | 20 | 由`orchestrator-agent`暴露给`orchestrator`的信息和API允许`orchestrator`通过从最新的可用快照中获取数据来coordinate and operate seeding of new or corrupted machines. 此外, 它允许`orchestrator`自动为给定的MySQL机器建议数据来源, 方法是查找那些实际上有最近快照可用的主机, 最好是在同一数据中心. 21 | 22 | > The information and API exposed by `orchestrator-agent` to `orchestrator` allow `orchestrator` to coordinate and operate seeding of new or corrupted machines by getting data from freshly available snapshots. Moreover, it allows `orchestrator` to automatically suggest the source of data for a given MySQL machine, by looking up such hosts that actually have a recent snapshot available, preferably in the same datacenter. 23 | 24 | 为了安全起见, 除了最简单的请求, `orchestrator-agent`需要一个token来执行操作. 这个token由`orchestrator-agent`随机生成, 并与`orchestrator`协商. `orchestrator`不会暴露代理的令牌(现在需要做一些工作来掩盖错误信息中的令牌) 25 | 26 | > For security measures, an agent requires a token to operate all but the simplest requests. This token is randomly generated by the agent and negotiated with `orchestrator`. `orchestrator` does not expose the agent's token (right now some work needs to be done on obscuring the token on error messages). -------------------------------------------------------------------------------- /Meta/Supported Topologies and Versions.md: -------------------------------------------------------------------------------- 1 | # Supported Topologies and Versions 2 | # [Supported Topologies and Versions](https://github.com/openark/orchestrator/blob/master/docs/supported-topologies-and-versions.md) 3 | # Supported Topologies and Versions 4 | The following setups are supported by `orchestrator`: 5 | 6 | * Plain-old MySQL replication; the *classic* one, based on log file + position 7 | * GTID replication. Both Oracle GTID and MariaDB GTID are supported. 8 | * Statement based replication (SBR) 9 | * Row based replication (RBR) 10 | * Semi-sync replication 11 | * Single master (aka standard) replication 12 | * Master-Master (two node in circle) replication 13 | * 5.7 Parallel replication 14 | * When using GTID there's no further constraints. 15 | * When using Pseudo-GTID in-order-replication must be enabled (see [slave\_preserve\_commit\_order](http://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#sysvar_slave_preserve_commit_order)). 16 | 17 | The following setups are *unsupported*: 18 | 19 | * Master-master...-master (circular) replication with 3 or more nodes in ring. 20 | * 5.6 Parallel (thread per schema) replication 21 | * Multi master replication (one replica replicating from multiple masters) 22 | * Tungsten replicator 23 | 24 | Also note: 25 | 26 | Master-master (ring) replication is supported for two master nodes. Topologies of three master nodes or more in a ring are unsupported. 27 | 28 | Galera/XtraDB Cluster replication is not strictly supported: `orchestrator` will not recognize that co-masters in a Galera topology are related. Each such master would appear to `orchestrator` to be the head of its own distinct topology. 29 | 30 | 支持在同一主机上有多个MySQL实例的复制拓扑结构. 例如, `orchestrator`的测试环境是由四个实例组成的, 它们都运行在同一台机器上,由 MySQLSandbox 提供. 然而, MySQL在复制体和主站之间缺乏信息共享, 使得`orchestrator`无法自上而下地分析拓扑结构, 因为主站不知道其复制体在监听哪些端口. 默认的假设是, 复制体的监听端口与主站相同. 如果在一台机器上有多个实例(并且在同一个网络上), 这是不可能的. 在这种情况下, 你必须配置你的MySQL实例的`report_host`和`report_port`([The importance of report\_host & report\_port](http://code.openark.org/blog/mysql/the-importance-of-report_host-report_port))参数, 并将`orchestrator`的配置参数`DiscoverByShowSlaveHosts`设置为`true`. 31 | -------------------------------------------------------------------------------- /Developers/Understanding CI.md: -------------------------------------------------------------------------------- 1 | # Understanding CI 2 | # [CI tests](https://github.com/openark/orchestrator/blob/master/docs/ci.md) 3 | `orchestrator` 使用 [GitHub Actions](https://help.github.com/en/actions) 来运行以下CI tests: 4 | 5 | * Build (main): build, unit tests, integration tests, docs tests 6 | * Upgrade: test a successful upgrade path 7 | * System: run system tests backed by actual MySQL topology 8 | 9 | ## Build 10 | Running on pull requests, the [main CI](https://github.com/openark/orchestrator/blob/master/.github/workflows/main.yml) job validates the following: 11 | 12 | * 验证源代码是否使用`gofmt`格式化 13 | * Build passes 14 | * Unit tests pass 15 | * Integration tests pass 16 | * Using `SQLite` backend 17 | * Using `MySQL` backend Documentation tests pass: ensuring pages and links are not orphaned. 18 | 19 | The [action](https://github.com/openark/orchestrator/actions?query=workflow%3ACI) completes by producing an artifact: an `orchestrator` binary for Linux `amd64`. The artifact is kept for a couple months per GitHub Actions policy. 20 | 21 | ## Upgrade 22 | [Upgrade](https://github.com/openark/orchestrator/blob/master/.github/workflows/upgrade.yml) runs on pull requests, tests a successful upgrade path from previous version (ie `master`) to PR's branch. This mainly tests internal database structure changes. The test: 23 | 24 | * Checks out `master` and run `orchestrator`, once using `SQLite`, once using `MySQL` 25 | * Checks out `HEAD` (PR's branch) and run `orchestrator` using pre-existing `SQLite` and `MySQL` backends. Expect no error. 26 | 27 | ## System 28 | [System tests](https://github.com/openark/orchestrator/blob/master/.github/workflows/system.yml) run as a scheduled job. A system test: 29 | 30 | * Sets up a [CI environment](https://github.com/openark/orchestrator-ci-env) which includes: 31 | * A replication topology via [DBDeployer](https://www.dbdeployer.com/), with heartbeat injection 32 | * [HAProxy](http://www.haproxy.org/) 33 | * [Consul](https://www.consul.io/) 34 | * [consul-template](https://github.com/hashicorp/consul-template) 35 | * Deploys `orchestrator` as a service, `orchestrator-client` 36 | * Runs a series of tests where `orchestrator` operates on the topology, e.g. refactors or fails over. -------------------------------------------------------------------------------- /Introduction/About.md: -------------------------------------------------------------------------------- 1 | # About 2 | # [About Orchestrator](https://github.com/openark/orchestrator/blob/master/docs/about.md) 3 | `Orchestrator`是一个MySQL复制拓扑结构HA、管理和可视化工具, 可以实现: 4 | 5 | #### Discovery 6 | `orchestrator`会主动抓取你的拓扑结构并进行映射. 它读取基本的MySQL信息, 如复制状态和配置. 7 | 8 | 它为你的拓扑结构提供了灵活的可视化, 包括复制问题,甚至在面对故障时也是如此。 9 | 10 | #### Refactoring 11 | `orchestrator`了解复制规则. 它知道binlog file:position, GTID, Pseudo GTID, Binlog Servers. 12 | 13 | Refactoring replication topologies can be a matter of drag & drop a replica under another master. Moving replicas around becomes safe: `orchestrator` will reject an illegal refactoring attempt. 14 | 15 | 通过各种命令行选项来实现细化控制. 16 | 17 | > Find grained control is achieved by various command line options. 18 | 19 | #### Recovery 20 | `Orchestrator`使用一种整体方法来检测master和intermediate master的故障. 基于从拓扑结构本身获得的信息, 它能识别各种故障情况 21 | 22 | Configurable, it may choose to perform automated recovery (or allow the user to choose type of manual recovery). Intermediate master recovery achieved internally to `orchestrator`. Master failover supported by pre/post failure hooks. 23 | 24 | 恢复过程利用了`orchestrator`对拓扑结构的理解和执行重构的能力. 它基于状态而非配置: `orchestrator`通过在恢复时调查/评估拓扑结构本身来选择最佳恢复方法. 25 | 26 | > Recovery process utilizes *orchestrator's* understanding of the topology and of its ability to perform refactoring. It is based on *state* as opposed to *configuration*: `orchestrator` picks the best recovery method by investigating/evaluating the topology at the time of recovery itself. 27 | 28 | ![image](images/tJsg29aJqEXSv3HW8ghNgZ-T0SYyC_9Hd8BqHnFUXdI.png) 29 | 30 | #### Credits, attributions 31 | Authored by [Shlomi Noach](https://github.com/shlomi-noach) 32 | 33 | This project was originally initiated at [Outbrain](http://outbrain.com/), who were kind enough to release it as open source from its very beginning. We wish to recognize Outbrain for their support of open source software and for their further willingness to collaborate to this particular project's success. 34 | 35 | The project was later developed at [Booking.com](http://booking.com/) and the company was gracious enough to release further changes into the open source. 36 | 37 | At this time the project is being developed at [GitHub](http://github.com/). We will continue to keep it open and supported. 38 | 39 | The project accepts pull-requests and encourages them. Thank you for any improvement/assistance! 40 | 41 | Additional collaborators & contributors to this Wiki: 42 | 43 | * [grierj](https://github.com/grierj) 44 | * Other awesome people -------------------------------------------------------------------------------- /Setup/配置/Configuration Discovery, Pseudo-GTID.md: -------------------------------------------------------------------------------- 1 | # Configuration: Discovery, Pseudo-GTID 2 | # [Configuration: discovery, Pseudo-GTID](https://github.com/openark/orchestrator/blob/master/docs/configuration-discovery-pseudo-gtid.md#automated-pseudo-gtid-injection) 3 | `orchestrator`将通过创建Pseudo-GTID标识, 使其能够像拥有GTID一样操作非GTID拓扑结构, 包括重新定位复制, 智能故障转移等. 4 | 5 | > `orchestrator` will identify magic hints in the binary logs, making it able to manipulate a non-GTID topology as if it had GTID, including relocation of replicas, smart failovers end more. 6 | 7 | 另见[Pseudo GTID](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Pseudo%20GTID.md) 8 | 9 | ### Automated Pseudo-GTID injection 10 | `orchestrator` 可以为你注入Pseudo-GTID条目, 省去你的麻烦: 11 | 12 | ```yaml 13 | { 14 | "AutoPseudoGTID": true, 15 | } 16 | ``` 17 | 你可以忽略任何其他与Pseudo-GTID相关的配置(它们都将被`orchestrator`隐式覆盖). 18 | 19 | 你将进一步需要在你的MySQL上授予以下内容: 20 | 21 | ```sql 22 | GRANT DROP ON _pseudo_gtid_.* to 'orchestrator'@'orch_host'; 23 | ``` 24 | **NOTE**: 无需创建`_pseudo_gtid_` 库. `orchestrator` 只是通过运行下面形式的语句注入Pseudo-GTID: 25 | 26 | ```sql 27 | drop view if exists `_pseudo_gtid_`.`_asc:5a64a70e:00000001:c7b8154ff5c3c6d8` 28 | ``` 29 | 这些语句不会做什么, 但会作为二进制日志中的神奇标记. 30 | 31 | `orchestrator`将只在允许的地方尝试注入Pseudo-GTID. 如果你想把Pseudo-GTID注入限制在特定的集群上, 你可以这样做, 只在那些你希望`orchestrator`注入Pseudo-GTID的集群上授予该权限. 您可以通过以下方式在特定集群上禁用Pseudo-GTID注入. 32 | 33 | ```sql 34 | REVOKE DROP ON _pseudo_gtid_.* FROM 'orchestrator'@'orch_host'; 35 | ``` 36 | 自动Pseudo-GTID注入是一个较新开发的功能, 它取代了你运行自己的伪-GTID注入的需要. 37 | 38 | 如果你希望在运行手动Pseudo-GTID注入后启用自动Pseudo-GTID注入, 你会很高兴地注意到: 39 | 40 | * 你将不再需要管理pseudo-GTID服务/事件调度器. 41 | * 尤其是在主库故障切换时, 你不需要在旧的/晋升的主站上禁用/启用Pseudo-GTID 42 | 43 | ### Manual Pseudo-GTID injection 44 | 建议使用[Automated Pseudo-GTID injection](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Discovery%2C%20Pseudo-GTID.md#automated-pseudo-gtid-injection)方式. 45 | 46 | 如果你想自己注入Pseudo-GTID, 我们建议你应按以下方式进行配置: 47 | 48 | ```yaml 49 | { 50 | "PseudoGTIDPattern": "drop view if exists `meta`.`_pseudo_gtid_hint__asc:", 51 | "PseudoGTIDPatternIsFixedSubstring": true, 52 | "PseudoGTIDMonotonicHint": "asc:", 53 | "DetectPseudoGTIDQuery": "select count(*) as pseudo_gtid_exists from meta.pseudo_gtid_status where anchor = 1 and time_generated > now() - interval 2 hour", 54 | } 55 | ``` 56 | 上述假设是: 57 | 58 | * `orchestrator`管理的MySQL集群中有一个`meta` 数据库 59 | * 你将通过[这个示例脚本](https://github.com/openark/orchestrator/tree/master/resources/pseudo-gtid)注入Pseudo-GTID条目 60 | -------------------------------------------------------------------------------- /Use/orchestrator-client.md: -------------------------------------------------------------------------------- 1 | # orchestrator-client 2 | # [orchestrator-client](https://github.com/openark/orchestrator/blob/master/docs/orchestrator-client.md) 3 | [orchestrator-client](https://github.com/openark/orchestrator/blob/master/resources/bin/orchestrator-client)是一个脚本, 用方便的命令行界面包装API调用. 4 | 5 | 它可以自动确定`orchestrator`集群的leader, 并在这种情况下将所有请求转发给leader. 6 | 7 | 它非常接近于`orchestrator` command line interface. 8 | 9 | 使用`orchestrator-client`, 您: 10 | 11 | * 不需要到处安装`orchestrator`二进制文件; 仅在服务运行的主机上安装即可. 12 | * 不需要在所有地方部署`orchestrator`配置(配置文件); 只需要服务运行的主机上部署即可. 13 | * 不需要访问后端数据库 14 | * 需要访问 HTTP api 15 | * 需要设置 `ORCHESTRATOR_API` 环境变量. 16 | * 要么为代理提供单个端点, 例如 17 | 18 | ```bash 19 | export ORCHESTRATOR_API=https://orchestrator.myservice.com:3000/api 20 | ``` 21 | * 或者提供所有`orchestrator`端点, `orchestrator-client`将自动选择leader(不需要代理), 例如 22 | 23 | ```bash 24 | export ORCHESTRATOR_API="https://orchestrator.host1:3000/api https://orchestrator.host2:3000/api https://orchestrator.host3:3000/api" 25 | ``` 26 | * 您可以在 `/etc/profile.d/orchestrator-client.sh` 中设置环境变量. 如果此文件存在, 它将被 `orchestrator-client` 内联(应该是使用的意思, 就是`orchestrator-client` 脚本里使用了`/etc/profile.d/orchestrator-client.sh` ) 27 | >  You may set up the environment in `/etc/profile.d/orchestrator-client.sh`. If this file exists, it will be inlined by `orchestrator-client`. 28 | 29 | ### Sample usage 30 | 显示当前已知的集群(复制拓扑结构): 31 | 32 | ```bash 33 | orchestrator-client -c clusters 34 | ``` 35 | 发现, 遗忘一个实例: 36 | 37 | ```bash 38 | orchestrator-client -c discover -i 127.0.0.1:22987 39 | orchestrator-client -c forget -i 127.0.0.1:22987 40 | ``` 41 | 打印拓扑实例的ASCII树. 通过`-i`传递一个集群名称(见上面的集群命令): 42 | 43 | ```bash 44 | orchestrator-client -c topology -i 127.0.0.1:22987 45 | ``` 46 | > Sample output: 47 | 48 | ```python 49 | 127.0.0.1:22987 50 | + 127.0.0.1:22989 51 | + 127.0.0.1:22988 52 | + 127.0.0.1:22990 53 | ``` 54 | 在拓扑结构中移动副本: 55 | 56 | ```bash 57 | orchestrator-client -c relocate -i 127.0.0.1:22988 -d 127.0.0.1:22987 58 | ``` 59 | > Resulting topology: 60 | 61 | ```bash 62 | 127.0.0.1:22987 63 | + 127.0.0.1:22989 64 | + 127.0.0.1:22988 65 | + 127.0.0.1:22990 66 | ``` 67 | 等等. 68 | 69 | ### Behind the scenes 幕后花絮 70 | 命令行界面为API调用提供了一个很好的包装, 其输出被从JSON格式转换为文本格式. 71 | 72 | 例如, 命令: 73 | 74 | ```bash 75 | orchestrator-client -c discover -i 127.0.0.1:22987 76 | ``` 77 | 被转化为(为方便起见在此简化) 78 | 79 | > Translates to (simplified here for convenience): 80 | 81 | ```bash 82 | curl "$ORCHESTRATOR_API/discover/127.0.0.1/22987" | jq '.Details | .Key' 83 | ``` 84 | ### Meta commands 85 | * `orchestrator-client -c help` : 列出所有可用的命令. 86 | * `orchestrator-client -c which-api` : 输出`orchestrator-client`用来调用命令的API端点. 当通过`$ORCHESTRATOR_API`提供多个端点时, 这很有用. 87 | * `orchestrator-client -c api -path clusters` : 调用一个generic HTTP API call(in this case `clusters`)并返回原始的JSON响应. 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | -------------------------------------------------------------------------------- /Setup/配置/Configuration.md: -------------------------------------------------------------------------------- 1 | # Configuration 2 | # [Configuration](https://github.com/openark/orchestrator/blob/master/docs/configuration.md) 3 | 文档化和解释所有配置变量变成了一项艰巨的任务, 就像用文字来解释代码一样. 目前正在进行修剪和简化配置的工作. 4 | 5 | The de-facto configuration list is located in [config.go](https://github.com/openark/orchestrator/blob/master/go/config/config.go). 6 | 7 | > 配置解析的代码位于[config.go](https://github.com/openark/orchestrator/blob/master/go/config/config.go). 8 | 9 | 你无疑对配置一些基本组件感兴趣: 后端数据库、主机发现. 你可以选择使用Pseudo-GTID. 你可能希望`orchestrator`在故障时发出通知, 或者你可能希望运行全面的自动恢复. 10 | 11 | Use the following small steps to configure `orchestrator`: 12 | 13 | * [Configuration: Backend](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Backend.md) 14 | * [Configuration: Basic Discovery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Basic%20Discovery.md) 15 | * [Configuration: Discovery, name resolving](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Discovery%2C%20name%20resolving.md) 16 | * [Configuration: Discovery, classifying servers](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Discovery%2C%20classifying%20servers.md) 17 | * [Configuration: Discovery, Pseudo-GTID](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Discovery%2C%20Pseudo-GTID.md) 18 | * [Configuration: Topology control](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Topology%20control.md) 19 | * [Configuration: Failure detection](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Failure%20detection.md) 20 | * [Configuration: Recovery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Recovery.md) 21 | * [Configuration: Raft](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Raft.md) configure a[Orchestrator/raft, consensus cluster](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%83%A8%E7%BD%B2/Orchestrator%20raft%2C%20consensus%20cluster.md)for high availability 22 | 23 | * Security: See[Security](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Security.md)section. 24 | * [Configuration: Key-Value stores](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Key-Value%20stores.md)configure and use key-value stores for master discovery. 25 | * [Orchestrator configuration in larger environments](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Orchestrator%20configuration%20in%20larger%20environments.md) 26 | 27 | ### Configuration sample file 28 | 为方便起见, 这个[示例配置文件](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/%E7%A4%BA%E4%BE%8B%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6.md)是GitHub生产环境中使用的`orchestrator`配置. 29 | -------------------------------------------------------------------------------- /Setup/配置/Configuration Raft.md: -------------------------------------------------------------------------------- 1 | # Configuration: Raft 2 | # [Configuration: raft](https://github.com/openark/orchestrator/blob/master/docs/configuration-raft.md) 3 | 本文讲述如何配置一个[Orchestrator/raft, consensus cluster](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%83%A8%E7%BD%B2/Orchestrator%20raft%2C%20consensus%20cluster.md) 4 | 5 |  假设你要运行一个`3` 节点的`orchestrator/raft` 集群, 你将需要在每个节点进行以下配置: 6 | 7 | ```yaml 8 | "RaftEnabled": true, 9 | "RaftDataDir": "", 10 | "RaftBind": "", 11 | "DefaultRaftPort": 10008, 12 | "RaftNodes": [ 13 | "", 14 | "", 15 | "" 16 | ], 17 | ``` 18 | 详细说明: 19 | 20 | * `RaftEnabled` 必须设置为`true` , 否则`orchestrator` 将使用shared-backend模式. 21 | * `RaftDataDir` 必须设置为一个`orchestrator` 有写入权限的目录. 如果目录不存在`orchestrator` 会尝试创建目录. 22 | * `RaftBind` 必须设置, 使用本地主机的 IP 地址或完整主机名. 这个 IP 或主机名也将被列为 RaftNodes 变量之一( This IP or hostname will also be listed as one of the `RaftNodes` variable.) 23 | * `DefaultRaftPort` 可以设置为任何端口, 但必须在所有部署中保持一致. 24 | * `RaftNodes` 应该列出raft集群的所有节点. 这个列表将由IP地址或主机名组成, 并将包括`RaftBind`中提出的这个主机本身的值. 25 | 26 | 例如, 下面可能是一个生产环境配置: 27 | 28 | ```yaml 29 | "RaftEnabled": true, 30 | "RaftDataDir": "/var/lib/orchestrator", 31 | "RaftBind": "10.0.0.2", 32 | "DefaultRaftPort": 10008, 33 | "RaftNodes": [ 34 | "10.0.0.1", 35 | "10.0.0.2", 36 | "10.0.0.3" 37 | ], 38 | ``` 39 | 以及这个: 40 | 41 | ```yaml 42 | "RaftEnabled": true, 43 | "RaftDataDir": "/var/lib/orchestrator", 44 | "RaftBind": "node-full-hostname-2.here.com", 45 | "DefaultRaftPort": 10008, 46 | "RaftNodes": [ 47 | "node-full-hostname-1.here.com", 48 | "node-full-hostname-2.here.com", 49 | "node-full-hostname-3.here.com" 50 | ], 51 | ``` 52 | ### NAT, firewalls, routing 53 | 如果你的orchestrator/raft节点需要通过NAT网关进行通信, 你可以另外设置: 54 | 55 | * `"RaftAdvertise": ""` 56 | 57 | 为其他节点应该联系的 IP 或主机名. 否则其他节点会尝试与“RaftBind”地址通信并失败. 58 | 59 | raft节点将反向代理HTTP请求, `orchestrator`将尝试启发式地计算领导者的URL, 以重定向请求. 如果在NAT后面, 重新路由端口等( If behind NAT, rerouting ports etc.),`orchestrator`可能无法计算出该URL。你可以配置: 60 | 61 | * `"HTTPAdvertise": "scheme://hostname:port"` 62 | 63 | 明确指定节点(假设它是领导者)将通过 HTTP API 访问的位置. 例如, 您可以配置: `"HTTPAdvertise": "http://my.public.hostname:3000"` 64 | 65 | ### Backend DB 66 | raft模式支持`MySQL`或`SQLite`作为后端数据库. 详见[Configuration: Backend](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Backend.md). 阅读[Orchestrator高可用](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/Orchestrator高可用.md), 了解使用这两种方法的情景、可能性和理由. 67 | 68 | ### Single raft node setups 69 | 在生产中, 你会希望使用多个raft节点, 如`3`个或`5`个. 70 | 71 | 在测试环境中, 你可以运行一个由单个节点组成的orchestrator/raft集群. 这个节点将隐含地成为领导者, 并将向自己发布raft信息. 72 | 73 | 要运行一个单节点的`orchestrator/raft`, 配置一个空的`RaftNodes` . 74 | 75 | ```yaml 76 | "RaftNodes": [], 77 | ``` 78 | or, alternatively, specify a single node, which is identical to `RaftBind` or `RaftAdvertise`: 79 | 80 | ```yaml 81 | "RaftEnabled": true, 82 | "RaftBind": "127.0.0.1", 83 | "DefaultRaftPort": 10008, 84 | "RaftNodes": [ 85 | "127.0.0.1" 86 | ], 87 | ``` 88 | -------------------------------------------------------------------------------- /Developers/System test environment.md: -------------------------------------------------------------------------------- 1 | # System test environment 2 | # [CI environment](https://github.com/openark/orchestrator/blob/master/docs/ci-env.md) 3 | An ancillary project, [orchestrator-ci-env](https://github.com/openark/orchestrator-ci-env), provides a MySQL replication environment with which one may evaluate/test `orchestrator`. Use cases: 4 | 5 | * You want to check `orchestrator`'s behavior on a testing environment. 6 | * You want to test failover and master discovery. 7 | * You want to develop changes to `orchestrator` and require a reproducible environment. 8 | 9 | You may do all of the above if you already have some staging environment with a MySQL replication topology, or a [dbdeployer](https://www.dbdeployer.com/) setup, but `orchestrator-ci-env` offers a Docker-based environment, reproducible and dependency-free. 10 | 11 | `orchestrator-ci-env` is the same environment used in [system CI](https://github.com/openark/orchestrator/blob/master/docs/ci.md#system) and in [Dockerfile.system](https://github.com/openark/orchestrator/blob/master/docs/docker.md#run-full-ci-environment). 12 | 13 | # Setup 14 | Clone `orchestrator-ci-env` via SSH or HTTPS: 15 | 16 | ```bash 17 | $ git clone git@github.com:openark/orchestrator-ci-env.git 18 | ``` 19 | or 20 | 21 | ```bash 22 | $ git clone https://github.com/openark/orchestrator-ci-env.git 23 | ``` 24 | # Run environment 25 | Requirement: Docker. 26 | 27 | ```bash 28 | $ cd orchestrator-ci-env 29 | $ script/dock 30 | ``` 31 | This will build and run an environment which conists of: 32 | 33 | * A replication topology via [DBDeployer](https://www.dbdeployer.com/), with heartbeat injection 34 | * [HAProxy](http://www.haproxy.org/) 35 | * [Consul](https://www.consul.io/) 36 | * [consul-template](https://github.com/hashicorp/consul-template) 37 | 38 | Docker will expose these ports: 39 | 40 | * `10111`, `10112`, `10113`, `10114`: MySQL hosts in a replication topology. Initially, `10111` is the master. 41 | * `13306`: exposed by HAProxy and routed to current MySQL topology master. 42 | 43 | # Run orchestrator with environment 44 | Assuming `orchestrator` is built into `bin/orchestrator` (`./script/build` if not): 45 | 46 | ```bash 47 | $ bin/orchestrator --config=conf/orchestrator-ci-env.conf.json --debug http 48 | ``` 49 | `conf/orchestrator-ci-env.conf.json` is designed to work with `orchestrator-ci-env`. 50 | 51 | You may choose to change the value of `SQLite3DataFile`, which is by default on `/tmp`. 52 | 53 | # Running system tests with environment 54 | While `orchestrator` is running (see above), open another terminal in `orchestrator`'s repo path. 55 | 56 | Run: 57 | 58 | ```bash 59 | $ ./tests/system/test.sh 60 | ``` 61 | for all tests, or 62 | 63 | ```bash 64 | $ ./tests/system/test.sh 65 | ``` 66 | for a specific test, e.g. `./tests/system/test.sh relocate-single` 67 | 68 | Destructive tests (e.g. a failover) require a full rebuild of the replication topology. The system tests CI runs both orchestrator and the ci-env together, and the tests can instruct the ci-env to rebuild replication. However, if you run ci-env on a local docker, your tests cannot instruct a replication rebuild. You will need to manually run `./script/deploy-replication` on your ci-env container at the end of a destructive test. -------------------------------------------------------------------------------- /Developers/Building and testing.md: -------------------------------------------------------------------------------- 1 | # Building and testing 2 | # [Building and testing](https://github.com/openark/orchestrator/blob/master/docs/build.md) 3 | Developers have multiple ways to build and test `orchestrator`. 4 | 5 | * Using GitHub's CI, no development environment needed 6 | * Using Docker 7 | * Build locally on dev machine 8 | 9 | ## Build and test via GitHub CI 10 | `orchestrator`'s' [CI Build](https://github.com/openark/orchestrator/blob/master/docs/ci.md) will: 11 | 12 | * build 13 | * test (unit, integration) 14 | * upload an artifact: an `orchestrator` binary compatible with Linux `amd64` 15 | 16 | The artifact is attached in the build's output, and valid for a couple months per GitHub Actions policy. 17 | 18 | This way, a developer only needs to `git checkout/commit/push` and does not require any development environment on their computer. Once CI completes (successfully), the developer may download the binary artifact to test on a Linux environment. 19 | 20 | ## Build and test via Docker 21 | Requirements: a docker installation. 22 | 23 | `orchestrator` provides [various docker builds](https://github.com/openark/orchestrator/blob/master/docs/docker.md). For developers: 24 | 25 | * run `script/dock alpine` to build and run `orchestrator` service 26 | * run `script/dock test` to build `orchestrator`, run unit tests, integration tests, documentation tests 27 | * run `script/dock pkg` to build `orchestrator` and create distribution packages (`.deb/.rpm/.tgz`) 28 | * run `script/dock system` to build and launch a full CI environment which includes a MySQL topology, HAProxy, Consul, consul-template and `orchestrator` running as a service. 29 | 30 | ## Build and test on dev machine 31 | Requirements: 32 | 33 | * `go` development setup (at this time `go1.12` or above required) 34 | * `git` 35 | * `gcc` (required to build `SQLite` as part of the `orchestrator` binary) 36 | * Linux, BSD or MacOS 37 | 38 | Run: 39 | 40 | ```bash 41 | git clone git@github.com:openark/orchestrator.git 42 | cd orchestrator 43 | ``` 44 | ### Build 45 | Build via: 46 | 47 | ```Plain Text 48 | ./script/build 49 | ``` 50 | This takes care of `GOPATH` and various other considerations. 51 | 52 | Alternatively, if you like and if your Go environment is setup, you may run: 53 | 54 | ```Plain Text 55 | go build -o bin/orchestrator -i go/cmd/orchestrator/main.go 56 | ``` 57 | ### Run 58 | Find artifacts under `bin/` directory and e.g. run: 59 | 60 | ```Plain Text 61 | bin/orchestrator --debug http 62 | ``` 63 | ### Setup backend DB 64 | If running with SQLite backend, no DB setup is needed. The rest of this section assumes you have a MySQL backend. 65 | 66 | For `orchestrator` to detect your replication topologies, it must also have an account on each and every topology. At this stage this has to be the same account (same user, same password) for all topologies. On each of your masters, issue the following: 67 | 68 | ```Plain Text 69 | CREATE USER 'orc_user'@'%' IDENTIFIED BY 'orc_password'; 70 | GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orc_user'@'%'; 71 | ``` 72 | Replace `%` with a specific hostname/`127.0.0.1`/subnet. Choose your password wisely. Edit `orchestrator.conf.json` to match: 73 | 74 | ```Plain Text 75 | "MySQLTopologyUser": "orc_user", 76 | "MySQLTopologyPassword": "orc_password", 77 | ``` 78 | -------------------------------------------------------------------------------- /Use/Using the Web interface.md: -------------------------------------------------------------------------------- 1 | # Using the Web interface 2 | # [Using the Web interface](https://github.com/openark/orchestrator/blob/master/docs/using-the-web-interface.md) 3 | 以下假设您已作为 Web/API 服务执行. 打开浏览器并将其定向到 `http://your.host:3000`. 如果一切顺利, 您应该会看到以下欢迎页面: 4 | 5 | ![image](images/r8ax_PwFzQ27WFzNrKfqCUs1KxnsTNgXeKmR5unT16E.png) 6 | 7 | 如果这是你第一次使用orchestrator, 那么你应该从教它开始. `orchestrator`需要知道你有哪些复制拓扑结构. Web界面通过`discover`页面提供了这一点. 8 | 9 | 在每个复制拓扑中, 选择一个server(可以是master或slave), 并让`orchestrator`知道这个server侦听的hostname & port. `Orchestrator`将递归地向上和向下钻取复制以映射整个拓扑. 这可能需要几分钟的时间, 在此期间, `orchestrator`会将它遇到的服务器连接到子拓扑中, 并最终形成最终的拓扑结构. 10 | 11 | 您可以手动输入任意数量的服务器(在拓扑结构内部或外部). `orchestrator`第一次调查时, 它只能接触到当前正在复制的那些副本. 因此, 如果你知道你有一些复制是暂时关闭的, 你需要手动添加它们, 或者, 如果你喜欢看到自动化的工作, 就等着它们起来, 那时`orchestrator`会自动找到它们. 12 | 13 | >  一旦`orchestrator`熟悉了一个server, 它就不会关心该服务器是否滞后、不复制或无法访问了。该服务器仍然是它最后被看到的拓扑结构的一部分。这方面有一个超时:如果一个服务器在UnseenInstanceForgetHours小时内没有被看到,它就会被自动遗忘(推定为死亡)。同样,如果它突然复活,并连接到一个已知的拓扑结构,它会被自动重新发现。 14 | 15 | `Orchestrator` 对它得到的每个输入的CNAME进行解析, 无论是从用户还是从复制拓扑结构本身. 这是为了避免歧义或隐含的重复 16 | 17 | >  `Orchestrator` resolves the `CNAME` of every input it gets, either from the user or from the replication topology itself. This is for avoiding ambiguities or implicit duplicates. 18 | 19 | ![image](images/ccKLzksN_qAQCjBYluO_EddYqLi9xOLKblrT_leE96Q.png) 20 | 21 | 一旦`orchestrator`熟悉了一个拓扑结构, 你就可以通过集群页面查看和操作它. 点击导航栏上的群集下拉菜单, 可以看到可用的群集. 22 | 23 | >  每个拓扑结构都与一个集群名称相关联, 该名称(目前)以拓扑结构的主站命名 24 | 25 | 集群页面是最有趣的地方. `Orchestrator`以一个基于D3小工具的、易于理解的树状信息图来展示集群. 子树是可折叠的 26 | 27 | 树中的每个节点都呈现一个单一的MySQL实例, 列出其fully qualified name、其版本、二进制日志格式和复制延迟. 28 | 29 | ![image](images/tJsg29aJqEXSv3HW8ghNgZ-T0SYyC_9Hd8BqHnFUXdI.png) 30 | 31 | 请注意, 每个服务器的右边都有一个设置图标. 点击这个图标可以打开一个模式, 里面有关于该服务器的一些额外信息以及要执行的操作 32 | 33 | 该模式允许你开始/结束实例的维护模式; 执行立即刷新(默认情况下, 实例每分钟轮询一次 -- 这是可以配置的); 停止/启动复制; 忘记实例(如果仍然连接到拓扑结构, 一分钟后可能被重新发现) 34 | 35 | ![image](images/WD9ZaVcXFXm-7PL1VL_Vldi41C0dtoK8H33GyrT9tu8.png) 36 | 37 | 拓扑结构可以被重构: 复制体可以通过拖放来移动. 开始拖动一个实例: 所有可能的可拖动目标立即被染成绿色. 你可以把你的实例变成所有可拖放目标的副本. 38 | 39 | Master-master 拓扑可以通过将一个 master 拖到它的一个副本上来创建, 使两个节点都成为共同的主节点. 40 | 41 | 复杂的重构是通过执行多个这样的步骤来完成的. 您可能需要将实例拖放三到四次才能将其放置在“远程”位置. 42 | 43 | >  Complex refactoring is done by performing multiple such steps. You may need to drag and drop your instance three or four times to put it in a "remote" location. 44 | 45 | 当实例或其目标主机出现问题(延迟过多、不要复制等)时, `Orchestrator`将禁止删除实例, 从而保证您的安全. 如果它发现更深层次的障碍, 例如目标没有二进制日志, 它可能会允许drop操作, 但仍然会中止操作. 46 | 47 | Begin dragging: possible targets colored green 48 | 49 | ![image](images/AMoy_xel0fmOGGie-Yovh22wfFE1HMvILpjBlPTdG10.png) 50 | 51 | Move over your target and drop: 52 | 53 | ![image](images/RiTdDpAptLqkCXB_QEB8_XgHDots9ZHUR0ml66qqS4o.png) 54 | 55 | 拓扑结构得到重构 56 | 57 | ![image](images/eJMLBMBa6c9nrx7pEClTcZH3v_fK0atJZ5T7uqea5NI.png) 58 | 59 | 将一个主站拖到它的副本上, 就会形成一个共双主的拓扑结构 60 | 61 | ![image](images/YBOmFgVN1lZNFBacxg8J-0gO7A6rPB19tWMKRbu5xjQ.png) 62 | 63 | 双主拓扑 64 | 65 | ![image](images/M7imHf5DWca_GwUDPYla03IrZbhHmR70clfWgGWmXEo.png) 66 | 67 | `Orchestrator`直观地显示复制和可访问性相关的问题: 复制滞后、复制不工作、实例长时间不被访问、实例访问失败、实例正在维护 68 | 69 | ![image](images/Um6tPmjf6hlkCm40jqC04F3t9dRW3So0LejypEaDqRk.png) 70 | 71 | *Problems*下拉菜单在所有页面上都可用, 并显示所有拓扑结构中的所有当前已知问题: 72 | 73 | ![image](images/kIu0Bg6jvv5_I9XCueAbIXwn4TRcF7NiDLr1N4iDuyw.png) 74 | 75 | `Audit`审计页面显示了通过`orchestrator`采取的所有行动: 副本移动、检测、维护等. (START SLAVE和STOP SLAVE目前没有被审计). 76 | 77 | ![image](images/x-ogK7jdiwx7WHFKvU3g4DNBeof9eGaM2Ii-9RXEnYA.png) 78 | 79 | -------------------------------------------------------------------------------- /Various/Pseudo GTID.md: -------------------------------------------------------------------------------- 1 | # Pseudo GTID 2 | # [Pseudo GTID](https://github.com/openark/orchestrator/blob/master/docs/pseudo-gtid.md) 3 | Pseudo GTID is the method of injecting unique entries into the binary logs, such that they can be used to match/sync replicas without direct connection, or replicas whose master is corrupted/dead. 4 | 5 | Pseudo-GTID is attractive to users not using GTID. Pseudo-GTID has most of GTID's benefits, but without making the commitment GTID requires. With Pseudo-GTID you can keep your existing topologies, whichever version of MySQL you're running. 6 | 7 | ### Advantages of Pseudo-GTID 8 | * Enable master failovers. 9 | * Enable intermediate master failovers. 10 | * Arbitrary refactoring, relocating replicas from one place to another (even those replicas that don't have binary logging). 11 | * Vendor neutral; works on both Oracle and MariaDB, even both combined. 12 | * No configuration changes. Your replication setup remains as it is. 13 | * No commitment. You can choose to move away from Pseudo-GTID at any time; just stop writing P-GTID entries. 14 | * Pseudo-GTID implies crash-safe replication for replicas running with: 15 | 16 | ```sql 17 | log-slave-updates 18 | ``` 19 | ```sql 20 | sync_binlog=1 21 | ``` 22 | * As opposed to GTID on MySQL `5.6`, servers don't *have to* run with `log-slave-updates`, though `log-slave-updates` is recommended. 23 | 24 | ### Automated Pseudo-GTID injection 25 | `orchestrator` can inject Pseudo-GTID entries for you. See [Automated Pseudo-GTID](https://github.com/openark/orchestrator/blob/master/docs/configuration-discovery-pseudo-gtid.md#automated-pseudo-gtid-injection) 26 | 27 | ### Manual Pseudo-GTID injection 28 | Automated Pseudo-GTID is a later addition which supersedes the need for manual Pseudo-GTID injection, and is recommended. However, you may still choose to inject your own Pseudo-GTID. 29 | 30 | See [Manual Pseudo-GTID injection](https://github.com/openark/orchestrator/blob/master/docs/pseudo-gtid-manual-injection.md) 31 | 32 | ### Limitations 33 | * Active-Active master-master replication not supported 34 | * Active-passive master-master replication, where Pseudo-GTID is injected on the active master only, *is supported*.\* Replicas that don't run `log-slave-updates` are synced via relay logs. MySQL's default aggressive purging of relay logs implies that if a crash happens on a master, and a replica's relay logs have just been rotated (i.e. immediately also purged), then there's no Pseudo-GTID info in the relay logs to use for healing the topology 35 | * Frequent injections of P-GTID mitigate this problem. We inject P-GTID every `5sec`.\* When a replica reads Statement Based Replication relay logs and relays Row Based Replication binary logs (i.e. master has `binlog_format=STATEMENT` and replica has `binlog_format=ROW`), then `orchestrator` matches Pseudo-GTID via relay logs. See the above bullet for limitations on relay logs. 36 | * You cannot match two servers where one is fully RBR (receives and writes Row Based Replication logs) and the other is fully SBR. Such scenario can happen when migrating from SBR based topology to RBR topology. 37 | * An edge case scenario is known when replicating from `5.6` to `5.7`: `5.7` adds `ANONYMOUS` statements to the binary logs, which `orchestrator` knows how to skip. However if `5.6`->`5.7` replication breaks (e.g. dead master) and an `ANONYMOUS` statement is the last statement in the binary log, `orchestrator` is unable at this time to align the servers. 38 | 39 | ### Deploying Pseudo-GTID 40 | Please follow [deployment, Pseudo-GTID](https://github.com/openark/orchestrator/blob/master/docs/deployment.md#pseudo-gtid). 41 | 42 | #### Using Pseudo GTID 43 | Via web: 44 | 45 | ![image](images/oNJrGmL9ujg6O1bMUvXmoXIZ_ef2u_OjI2SRl2EUMQg.png) 46 | 47 | 48 | 49 | Via command line: 50 | 51 | ```bash 52 | orchestrator-client -c relocate -i some.server.to.relocate -d under.some.other.server 53 | ``` 54 | The `relocate` command will auto-identify that Pseudo-GTID is enabled. -------------------------------------------------------------------------------- /Developers/Docker.md: -------------------------------------------------------------------------------- 1 | # Docker 2 | # [Docker](https://github.com/openark/orchestrator/blob/master/docs/docker.md) 3 | Multiple Dockerfiles are available, to: 4 | 5 | * Build and test `orchestrator` 6 | * Create distribution files 7 | * Run a minimal `orchestrator` daemon 8 | * Run a 3-node raft setup 9 | * Run a full blown CI environment 10 | 11 | `script/dock` is a convenience script to build/spawn each of these docker images. 12 | 13 | ## Build and test 14 | First, it should be noted that you can let GitHub Actions do all the work for you: it will build, put through testing, and generate an artifact, an `orchestrator` binary for Linux `amd64` for you, all from GitHub's platform. You do not strictly need Docker nor a development environment on your computer. See [CI](ci.md). 15 | 16 | If you wish to build and test on your host, but do not want to set up a development environment, use: 17 | 18 | ```bash 19 | $ script/dock test 20 | ``` 21 | This will use `docker/Dockerfile.test` to build, unit test, integration test, run doc validation on your behalf. 22 | 23 | ## Build and run 24 | Run this command: 25 | 26 | ```bash 27 | $ script/dock alpine 28 | ``` 29 | which uses `docker/Dockerfile` to build `orchestrator` on an Alpine Linux, and run the service. Docker will map port `:3000` onto your machine, you may browse onto `http://127.0.0.1:3000` to access the orchestrator web interface. 30 | 31 | The following environment variables are available and take effect if no config 32 | file is bind mounted into container at `/etc/orchestrator.conf.json` 33 | 34 | * `ORC_TOPOLOGY_USER`: defaults to `orchestrator` 35 | * `ORC_TOPOLOGY_PASSWORD`: defaults to `orchestrator` 36 | * `ORC_DB_HOST`: defaults to `db` 37 | * `ORC_DB_PORT`: defaults to `3306` 38 | * `ORC_DB_NAME`: defaults to `orchestrator` 39 | * `ORC_USER`: defaults to `orc_server_user` 40 | * `ORC_PASSWORD`: defaults to `orc_server_password` 41 | 42 | To set these variables you could add these to an environment file where you add them like `key=value` (one pair per line). You can then pass this enviroment file to the docker command adding `--env-file=path/to/env-file` to the `docker run` command. 43 | 44 | ## Create package files 45 | Run this command: 46 | 47 | ```bash 48 | $ script/dock pkg 49 | ``` 50 | To create (via `fpm`) release packages: 51 | 52 | * `.deb` 53 | * `.rpm` 54 | * `.tgz` 55 | 56 | for Linux `amd64`, with `Systemd` or `SysVinit`, all binaries or just client scripts. It uses the same methods as used for [official releases](https://github.com/openark/orchestrator/releases). 57 | 58 | Uses `Dockerfile.packaging` 59 | 60 | ## Run full CI environment 61 | Execute: 62 | 63 | ```Plain Text 64 | $ script/dock system 65 | ``` 66 | to run a full blown environment (see [ci-env.md](ci-env.md)), consisting of: 67 | 68 | * MySQL replication topology (via `dbdeployer`) with heartbeat injection 69 | * `orchestrator` as a service 70 | * `HAProxy` 71 | * `Consul` 72 | * `consul-template` 73 | 74 | All wired to work together. It's a good playground for testing `orchestrator`'s functionality. 75 | 76 | Tips: 77 | 78 | * port `13306` routes to current topology master 79 | * MySQL topology available on ports `10111, 10112, 10113, 10114` 80 | * Connect to MySQL with user: `ci`, password: `ci`. e.g.: 81 | `mysqladmin -uci -pci -h 127.0.0.1 --port 13306 processlist` 82 | * Use `redeploy-ci-env` to re-create the MySQL topology, and recreate and restart the heartbeat, consul, consul-template and haproxy services. This resets the services to their original state. 83 | 84 | Uses `Dockerfile.system` 85 | 86 | ## Run a raft setup 87 | Execute: 88 | 89 | ```Plain Text 90 | $ script/dock raft 91 | ``` 92 | This will spin three `orchestrator` services: 93 | 94 | 1. Listens on `http://127.0.0.1:3007`, advertising raft on `127.0.0.1:10007` 95 | 2. Listens on `http://127.0.0.1:3008`, advertising raft on `127.0.0.1:10008` 96 | 3. Listens on `http://127.0.0.1:3009`, advertising raft on `127.0.0.1:10009` 97 | 98 | `orchestrator-client` is configured to connect to any of the nodes. -------------------------------------------------------------------------------- /Various/SSL and TLS.md: -------------------------------------------------------------------------------- 1 | # SSL and TLS 2 | # [SSL and TLS](https://github.com/openark/orchestrator/blob/master/docs/ssl-and-tls.md) 3 | Orchestrator supports SSL/TLS for the web interface as HTTPS. This can be standard server side certificates or you can configure Orchestrator to validate and filter client provided certificates with Mutual TLS. 4 | 5 | Orchestrator also allows for the use of certificates to authenticate with MySQL. 6 | 7 | If MySQL is using SSL encryption for replication, Orchestrator will attempt to configure replication with SSL during recovery. 8 | 9 | #### HTTPS for the Web/API interface 10 | You can set up SSL/TLS protection like so: 11 | 12 | ```json 13 | { 14 | "UseSSL": true, 15 | "SSLPrivateKeyFile": "PATH_TO_CERT/orchestrator.key", 16 | "SSLCertFile": "PATH_TO_CERT/orchestrator.crt", 17 | "SSLCAFile": "PATH_TO_CERT/ca.pem", 18 | } 19 | ``` 20 | The SSLCAFile is optional if you don't need to specify your certificate authority. This will enable SSL via the web interface (and API) so that communications are encrypted, like a normal HTTPS web page. 21 | 22 | You can, similarly, set this up for the Agent API if you're using the `Orchestrator Agent` with: 23 | 24 | ```json 25 | { 26 | "AgentsUseSSL": true, 27 | "AgentSSLPrivateKeyFile": "PATH_TO_CERT/orchestrator.key", 28 | "AgentSSLCertFile": "PATH_TO_CERT/orchestrator.crt", 29 | "AgentSSLCAFile": "PATH_TO_CERT/ca.pem", 30 | } 31 | ``` 32 | This can be the same SSL certificate, but it doesn't have to be. 33 | 34 | #### Mutual TLS 35 | It also supports the concept of Mutual TLS. That is, certificates that must be presented and valid for the client as well as the server. This is frequently used to protect service to service communication in an internal network. The certificates are commonly signed from an internal root certificate. 36 | 37 | In this case the certificates must 1) be valid and 2) be for the correct service. The correct service is dictated by filtering on the Organizational Unit (OU) of the client certificate. 38 | 39 | *Setting up a private root CA is not a trivial task. It is beyond the scope of these documents to instruct how to successfully accomplish it* 40 | 41 | With that in mind, you can set up Mutual TLS by setting up SSL as above, but also add the following directives: 42 | 43 | ```json 44 | { 45 | "UseMutualTLS": true, 46 | "SSLValidOUs": [ "service1", "service2" ], 47 | } 48 | ``` 49 | This will turn on client certificate verification and start filtering clients based on their OU. OU filtering is mandatory as it's pointless to use Mutual TLS without it. In this case, `service1` and `service2` would be able to connect to Orchestrator assuming their certificate was valid and they had an OU with that exact service name. 50 | 51 | #### MySQL Authentication 52 | You can also use client certificates to authenticate, or just encrypt, you mysql connection. You can encrypt the connection to the MySQL server `Orchestrator` uses with: 53 | 54 | ```json 55 | { 56 | "MySQLOrchestratorUseMutualTLS": true, 57 | "MySQLOrchestratorSSLSkipVerify": true, 58 | "MySQLOrchestratorSSLPrivateKeyFile": "PATH_TO_CERT/orchestrator-database.key", 59 | "MySQLOrchestratorSSLCertFile": "PATH_TO_CERT/orchestrator-database.crt", 60 | "MySQLOrchestratorSSLCAFile": "PATH_TO_CERT/ca.pem", 61 | } 62 | ``` 63 | Similarly the connections to the topology databases can be encrypted with: 64 | 65 | ```json 66 | { 67 | "MySQLTopologyUseMutualTLS": true, 68 | "MySQLTopologySSLSkipVerify": true, 69 | "MySQLTopologySSLPrivateKeyFile": "PATH_TO_CERT/orchestrator-database.key", 70 | "MySQLTopologySSLCertFile": "PATH_TO_CERT/orchestrator-database.crt", 71 | "MySQLTopologySSLCAFile": "PATH_TO_CERT/ca.pem", 72 | } 73 | ``` 74 | In this case all of your topology servers must respond to the certificates provided. There's no current method to have TLS enabled only for some servers. 75 | 76 | #### MySQL SSL Replication 77 | If Orchestrator is able to configure the failed Source to replicate to the newly promoted Source during recovery, it will attempt to configure `Master_SSL=1` if the newly promoted Source was configured that way. 78 | 79 | Orchestrator currently does not handle configuring Source SSL certificates for replication during recovery. -------------------------------------------------------------------------------- /Setup/部署/安装-Installation.md: -------------------------------------------------------------------------------- 1 | # 安装-Installation 2 | # [Installation](https://github.com/openark/orchestrator/blob/master/docs/install.md) 3 | 关于生产环境部署, 见[在生产环境中部署Orchestrator](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/在生产环境中部署Orchestrator.md). 下面的文字将引导你通过手动方式安装和必要的配置来使其工作. 4 | 5 | 以下内容假设您将使用同一台机器来运行`orchestrator`和后端MySQL数据库. 如果不是, 请用适当的主机名替换`127.0.0.1`. 将`orch_backend_password`替换为您自己的密码. 6 | 7 | #### Extract orchestrator binary and files 8 | * Extract from tarball 9 | 从[https://github.com/openark/orchestrator/releases](https://github.com/openark/orchestrator/releases) 下载tar包. 例如, 假设你想在`/usr/local/orchestrator`下安装`orchestrator` : 10 | 11 | ```bash 12 | sudo mkdir -p /usr/local 13 | sudo cd /usr/local 14 | sudo tar xzfv orchestrator-1.0.tar.gz 15 | ``` 16 | * Install from `RPM` 17 | 会安装到`/usr/local/orchestrator` 18 | 19 | ```bash 20 | sudo rpm -i orchestrator-1.0-1.x86_64.rpm 21 | ``` 22 | * Install from `DEB` 23 | 会安装到`/usr/local/orchestrator` 24 | 25 | ```bash 26 | sudo dpkg -i orchestrator_1.0_amd64.deb 27 | ``` 28 | * Install from repository 29 | `orchestrator` packages can be found in [https://packagecloud.io/github/orchestrator](https://packagecloud.io/github/orchestrator) 30 | 31 | 32 | 33 | #### Setup backend MySQL server 34 | 设置一个MySQL作为后端, 执行以下命令: 35 | 36 | ```sql 37 | CREATE DATABASE IF NOT EXISTS orchestrator; 38 | CREATE USER 'orchestrator'@'127.0.0.1' IDENTIFIED BY 'orch_backend_password'; 39 | GRANT ALL PRIVILEGES ON `orchestrator`.* TO 'orchestrator'@'127.0.0.1'; 40 | ``` 41 | `Orchestrator`使用一个配置文件, 位于`/etc/orchestrator.conf.json`或`/${orchestrator软件路径}/conf/orchestrator.conf.json` 或 `./orchestrator.conf.json`. 42 | > 除非显式指定-config参数, 否则orchestrator会在以上位置找按顺序读取配置文件: `config.Read("/etc/orchestrator.conf.json", "conf/orchestrator.conf.json", "orchestrator.conf.json")` . 会读取所有文件(无论有没有), 后面的配置文件会覆盖前面配置文件中的参数配置 43 | 44 | 提示: 安装的软件包包括一个名为`orchestrator.conf.json.sample`的文件, 其中有一些基本设置, 你可以将其作为`orchestrator.conf.json`的基线. 它可以在`/usr/local/orchestrator/orchestrator-sample.conf.json`中找到, 你也可以找到`/usr/local/orchestrator/orchestrator-sample-sqlite.conf.json`, 它有一个面向SQLite的配置. 这些样本文件也可以在[orchestrator资源库](https://github.com/openark/orchestrator/tree/master/conf)中找到. 45 | 46 | Edit `orchestrator.conf.json` to match the above as follows: 47 | 48 | ```yaml 49 | ... 50 | "MySQLOrchestratorHost": "127.0.0.1", 51 | "MySQLOrchestratorPort": 3306, 52 | "MySQLOrchestratorDatabase": "orchestrator", 53 | "MySQLOrchestratorUser": "orchestrator", 54 | "MySQLOrchestratorPassword": "orch_backend_password", 55 | ... 56 | ``` 57 | #### Grant access to orchestrator on all your MySQL servers 58 | 为了使`orchestrator`能够检测到你的复制拓扑结构, 还必须在每个拓扑结构上拥有一个账户. 所有的拓扑结构都必须是同一个账户(同一个用户,同一个密码). 在你的每个主库上, 发布以下内容 59 | 60 | ```sql 61 | CREATE USER 'orchestrator'@'orch_host' IDENTIFIED BY 'orch_topology_password'; 62 | GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'orch_host'; 63 | GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'orch_host'; 64 | GRANT SELECT ON ndbinfo.processes TO 'orchestrator'@'orch_host'; -- Only for NDB Cluster 65 | ``` 66 | > `REPLICATION SLAVE` 对于执行`SHOW SLAVE HOSTS` 和 扫描二进制日志以支持[Pseudo GTID](Various/Pseudo%20GTID.md)是必须的, 67 | 68 | > `RELOAD` 对于执行`RESET SLAVE` 是必须的 69 | 70 | > `PROCESS` 对于执行`SHOW PROCESSLIST` 是必须的(5.6和以上版本). 如果`master_info_repository = 'TABLE'` 请给orchestrator访问`mysql.slave_master_info` 的权限. 这将允许orchestrator在需要时抓取复制凭证. 71 | 72 | Replace `orch_host` with hostname or orchestrator machine (or do your wildcards thing). Choose your password wisely. Edit `orchestrator.conf.json` to match: 73 | 74 | ```yaml 75 | "MySQLTopologyUser": "orchestrator", 76 | "MySQLTopologyPassword": "orch_topology_password", 77 | ``` 78 | Consider moving `conf/orchestrator.conf.json` to `/etc/orchestrator.conf.json` (both locations are valid) 79 | 80 | 要在命令行模式或仅在HTTP API中执行`orchestrator`, 你所需要的只是`orchestrator`二进制文件. 要享受丰富的网络界面, 包括拓扑可视化和拖放拓扑变化, 你将需要`resources`目录和它下面的所有内容. 如果你不确定, 不要碰, 东西已经到位了. 81 | 82 | > To execute `orchestrator` in command line mode or in HTTP API only, all you need is the `orchestrator` binary. To enjoy the rich web interface, including topology visualizations and drag-and-drop topology changes, you will need the `resources` directory and all that is underneath it. If you're unsure, don't touch; things are already in place. 83 | -------------------------------------------------------------------------------- /Use/Executing via command line.md: -------------------------------------------------------------------------------- 1 | # Executing via command line 2 | # [Executing via command line](https://github.com/openark/orchestrator/blob/master/docs/executing-via-command-line.md) 3 | 另请参阅[First Steps with Orchestrator](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Quick%20guides/First%20Steps.md). 4 | 5 | `orchestrator`支持两种从命令行运行操作的方式: 6 | 7 | * 使用`orchestrator` 二进制文件(即orchestrator命令, 也是本文的主题) 8 | * 你将在运维/应用服务器上部署`orchestrator` (命令), 但不将其作为服务运行 9 | * 您将为`orchestrator`二进制文件部署配置文件, 以便能够连接到(orchestrator)后端数据库. 10 | * 使用[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md)脚本. 11 | * 你只需要在运维/应用服务器上部署`orchestrator-client` 脚本即可. 12 | * 你即不需要配置文件, 也不需要二进制文件 13 | >  You will not need any config file nor binaries. 14 | * 你需要设置`ORCHESTRATOR_API` 环境变量. 15 | 16 | 两者(大部分)是兼容的. 本文件讨论的是第一种选择(使用orchestrator命令). 17 | 18 | 以下是命令行示例的概要. 为简单起见, 我们假设 `orchestrator`命令路径已经配置到了`PATH` 中, 如果没有, 请用 `/path/to/orchestrator` 替换`orchestrator` . 19 | 20 | > 下面的例子使用了一个测试的mysqlsandbox拓扑结构, 其中所有的实例都在同一个主机127.0.0.1上, 并且在不同的端口. 22987是主库, 22988、22989、22990是从库. 21 | 22 | 显示当前已知的集群(复制拓扑结构): 23 | 24 | ```bash 25 | orchestrator -c clusters 26 | ``` 27 | > 以上在 `/etc/orchestrator.conf.json`、`conf/orchestrator.conf.json`、`orchestrator.conf.json` 中按顺序查找配置. 通常是将配置放在`/etc/orchestrator.conf.json` 中. 由于它包含您的 MySQL 服务器的凭据, 您可能希望限制对该文件的访问. 28 | 29 | 您可以选择为配置文件使用不同的位置, 在这种情况下执行: 30 | 31 | ```bash 32 | orchestrator -c clusters --config=/path/to/config.file 33 | ``` 34 | > `-c`代表`command`, 是必要参数. 35 | 36 | 发现一个新的实例("教"`orchestrator`了解你的拓扑结构). `Orchestrator`将自动递归地向上钻取主链(如果有的话)和向下钻取复制链(如果有的话)以检测整个拓扑结构 37 | 38 | ```bash 39 | orchestrator -c discover -i 127.0.0.1:22987 40 | ``` 41 | > `-i` 代表`instance` , 必须是`hostname:port` 的形式. 42 | 43 | 用上面的命令相同, 但是包含更多信息: 44 | 45 | ```bash 46 | orchestrator -c discover -i 127.0.0.1:22987 --debug 47 | orchestrator -c discover -i 127.0.0.1:22987 --debug --stack 48 | ``` 49 | >  `--debug`在所有操作中都很有用. `--stack`在(大多数)错误上打印代码堆栈跟踪, 对于开发和测试目的或提交错误报告很有用 50 | 51 | 忘记一个实例(一个实例可以通过上面的discover命令手动或自动重新发现) 52 | 53 | ```bash 54 | orchestrator -c forget -i 127.0.0.1:22987 55 | ``` 56 | 打印拓扑实例的ASCII树. 通过`-i`传递一个集群名称(见上面的`clusters`命令) 57 | 58 | ```bash 59 | orchestrator -c topology -i 127.0.0.1:22987 60 | ``` 61 | > Sample output: 62 | 63 | ```Plain Text 64 | 127.0.0.1:22987 65 | + 127.0.0.1:22989 66 | + 127.0.0.1:22988 67 | + 127.0.0.1:22990 68 | ``` 69 | 在拓扑结构中移动副本: 70 | 71 | ```bash 72 | orchestrator -c relocate -i 127.0.0.1:22988 -d 127.0.0.1:22987 73 | ``` 74 | > Resulting topology: 75 | 76 | ```Plain Text 77 | 127.0.0.1:22987 78 | + 127.0.0.1:22989 79 | + 127.0.0.1:22988 80 | + 127.0.0.1:22990 81 | ``` 82 | 上面的情况是将复制体向上移动了一级. 然而, `relocate`命令接受任何有效的destination. `relocate`找出移动副本的最佳方式. 如果GTID被启用, 使用它. 如果Pseudo-GTID可用, 就使用它. 如果涉及binlog server, 则使用它. 如果`orchestrator`对所涉及的具体坐标有进一步的洞察力, 就使用它. 否则就使用普通的基于binlog file:pos的方式. 83 | 84 | >  I `orchestrator` has further insight into the specific coordinates involved, use it. 85 | 86 | 与`relocate`类似, 你可以通过`relocate-replicas`移动多个副本. 这将把一个实例的副本移动到另一个服务器下面 87 | 88 | > 假设: 89 | 90 | ```bash 91 | 10.0.0.1:3306 92 | + 10.0.0.2:3306 93 | + 10.0.0.3:3306 94 | + 10.0.0.4:3306 95 | + 10.0.0.5:3306 96 | + 10.0.0.6:3306 97 | ``` 98 | ```bash 99 | orchestrator -c relocate-replicas -i 10.0.0.2:3306 -d 10.0.0.6 100 | ``` 101 | > 结果: 102 | 103 | ```bash 104 | 10.0.0.1:3306 105 | + 10.0.0.2:3306 106 | + 10.0.0.6:3306 107 | + 10.0.0.3:3306 108 | + 10.0.0.4:3306 109 | + 10.0.0.5:3306 110 | ``` 111 | > 你可以使用`--pattern` 来匹配所需的副本. 112 | 113 | 其他命令让你对服务器的重新定位方式有更精细的控制. 考虑一下经典的基于file:pos的方式来重新指定副本. 114 | 115 | 将一个副本在拓扑结构中向上移动(使其成为其主副本,或其 "祖先 "的直接副本). 116 | 117 | ```bash 118 | orchestrator -c move-up -i 127.0.0.1:22988 119 | ``` 120 | 上述命令只有在实例有祖先, 并且没有复制滞后等问题时才会成功 121 | 122 | 将副本移动到其同级下方: 123 | 124 | ```bash 125 | orchestrator -c move-below -i 127.0.0.1:22988 -d 127.0.0.1:22990 --debug 126 | ``` 127 | > 上面的命令只有在 127.0.0.1:22988 和 127.0.0.1:22990 是兄弟姐妹(同一个 master 的副本)时才会成功, 它们都没有问题(例如副本滞后), 并且兄弟姐妹可以称为一个新的 master(即有二进制 日志,有 log\_slave\_updates,没有版本冲突等) 128 | 129 | 让一个实例只读或可写: 130 | 131 | ```bash 132 | orchestrator -c set-read-only -i 127.0.0.1:22988 133 | orchestrator -c set-writeable -i 127.0.0.1:22988 134 | ``` 135 | -------------------------------------------------------------------------------- /Failure detection & recovery/Key-Value stores.md: -------------------------------------------------------------------------------- 1 | # Key-Value stores 2 | # [Key-Value stores](https://github.com/openark/orchestrator/blob/master/docs/kv.md) 3 | `orchestrator` 支持以下key-value存储: 4 | 5 | * 一个基于关系表的内部存储 6 | 7 | * [Consul](https://github.com/hashicorp/consul) 8 | * [ZooKeeper](https://zookeeper.apache.org/) 9 | 10 | 更多信息请参阅[Configuration: Key-Value stores](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Key-Value%20stores.md) 11 | 12 | ### Key-Value usage 13 | At this time Key-Value (aka KV) stores are used for: 14 | 15 | * Master discoveries 16 | 17 | ### Master discoveries, key-values and failovers 18 | 其目的是, 诸如基于`Consul`或`ZookKeeper`的服务发现将能够提供master discovery和/或根据集群的master身份和变化采取行动. 19 | 20 | 最常见的场景是更新代理以将集群的写入流量定向到特定的主节点. 例如, 可以通过 `consul-template` 设置 `HAProxy`, 这样 `consul-template` 根据由 orchestrator 编写的键值存储填充a single-host master pool 21 | 22 | orchestrator 在 master failover时更新所有 KV 存储. 23 | 24 | #### Populating master entries 25 | Clusters' master entries在以下情况下被填入: 26 | 27 | * 遇到一个新的集群, 或遇到一个没有KV entry的master. 这个检查是自动和定期进行的 28 | * 定期检查首先咨询`orchestrator`的内部KV存储. 只有在内部存储没有master entries的情况下, 它才会尝试填充外部存储(`Consul`, `Zookeeper`). 由此可见, 定期检查将只注入一次外部KV. 29 | * An actual failover: `orchestrator`用new master的身份覆盖现有条目 30 | * 手工的entry填充请求: 31 | * `orchestrator-client -c submit-masters-to-kv-stores` 提交所有集群的master到KV, 或 32 | * `orchestrator-client -c submit-masters-to-kv-stores -alias mycluster` 提交`mycluster` 集群的master到KV 33 | 另见[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md). 也可以使用orchestrator命令行. 34 | 或者你可以直接访问API: 35 | * `/api/submit-masters-to-kv-stores` 36 | * /api/submit-masters-to-kv-stores/:alias 37 | 38 | 实际的故障转移和手动请求都将覆盖任何现有的内部和外部KV entries. 39 | 40 | ### KV and orchestrator/raft 41 | 在[Orchestrator/raft, consensus cluster](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%83%A8%E7%BD%B2/Orchestrator%20raft%2C%20consensus%20cluster.md) 部署模式中, 所有KV写入都要通过`raft` 协议. 因此, 一旦领导者确定需要对KV存储进行写入, 它就会向所有`raft`节点发布请求. 每个节点将根据自己的配置, 独立地应用写入. 42 | 43 | #### Implications 44 | 举例来说, 假设你在一个3个数据中心的设置中运行`orchestrator/raft`, 每个DC一个节点. 另外, 我们假设你在这些DC上都部署了Consul. Consul的设置通常是在DC之间, 可能还有跨DC的异步复制. 45 | 46 | 在主站故障切换时, 每个`orchestrator`节点都会用new master的身份更新Consul. 47 | 48 | 如果你的Consul运行跨DC复制, 那么同一个KV更新有可能运行两次: 一次通过Consul复制, 一次通过本地`orchestrator`节点. 这两次更新是相同的、一致的, 因此可以安全运行 49 | 50 | 如果你的Consul没有相互复制, 那么`orchestrator`是使你的master discovery在你的Consul集群中一致的唯一手段. 你可以得到raft带来的所有好的特性: 如果一个DC被网络分区, 该DC中的`orchestrator`节点将不会收到KV更新事件, 并且在一段时间内,Consul 集群也不会收到. 然而, 一旦网络访问被恢复 `orchestrator`将赶上事件日志, 并将KV更新应用到本地Consul集群. The setup is eventual-consistent(最终一致). 51 | 52 | 在主站failover后不久, `orchestrator`就会生成一个raft快照. 这不是严格的要求的, 但却是一个有用的操作: 在`orchestrator`节点重新启动的情况下, 快照可以防止协调器重放KV写入. 这在 failover-and-failback的情况下特别有趣, 像consul这样的远程KV可能会得到同一个集群的两个更新. 快照可以缓解此类事件. 53 | 54 | ### Consul specific 55 | Optionally, you may configure: 56 | 57 | ```yaml 58 | "ConsulCrossDataCenterDistribution": true, 59 | ``` 60 | ...which can (and will) take place in addition to the flow illustrated above. 61 | 62 | 通过`ConsulCrossDataCenterDistribution`, `orchestrator`对Consul集群的扩展列表运行一个额外的、定期更新. 63 | 64 | 每分钟一次, `orchestrator`领导节点查询其配置的Consul服务器, 以获取[known datacenters](https://www.consul.io/api/catalog.html#list-datacenters)的列表. 然后, 它遍历这些数据中心集群, 并以主站的当前身份更新每一个集群. 65 | 66 | 如果一个人有更多的Consul数据中心, 而不是只有一个本地Consul-per-orchestrator-node, 那么就需要这个功能. 我们在上面说明了在`orchestrator/raft`部署模式中, 每个节点如何更新其本地Consul集群. 然而, 不属于任何`orchestrator`节点本地的Consul集群不受这种方法的影响. `ConsulCrossDataCenterDistribution`是包括所有这些其他DC的方式 67 | 68 | #### Consul Transaction support 69 | Atomic [Consul Transaction](https://www.consul.io/api-docs/txn) support is enabled by configuring: 70 | 71 | ```yaml 72 | "ConsulKVStoreProvider": "consul-txn", 73 | ``` 74 | *Note: this feature requires Consul version 0.7 or greater.* 75 | 76 | This causes Orchestrator to use a [Consul Transaction](https://www.consul.io/api-docs/txn) when distributing one or more Consul KVs. KVs are read from the server in one transaction and any necessary updates are performed in a second transaction. 77 | 78 | Orchestrator groups KV updates by key-prefix into groups of of 5 to 64 operations *(default 5)*. This grouping ensures updates to a single cluster *(5 x KVs)* happen atomically. Increasing the `ConsulMaxKVsPerTransaction` configuration setting from `5` *(default)* to a max of `64` *(Consul Transaction API limit)* allows more operations to be grouped into fewer transactions but more can fail at once. 79 | -------------------------------------------------------------------------------- /Deployment/shard backend模式部署.md: -------------------------------------------------------------------------------- 1 | # shard backend模式部署 2 | # [Orchestrator deployment: shared backend](https://github.com/openark/orchestrator/blob/master/docs/deployment-shared-backend.md) 3 | 本文描述了shared backend数据库的部署方法. 有关各种后端DB设置,请参阅[Orchestrator高可用](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/Orchestrator高可用.md). 4 | 5 | 这篇文的完善了[在生产环境中部署Orchestrator](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/在生产环境中部署Orchestrator.md). 6 | 7 | ### Shared backend 8 | 你将需要创建一个shared backend database. 这可以是同步复制(Galera/XtraDB Cluster/InnoDB Cluster), 以实现高可用性, 也可以是主从复制. 9 | 10 | 后端数据库具有你的拓扑结构的状态. `orchestrator`本身几乎是无状态的, 并且信任后端数据库中的数据. 11 | 12 | > The backend database has the *state* of your topologies. `orchestrator` itself is almost stateless, and trusts the data in the backend database. 13 | 14 | 在shard backend模式部署中, 所有`orchestrator` 服务将全部与同一个后端通信. 15 | 16 | * 对于同步复制(**synchronous replication**), 建议是: 17 | * 配置多写模式(数据库集群的每个节点都是可写的) 18 | * `orchestrator`和MySQL节点之间是1:1映射的: 每个`orchestrator`服务与自己的后端数据库节点对话 19 | * 对于主从复制(异步 & 半同步), 请执行以下操作: 20 | * 配置所有`orchestrator`节点访问同一个后端数据库节点(即, 主库) 21 | * 你可以通过代理(proxysql)将引导流量到主库, 在这种情况下, 配置所有`orchestrator`节点访问代理即可. 22 | 23 | ### MySQL backend setup and high availability 24 | 设置后端数据库是你的责任. 另外, `orchestartor` doesn't eat its own dog food, 也不能恢复自己后端数据库的故障. 你将需要处理, 例如, 添加Galera节点的问题, 或管理你的代理健康检查等. 25 | 26 | ### What to deploy: service 27 | * 将`orchestrator`服务部署到服务盒上. 部署多少个服务盒将取决于你的可用性需求. 28 | >  Deploy the `orchestrator` service onto service boxes. The decision of how many service boxes to deploy will depend on your[Orchestrator高可用](Deployment/Orchestrator高可用.md) 29 | * In a synchronous replication shared backend setup, these may well be the very MySQL boxes, in a `1:1` mapping. 如果后端数据库采用同步复制(MGR/PXC), 那么`orchestrator` 服务数量和后端数据库集群节点数应该是1:1的. `orchestrator` 应该可以直接部署在后端数据库服务器上. 30 | * 考虑在服务盒(service boxes)之上增加一个代理(proxy);代理最好能将所有流量重定向到leader node(这里指的是`orchestrator` 服务leader节点). 有一个而且只有一个领导者节点, 状态检查的端点是`/api/leader-check` . 可以将流量导向任何健康的(`orchestrator`)服务. 由于所有`orchestrator`节点都与相同的共享后端数据库通信, 因此可以从一个服务节点执行一些操作, 从另一个服务节点执行其他操作. 内部锁是为了避免运行相互矛盾或干扰的命令. 31 | 32 | ### What to deploy: client 33 | 为了通过shell/automation/scripts与`orchestrator`进行交互, 你可以选择: 34 | 35 | * 直接与HTTP API交互 36 | * 使用[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md)脚本([orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md)本质是一个shell脚本). 37 | * 将`orchestrator-client`部署在你希望与`orchestrator`交互的任何盒子上. 38 | * Create and edit `/etc/profile.d/orchestrator-client.sh` on those boxes to read: 39 | 40 | ```bash 41 | ORCHESTRATOR_API="http://your.orchestrator.service.proxy:80/api" 42 | # 代理 43 | ``` 44 | or 45 | 46 | ```bash 47 | ORCHESTRATOR_API="http://your.orchestrator.service.host1:3000/api http://your.orchestrator.service.host2:3000/api http://your.orchestrator.service.host3:3000/api" 48 | # 指定所有orchestrator 49 | ``` 50 | 在后一种情况下, 你将提供所有`orchestrator`节点的列表, 而`orchetsrator-client`脚本将自动计算出哪个是leader. 通过这种设置, 你的自动化将不需要代理(尽管你可能仍然希望为Web界面用户使用代理). 51 | 52 | 确保 chef/puppet/whatever 的 `ORCHESTRATOR_API` 值能够适应环境的变化. 53 | 54 | * [orchestrator命令](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Executing%20via%20command%20line.md) 55 | * 将`orchestrator`二进制文件(你可以使用`orchestrator-cli`distributed package)部署在你希望与`orchestrator`互动的任何盒子上. 56 | * 在这些盒子上创建`/etc/orchestrator.conf.json`,用证书填充(populate with credentials). 该文件一般应与`orchestrator`服务盒的文件相同. 如果你不确定, 请使用完全相同的文件内容. 57 | * `orchestrator`命令将访问共享的后端数据库. 请确保给予它访问权. 通常情况下, 这将是3306端口. 58 | 59 | 即使在`orchestrator`服务运行时, 运行orchestrator CLI也是可以的, 因为它们都将在相同的后端DB上进行协调. 60 | 61 | ### Orchestrator service 62 | In a shared-backend deployment, 你可以根据你的需求部署所需数量的`orchestrator` 节点. 63 | 64 | 然而, 如前所述, 只有一个`orchestrator`节点将被选为领导者. 只有领导者会: 65 | 66 | * 发现(探测)你的MySQL拓扑结构 67 | * 运行故障检测 68 | * 运行故障恢复 69 | 70 | 所有节点都: 71 | 72 | * 提供HTTP请求 73 | 74 | * Register their own health check 注册自己的健康检查 75 | 76 | 所有节点都可以: 77 | 78 | * Run arbitrary command (e.g. `relocate`, `begin-downtime`) (运行任意命令?) 79 | 80 | * Run recoveries per human request. (按人的要求运行恢复) 81 | 有关部署多个节点的详细信息,请阅读有关[Orchestrator高可用](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/Orchestrator高可用.md)的内容. 82 | 83 | ### Orchestrator CLI 84 | CLI执行以完成特定的操作. 它可以选择探测一些服务器, 这取决于操作(e.g. `relocate`), 也可以根本不探测服务器, 只从后端数据库读取数据. 85 | 86 | ### A visual example 87 | ![image](images/p9yrEDChIfZ9JcH01eVqYMbOK3JXJR4iue-knBqMxQU.png) 88 | 89 | 在上面图中, 有三个`orchestrator`节点运行在3个节点的同步复制设置之上. 每个`orchestrator`节点都与不同的MySQL后端通信, 但是这些节点都是同步复制的, 并且都共享相同的数据(有一定的延迟). 90 | 91 | 一个`orchestrator`节点被选为leader, 并且只有这个节点探测MySQL拓扑. 它会探测所有已知的服务器(上面的图片只显示了部分探测,以避免意大利面. (他意思应该是图上画太多线了, 向意大利面一样乱)) 92 | -------------------------------------------------------------------------------- /Deployment/Orchestrator高可用.md: -------------------------------------------------------------------------------- 1 | # Orchestrator高可用 2 | # [Orchestrator High Availability](https://github.com/openark/orchestrator/blob/master/docs/high-availability.md) 3 | `orchestrator`作为一个高可用服务运行. 本文列出了为`orchestrator`实现高可用的各种方法, 以及一些less/not高可用的设置. 4 | 5 | ### TL;DR 实现高可用的方式 6 | HA是通过选择以下其中之一实现的: 7 | 8 | * `orchestrator/raft` 模式, `orchestrator`节点间通过raft共识算法通信. 每个`orchestrator`节点都有一个私有的数据库, `MySQL` 或`sqlite` . 另见 [Orchestrator/raft, consensus cluster](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/部署/Orchestrator%20raft%2C%20consensus%20cluster.md) 9 | * 共享存储(Shared backend)模式. 多个`orchestrator` 节点使用一个后端数据库, 可能是Galera/XtraDB Cluster/InnoDB Cluster/NDB Cluster. (orchestrator的)数据是在数据库层级实现同步的. 10 | 11 | 另见 [orchestrator/raft vs. synchronous replication setup](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/部署/orchestrator%20raft%20vs.%20synchronous%20replication%20setup.md) 12 | 13 | ### 高可用类型 14 | 你可以根据你的需求,选择不同的可用性类型. 15 | 16 | * 无高可用: 最容易、最简单的部署方式, 适合于测试或开发环境. 可以使用`MySQL` 或`sqlite` 17 | * 半高可用(Semi HA): 后端数据库是基于普通主从复制实现的. `orchestrator` does not eat its own dog food, 也不能故障转移自己的后端数据库. 18 | * 高可用: 不会出现单点故障. 不同的解决方案在资源利用、支持的软件、客户端访问类型方面有不同的权衡. 19 | 20 | 21 | 22 | ### No high availability 23 | ![image](images/RKcild_9LBYlOFCLhgSLRMIZpTFTrTBdTkaxeEQsGJA.png) 24 | 25 | 这种部署方式适合CI testing, 本地开发环境或其他实验场景. 本质是一个单一的`orchestrator` 节点和一个单一的数据库节点. 26 | 27 | 后端数据库可以是`MySQL` 或`sqlite` 数据库, 与`orchestrator` 捆绑在一起(没有依赖性, 不需要额外的软件) 28 | 29 | 30 | 31 | ### Semi HA 32 | ![image](images/BIjP2MvFIWGIWGT1Q2uOFgsnJ7l4_-SisU_M-nNfXAI.png) 33 | 34 | 这种部署方式为`orchestrator` 提供了semi HA能力. 有两种变体可供选择: 35 | 36 | * 多个`orchestrator`节点与同一后端数据库对话. (借此方式)`orchestrator` 服务的HA已经实现. 然而, 后端数据库的HA却没有实现. 后端数据库可能是一个带有从库的`主库`, 但`orchestrator`does not eat its own dog food, 也不能故障转移自己的后端数据库. 37 | 如果后端主库发生故障,需要有人或其他东西将`orchestrator`服务故障转移到后端数据库的"新主库"上. 38 | * 多个`orchestrator`节点都与proxy对话, proxy实现对后端一套基于`STATEMENT` 格式binlog的双主集群的负载均衡. 39 | * proxy总是路由到同一个数据库(例如, `HAProxy` 的`first` 算法)除非这个库挂了. 40 | * active master的宕机导致`orchestrator`与另一个maser对话, 而这个master可能有些落后(复制延迟). `orchestrator` 通常会根据其持续发现的性质自行重新应用缺失的变化. 41 | * `orchestrator` 保证了基于`STATEMENT` 的复制不会造成duplicate errors,后端(master-master)双主数据库将始终实现一致性. 42 | * 即使在运行恢复(指对所管理的MySQL集群运行恢复. 如: Failover)的过程中, `orchestrator` 也能从后端主库宕机中恢复(恢复将在备主上重新启动). 43 | * **脑裂是可能出现的**. 根据你的设置、物理位置、代理类型, 可能有不同的`orchestrator`服务节点与不同的后端MySQL服务器通话. 这种情况可能导致出现两个认为自己是 "活动 "的`orchestrator`服务, 这两个服务将独立运行故障转移, 这将导致拓扑结构损坏. 44 | 45 | 要访问您的orchestrator服务, 您可以对任何正常的节点进行通信. 46 | 47 | 众所周知, 这几种变体都能在生产环境中运行, 适用于非常大的环境. 48 | 49 | 50 | 51 | ### HA via shared backend 52 | ![image](images/NQ-ePOZ7dpey41yHy6LfSGOZ98AkOX1mxhM-wTSv7ms.png) 53 | 54 | HA是通过后端数据库的高可用实现的. 现有的解决方案是: 55 | 56 | * Galera 57 | * XtraDB Cluster 58 | * InnoDB Cluster 59 | * NDB Cluster 60 | 61 | 在上述所有情况下,MySQL节点都运行"同步"复制. 62 | 63 | > 译者注: 无论Galera还是Group Replication, 都做不到真正的同步. 64 | 65 | 存在两种变体: 66 | 67 | * 后端 Galera/XtraDB Cluster/InnoDB Cluster 数据库集群使用单主(单点写入)模式. 多个`orchestrator` 节点与writer DB对话, 通常是通过proxy. 如果writer DB出现故障, 后端数据库集群自行完成故障转移, 提升一个新主库作为writer; `orchestrator` 通过proxy识别出来的新主继续提供服务. 68 | * 后端 Galera/XtraDB Cluster/InnoDB Cluster 数据库集群使用多主(多点写入)模式. 一个好的方式是将每个`orchestrator`节点与一个数据库实例结合起来(可能部署在同一个服务器上). 由于复制是同步的, 所以不存在脑裂的情况. 只有一个`orchestrator`节点可以成为leader, 而且这个leader只会在DB节点的共识下发言. 69 | 70 | 在这种设置中, MySQL节点之间可能会有大量的流量. 在跨DC的设置中, 这可能意味着更大的提交延迟(每个提交可能需要跨越DC). 71 | 72 | 要访问你的`orchestrator`服务, 你可以与任何健康的节点对话. 建议你只通过代理与leader对话(使用`/api/leader-check`作为代理的HTTP健康检查). 73 | 74 | 这种部署方式通常适用于非常大的环境中, 通常会部署3-5个`orchestrator` 节点. 75 | 76 | ### HA via raft 77 | ![image](images/50Axq5Tb-HSk4xdvGP1b5GUj-LK_FObapENI33iofeE.png) 78 | 79 | `orchestrator` 节点将直接通过`raft`共识算法进行通信. 每个`orchestrator`节点都有自己的私有后端数据库. 这可以是`MySQL`或`sqlite` . 80 | 81 | 只有一个`orchestrator`节点承担领导责任, 并且始终是共识的一部分. 然而, 所有其他节点都是独立活动的, 并且正在对你的拓扑结构进行投票. 82 | 83 | 在这种模式中: 84 | 85 | * DB节点之间没有通信 86 | * `orchestrator`之间的沟通极少. 87 | * `*n倍的`与MySQL拓扑结构节点的通信. 3个节点的设置意味着每个拓扑结构的MySQL服务器都由3个不同的协调器节点独立探测. 88 | 89 | 建议部署3个或5个节点的集群 90 | 91 | `sqlite`被嵌入到`orchestrator`中, 不需要外部依赖. 在负载较高的场景下, `MySQL`的性能优于`sqlite.` 92 | 93 | 要访问你的`orchestrator`服务, 你只能与leader节点对话. 94 | 95 | * Use `/api/leader-check` as HTTP health check for your proxy. 96 | 97 | * 或者对多个`orchestrator`后端使用[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md); `orchestrator-client`将找出leader并将请求发送给它. 98 | 99 | ![image](images/dlnTL2c8qw-wWm2A91Y7zuMz0j-RLHGMH9FWPQlPJ9s.png) 100 | 101 | `orchestrator/raft`是一个较新的开发项目, 目前正在生产中进行测试. 请阅读[Orchestrator/raft, consensus cluster](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/部署/Orchestrator%20raft%2C%20consensus%20cluster.md)以获取更多信息. 102 | 103 | > 译者注: 上面这句话是5年前写的了, 可以查看这篇文档的提交记录. 104 | -------------------------------------------------------------------------------- /Setup/部署/orchestrator raft vs. synchronous replication setup.md: -------------------------------------------------------------------------------- 1 | # orchestrator/raft vs. synchronous replication setup 2 | # [orchestrator/raft vs. synchronous replication setup](https://github.com/openark/orchestrator/blob/master/docs/raft-vs-sync-repl.md) 3 | 这篇文章比较了两种高可用性部署方法的部署、行为、限制和好处: `orchestrator/raft` vs `orchestrator/[galera|xtradb cluster|innodb cluster]` 4 | 5 | 我们将假设和比较: 6 | 7 | * `3`个数据中心的部署设置(一个可用区可以算作一个数据中心) 8 | * `3` 节点`orchestrator/raft` setup 9 | * `3` 节点`orchestrator` 在多主模式`galera|xtradb cluster|innodb cluster`上(集群中的每个MySQL都可以接受写) 10 | * 一个能够运行`HTTP`或`mysql`健康检查的代理 11 | >  A proxy able to run `HTTP` or `mysql` health checks 12 | * `MySQL`、`MariaDB`、`Percona Server`都被认为是`MySQL` . 13 | 14 | ![image](images/FRo9X-wvrhPyA5-rFJ_PEv2i63pmDgHHqjdwjvq_1Dg.png) 15 | 16 | |**Compare**|**orchestrator/raft**|**synchronous replication backend**| 17 | | ----- | ----- | ----- | 18 | |General wiring
(一般布线? 总体布线?)|每个`orchestrator` 节点都有一个私有后端数据库;
`orchestrator` 节点间通过`raft` 协议通信|每个`orchestrator` 节点连接到一个"同步复制集群"的不同`MySQL` 成员上.
`orchestrator` 节点之间不进行通信| 19 | |Backend DB
(后端数据库)|`MySQL` 或 `SQLite`|MySQL| 20 | |Backend DB dependency
(后端数据库依赖性)|如果不能访问自己的私有后端数据库, 服务就会panic|如果不能访问自己的私有后端数据库, 则服务处于*unhealthy状态*| 21 | |DB data
(数据库数据)|Independent across DB backends.
May vary, but on a stable system converges to same overall picture
每个私有数据库的数据都是独立的.
可能会有所不同, 但在一个稳定的系统上各个私有数据库的数据会收敛到近乎相同.|单一数据集,因为是"同步复制集群"| 22 | |DB access
(数据库访问)|请永远不要直接写入数据.
只有`raft`节点在协调/合作时访问后端数据库.
否则会引入不一致的情况.
读取是可以的.|可以直接访问和写入;
所有`orchestrator`节点/clients看到的是完全相同的数据| 23 | |Leader and actions|Single leader.
只有leader运行故障恢复功能.
> Only the leader runs recoveries
所有节点都运行发现(探测)和自我分析
> All nodes run discoveries (probing) and self-analysis|Single leader.
只有领导者负责发现(探测)、分析和恢复工作.
> Only the leader runs discoveries (probing), analysis and recoveries.| 24 | |HTTP Access|必须只访问leader(可以通过proxy或`orchestrator-client`来执行)
> can be enforced by proxy or `orchestrator-client`|可以访问任何健康的节点(可以通过proxy强制执行).
为了保证读取的一致性, 最好只与leader对话(可由proxy或`orchestrator-client`来执行)| 25 | |Command line|HTTP/API access (e.g. `curl`, `jq`) or
`orchestrator-client` script which wraps common HTTP /API calls with familiar command line interface|HTTP/API, and/or `orchestrator-client` script, or `orchestrator ...` command line invocation.| 26 | |Install|`orchestrator` service on service nodes only.
`orchestrator-client` script anywhere (requires access to HTTP/API).|`orchestrator` service on service nodes.
`orchestrator-client` script anywhere (requires access to HTTP/API).
`orchestrator` client anywhere (requires access to backend DBs)| 27 | |Proxy|HTTP. Must only direct traffic to the leader (`/api/leader-check`)|HTTP. Must only direct traffic to healthy nodes (`/api/status`) ;
best to only direct traffic to leader node (`/api/leader-check`)| 28 | |No proxy|Use `orchestrator-client` with all `orchestrator` backends.
`orchestrator-client` will direct traffic to master.|Use `orchestrator-client` with all `orchestrator` backends.
`orchestrator-client` will direct traffic to master.| 29 | |Cross DC
(跨数据中心部署)|每个`orchestrator` 节点(以及私有后端数据库)可以在不同DC上运行.
节点之间的通信不多, 流量也很低.|每个`orchestrator` 节点(以及私有后端数据库)可以在不同DC上运行.
`orchestrator` 节点间不直接通信.
`MySQL` group replication is chatty. Amount of traffic mostly linear by size of topologies and by polling rate. Write latencies.| 30 | |Probing
(探查, 数据库服务发现)|Each topology server probed by all `orchestrator` nodes
每个拓扑服务器由所有`orchestrator`节点探查|Each topology server probed by the single active node
每个拓扑服务器由单个活动节点探查| 31 | |Failure analysis|由所有节点独立执行|仅由leader执行(数据库是共享的, 因此所有节点无论如何都会看到完全相同的数据).| 32 | |Failover|仅由leader节点执行|仅由leader节点执行| 33 | |Resiliency to failure|3节点集群容忍1个节点宕机
5节点集群容忍2个节点宕机|3节点集群容忍1个节点宕机
5节点集群容忍2个节点宕机| 34 | |Node back from short failure
(节点从短时故障中恢复)|Node rejoins cluster, gets updated with changes.
节点重新加入集群.|DB node rejoins cluster, gets updated with changes.
节点重新加入集群.| 35 | |Node back from long outage
(节点从长期中断中恢复)|DB must be cloned from healthy node.
数据库需要从其他健康节点的备份中恢复|Depends on your MySQL backend implementation. Potentially SST/restore from backup.
取决于你的MySQL后端实现. 可能通过SST或从备份中恢复.| 36 | 37 | 38 | 39 | ### 注意事项 40 | 以下是在这两种部署方法中选择的考虑因素: 41 | 42 | * 你只有一个单一的数据中心(DC): 选择shared DB(MGR/PXC)或甚至更简单的部署方式(参考[Orchestrator高可用](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/Orchestrator高可用.md)) 43 | * 你对Galera/XtraDB Cluster/InnoDB Cluster很熟悉, 并且有自动化部署和维护它们的能力: 挑选shared DB. 44 | * 你有高延迟的跨DC网络: 选择`orchestrator/raft`. 45 | * 你不想为`orchestrator` 使用MySQL: choose `orchestrator/raft` with `SQLite` backend. 46 | * 你要管理数千个MySQL集群, 选择raft或shard DB都行. 但是后端数据库请选择MySQL, 而不是SQLite, 因为前者性能更好. 47 | 48 | > 译者注: raft和shared DB 各有优缺点, 都可以. 如果要将orchestrator跨云部署(部署在华为和阿里), 那么选raft. 否则如果只是跨AZ部署, 以华为AZ间的网络延迟来看, raft意义不大. 49 | 50 | ### Notes 51 | * Another synchronous replication setup is that of a single writer. This would require an additional proxy between the `orchestrator` nodes and the underlying cluster, and is not considered above. 52 | `orchestrator` -> proxysql -> MySQL集群(如主从复制) 53 | -------------------------------------------------------------------------------- /Quick guides/FAQ.md: -------------------------------------------------------------------------------- 1 | # FAQ 2 | # [Frequently Asked Questions](https://github.com/openark/orchestrator/blob/master/docs/faq.md) 3 | ### 谁应该使用orchestrator? 4 | 面对不仅仅是一主一从集群架构的DBA和运维人员. 5 | 6 | ### Orchestrator能为我做什么? 7 | `Orchestrator` 分析你的复制拓扑结构, 并提供信息和行动: 你将能够可视化&操作这些拓扑结构(refactoring replication paths). `Orchestrator`可以监控和恢复拓扑结构的故障, 包括主库宕机. 8 | 9 | ### 这又是一个监控工具吗? 10 | 不. 严格来说, `Orchestrator`不是一个监控工具. 它并不打算这样做; 没有警报或电子邮件. 但它确实提供了拓扑状态的在线可视化, 并需要一些自己的阈值来管理拓扑. 11 | 12 | ### Orchestrator支持什么类型的复制? 13 | `Orchestrator`支持 "plain-old-MySQL-replication", 即使用二进制日志文件和位置的那种. 如果你不知道你在用什么, 可能就是这个. 14 | 15 | ### Orchestrator支持Row Based Replication吗? 16 | 支持. Statement Based Replication和Row Based Replication都被支持(而这种区别实际上与`orchestrator`无关) 17 | 18 | ### Orchestrator是否支持半同步复制? 19 | 支持. 20 | 21 | ### Orchestrator是否支持Master-Master (ring) Replication? 22 | 支持, for a ring of two masters (active-active, active-passive). 23 | 24 | Master-Master-Master\[-Master...\] 拓扑结构, 即由3个或更多的master组成的环不被支持, 也没有测试. 而且不鼓励使用. 并且是一种可憎的行为. 25 | 26 | ### Orchestrator支持Galera Replication吗? 27 | 是的, 也不是. `Orchestrator`不知道Galera复制的情况. 如果你有三个Galera master, 每个master下有不同的复制拓扑结构, 那么`Orchestrator`会把这些看作是三个不同的拓扑结构. 28 | 29 | ### Orchestrator支持基于GTID的复制吗? 30 | 支持. Oracle GTID和MariaDB GTID都被支持. 31 | 32 | ### Orchestrator支持5.6版本的并行复制吗(thread per schema)? 33 | 不支持. 这是因为`START SLAVE UNTIL`在并行复制中不被支持, 而且`SHOW SLAVE STATUS`的输出是不完整的. 在这方面没有预期的工作. 34 | 35 | > No. This is because `START SLAVE UNTIL` is not supported in parallel replication, and output of `SHOW SLAVE STATUS` is incomplete. There is no expected work on this. 36 | 37 | ### Orchestrator支持5.7版本的并行复制吗? 38 | 支持. 只要使用GTID就可以. 当使用Pseudo-GTID时, 你必须启用in-order-replication(set [slave\_preserve\_commit\_order](http://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#sysvar_slave_preserve_commit_order)). 39 | 40 | ### Does orchestrator support Multi-Master Replication? 41 | 这应该指的是多源复制吧. 不支持 42 | 43 | No. Multi Master Replication (e.g. as in MariaDB 10.0) is not supported. 44 | 45 | ### Does orchestrator support Tungsten Replication? 46 | 不支持. 47 | 48 | > [Tungsten-Replicator ](https://www.continuent.com/products/tungsten-replicator)是第三方的MySQL数据复制引擎,是个商业产品,同时提供开源版本。类似于MySQL 自身的replication,基于日志复制模式,不同的是 Tungsten 通过Extractor控件读取mysql主库的binlog 解析成自己的日志格式--THL(Transaction History Log), 在从库上通过Applier控件写入数据库。 49 | 50 | > Tungsten-Replicator 具有以下特性: 51 | 52 | > * 支持高版本MySQL向低版本复制,如:MySQL5.1 --> MySQL5.0; 53 | > * 支持跨数据库系统的复制,如:MySQL --> PostgreSQL 54 | > * 支持多主库向单台Slave 的复制,Multi-Master --> Slave 55 | > * Ganji-Replicator提取数据的更新记录,写到MySQL 队列表 Queue;基于这个队列,可以为其他应用服务提供便利,如检索系统数据更新,跨机房半同步。 MySQL --> Queue 56 | 57 | ### Orchestrator支持MySQL Group Replication吗? 58 | 部分支持. 在MySQL8.0版本支持single primary mode的组复制. 支持的范围是: 59 | 60 | * `Orchestrator` 了解所有组成员都是同一集群的一部分, 检索复制组信息作为实例发现的一部分, 将其存储在其数据库中, 并通过 API 公开 61 | > Orchestrator understands that all group members are part of the same cluster, retrieves replication group information as part of instance discovery, stores it in its database, and exposes it via the API. 62 | * Orchestrator Web UI 显示single primary group members. 它们显示如下: 63 | > The orchestrator web UI displays single primary group members. They are shown like this: 64 | * 所有secondary成员都是从primary复制的 65 | > All secondary group members as replicating from the primary. 66 | * 所有组成员都有一个图标, 显示他们是组成员(相对于传统的异步/半同步复制). 67 | > All group members have an icon that shows they are group members (as opposed to traditional async/semi-sync replicas). 68 | * 将鼠标悬停在上述图标上可提供有关数据库实例在组中的状态和角色的信息. 对于组成员来说, 一些重新定位操作是被禁止的. 特别是, orchestrator将拒绝重新定位一个secondary节点, 因为根据定义, 它总是从primary节点进行复制. 它也会拒绝在同一组的secondary节点下重新定位一个primary节点的尝试 69 | > Hovering over the icon mentioned above provides information about the state and role of the DB instance in the group.\* Some relocation operations are forbidden for group members. In particular, orchestrator will refuse to relocate a secondary group member, as it, by definition, replicates always from the group primary. It will also reject an attempt to relocate a group primary under a secondary of the same group. 70 | * 来自失败组成员的传统异步/半同步副本将被重新定位到MGR集群的其他组成员上. 71 | > Traditional async/semisync replicas from failed group members are relocated to a different group member. 72 | 73 | ### Does orchestrator support Yet Another Type of Replication? 74 | No. 75 | 76 | ### Does orchestrator support... 77 | No. 78 | 79 | ### Is orchestrator open source? 80 | Yes. `Orchestrator` is released as open source under the Apache License 2.0 and is available at: [https://github.com/openark/orchestrator](https://github.com/openark/orchestrator) 81 | 82 | ### Who develops orchestrator and why? 83 | `Orchestrator` is developed by [Shlomi Noach](https://github.com/shlomi-noach) at [GitHub](http://github.com/) (previously at [Booking.com](http://booking.com/) and [Outbrain](http://outbrain.com/)) to assist in managing multiple large replication topologies; time and human errors saved so far are almost priceless. 84 | 85 | ## Orchestrator 是否可以与包含多个版本的MySQL集群一起使用? 86 | 部分支持. 这经常出现在升级一个不能停机的集群时. 每个副本将被脱机并升级到新的主要版本, 然后再添加到集群中, 直到所有副本都被升级. Orchestrator知道MySQL的版本, 并将允许一个主版本较高的副本移动到一个主版本或中间主版本较低的副本下, 但不允许反过来,因为这通常不被上游供应商支持, 即使它可能实际工作. 在大多数情况下, 协调器会做正确的事情, 它将允许在拓扑结构中安全地移动这些复制, 如果这是可能的. 这在MySQL 5.5/5.6中被广泛使用, 在5.6/5.7中也有使用, 但在MariaDB 10中却没有这么多. 如果你看到可能与此有关的问题, 请报告它们. 87 | 88 | -------------------------------------------------------------------------------- /Operation/Tags.md: -------------------------------------------------------------------------------- 1 | # Tags 2 | # [Tags](https://github.com/openark/orchestrator/blob/master/docs/tags.md) 3 | `orchestrator`支持对实例打标签并通过标签进行搜索. 4 | 5 | 标签是作为一种服务提供给用户的, 并不被`orchestrator`内部使用. 6 | 7 | ### Tag commands 8 | 支持以下命令. 细目如下: 9 | 10 | * `orchestrator-client -c tag -i some.instance --tag name=value` 11 | * `orchestrator-client -c tag -i some.instance --tag name` 12 | * `orchestrator-client -c untag -i some.instance -t name` 13 | * `orchestrator-client -c untag-all -t name=value` 14 | * `orchestrator-client -c tags -i some.instance` 15 | * `orchestrator-client -c tag-value -i some.instance -t name` 16 | * `orchestrator-client -c tagged -t name` 17 | * `orchestrator-client -c tagged -t name=value` 18 | * `orchestrator-client -c tagged -t name=` 19 | 20 | and these API endpoints: 21 | 22 | * `api/tag/:host/:port/:tagName/:tagValue` 23 | * `api/tag/:host/:port?tag=name` 24 | * `api/tag/:host/:port?tag=name%3Dvalue` 25 | * `api/untag/:host/:port/:tagName` 26 | * `api/untag/:host/:port?tag=name` 27 | * `api/untag-all/:tagName/:tagValue` 28 | * `api/untag-all?tag=name%3Dvalue` 29 | * `api/tags/:host/:port` 30 | * `api/tag-value/:host/:port/:tagName` 31 | * `api/tag-value/:host/:port?tag=name` 32 | * `api/tagged?tag=name` 33 | * `api/tagged?tag=name%3Dvalue` 34 | * `api/tagged?tag=name%3D` 35 | 36 | ### Tags, general 37 | 一个标签可以是`name=value`的形式, 也可以是`name`的形式, 在这种情况下, 值被隐含地设置为空字符串. `name`可以是以下格式 38 | 39 | * `word` 40 | * `some-other-word` 41 | * `some_word_word_yet` 42 | 43 | 虽然没有严格限制, 但要避免使用特殊字符/标点符号. 44 | 45 | ### Tagging 46 | `-c tag`或`api/tag`向实例添加或替换现有的标签. `orchestrator`不会指示该标签是否事先存在, 也不会提供以前的值(如果有). 47 | 48 | Example: 49 | 50 | ```bash 51 | $ orchestrator-client -c tag -i db-host-01:3306 --tag vttablet_alias=dc1-0123456789 52 | ``` 53 | 在上面的例子中, 我们选择了创建一个名为`vttablet_alias`的标签, 并带有一个值. 54 | 55 | 标签是打在实例进行上的. 实例本身不受此操作的影响. `orchestrator`将标签作为元数据进行维护. 实例不需要可用. 56 | 57 | ### Untagging: single instance 58 | `-c untag`或`api/untag`从一个给定的实例中删除一个标签(如果存在的话). 如果标签确实存在, `orchestrator`会输出实例名称, 如果标签不存在, 则输出空. 59 | 60 | You may tags: 61 | 62 | * 指定标签名称和标签值: 标签只有在等于该值时才被删除 63 | * 只指定标签名称: 标签被删除, 而不考虑其价值 64 | 65 | Example: 66 | 67 | ```bash 68 | $ orchestrator-client -c untag -i db-host-01:3306 --tag vttablet_alias 69 | ``` 70 | ### Untagging: multiple instances 71 | `-c untag-all` 或 `api/untag-all` 会从值匹配的所有实例中删除一个标签. 注意, 必须提供标签值. 72 | 73 | Example: 74 | 75 | ```bash 76 | $ orchestrator-client -c untag-all --tag vttablet_alias=dc1-0123456789 77 | ``` 78 | ### Listing instance tags 79 | 对于一个给定的实例, `-c tags` or `api/tags`列出所有已知的标签. 80 | 81 | Example: 82 | 83 | ```bash 84 | $ orchestrator-client -c tag -i db-host-01:3306 --tag vttablet_alias=dc1-0123456789 85 | $ orchestrator-client -c tag -i db-host-01:3306 --tag old-hardware 86 | 87 | $ orchestrator-client -c tags -i db-host-01:3306 88 | old-hardware= 89 | vttablet_alias=dc1-0123456789 90 | ``` 91 | 列出的标签是按名称排序的. 请注意, 我们添加了不带值的`old-hardware`标签. 它以`old-hardware=`的形式导出, 并隐含空值. 92 | 93 | ### Listing instance tags for a cluster 94 | 对于一个给定的实例或集群别名, `-c topology-tags`或`api/topology-tags`列出了集群拓扑结构与每个实例的所有已知标签. 95 | 96 | Example: 97 | 98 | ```bash 99 | $ orchestrator-client -c tag -i db-host-01:3306 --tag vttablet_alias=dc1-0123456789 100 | $ orchestrator-client -c tag -i db-host-01:3306 --tag old-hardware 101 | 102 | $ orchestrator-client -c topology-tags -alias mycluster 103 | db-host-01:3306 [0s,ok,5.7.23-log,rw,ROW,>>,GTID,P-GTID] [vttablet_alias=dc1-0123456789, old-hardware] 104 | + db-host-02:3306 [0s,ok,5.7.23-log,ro,ROW,>>,GTID,P-GTID] [] 105 | 106 | $ orchestrator-client -c topology-tags -i db-host-01:3306 107 | db-host-01:3306 [0s,ok,5.7.23-log,rw,ROW,>>,GTID,P-GTID] [vttablet_alias=dc1-0123456789, old-hardware] 108 | + db-host-02:3306 [0s,ok,5.7.23-log,ro,ROW,>>,GTID,P-GTID] [] 109 | ``` 110 | ### Getting the value of a specific tag 111 | `-c tag-value` 或 `api/tag-value` 返回一个实例上特定标签的值 112 | 113 | Example: 114 | 115 | ```bash 116 | $ orchestrator-client -c tag -i db-host-01:3306 --tag vttablet_alias=dc1-0123456789 117 | $ orchestrator-client -c tag -i db-host-01:3306 --tag old-hardware 118 | $ orchestrator-client -c tag-value -i db-host-01:3306 --tag vttablet_alias 119 | dc1-0123456789 120 | $ orchestrator-client -c tag-value -i db-host-01:3306 --tag old-hardware 121 | 122 | # 123 | $ orchestrator-client -c tag-value -i db-host-01:3306 --tag no-such-tag 124 | tag no-such-tag not found for db-host-01:3306 125 | # in stderr 126 | ``` 127 | ### Searching instances by tags 128 | `-c tagged` 或 `api/tagged` 按标签列出实例, 如下所示. 129 | 130 | * `-c tagged -tag name=value`: 列出`name`存在且等于`value`的实例. 131 | * `-c tagged -tag name`: 列出存在这个`name` 的实例, 无论`value`是什么. 132 | * `-c tagged -tag name=`: 列出`name`存在且值为空的实例. 133 | * `-c tagged -tag name,role=backup`: 列出包含`name`标签(无论其值如何), 并且也用 `role=backup` 标记的实例. 134 | * `-c tagged -tag !name`: 列出不存在名为 `name` 的标签的实例, 无论其值如何 135 | * `-c tagged -tag ~name`: `~` 是 `!` 的同义词. 136 | * `-c tagged -tag name,~role`: 列出包含`name`标签(无论其值如何), 并且不包含`role`标签(无论其值如何)的实例. 137 | * `-c tagged -tag ~role=backup`: 列出包含`role` 标签, 但`value` 不是`backup` 的实例. 请注意这与 `-c tagged -tag ~role` 有何不同. 后者将首先列出没有`role`标签的实例. 138 | 139 | ### Tags, internal 140 | 标签与实例关联, 但关联是记录在`orchestrator`内部的, 不会影响实际的 MySQL 实例. 标签内部存储在(orchestrator的)后端数据库表中. 标签由 `raft` 协议发布; 由`raft` leader执行的标记操作(`tag`, `untag`)将被应用于`raft` followers. 141 | 142 | ### Use cases 143 | 对标签的需求来自不同用户的不同使用情况. 144 | 145 | * [Vitess](http://github.com/vitess.io/vitess)用户的一个常见用例是需要将一个实例与`vttablet` alias联系起来. 146 | * 用户可能希望根据标签来应用`promotion`逻辑. 虽然`orchestrator`在任何决策中都不使用内部标记, 但用户可以根据标记设置`promotion-rule`, 或应用不同的failover操作. 147 | 148 | 149 | 150 | 151 | 152 | -------------------------------------------------------------------------------- /TOC.md: -------------------------------------------------------------------------------- 1 | # TOC 2 | # [Table of Contents](https://github.com/openark/orchestrator/tree/master/docs#introduction) 3 | #### Introduction 4 | * [About](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Introduction/About.md) 5 | * [License](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Introduction/License.md) 6 | * [Download](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Introduction/Download.md) 7 | * [Requirements](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Introduction/Requirements.md) 8 | 9 | #### Setup 10 | * [安装-Installation](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%83%A8%E7%BD%B2/%E5%AE%89%E8%A3%85-Installation.md): installing the service/binary 11 | * [Configuration](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration.md): breakdown of major configuration variables by topic. 12 | 13 | #### Use 14 | * [Execution](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Execution.md): running the `orchestrator` service. 15 | * [Executing via command line](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Executing%20via%20command%20line.md) 16 | * [Using the Web interface](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Using%20the%20Web%20interface.md) 17 | * [Using the web API](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Using%20the%20web%20API.md): 通过HTTP GET请求实现自动化 18 | * Using [orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md): a no binary/config needed script that wraps API calls 19 | * [Scripting samples](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Scripting%20samples.md) 20 | 21 | #### Deployment 22 | * [Orchestrator高可用](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/Orchestrator高可用.md): making `orchestrator` highly available 23 | * [在生产环境中部署Orchestrator](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/在生产环境中部署Orchestrator.md) 24 | * [shard backend模式部署](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/shard%20backend模式部署.md) 25 | * [raft模式部署](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/raft模式部署.md) 26 | 27 | #### Failure detection & recovery 28 | * [Failure detection](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Failure%20detection.md): how `orchestrator` detects failure, types of failures it can handle 29 | * [Topology recovery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Topology%20recovery.md): recovery process, promotion and hooks. 30 | * [Key-Value stores](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Key-Value%20stores.md): master discovery for your apps 31 | 32 | #### Operation 33 | * [Status Checks](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Operation/Status%20Checks.md) 34 | * [Tags](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Operation/Tags.md) 35 | 36 | #### Developers 37 | * [Understanding CI](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Understanding%20CI.md) 38 | * [Building and testing](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Building%20and%20testing.md) 39 | * [System test environment](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/System%20test%20environment.md) 40 | * [Docker](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Docker.md) 41 | * [Contributions](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Contributions.md) 42 | 43 | #### Various 44 | * [Security](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Security.md) 45 | * [SSL and TLS](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/SSL%20and%20TLS.md) 46 | * [Pseudo GTID](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Pseudo%20GTID.md): refactoring and high availability without using GTID. 47 | * [Agents](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Agents.md) 48 | 49 | #### Meta 50 | * [Risk](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Risk.md) 51 | * [Gotchas](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Gotchas.md) 52 | * [Supported Topologies and Versions](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Supported%20Topologies%20and%20Versions.md) 53 | * [Bugs](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Bugs.md) 54 | * [Who uses Orchestrator?](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Who%20uses%20Orchestrator%20.md) 55 | * [Presentations](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Presentations.md) 56 | 57 | #### Quick guides 58 | * [FAQ](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Quick%20guides/FAQ.md) 59 | * [First Steps](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Quick%20guides/First%20Steps.md), a quick introduction to `orchestrator` 60 | 61 | #### 配置文件参数详解 62 | * [配置参数详解-Ⅰ](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/配置参数详解-Ⅰ.md) 63 | * [配置参数详解-Ⅱ](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/配置参数详解-Ⅱ.md) 64 | * [配置参数详解-Ⅲ](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/配置参数详解-Ⅲ.md) 65 | 66 | #### 命令详解 67 | * [relocate](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/commands/relocate.md) 68 | 69 | #### 源码分析 70 | * [Orchestrator Failover过程源码分析-I](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/Orchestrator%20Failover%E8%BF%87%E7%A8%8B%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90-I.md) 71 | * [Orchestrator Failover过程源码分析-II](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/Orchestrator%20Failover%E8%BF%87%E7%A8%8B%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90-II.md) 72 | * [Orchestrator Failover过程源码分析-III](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/Orchestrator%20Failover%E8%BF%87%E7%A8%8B%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90-III.md) -------------------------------------------------------------------------------- /Deployment/raft模式部署.md: -------------------------------------------------------------------------------- 1 | # raft模式部署 2 | # [Orchestrator deployment: raft](https://github.com/openark/orchestrator/blob/master/docs/deployment-raft.md) 3 | 本文描述了部署[Orchestrator/raft, consensus cluster](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/部署/Orchestrator%20raft%2C%20consensus%20cluster.md)的方法. 4 | 5 | 这篇文的完善了[在生产环境中部署Orchestrator](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/在生产环境中部署Orchestrator.md). 6 | 7 | ### Backend DB 8 | 你可以选择使用MySQL和SQLite. See[configuration-backend](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Backend.md) 9 | 10 | * For MySQL: 11 | * 后端数据库将是独立的. No replication setup. 每个`orchestrator`节点将与自己专用的后端数据库交互. 12 | * You *must* have a `1:1` mapping `orchestrator:MySQL`. 13 | * 建议: 请在同一台机器上运行`orchestrator`及其专用的MySQL数据库. 14 | * 确保在每个后端节点上为`orchestrator`用户`GRANT` privileges. 15 | * For SQLite: 16 | * SQLite与`orchestrator`捆绑在一起. 17 | * 确保`orchestrator`用户对`SQLite3DataFile` 有写入权限. 18 | 19 | ### High availability 20 | 利用`raft`实现了`orchestrator`的高可用性. 您不需要考虑后端数据库的高可用性. 21 | 22 | ### What to deploy: service 23 | * 将`orchestrator`服务部署到service boxes上. 正如所建议的, 您可能希望将`orchestrator`服务和`MySQL`服务放在同一个机器上. If using `SQLite` there's nothing else to do. 24 | * 考虑在服务盒(service boxes)之上增加一个代理(proxy); 代理将把所有流量重定向到leader node(这里指的是`orchestrator` 服务leader节点). 有一个而且只有一个领导者节点, 状态检查的端点是`/api/leader-check` . 25 | * 客户端将只与健康的raft节点交互. 26 | * 最简单的方法就是只与leader互动. 设置代理proxy是确保这一点的一种方法. See [Proxy: leader](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%83%A8%E7%BD%B2/Orchestrator%20raft%2C%20consensus%20cluster.md#proxy-leader) . 27 | * 否则,所有健康的raft节点将反向代理您的请求到leader. See [Proxy: healthy raft nodes](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%83%A8%E7%BD%B2/Orchestrator%20raft%2C%20consensus%20cluster.md#proxy-healthy-raft-nodes) . 28 | * 任何东西都不应该直接与后端DB交互. 只有leader有能力与其他raft节点协调对数据的更改. 29 | * `orchestrator`节点之间通过`DefaultRaftPort`进行通信. 该端口应该对所有`orchestrator`节点开放, 其他任何人都不需要访问该端口. 30 | 31 | ### What to deploy: client 32 | 为了从shell/automation/scripts与`orchestrator`进行交互, 你可以选择: 33 | 34 | * 直接与HTTP API交互 35 | * 你只能和leader互动. 实现这一点的一个好方法是使用代理. 36 | * 使用[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md)脚本([orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md)本质是一个shell脚本). 37 | * 将`orchestrator-client`部署在你希望与`orchestrator`交互的任何盒子上. 38 | * Create and edit `/etc/profile.d/orchestrator-client.sh` on those boxes to read: 39 | 40 | ```bash 41 | ORCHESTRATOR_API="http://your.orchestrator.service.proxy:80/api" 42 | ``` 43 | or 44 | 45 | ```bash 46 | ORCHESTRATOR_API="http://your.orchestrator.service.host1:3000/api http://your.orchestrator.service.host2:3000/api http://your.orchestrator.service.host3:3000/api" 47 | ``` 48 | 在后一种情况下, 你将提供所有`orchestrator`节点的列表, 而`orchetsrator-client`脚本将自动计算出哪个是leader. 通过这种设置, 你的自动化将不需要代理(尽管你可能仍然希望为Web界面用户使用代理). 49 | 50 | 确保 chef/puppet/whatever 的 `ORCHESTRATOR_API` 值能够适应环境的变化. 51 | 52 | * `orchestrator`cli将拒绝运行给定的raft设置, 因为它直接与底层数据库交互, 不参与raft的共识, 因此不能确保所有raft成员都能看到它的变化 53 | * 幸运的是, `orchestrator-client`提供了一个与命令行客户端(指的是orchestrator命令)几乎相同的界面(provides an almost identical interface as the command line client). 54 | * 你可以通过`--ignore-raft-setup` 强制`orchestrator`cli运行(raft设置). 前提是"你知道你在做什么", 清楚这样操作的风险. 如果你确定要这样做, 那么连接到leader的后端DB更有意义. 55 | 56 | ### Orchestrator service 57 | 如前所述, 只有一个`orchestrator`节点将被选为领导者. 只有领导者会: 58 | 59 | * 运行故障恢复 60 | 61 | 所有节点都: 62 | * 发现(探测)你的MySQL拓扑结构 63 | * 运行故障检测 64 | 65 | * Register their own health check 注册自己的健康检查 66 | 67 | 非leader节点一定不能: 68 | 69 | * Run arbitrary command (e.g. `relocate`, `begin-downtime`) (运行任意命令?) 70 | 71 | * Run recoveries per human request. (按人的要求运行恢复) 72 | 73 | * 提供HTTP请求(but some endpoints, such as load-balancer and health checks, are valid). 74 | 75 | ### A visual example 76 | ![image](images/ENBunJzMa15CJC0xwniCzZQ-VRvV-WZ2IOUQKhFGVvw.png) 77 | 78 | 如上图所示, 三个`orchestrator` 组成一个raft cluster, 每个`orchestrator` 节点使用自己的专用数据库(`MySQL`或`SQLite`) 79 | 80 | `orchestrator` 节点之间会进行通信. 81 | 82 | 只有一个`orchestrator` 节点会成为leader. 83 | 84 | 所有`orchestrator`节点探测整个`MySQL`舰队. 每个`MySQL`server都被每个raft成员探测. 85 | 86 | ### orchestrator/raft operational scenarios 操作场景 87 | ##### A node crashes 一个节点崩溃(如何恢复): 88 | 启动节点, 启动MySQL服务, 启动`orchestrator`服务. `orchestrator`服务应该加入`raft`组, 获得最近的快照, 赶上raft复制日志并继续正常运行. 89 | 90 | > Start the node, start the `MySQL` service if applicable, start the `orchestrator` service. The `orchestrator` service should join the `raft` group, get a recent snapshot, catch up with `raft` replication log and continue as normal. 91 | 92 | ##### A new node is provisioned / a node is re-provisioned 分配一个新节点/重新分配一个节点: 93 | Such that the backend database is completely empty/missing. 分配/重新分配时后端数据库其实是没有数据或者丢失数据的 94 | 95 | * 如果后端数据库是`SQLite`, 则什么也不需要做. 该节点会加入raft group, 从一个active node获得snapshot, 追raft log然后正常运行. 96 | * 如果后端数据库是`MySQL` , 也会尝试像上面那样做. 然而, `orchestrator` 需要有权限操作`MySQL` ,详见[MySQL backend DB setup](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Backend.md#mysql-backend-db-setup) . 因此, 如果这是一个全新的服务器, 这些权限可能不存在. 例如, 我们的`puppet`设置会定期确保在我们的MySQL服务器上设置权限. 因此, 当新服务器被配置时, 下一次`puppet`运行会为`orchestrator`设置权限. `puppet`还确保`orchestrator`服务正在运行, 因此, 在一段时间内, `orchestrator`可以自动加入组. 97 | 98 | ##### Cloning is valid 99 | 如果你选择这样做, 你也可以通过使用你喜欢的备份/恢复或转储/加载方法克隆你现有的后台数据库来配置新的盒子. 100 | 101 | 这是完全有效的, 尽管不是必须的. 102 | 103 | * 如果后端数据库是`MySQL` , 运行备份/恢复, 逻辑物理都行. 104 | * 如果后端数据库是`SQLite` , 运行`.dump` + 恢复, see [10. Converting An Entire Database To An ASCII Text File](https://sqlite.org/cli.html). 105 | * 启动`orchestrator` 服务. 它应该赶上`raft`复制日志并加入`raft`集群. 106 | 107 | ##### Replacing a node 108 | 假设 `RaftNodes: ["node1", "node2", "node3"]` , 你希望使用`nodeX` 替换`node3` . 109 | 110 | * 你可以关闭`node3` , 只要`node1` 和`node2` 是正常的, raft cluster就能正常运行. 111 | * 创建`nodeX` , 生成后端数据库数据(见上文). 112 | * 在`node1` , `node2` 和`nodeX` 重新配置`RaftNodes: ["node1", "node2", "nodeX"]` . 113 | * 在`nodeX` 启动`orchestrator` . 它将被拒绝, 不会加入集群, 因为`node1`和`node2`还没有意识到这个变化. 114 | * 在`node1` 重启`orchestrator` . 115 | * 在`node2` 重启`orchestrator` . 116 | * 这时, 所有三个节点应该形成一个快乐的集群 117 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # orchestrator-zh-doc 2 | 英语不好, 语文不好, 凑合看吧. 3 | 4 | 允许转载, 但请标明出处. 本项目所有文档遵循CC BY 4.0协议 5 | 6 | For more information, please see 7 | 8 | 9 | 10 | 11 | 12 | 13 | # TOC 14 | # [Table of Contents](https://github.com/Fanduzi/orchestrator-zh-doc#introduction) 15 | #### Introduction 16 | * [About](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Introduction/About.md) 17 | * [License](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Introduction/License.md) 18 | * [Download](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Introduction/Download.md) 19 | * [Requirements](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Introduction/Requirements.md) 20 | 21 | #### Setup 22 | * [安装-Installation](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%83%A8%E7%BD%B2/%E5%AE%89%E8%A3%85-Installation.md): installing the service/binary 23 | * [Configuration](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration.md): breakdown of major configuration variables by topic. 24 | 25 | #### Use 26 | * [Execution](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Execution.md): running the `orchestrator` service. 27 | * [Executing via command line](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Executing%20via%20command%20line.md) 28 | * [Using the Web interface](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Using%20the%20Web%20interface.md) 29 | * [Using the web API](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Using%20the%20web%20API.md): 通过HTTP GET请求实现自动化 30 | * Using [orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md): a no binary/config needed script that wraps API calls 31 | * [Scripting samples](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/Scripting%20samples.md) 32 | 33 | #### Deployment 34 | * [Orchestrator高可用](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/Orchestrator高可用.md): making `orchestrator` highly available 35 | * [在生产环境中部署Orchestrator](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/在生产环境中部署Orchestrator.md) 36 | * [shard backend模式部署](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/shard%20backend模式部署.md) 37 | * [raft模式部署](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/raft模式部署.md) 38 | 39 | #### Failure detection & recovery 40 | * [Failure detection](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Failure%20detection.md): how `orchestrator` detects failure, types of failures it can handle 41 | * [Topology recovery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Topology%20recovery.md): recovery process, promotion and hooks. 42 | * [Key-Value stores](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Key-Value%20stores.md): master discovery for your apps 43 | 44 | #### Operation 45 | * [Status Checks](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Operation/Status%20Checks.md) 46 | * [Tags](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Operation/Tags.md) 47 | 48 | #### Developers 49 | * [Understanding CI](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Understanding%20CI.md) 50 | * [Building and testing](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Building%20and%20testing.md) 51 | * [System test environment](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/System%20test%20environment.md) 52 | * [Docker](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Docker.md) 53 | * [Contributions](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Developers/Contributions.md) 54 | 55 | #### Various 56 | * [Security](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Security.md) 57 | * [SSL and TLS](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/SSL%20and%20TLS.md) 58 | * [Pseudo GTID](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Pseudo%20GTID.md): refactoring and high availability without using GTID. 59 | * [Agents](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Agents.md) 60 | 61 | #### Meta 62 | * [Risk](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Risk.md) 63 | * [Gotchas](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Gotchas.md) 64 | * [Supported Topologies and Versions](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Supported%20Topologies%20and%20Versions.md) 65 | * [Bugs](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Bugs.md) 66 | * [Who uses Orchestrator?](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Who%20uses%20Orchestrator%20.md) 67 | * [Presentations](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Meta/Presentations.md) 68 | 69 | #### Quick guides 70 | * [FAQ](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Quick%20guides/FAQ.md) 71 | * [First Steps](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Quick%20guides/First%20Steps.md), a quick introduction to `orchestrator` 72 | 73 | 74 | #### 配置文件参数详解 75 | * [配置参数详解-Ⅰ](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/配置参数详解-Ⅰ.md) 76 | * [配置参数详解-Ⅱ](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/配置参数详解-Ⅱ.md) 77 | * [配置参数详解-Ⅲ](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/配置参数详解-Ⅲ.md) 78 | 79 | 80 | #### 命令详解 81 | * [relocate](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/commands/relocate.md) 82 | 83 | #### 源码分析 84 | * [Orchestrator Failover过程源码分析-I](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/Orchestrator%20Failover%E8%BF%87%E7%A8%8B%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90-I.md) 85 | * [Orchestrator Failover过程源码分析-II](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/Orchestrator%20Failover%E8%BF%87%E7%A8%8B%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90-II.md) 86 | * [Orchestrator Failover过程源码分析-III](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/Orchestrator%20Failover%E8%BF%87%E7%A8%8B%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90-III.md) 87 | -------------------------------------------------------------------------------- /Deployment/在生产环境中部署Orchestrator.md: -------------------------------------------------------------------------------- 1 | # 在生产环境中部署Orchestrator 2 | # [Orchestrator deployment in production](https://github.com/openark/orchestrator/blob/master/docs/deployment.md) 3 | `orchestrator`的部署是什么样子的?您需要在`puppet/chef`中设置什么?哪些服务需要运行, 在哪里运行? 4 | 5 | ## 部署服务和客户端 6 | 你首先要决定是在一个共享的后端数据库上运行`orchestrator` , 还是用raft协议. 7 | 8 | > You will first decide whether you want to run `orchestrator` on a shared backend DB or with a `raft` setup 9 | 10 | 参见[Orchestrator高可用](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/Orchestrator高可用.md),以及[orchestrator/raft vs. synchronous replication setup](Setup/部署/orchestrator%20raft%20vs.%20synchronous%20replication%20setup.md)的比较和讨论. 11 | 12 | 遵循这些部署指南: 13 | 14 | * 在共享后端数据库上部署`orchestrator` : [shard backend模式部署](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/shard%20backend模式部署.md) 15 | * 通过raft共识部署`orchestrator` : [raft模式部署](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/raft模式部署.md) 16 | 17 | ## Next steps 18 | `orchestrator`在动态环境中工作得很好, 并适应inventory、配置和拓扑的变化. 它的动态特性表明, 环境也应该在动态特性中与它相互作用. 与硬编码的配置不同, `orchestrator`乐于接受动态提示和请求, 这些提示和请求可以更改其对拓扑结构的看法. 部署好`orchestrator`服务和客户端后, 请考虑执行以下操作以充分利用这一点. 19 | 20 | ### Discovering topologies 拓扑发现 21 | `orchestrator`自动发现加入拓扑结构的实例. 如果一个新的副本加入了一个由`orchestrator`监控的现有集群, 那么当`orchestrator`下一次探测集群主库时就会发现它. 22 | 23 | 那么, `orchestrator` 是如何发现全新的拓扑结构的呢? 24 | 25 | * 你可以要求`orchestrator`发现(探测)这样一个拓扑中的任何一个服务器, 然后从那里开始, 它将在整个拓扑中爬行. 26 | > 译者注: 连接这个节点后, 通过processlist和show slave status, 顺藤摸瓜, 发现整个拓扑结构 27 | * Or you may choose to just let `orchestrator` know about any single production server you have, routinely. Set up a cronjob on any production `MySQL` server to read: 28 | 29 | ```bash 30 | 0 0 * * * root "/usr/bin/perl -le 'sleep rand 600' && /usr/bin/orchestrator-client -c discover -i this.hostname.com" 31 | ``` 32 | 在上述情况下, 每个主机每天让`orchestrator`了解自己一次; 新启动的主机在第二天午夜被发现. 引入睡眠是为了避免所有服务器在同一时间对`orchestrator`造成冲击. 33 | 34 | 以上使用的是[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md), 但如果是使用[shard backend模式部署](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Deployment/shard%20backend模式部署.md), 你可以使用orchestrator cli. 35 | 36 | ### Adding promotion rules 37 | 在发生故障转移时, 有些server更适合提升为leader. 有些server则不是好选择. 例子: 38 | 39 | * 硬件配置较差的server. 你不希望他被选为new leader. 40 | * 在另一个数据中心的server. 你不希望他被选为new leader. 41 | * 作为备份节点的server: 如会运行LVM快照备份, mysqldump, xtrabackup等等. 你不希望他被选为new leader. 42 | * 一个配置较好的server, 是理想的候选人. 你希望他被选为new leader. 43 | * 任何一个状态正常的server, 你没有特别的选择意见. 44 | 45 | 您将以下列方式向`orchestrator`协调者宣布您对某一特定服务器的偏爱: 46 | 47 | ```bash 48 | orchestrator -c register-candidate -i ${::fqdn} --promotion-rule ${promotion_rule} 49 | ``` 50 | 支持的promotion rules为: 51 | 52 | * `prefer` 53 | * `neutral` 中立 54 | * `prefer_not` 55 | * `must_not` 56 | 57 | Promotion rules在一小时后失效. That's the dynamic nature of `orchestrator`. 你需要设置一个cron作业来宣布服务器的promotion rule: 58 | 59 | ```bash 60 | */2 * * * * root "/usr/bin/perl -le 'sleep rand 10' && /usr/bin/orchestrator-client -c register-candidate -i this.hostname.com --promotion-rule prefer" 61 | ``` 62 | 此设置来自生产环境. cron entries通过`puppet` 更新来设置新的`promotion_rule` . 一个server可能现在是`prefer`的, 但5分钟后就是`prefer_not` 63 | 64 | > 这取决于你们公司自己的选主逻辑, 如, prefer的服务器要是是例行维护了, 那么就要更改`promotion_rule` 65 | 66 | 你可以整合你自己的服务发现方法、你自己的脚本, 以提供你最新的`promotion_rule` 67 | 68 | 69 | 70 | ### Downtiming 71 | 当一个server出现问题, 它: 72 | 73 | * 将在web页面中的`problems` 下拉菜单中展示. 74 | 75 | * May be considered for recovery (example: server is dead and all of it's replicas are now broken). 76 | 77 | 你可以主动`下线` 一个server: 78 | 79 | * 它不会出现在`problems` 下拉菜单中 80 | 81 | * It will not be considered for recovery. 82 | 83 | 84 | Downtiming takes place via: 85 | 86 | ```bash 87 | orchestrator-client -c begin-downtime -duration 30m -reason "testing" -owner myself 88 | ``` 89 | 有些server可能经常被破坏; 例如, auto-restore servers; dev boxes; testing boxes. 对于这样的服务器, 您可能希望持续停机(*continuous* downtime). 实现这一点的一种方法是设置一个很大的`-duration 240000h` . 但是,如果有什么变化,你需要记住结束停机时间(`end-downtime`). Continuing the dynamic approach, consider: 90 | 91 | ```bash 92 | */2 * * * * root "/usr/bin/perl -le 'sleep rand 10' && /data/orchestrator/current/bin/orchestrator -c begin-downtime -i ${::fqdn} --duration=5m --owner=cron --reason=continuous_downtime" 93 | ``` 94 | 每`2`分钟,停机`5`分钟; 这意味着, 当我们取消cronjob时, 停机时间将在`5`分钟内到期. 95 | 96 | 上面展示的是`orchestrator-client` 和`orchestrator` cli的用法. 为了完整起见, 下面展示如何通过直接调用API来完成相同的操作: 97 | 98 | ```bash 99 | $ curl -s "http://my.orchestrator.service:80/api/begin-downtime/my.hostname/3306/wallace/experimenting+failover/45m" 100 | ``` 101 | `orchestrator-client`脚本正是运行这个API调用, 将其包装起来并对URL路径进行编码. 它还可以自动检测leader, 以防你不想通过代理运行. 102 | 103 | ### Pseudo-GTID 104 | 如果你没有使用GTID, 你会很高兴知道`orchestrator`可以利用Pseudo-GTID来实现与GTID类似的好处, 例如将两个不相关的服务器关联起来, 使一个从另一个复制. This implies master and intermediate master failovers. 105 | 106 | 更多信息请参阅[Pseudo GTID](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Pseudo%20GTID.md). 107 | 108 | `orchestrator` 可以为你注入 Pseudo-GTID条目. 你的集群将神奇地拥有类似GTID的超能力. 遵循[Automated Pseudo-GTID injection](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Discovery%2C%20Pseudo-GTID.md#automated-pseudo-gtid-injection) 109 | 110 | ### Populating meta data 填充元数据 111 | `orchestrator` 会从server中提取一些元数据: 112 | 113 | * 这个实例所属的集群的别名是什么? 114 | * 这个服务器所属的数据中心是什么? 115 | * 这个server开启半同步了吗? 116 | 117 | 这些细节是通过下列查询获取的: 118 | 119 | * `DetectClusterAliasQuery` 120 | * `DetectClusterDomainQuery` 121 | * `DetectDataCenterQuery` 122 | * `DetectSemiSyncEnforcedQuery` 123 | 或通过正则表达式作用于主机名: 124 | * `DataCenterPattern` 125 | * `PhysicalEnvironmentPattern` 126 | 127 | 查询可以通过将数据注入你的主表的元数据表来满足. 例如, 你可以: 128 | 129 | ```sql 130 | CREATE TABLE IF NOT EXISTS cluster ( 131 | anchor TINYINT NOT NULL, 132 | cluster_name VARCHAR(128) CHARSET ascii NOT NULL DEFAULT '', 133 | PRIMARY KEY (anchor) 134 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 135 | ``` 136 | 然后用类似"1, my\_cluster\_name"的数据填充该表. 并使用伴随类似下面的查询去获取元数据. 137 | 138 | ```sql 139 | { 140 | "DetectClusterAliasQuery": "select cluster_name from meta.cluster where anchor=1" 141 | } 142 | ``` 143 | 请注意`orchestrator`不会创建这样的表, 也不会填充它们. 你需要创建表, 填充它们, 并让orchestrator知道如何查询数据. 144 | 145 | ### Tagging 标签 146 | `orchestrator`支持对实例打标签, 以及通过标签搜索实例. See[Tags](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Operation/Tags.md) 147 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | ### 知识共享 (Creative Commons) 署名 4.0公共许可协议国际版 2 | 3 | 通过行使本协议所授予的权利(定义如下),您接受并同意受到知识共享(Creative Commons)署名 4.0国际公共许可协议(以下简称“本公共许可协议”)的约束。从合同解释的角度来看,您获得授权的对价是接受本协议的条款,许可人授予您这些权利的对价是可以通过采用本协议条款发布授权作品(material)而获得利益。 4 | 5 | **第一条 定义** 6 | 7 | 1. **演绎作品(Adapted Material):** 指受到著作权与类似权利保护的,基于授权作品(Licensed Material)而创作的作品(material),例如对授权作品(Licensed Material)的翻译、改编、编排、改写或其他依据著作权与类似权利需要获得所有人许可的修改。为本公共许可协议之目的,当授权作品(Licensed Material)为音乐作品、表演或录音时,将其依时间序列关系与动态影像配合一致而形成的作品,视为演绎作品(Adapted Material)。 8 | 2. **演绎作者的许可:** 指您依据本公共许可协议对在演绎作品(Adapted Material)中自己所贡献的部分所享有的著作权与类似权利进行授权的协议。 9 | 3. **著作权与类似权利:** 指著作权和/或与著作权紧密联系的类似权利。类似权利包括但不限于:表演者权、广播组织权、录音录像制作者权、以及数据库特别权利,而不论上述权利的定义和归类如何。为本公共许可协议之目的, [第二条b款第(1)项与第(2)项](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s2b) 所列权利不属于著作权与类似权利。 10 | 4. **有效的技术措施:** 指根据各司法管辖区遵循《世界知识产权组织版权条约》(1996年12月20日通过)第十一条或类似国际协定项下的义务所制定的法律,在没有适当的授权的情况下,禁止使用者规避的技术措施。 11 | 5. **例外与限制:** 指合理使用(Fair Dealing and Fair Use)和/或其他适用于您对授权作品(Licensed Material)的使用的著作权与类似权利的例外或限制。 12 | 6. **授权作品(Licensed Material):** 指许可人通过本公共许可协议授权的文学、艺术作品(artistic or literary work),数据库或其他作品(material)。 13 | 7. **协议所授予的权利:** 指依据本公共许可协议的条款和条件所授予您的各项权利,限于适用于您对授权作品(Licensed Material)的使用且许可人有权许可的著作权与类似权利。 14 | 8. **许可人:** 指通过本公共许可协议进行授权的个人或组织。 15 | 9. **分享:** 指以需要“协议所授予的权利”许可的任何方法或程序向公众提供作品(material),包括复制、公共展示、公开表演、发行、散布、传播、进口或提供作品(material)给公众以便其能在其选定的时间和地点接收作品(material)。 16 | 10. **数据库特别权利:** 指除了著作权之外,衍生于1996年3月11日通过的《欧洲议会与欧盟理事会关于数据库法律保护的指令》(Directive 96/9/EC)及其修改或后续版本的权利,或其他国家或地区本质上与之等同的权利。 17 | 11. **您:** 指依据本公共许可协议行使其所获得授予之权利的个人或机构。 **“您的”** 有相应的含义。 18 | 19 | **第二条 授权范围** 20 | 21 | 1. 授权 22 | 1. 根据本公共许可协议的条款,许可人授予您在全球范围内,免费的、不可再许可、非独占、不可撤销的许可,以对授权作品(Licensed Material)行使以下“协议所授予的权利”: 23 | 1. 复制和分享授权作品(Licensed Material)的全部或部分;以及 24 | 2. 创作、复制和分享演绎作品(Adapted Material)。 25 | 2. 例外和限制 为避免疑义,若著作权的例外和限制适用于您对授权作品(Licensed Material)的使用,本公共许可协议将不适用,您也无须遵守本公共许可协议之条款。 26 | 3. 期限 本公共许可协议的期限规定于[第六条 a](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s6a) 款。 27 | 4. 媒介和形式;允许的技术修改 许可人授权您在任何媒介以任何形式(不论目前已知的或未来出现的)行使本协议授予的权利,并为之进行必要的技术修改。许可人放弃和/或同意不主张任何权利以阻止您为了行使协议项下权利进行必要的技术修改,包括为规避有效技术措施所必须的技术修改。为了本公共许可协议之目的, 基于[第二条a款第(4)项](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s2a4) 进行的技术修改不构成演绎作品(Adapted Material)。 28 | 5. 后续接受者 29 | 1. 来自许可人的要约——授权作品(Licensed Material) 本授权作品(Licensed Material)的每一个后续接受者都自动取得许可人的要约,以按照本公共许可协议的条款行使协议授予的权利。 30 | 2. 禁止下游限制 若会限制授权作品(Licensed Material)后续接受者行使本协议所授予的权利,则您不得对授权作品(Licensed Material)提出或增加任何额外的或不同的条款,或使用任何有效技术措施。 31 | 6. 并非背书 本公共许可协议不构成、或不得被解释为允许您声明或主张:您或您对授权作品(Licensed Material)的使用与许可人或 [第三条a款第(1)项(A)目(i)](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s3a1Ai)所规定要求提供署名的权利人相关联,或得到其赞助、同意或被授予正式地位。 32 | 2. **其他权利** 33 | 1. 依据本公共许可协议,著作人身权,例如保护作品完整权、形象权、隐私权或其他类似的人格权利,不在许可范围内。但是,在条件允许的情况下,许可人可以在必要范围内放弃和/或同意不主张其权利,以便您行使本协议所授予的权利。 34 | 2. 本公共许可协议不适用于任何专利权或商标权许可。 35 | 3. 在自愿的或可放弃的法定或强制许可机制下,许可人在最大可能范围内放弃对您因行使本协议所授予的权利而产生的使用费的权利,不论是直接收取或通过集体管理组织收取。在其他任何情况下,许可人明确保留收取使用费的任何权利。 36 | 37 | **第三条 授权条件** 38 | 39 | 您行使被许可的权利明确受以下条件限制: 40 | 41 | 1. **署名** 42 | 1. 若您分享本授权作品(Licensed Material)(包含修改格式),您必须: 43 | 1. 保留如下标识(如果许可人提供授权作品(Licensed Material)的同时提供如下标识): 44 | 1. 以许可人要求的任何合理方式,标识本授权作品(Licensed Material)创作者和其他被指定署名的人的身份(包括指定的笔名); 45 | 2. 著作权声明; 46 | 3. 有关本公共许可协议的声明; 47 | 4. 有关免责的声明; 48 | 5. 在合理可行情况下,本授权作品(Licensed Material)的网址(URI)或超链接; 49 | 2. 表明您是否修改本授权作品(Licensed Material)及保留任何先前修改的标记;及 50 | 3. 表明授权作品(Licensed Material)依据本公共许可协议授权,并提供本公共许可协议全文,或者本公共许可协议的网址(URI)或超链接。 51 | 2. 依据您分享本授权作品(Licensed Material)的媒介、方法及情況,您可以采用任何合理方式满足[第三条a款第(1)项](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s3a1)的条件 。 例如,提供包含所要求信息来源的网址(URI)或超链接可算是合理地满足此处的条件。 52 | 3. 如果许可人要求,您必须在合理可行的范围内移除[第三条a款第(1)项(A)目](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s3a1A) 所要求的任何信息。 53 | 4. 如果您分享您创作的演绎作品(Adapted Material),您适用的“演绎作者的许可”协议,不得使演绎作品(Adapted Material)的接收者无法遵守本公共许可协议。 54 | 55 | **第四条 数据库特别权利** 56 | 57 | 当协议所授予的权利包含数据库特别权利,而该数据库特别权利适用于您对授权作品(Licensed Material)的使用时: 58 | 59 | 1. 为避免疑义, [第二条a款第(1) 项](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s2a1)授权您,摘录、再利用、复制和分享全部或绝大部分数据库资料; 60 | 2. 如果您将数据库资料的全部或绝大部分纳入您享有数据库特别权利的另一数据库,则您享有数据库特别权利的该数据库(而非其中的单个内容)视为演绎作品(Adapted Material); 61 | 3. 如果您分享全部或大部分该数据库的资料,您必须遵守 [第三条a款](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s3a) 规定的条件。 62 | 63 | 为避免疑义,当协议所授予的权利包含其他著作权与类似权利时,[第四条](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s4)补充且不取代本公共许可协议所规定的您的义务。 64 | 65 | **第五条 免责声明及责任限制条款** 66 | 67 | 1. **除非许可人另有保证,否则在最大可能范围内,许可人按其现状和现有之基础提供授权作品(Licensed Material),且没有就授权作品(Licensed Material)做出任何形式的陈述或保证:无论明示、默示、法定或其他形式,包括但不限于任何有关本授权作品(Licensed Material)的权属保证、可交易性、适于特定目的、未侵害他人权利、没有潜在或其他瑕疵、精确性或是否有错误,不管是否已知或可发现。当免责声明全部或部分不被允许时,此免责声明可能不适用于您。** 68 | 2. **在最大可能范围内, 对于任何因本公共许可协议或使用授权作品(Licensed Material)引起的直接的、特殊的、间接的、附随的、连带的、惩罚性的、警告性的,或其他的损失、成本、费用或损害,许可人不对您负任何法律上或其他的责任(包括但不限于过失责任)。当责任限制部分或全部不被允许时,该限制不适用于您。** 69 | 70 | 1. 前述免责及责任限制声明,应尽可能以最接近于完全排除全部责任的方式解释。 71 | 72 | **第六条 期限与终止** 73 | 74 | 1. 本公共许可协议在著作权与类似权利存续期间内有效。然而,如果您没有遵守此公共许可协议,则您依据此公共许可协议享有的权利自动终止。 75 | 76 | 2. 当您使用本授权作品(Licensed Material)的权利根据[第六条a款](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s6a)终止时,您的权利在下述情况下恢复: 77 | 78 | 1. 自违反协议的行为纠正之日起自动恢复,但须在您发现违反情形后30日内纠正;或 79 | 2. 根据许可人明示恢复权利的意思表达。 80 | 81 | 为避免疑义,本公共许可协议 82 | 83 | 第六条b款 84 | 85 | 86 | 87 | 不影响许可人就您违反本公共许可协议的行为寻求法律救济。 88 | 89 | 3. 为避免疑义,许可人也可在任何时间,以另外的条款或条件提供本授权作品(Licensed Material),或者停止传播本授权作品(Licensed Material);然而,许可人此种行为不会终止本公共许可协议。 90 | 91 | 4. 本协议[第一](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s1)、[五](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s5)、[六](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s6)、[七](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s7)及第[八](https://creativecommons.org/licenses/by/4.0/legalcode.zh-Hans#s8)条,不因本公共许可协议终止而失效。 92 | 93 | **第七条 其他条款和条件** 94 | 95 | 1. 除非明示同意,否则许可人不受您表达的任何附加或不同条款或条件约束。 96 | 2. 本公共许可协议未提及的关于授权作品(Licensed Material)之任何安排、共识或协议,不属于且独立于本公共许可协议的条款及条件。 97 | 98 | **第八条 解释** 99 | 100 | 1. 为避免疑义,本许可协议不会也不应被解释为减少、限制、约束或施加条件于无需本公共许可协议授权即可依法行使的对授权作品(Licensed Material)的任何使用。 101 | 2. 在最大可能范围内,如果本公共许可协议的任何条款被视为无法执行,该条款在必要的最小限度内,自动调整至可以执行。如果该条款不能被调整,其应自本公共许可协议中排除适用,不影响其余条款的效力。 102 | 3. 除非许可人明示同意,本公共许可协议的任何条款或条件均不得放弃。 103 | 4. 本公共许可协议条款不构成、也不得被解释为限制或者放弃适用于许可人或您的特权或豁免,包括豁免于任何司法管辖区或行政机构的法律程序。 104 | -------------------------------------------------------------------------------- /Setup/配置/Configuration Discovery, classifying servers.md: -------------------------------------------------------------------------------- 1 | # Configuration: Discovery, classifying servers 2 | # [Configuration: discovery, classifying servers](https://github.com/openark/orchestrator/blob/master/docs/configuration-discovery-classifying.md) 3 | 配置`orchestrator` 如何获取集群名称, 数据中心等信息. 4 | 5 | ```yaml 6 | { 7 | "ReplicationLagQuery": "select absolute_lag from meta.heartbeat_view", 8 | "DetectClusterAliasQuery": "select ifnull(max(cluster_name), '') as cluster_alias from meta.cluster where anchor=1", 9 | "DetectClusterDomainQuery": "select ifnull(max(cluster_domain), '') as cluster_domain from meta.cluster where anchor=1", 10 | "DataCenterPattern": "", 11 | "DetectDataCenterQuery": "select substring_index(substring_index(@@hostname, '-',3), '-', -1) as dc", 12 | "PhysicalEnvironmentPattern": "", 13 | "DetectSemiSyncEnforcedQuery": "" 14 | } 15 | ``` 16 | ### Replication lag 17 | 默认情况下, `orchestrator`使用`SHOW SLAVE STATUS` 来监控复制延迟(粒度为秒). 但这种延迟监控策略没有考虑链式复制情况. 许多人使用自定义的心跳机制, 如 `pt-heartbeat`. 这种方式可以获取绝对的延迟数值, 粒度可以达到亚秒级. 18 | 19 | 通过`ReplicationLagQuery` 参数, 你可以定义获取复制延迟的语句 20 | 21 | ### Cluster alias 22 | 在实际工作中, 集群名字往往是人为定义的, 例如: "Main", "Analytics", "Shard031" 等. 然而, MySQL集群本身不知道自己的集群名称是什么. 23 | 24 | `DetectClusterAliasQuery` 定一个查询, `orchestrator` 借此查询语句获取集群名称. 25 | 26 | 这个名字很重要. 你可能会用它来告诉`orchestrator`: "请自动恢复这个集群", 或者 "这个集群中的所有参与实例是什么". 27 | 28 | 为了能够获取集群名称, 一个技巧是在`meta`库中创建一个表: 29 | 30 | ```sql 31 | CREATE TABLE IF NOT EXISTS cluster ( 32 | anchor TINYINT NOT NULL, 33 | cluster_name VARCHAR(128) CHARSET ascii NOT NULL DEFAULT '', 34 | cluster_domain VARCHAR(128) CHARSET ascii NOT NULL DEFAULT '', 35 | PRIMARY KEY (anchor) 36 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 37 | ``` 38 | ...并按如下方式填充数据(例如,通过puppet/cron): 39 | 40 | ```sql 41 | mysql meta -e "INSERT INTO cluster (anchor, cluster_name, cluster_domain) VALUES (1, '${cluster_name}', '${cluster_domain}') ON DUPLICATE KEY UPDATE cluster_name=VALUES(cluster_name), cluster_domain=VALUES(cluster_domain)" 42 | ``` 43 | 也许你们公司MySQL服务器的主机名包含集群名(比如都是单机单实例部署, 主机名是 cluster1-xxx). 那么你也可以通过`@@hostname` 取获取集群名称. 44 | 45 | > 就是说, 集群名字, 怎么搞都行, 具体看你们公司的情况. 大不了就自己定义一个表, 让`orchestrator` 通过查询这个表获取集群名称 46 | 47 | ### Data center 48 | `orchestrator`是数据中心感知的. 它不仅会在Web界面上很好地给它们着色; 而且在运行故障转移时, 它还会考虑到DC. 49 | 50 | 您将通过以下两种方法之一配置数据中心感知: 51 | 52 | * `DataCenterPattern`: 用于fqdn的正则表达式. 例如. `"db-.*?-.*?[.](.*?)[.].myservice[.]com"` 53 | * `DetectDataCenterQuery`: 一个返回数据中心名称的查询语句 54 | 55 | ### Cluster domain 56 | 较为次要的是, 主要是为了可见性, `DetectClusterDomainQuery`应该返回VIP或CNAME或其他信息来指代集群主库的地址. 57 | 58 | ### Semi-sync topology 59 | 在某些环境中, 不仅要控制半同步复制的数量, 而且要控制一个复制是半同步还是异步复制. `orchestrator`可以检测到不想要的半同步配置, 并切换半同步标志`rpl_semi_sync_slave_enabled`和`rpl_semi_sync_master_enabled`来纠正这种情况. 60 | 61 | #### Semi-sync master (`rpl_semi_sync_master_enabled`) 62 | 如果新主库的`DetectSemiSyncEnforcedQuery`值大于0, 则`orchestrator`在主库故障切换期间(e.g. `DeadMaster)`会打开新主库的`rpl_semi_sync_master_enabled`参数. 如果主库标志以其他方式改变或错误设置, `orchestrator`不会触发任何恢复. 63 | 64 | >  `orchestrator` enables the semi-sync master flag during a master failover (e.g. `DeadMaster`) if `DetectSemiSyncEnforcedQuery` returns a value > 0 for the new master. `orchestrator` does not trigger any recoveries if the master flag is otherwise changed or incorrectly set. 65 | 66 | A semi-sync master可以进入两种故障情况. [LockedSemiSyncMaster](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Failure%20detection.md#lockedsemisyncmaster)和[MasterWithTooManySemiSyncReplicas](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Failure%20detection.md#masterwithtoomanysemisyncreplicas), 在这两种情况的恢复过程中, `orchestrator`会关闭半同步的从库(打开了rpl\_semi\_sync\_slave\_enabled的从库)的`rpl_semi_sync_master_enabled` 参数. 67 | 68 | #### Semi-sync replicas (`rpl_semi_sync_slave_enabled`) 69 | `orchestrator`可以检测拓扑结构中是否存在不正确的半同步复制数量([LockedSemiSyncMaster](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Failure%20detection.md#lockedsemisyncmaster)和[MasterWithTooManySemiSyncReplicas](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Failure%20detection.md#masterwithtoomanysemisyncreplicas)) 然后可以通过启用/禁用相应的半同步复制参数来纠正这种情况. 70 | 71 | 这种行为可以由以下选项控制: 72 | 73 | * `DetectSemiSyncEnforcedQuery` : 返回半同步优先级的查询(零表示异步复制; 数字越大表示优先级越高). 74 | * `EnforceExactSemiSyncReplicas` : 决定是否执行严格的半同步复制拓扑结构的标志. 如果启用, `LockedSemiSyncMaster`和`MasterWithTooManyReplicas`的恢复将在副本上启用和禁用半同步, 以根据优先级顺序完全匹配所需的拓扑结构. 75 | * `RecoverLockedSemiSyncMaster`: 决定是否从`LockedSemiSyncMaster`情况下恢复的标志. 如果启用, `LockedSemiSyncMaster`的恢复将按照优先级顺序在副本上启用(但绝不会禁用)半同步, 以匹配主库的等待计数(rpl\_semi\_sync\_master\_wait\_for\_slave\_count). 如果`EnforceExactSemiSyncReplicas`被设置了, 这个选项就没有效果. 如果你想只处理半同步复制太少的情况, 而不是太多的话, 这个选项很有用. 76 | * `ReasonableLockedSemiSyncMasterSeconds` : 触发`LockedSemiSyncMaster`条件的秒数; 如果没有设置, 则退回到`ReasonableReplicationLagSeconds` . 77 | 78 | **Example 1**: 执行严格的半同步复制拓扑结构: 79 | 80 | ```yaml 81 | "DetectSemiSyncEnforcedQuery": "select priority from meta.semi_sync where cluster_member = @@hostname", 82 | "EnforceExactSemiSyncReplicas": true 83 | ``` 84 | 假设有以下拓扑结构, 85 | 86 | ```Plain Text 87 | ,- replica1 (priority = 10, rpl_semi_sync_slave_enabled = 1) 88 | master 89 | `- replica2 (priority = 20, rpl_semi_sync_slave_enabled = 1) 90 | ``` 91 | `orchestrator`会检测到[MasterWithTooManySemiSyncReplicas](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Failure%20detection.md#masterwithtoomanysemisyncreplicas)的情况, 并在replica1上禁用半同步(低优先级). 92 | 93 | **Example 2**: Enforcing a weak semi-sync replica toplogy, with 94 | 95 | `rpl_semi_sync_master_wait_for_slave_count=1`: 96 | 97 | ```yaml 98 | "DetectSemiSyncEnforcedQuery": "select 2586", 99 | "DetectPromotionRuleQuery": "select promotion_rule from meta.promotion_rules where cluster_member = @@hostname", 100 | "RecoverLockedSemiSyncMaster": true 101 | ``` 102 | 假设有以下拓扑结构, 103 | 104 | ```Plain Text 105 | ,- replica1 (priority = 2586, promotion rule = prefer, rpl_semi_sync_slave_enabled = 0) 106 | master 107 | `- replica2 (priority = 2586, promotion rule = neutral, rpl_semi_sync_slave_enabled = 0) 108 | ``` 109 | `orchestrator`会检测到一个[LockedSemiSyncMaster](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Failure%20detection.md#lockedsemisyncmaster)场景, 并在replica1上启用半同步(因为replica1的promotion rule是prefer) 110 | -------------------------------------------------------------------------------- /Setup/配置/示例配置文件.md: -------------------------------------------------------------------------------- 1 | # 示例配置文件 2 | # [Configuration sample file](https://github.com/openark/orchestrator/blob/master/docs/configuration-sample.md) 3 | 下面是一个生产用的配置文件, 其中一些细节被删节了 4 | 5 | ```yaml 6 | { 7 | "Debug": true, 8 | "EnableSyslog": false, 9 | "ListenAddress": ":3000", 10 | "MySQLTopologyCredentialsConfigFile": "/etc/mysql/orchestrator.cnf", 11 | "MySQLTopologySSLPrivateKeyFile": "", 12 | "MySQLTopologySSLCertFile": "", 13 | "MySQLTopologySSLCAFile": "", 14 | "MySQLTopologySSLSkipVerify": true, 15 | "MySQLTopologyUseMutualTLS": false, 16 | "MySQLTopologyMaxPoolConnections": 3, 17 | "MySQLOrchestratorHost": "127.0.0.1", 18 | "MySQLOrchestratorPort": 3306, 19 | "MySQLOrchestratorDatabase": "orchestrator", 20 | "MySQLOrchestratorCredentialsConfigFile": "/etc/mysql/orchestrator_srv.cnf", 21 | "MySQLOrchestratorSSLPrivateKeyFile": "", 22 | "MySQLOrchestratorSSLCertFile": "", 23 | "MySQLOrchestratorSSLCAFile": "", 24 | "MySQLOrchestratorSSLSkipVerify": true, 25 | "MySQLOrchestratorUseMutualTLS": false, 26 | "MySQLConnectTimeoutSeconds": 1, 27 | "DefaultInstancePort": 3306, 28 | "ReplicationLagQuery": "select round(absolute_lag) from meta.heartbeat_view", 29 | "SlaveStartPostWaitMilliseconds": 1000, 30 | "DiscoverByShowSlaveHosts": false, 31 | "InstancePollSeconds": 5, 32 | "DiscoveryIgnoreReplicaHostnameFilters": [ 33 | "a_host_i_want_to_ignore[.]example[.]com", 34 | ".*[.]ignore_all_hosts_from_this_domain[.]example[.]com", 35 | "a_host_with_extra_port_i_want_to_ignore[.]example[.]com:3307" 36 | ], 37 | "ReadLongRunningQueries": false, 38 | "SkipMaxScaleCheck": true, 39 | "BinlogFileHistoryDays": 10, 40 | "UnseenInstanceForgetHours": 240, 41 | "SnapshotTopologiesIntervalHours": 0, 42 | "InstanceBulkOperationsWaitTimeoutSeconds": 10, 43 | "ActiveNodeExpireSeconds": 5, 44 | "HostnameResolveMethod": "default", 45 | "MySQLHostnameResolveMethod": "@@hostname", 46 | "SkipBinlogServerUnresolveCheck": true, 47 | "ExpiryHostnameResolvesMinutes": 60, 48 | "RejectHostnameResolvePattern": "", 49 | "ReasonableReplicationLagSeconds": 10, 50 | "ProblemIgnoreHostnameFilters": [ 51 | 52 | ], 53 | "VerifyReplicationFilters": false, 54 | "MaintenanceOwner": "orchestrator", 55 | "ReasonableMaintenanceReplicationLagSeconds": 20, 56 | "MaintenanceExpireMinutes": 10, 57 | "MaintenancePurgeDays": 365, 58 | "CandidateInstanceExpireMinutes": 60, 59 | "AuditLogFile": "", 60 | "AuditToSyslog": false, 61 | "AuditPageSize": 20, 62 | "AuditPurgeDays": 365, 63 | "RemoveTextFromHostnameDisplay": ":3306", 64 | "ReadOnly": false, 65 | "AuthenticationMethod": "", 66 | "HTTPAuthUser": "", 67 | "HTTPAuthPassword": "", 68 | "AuthUserHeader": "", 69 | "PowerAuthUsers": [ 70 | "*" 71 | ], 72 | "ClusterNameToAlias": { 73 | "127.0.0.1": "test suite" 74 | }, 75 | "AccessTokenUseExpirySeconds": 60, 76 | "AccessTokenExpiryMinutes": 1440, 77 | "DetectClusterAliasQuery": "select ifnull(max(cluster_name), '') as cluster_alias from meta.cluster where anchor=1", 78 | "DetectClusterDomainQuery": "", 79 | "DataCenterPattern": "", 80 | "DetectDataCenterQuery": "select 'redacted'", 81 | "PhysicalEnvironmentPattern": "", 82 | "PromotionIgnoreHostnameFilters": [ 83 | 84 | ], 85 | "ServeAgentsHttp": false, 86 | "UseSSL": false, 87 | "UseMutualTLS": false, 88 | "SSLSkipVerify": false, 89 | "SSLPrivateKeyFile": "", 90 | "SSLCertFile": "", 91 | "SSLCAFile": "", 92 | "SSLValidOUs": [ 93 | 94 | ], 95 | "StatusEndpoint": "/api/status", 96 | "StatusSimpleHealth": true, 97 | "StatusOUVerify": false, 98 | "HttpTimeoutSeconds": 60, 99 | "StaleSeedFailMinutes": 60, 100 | "SeedAcceptableBytesDiff": 8192, 101 | "SeedWaitSecondsBeforeSend": 2, 102 | "PseudoGTIDPattern": "drop view if exists `meta`.`_pseudo_gtid_hint__asc:", 103 | "PseudoGTIDPatternIsFixedSubstring": true, 104 | "PseudoGTIDMonotonicHint": "asc:", 105 | "DetectPseudoGTIDQuery": "select count(*) as pseudo_gtid_exists from meta.pseudo_gtid_status where anchor = 1 and time_generated > now() - interval 1 day", 106 | "BinlogEventsChunkSize": 10000, 107 | "BufferBinlogEvents": true, 108 | "SkipBinlogEventsContaining": [ 109 | "@@SESSION.GTID_NEXT= 'ANONYMOUS'" 110 | ], 111 | "ReduceReplicationAnalysisCount": false, 112 | "FailureDetectionPeriodBlockMinutes": 60, 113 | "RecoveryPeriodBlockSeconds": 600, 114 | "RecoveryIgnoreHostnameFilters": [ 115 | 116 | ], 117 | "RecoverMasterClusterFilters": [ 118 | "*" 119 | ], 120 | "RecoverIntermediateMasterClusterFilters": [ 121 | "*" 122 | ], 123 | "OnFailureDetectionProcesses": [ 124 | "/redacted/our-orchestrator-recovery-handler -t 'detection' -f '{failureType}' -h '{failedHost}' -C '{failureCluster}' -A '{failureClusterAlias}' -n '{countReplicas}'" 125 | ], 126 | "PreGracefulTakeoverProcesses": [ 127 | "echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log" 128 | ], 129 | "PreFailoverProcesses": [ 130 | "/redacted/our-orchestrator-recovery-handler -t 'pre-failover' -f '{failureType}' -h '{failedHost}' -C '{failureCluster}' -A '{failureClusterAlias}' -n '{countReplicas}'" 131 | ], 132 | "PostFailoverProcesses": [ 133 | "/redacted/our-orchestrator-recovery-handler -t 'post-failover' -f '{failureType}' -h '{failedHost}' -H '{successorHost}' -C '{failureCluster}' -A '{failureClusterAlias}' -n '{countReplicas}' -u '{recoveryUID}'" 134 | ], 135 | "PostUnsuccessfulFailoverProcesses": [ 136 | "/redacted/our-orchestrator-recovery-handler -t 'post-unsuccessful-failover' -f '{failureType}' -h '{failedHost}' -C '{failureCluster}' -A '{failureClusterAlias}' -n '{countReplicas}' -u '{recoveryUID}'" 137 | ], 138 | "PostMasterFailoverProcesses": [ 139 | "/redacted/do-something # e.g. kick pt-heartbeat on promoted master" 140 | ], 141 | "PostIntermediateMasterFailoverProcesses": [ 142 | ], 143 | "PostGracefulTakeoverProcesses": [ 144 | "echo 'Planned takeover complete' >> /tmp/recovery.log" 145 | ], 146 | "CoMasterRecoveryMustPromoteOtherCoMaster": true, 147 | "DetachLostSlavesAfterMasterFailover": true, 148 | "ApplyMySQLPromotionAfterMasterFailover": true, 149 | "PreventCrossDataCenterMasterFailover": false, 150 | "PreventCrossRegionMasterFailover": false, 151 | "MasterFailoverLostInstancesDowntimeMinutes": 60, 152 | "PostponeReplicaRecoveryOnLagMinutes": 10, 153 | "OSCIgnoreHostnameFilters": [ 154 | 155 | ], 156 | "GraphitePollSeconds": 60, 157 | "GraphiteAddr": "", 158 | "GraphitePath": "", 159 | "GraphiteConvertHostnameDotsToUnderscores": true, 160 | "BackendDB": "mysql", 161 | "MySQLTopologyReadTimeoutSeconds": 3, 162 | "MySQLDiscoveryReadTimeoutSeconds": 3, 163 | "SQLite3DataFile": "/var/lib/orchestrator/orchestrator-sqlite.db", 164 | "RaftEnabled": false, 165 | "RaftBind": "redacted", 166 | "RaftDataDir": "/var/lib/orchestrator", 167 | "DefaultRaftPort": 10008, 168 | "ConsulAddress": "redacted:8500", 169 | "RaftNodes": [ 170 | "redacted", 171 | "redacted", 172 | "redacted" 173 | ] 174 | } 175 | ``` -------------------------------------------------------------------------------- /Use/Using the web API.md: -------------------------------------------------------------------------------- 1 | # Using the web API 2 | ## [Using the web API](https://github.com/openark/orchestrator/blob/master/docs/using-the-web-api.md) 3 | `orchestrator` 提供了一个精心设计的 Web API. 4 | 5 | 敏锐的网络开发者会注意到(通过Firebug或开发者工具)web interface是如何完全依赖JSON API请求的. 6 | 7 | 开发人员可以使用该API来实现自动化的目的. 8 | 9 | ### A very very brief look at a few API commands 10 | 举例来说: 11 | 12 | * `/api/instance/:host/:port` : 读取并返回一个实例的详细信息(例如`/api/instance/mysql10/3306`). 13 | * `/api/discover/:host/:port` : discover给定的实例(`orchestrator`服务将从那里获取它并递归扫描整个拓扑). 14 | * `/api/relocate/:host/:port/:belowHost/:belowPort` : (试图)将一个实例移到另一个实例的下面, `orchestrator`选择最佳行动方案. 15 | * `/api/relocate-replicas/:host/:port/:belowHost/:belowPort` : (试图)将一个实例的副本移到另一个实例的下面, 协调者选择最佳行动方案. 16 | * `/api/recover/:host/:post` : 在给定的实例上启动恢复, 假设有东西需要恢复. 17 | * `/api/force-master-failover/:mycluster` : 强制在给定的集群上立即进行failover. 18 | 19 | ### Full listing 20 | 完整的接口清单请见[api.go](https://github.com/openark/orchestrator/blob/master/go/http/api.go)(向下滚动到`RegisterRequests`) 21 | 22 | 你可能还想看看[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md)([source code](https://github.com/openark/orchestrator/blob/master/resources/bin/orchestrator-client)), 看看command line interface是如何转化为API调用的. 或者, 直接使用[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md)作为你的API客户端, 这就是它的作用. 23 | 24 | ### Instance JSON breakdown 25 | 许多 API 调用返回 *instance objects*实例对象, 描述单个 MySQL server. 此示例之后是字段细分 26 | 27 | ```yaml 28 | { 29 | 30 | "Key": { 31 | "Hostname": "mysql.02.instance.com", 32 | "Port": 3306 33 | }, 34 | "Uptime": 45, 35 | "ServerID": 101, 36 | "Version": "5.6.22-log", 37 | "ReadOnly": false, 38 | "Binlog_format": "ROW", 39 | "LogBinEnabled": true, 40 | "LogReplicationUpdatesEnabled": true, 41 | "SelfBinlogCoordinates": { 42 | "LogFile": "mysql-bin.015656", 43 | "LogPos": 15082, 44 | "Type": 0 45 | }, 46 | "MasterKey": { 47 | "Hostname": "mysql.01.instance.com", 48 | "Port": 3306 49 | }, 50 | "ReplicationSQLThreadRuning": true, 51 | "ReplicationIOThreadRuning": true, 52 | "HasReplicationFilters": false, 53 | "SupportsOracleGTID": true, 54 | "UsingOracleGTID": true, 55 | "UsingMariaDBGTID": false, 56 | "UsingPseudoGTID": false, 57 | "ReadBinlogCoordinates": { 58 | "LogFile": "mysql-bin.015993", 59 | "LogPos": 20146, 60 | "Type": 0 61 | }, 62 | "ExecBinlogCoordinates": { 63 | "LogFile": "mysql-bin.015993", 64 | "LogPos": 20146, 65 | "Type": 0 66 | }, 67 | "RelaylogCoordinates": { 68 | "LogFile": "mysql_sandbox21088-relay-bin.000051", 69 | "LogPos": 16769, 70 | "Type": 1 71 | }, 72 | "LastSQLError": "", 73 | "LastIOError": "", 74 | "SecondsBehindMaster": { 75 | "Int64": 0, 76 | "Valid": true 77 | }, 78 | "SQLDelay": 0, 79 | "ExecutedGtidSet": "230ea8ea-81e3-11e4-972a-e25ec4bd140a:1-49", 80 | "ReplicationLagSeconds": { 81 | "Int64": 0, 82 | "Valid": true 83 | }, 84 | "Replicas": [ ], 85 | "ClusterName": "mysql.01.instance.com:3306", 86 | "DataCenter": "", 87 | "PhysicalEnvironment": "", 88 | "ReplicationDepth": 1, 89 | "IsCoMaster": false, 90 | "IsLastCheckValid": true, 91 | "IsUpToDate": true, 92 | "IsRecentlyChecked": true, 93 | "SecondsSinceLastSeen": { 94 | "Int64": 9, 95 | "Valid": true 96 | }, 97 | "CountMySQLSnapshots": 0, 98 | "IsCandidate": false, 99 | "UnresolvedHostname": "" 100 | } 101 | ``` 102 | 实例的结构在不断发展, 文档可能更新不及时. 一些关键属性是: 103 | 104 | * `Key`: 实例的唯一标识: 一个host和port的组合. 105 | * `ServerID`: MySQL `server_id` 参数值. 106 | * `Version`: MySQL 版本. 107 | * `ReadOnly`: global `read_only` 值, 布尔值. 108 | * `Binlog_format`: global `binlog_format` 参数值. 109 | * `LogBinEnabled`: 是否启用了binlog. 110 | * `LogReplicationUpdatesEnabled`: 是否启用了 `log_slave_updates` . 111 | * `SelfBinlogCoordinates`: 当前写到哪个binary log file 哪个 position (和 `SHOW MASTER STATUS`显示的一样). 112 | * `MasterKey`: 主库的hostname & port, 如果这个实例有主库的话. 113 | * `ReplicationSQLThreadRuning`: 从`SHOW SLAVE STATUS`的`Slave_SQL_Running`直接映射. 114 | * `ReplicationIOThreadRuning`: 从`SHOW SLAVE STATUS`的`Slave_IO_Running`直接映射. 115 | * `HasReplicationFilters`: 如果设置了复制过滤规则, 此值为true. 116 | * `SupportsOracleGTID`: true if cnfigured with `gtid_mode` (Oracle MySQL >= 5.6) 117 | > 有点歧义, 是说`gtid_mode` 为ON吗? 需要看代码 118 | * `UsingOracleGTID`: true if replica replicates via Oracle GTID 119 | * `UsingMariaDBGTID`: true if replica replicates via MariaDB GTID 120 | * `UsingPseudoGTID`: true if replica known to have Pseudo-GTID coordinates (see related `DetectPseudoGTIDQuery` config) 121 | * `ReadBinlogCoordinates`: (复制时) 从主库读取到的binlog坐标 (即 `IO_THREAD` 拉取的内容) . 122 | > 应该就是Master\_Log\_File 和 Read\_Master\_Log\_Pos. 需要看代码 123 | * `ExecBinlogCoordinates`: (复制时) 现在正在应用的主库binlog坐标 (即 `SQL_THREAD` 执行到的位置). 124 | > 应该就是Relay\_Master\_Log\_File 和 Exec\_Master\_Log\_Pos. 需要看代码 125 | * `RelaylogCoordinates`: (复制时) 现在正在执行的中继日志的坐标. 126 | > 应该就是Relay\_Log\_File 和 Relay\_Log\_Pos. 需要看代码 127 | * `LastSQLError`: `SHOW SLAVE STATUS` 中的`Last_SQL_Error` . 128 | * `LastIOError`: `SHOW SLAVE STATUS` 中的`Last_IO_Error` . 129 | * `SecondsBehindMaster`: 从`SHOW SLAVE STATUS`的`Seconds_Behind_Master`的直接映射 `"Valid": false`表示`NULL` 130 | > 如sql\_thread停止后, Seconds\_Behind\_Master: NULL 131 | * `SQLDelay`: change master语句中的 `MASTER_DELAY` . 132 | * `ExecutedGtidSet`: if using Oracle GTID, the executed GTID set 133 | > 怀疑就是SHOW SLAVE STATUS中的`Executed_Gtid_Set` , 需要看代码. 134 | * `ReplicationLagSeconds`: 当提供`ReplicationLagQuery` 时, 计算出的复制延迟. 否则与`SecondsBehindMaster` 相同. 135 | * `Replicas`: 该实例的从库列表(hostname & port). 136 | * `ClusterName`: 这个实例所关联的集群的名称; 唯一标识一个集群. 137 | * `DataCenter`: (元数据) 数据中心的名称, 由DataCenterPattern配置变量推断. 138 | * `PhysicalEnvironment`: (元数据) 环境名称, 由`PhysicalEnvironmentPattern`配置变量推断出. 139 | * `ReplicationDepth`: 与master的距离 (master is `0`, direct replica is `1` and so on) 140 | * `IsCoMaster`: 当实例是双主的一部分时, 此值为true. 141 | * `IsLastCheckValid`: 上次尝试读取此实例是否成功. 142 | * `IsUpToDate`: 这些数据是否是最新的. 143 | * `IsRecentlyChecked`: 最近是否对该实例进行了读取尝试. 144 | * `SecondsSinceLastSeen`: 自上次成功访问此实例以来经过的时间. 145 | * `CountMySQLSnapshots`: 已知快照的数量(由 `orchestrator-agent` 提供的数据) 146 | * `IsCandidate`: (元数据) 当这个实例通过`register-candidate` CLI命令被标记为候选实例时为`true`. 可以在崩溃恢复中使用, 以确定故障转移选项的优先级 147 | * `UnresolvedHostname`: name this host *unresolves* to, as indicated by the `register-hostname-unresolve` CLI command 148 | 149 | ### Cheatsheet 150 | 下面是几个有用的API使用例子: 151 | 152 | * 获取集群的一般信息: 153 | 154 | ```json 155 | curl -s "http://my.orchestrator.service.com/api/cluster-info/my_cluster" | jq . 156 | 157 | { 158 | "ClusterName": "my-cluster-fqdn:3306", 159 | "ClusterAlias": "my_cluster", 160 | "ClusterDomain": "my-cluster.com", 161 | "CountInstances": 10, 162 | "HeuristicLag": 0, 163 | "HasAutomatedMasterRecovery": true, 164 | "HasAutomatedIntermediateMasterRecovery": true 165 | } 166 | ``` 167 | * 找到`my_cluster`中没有启动binary log的主机: 168 | 169 | ```javascript 170 | curl -s "http://my.orchestrator.service.com/api/cluster/alias/my_cluster" | jq '.[] | select(.LogBinEnabled==false) .Key.Hostname' -r 171 | ``` 172 | * 找到`my_cluster`的master的直接副本 173 | 174 | ```json 175 | curl -s "http://my.orchestrator.service.com/api/cluster/alias/my_cluster" | jq '.[] | select(.ReplicationDepth==1) .Key.Hostname' -r 176 | ``` 177 | 或: 178 | 179 | ```json 180 | master=$(curl -s "http://my.orchestrator.service.com/api/cluster-info/my_cluster" | jq '.ClusterName' | tr ':' '/') 181 | curl -s "http://my.orchestrator.service.com/api/instance-replicas/${master}" | jq '.[] | .Key.Hostname' -r 182 | ``` 183 | * Find all intermediate masters in `my_cluster`: 184 | 185 | ```json 186 | curl -s "http://my.orchestrator.service.com/api/cluster/alias/my_cluster" | jq '.[] | select(.MasterKey.Hostname!="") | select(.Replicas!=[]) .Key.Hostname' 187 | ``` 188 | -------------------------------------------------------------------------------- /Quick guides/First Steps.md: -------------------------------------------------------------------------------- 1 | # First Steps 2 | # [First Steps with Orchestrator](https://github.com/openark/orchestrator/blob/master/docs/first-steps.md) 3 | 你已经安装、部署和配置了Orchestrator. 你能用它做什么? 4 | 5 | 介绍一下常用命令, 主要是CLI侧的命令 6 | 7 | > A walk through of common commands, mostly on the CLI side 8 | 9 | #### Must 10 | ##### Discover 11 | You need to discover your MySQL hosts. 要么浏览你的`http://orchestrator:3000/web/discover` 页面并提交一个实例以供发现, 要么: 12 | 13 | ```bash 14 | $ orchestrator-client -c discover -i some.mysql.instance.com:3306 15 | ``` 16 | `:3306` 不是必须的, 因为`DefaultInstancePort` 默认就是`3306` . You may also: 17 | 18 | ```bash 19 | $ orchestrator-client -c discover -i some.mysql.instance.com 20 | ``` 21 | 这就发现了一个单一的实例. 但是: 你是否也有一个正在运行的`orchestrator`服务? 它将从这里开始, 询问这个实例的主站和副本, 递归进行, 直到整个拓扑结构被揭示. 22 | 23 | > This discovers a single instance. But: do you also have an `orchestrator` service running? It will pick up from there and will interrogate this instance for its master and replicas, recursively moving on until the entire topology is revealed. 24 | 25 | #### Information 26 | 我们现在假设你有被`orchestrator`知道的拓扑结构(你已经发现了它). 假设`some.mysql.instance.com`属于一个拓扑结构. `a.replica.3.instance.com`属于另一个. 你可以问以下问题. 27 | 28 | > We now assume you have topologies known to `orchestrator` (you have *discovered* it). Let's say `some.mysql.instance.com` belongs to one topology. `a.replica.3.instance.com` belongs to another. You may ask the following questions: 29 | 30 | ```bash 31 | $ orchestrator-client -c clusters 32 | topology1.master.instance.com:3306 33 | topology2.master.instance.com:3306 34 | 35 | $ orchestrator-client -c which-master -i some.mysql.instance.com 36 | some.master.instance.com:3306 37 | 38 | $ orchestrator-client -c which-replicas -i some.mysql.instance.com 39 | a.replica.instance.com:3306 40 | another.replica.instance.com:3306 41 | 42 | $ orchestrator-client -c which-cluster -i a.replica.3.instance.com 43 | topology2.master.instance.com:3306 44 | 45 | $ orchestrator-client -c which-cluster-instances -i a.replica.3.instance.com 46 | topology2.master.instance.com:3306 47 | a.replica.1.instance.com:3306 48 | a.replica.2.instance.com:3306 49 | a.replica.3.instance.com:3306 50 | a.replica.4.instance.com:3306 51 | a.replica.5.instance.com:3306 52 | a.replica.6.instance.com:3306 53 | a.replica.7.instance.com:3306 54 | a.replica.8.instance.com:3306 55 | 56 | $ orchestrator-client -c topology -i a.replica.3.instance.com 57 | topology2.master.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 58 | + a.replica.1.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 59 | + a.replica.2.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 60 | + a.replica.3.instance.com:3306 [OK,5.6.17-log,STATEMENT] 61 | + a.replica.4.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 62 | + a.replica.5.instance.com:3306 [OK,5.6.17-log,STATEMENT] 63 | + a.replica.6.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 64 | + a.replica.7.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 65 | + a.replica.8.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 66 | 67 | ``` 68 | #### Move stuff around 69 | 您可以使用各种命令来移动服务器. 通用的“自动解决问题”命令是 `relocate` 和 `relocate-replicas`: 70 | 71 | > You may move servers around using various commands. The generic "figure things out automatically" commands are `relocate` and `relocate-replicas`: 72 | 73 | ```Plain Text 74 | # Move a.replica.3.instance.com to replicate from a.replica.4.instance.com 75 | 76 | $ orchestrator-client -c relocate -i a.replica.3.instance.com:3306 -d a.replica.4.instance.com 77 | a.replica.3.instance.com:3306>] 81 | + a.replica.1.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 82 | + a.replica.2.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 83 | + a.replica.4.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 84 | + a.replica.3.instance.com:3306 [OK,5.6.17-log,STATEMENT] 85 | + a.replica.5.instance.com:3306 [OK,5.6.17-log,STATEMENT] 86 | + a.replica.6.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 87 | + a.replica.7.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 88 | + a.replica.8.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 89 | 90 | # Move the replicas of a.replica.2.instance.com to replicate from a.replica.6.instance.com 91 | 92 | $ orchestrator-client -c relocate-replicas -i a.replica.2.instance.com:3306 -d a.replica.6.instance.com 93 | a.replica.4.instance.com:3306 94 | a.replica.5.instance.com:3306 95 | 96 | $ orchestrator-client -c topology -i a.replica.3.instance.com 97 | topology2.master.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 98 | + a.replica.1.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 99 | + a.replica.2.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 100 | + a.replica.6.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 101 | + a.replica.4.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 102 | + a.replica.3.instance.com:3306 [OK,5.6.17-log,STATEMENT] 103 | + a.replica.5.instance.com:3306 [OK,5.6.17-log,STATEMENT] 104 | + a.replica.7.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 105 | + a.replica.8.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 106 | 107 | ``` 108 | `relocate`和`relocate-replicas`会自动计算出如何重新指向一个副本. 也许是通过GTID; 也许是正常的binlog file:pos. 或者也许有Pseudo GTID, 或者有一个binlog server参与?也支持其他变化. 109 | 110 | If you want to have greater control: 111 | 112 | * Normal file:pos operations are done via `move-up`, `move-below` 113 | * Pseudo-GTID specific replica relocation, use `match`, `match-replicas`, `regroup-replicas`. 114 | * Binlog server operations are typically done with `repoint`, `repoint-replicas` 115 | 116 | #### Replication control 117 | You are easily able to see what the following do: 118 | 119 | ```Plain Text 120 | $ orchestrator-client -c stop-replica -i a.replica.8.instance.com 121 | $ orchestrator-client -c start-replica -i a.replica.8.instance.com 122 | $ orchestrator-client -c restart-replica -i a.replica.8.instance.com 123 | $ orchestrator-client -c set-read-only -i a.replica.8.instance.com 124 | $ orchestrator-client -c set-writeable -i a.replica.8.instance.com 125 | 126 | ``` 127 | Break replication by messing with a replica's master host: 128 | 129 | ```bash 130 | $ orchestrator-client -c detach-replica -i a.replica.8.instance.com 131 | ``` 132 | Don't worry, this is reversible: 133 | 134 | ```bash 135 | $ orchestrator-client -c reattach-replica -i a.replica.8.instance.com 136 | ``` 137 | #### Crash analysis & recovery 138 | Are your clusters healthy? 139 | 140 | ```Plain Text 141 | $ orchestrator-client -c replication-analysis 142 | some.master.instance.com:3306 (cluster some.master.instance.com:3306): DeadMaster 143 | a.replica.6.instance.com:3306 (cluster topology2.master.instance.com:3306): DeadIntermediateMaster 144 | 145 | $ orchestrator-client -c topology -i a.replica.6.instance.com 146 | topology2.master.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 147 | + a.replica.1.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 148 | + a.replica.2.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 149 | + a.replica.6.instance.com:3306 [last check invalid,5.6.17-log,STATEMENT,>>] 150 | + a.replica.4.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 151 | + a.replica.3.instance.com:3306 [OK,5.6.17-log,STATEMENT] 152 | + a.replica.5.instance.com:3306 [OK,5.6.17-log,STATEMENT] 153 | + a.replica.7.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 154 | + a.replica.8.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 155 | 156 | ``` 157 | Ask `orchestrator` to recover the above dead intermediate master: 158 | 159 | ```Plain Text 160 | $ orchestrator-client -c recover -i a.replica.6.instance.com:3306 161 | a.replica.8.instance.com:3306 162 | 163 | $ orchestrator-client -c topology -i a.replica.8.instance.com 164 | topology2.master.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 165 | + a.replica.1.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 166 | + a.replica.2.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 167 | + a.replica.6.instance.com:3306 [last check invalid,5.6.17-log,STATEMENT,>>] 168 | + a.replica.8.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 169 | + a.replica.4.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 170 | + a.replica.3.instance.com:3306 [OK,5.6.17-log,STATEMENT] 171 | + a.replica.5.instance.com:3306 [OK,5.6.17-log,STATEMENT] 172 | + a.replica.7.instance.com:3306 [OK,5.6.17-log,STATEMENT,>>] 173 | 174 | ``` 175 | #### More 176 | The above should get you up and running. For more please consult the [TOC](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/TOC.md). For CLI commands listing just run: 177 | 178 | ```Plain Text 179 | orchestrator-client -help 180 | ``` 181 | -------------------------------------------------------------------------------- /Setup/部署/Orchestrator raft, consensus cluster.md: -------------------------------------------------------------------------------- 1 | # Orchestrator/raft, consensus cluster 2 | # [Orchestrator/raft, consensus cluster](https://github.com/openark/orchestrator/blob/master/docs/raft.md) 3 | ![image](images/IN2I79t-Cw8BgFvmfyyl5_T3SoaqtWPF2PfLy93AXbQ.png) 4 | 5 | `orchestrator/raft` 是一种部署方法, 集群中的`orchestrator` 节点通过`raft` 共识协议相互通信. 6 | 7 | `orchestrator/raft`的部署既解决了`orchestrator`本身的高可用性, 也解决了网络隔离的问题, 特别是跨数据中心的网络分区/围栏问题(network partitioning/fencing). 8 | 9 | > 译者注: 根据华为云文档:[区域和可用区](https://support.huaweicloud.com/productdesc-bms/bms_01_0004.html), 同区域(Region)中的不同可用区(AZ,Availability Zone)的服务器在不同的机房. 一个Region中的多个AZ间通过高速光纤相连. AZ间延迟没有官方, 华为工作人员反馈测试在0.5ms左右 10 | 11 | ![image](images/Gl-NJECOnWKQBz5y_3I_RIUzWsye0U_M0gNrkzkiih4.png) 12 | 13 | 14 | > 所以, 像broker集群, 其实现在已经是两地三中心部署了. bj1与bj2其实就是跨数据中心了, 华为文档说的很清楚了, [AZ有独立的风火水电](https://blog.csdn.net/wangjianno2/article/details/52145197) 15 | 16 | 17 | 18 | ### Very brief overview of traits of raft -- raft特点的简要概述 19 | 通过使用共识协议, `orchestrator`节点能够挑选一个拥有*quorum(*法定人数)的领导者, 这意味着它不是孤立的. 例如, 考虑一个3个节点的`orchestrator/raft`setup. 通常情况下, 这三个节点会互相通信, 其中一个会成为稳定的当选领导者. 然而, 面对网络分区, 比如说节点`n1`与节点`n2`和`n3`分开, 可以保证领导者将是`n2`或`n3`. `n1`将无法领导, 因为它没有*quorum(*法定人数)(在3个节点设置中, *quorum*法定人数是2; 在5个节点设置中, *quorum*法定人数是3) 20 | 21 | 这在跨数据中心(DC)的设置中变得非常有用. 假设你设置了三个`orchestrator`节点, 每个都在自己的DC上. 如果一个DC被隔离, 可以保证活动的`orchestrator`节点将是一个有共识的节点, 即从被隔离的DC之外操作( i.e. operates from outside the isolated DC). 22 | 23 | ### orchestrator/raft setup technical details 技术细节 24 | 另见: [orchestrator/raft vs. synchronous replication setup](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/部署/orchestrator%20raft%20vs.%20synchronous%20replication%20setup.md) 25 | 26 | #### Service nodes 27 | 你将设置`3`个或`5`个(推荐raft节点数)`orchestrator`节点. 其他数字也是合法的, 但你将希望至少有3个. 28 | 29 | 此时, `orchestrator`节点不会动态加入到集群中. 节点的列表是预先配置好的, 如: 30 | 31 | ```yaml 32 | "RaftEnabled": true, 33 | "RaftDataDir": "/var/lib/orchestrator", 34 | "RaftBind": "", 35 | "DefaultRaftPort": 10008, 36 | "RaftNodes": [ 37 | "", 38 | "", 39 | "" 40 | ], 41 | ``` 42 | #### Backend DB 43 | 每个`orchestrator`节点都有自己专用的后端数据库服务器. 这可能是: 44 | 45 | * MySQL(不需要配置从库, 不过有也没事) 46 | 根据部署建议, MySQL可以与`orchestrator` 运行在同一主机上. 47 | * SQLite: 48 | 49 | ```yaml 50 | "BackendDB": "sqlite", 51 | "SQLite3DataFile": "/var/lib/orchestrator/orchestrator.db", 52 | ``` 53 | `orchestrator`与`sqlite`捆绑在一起, 不需要安装外部依赖. 54 | 55 | #### Proxy: leader 56 | Only the leader is allowed to make changes. 57 | 58 | 最简单的设置是, 通过在`orchestrator`服务上设置一个`HTTP`代理(如HAProxy), 只将流量路由到领导者. 59 | 60 | > 另一种方法请见[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%83%A8%E7%BD%B2/Orchestrator%20raft%2C%20consensus%20cluster.md#orchestrator-client) 61 | 62 | * 使用`/api/leader-check` 做健康检查. 在任何时候, 最多只有一个`orchestrator`节点会以`HTTP 200/OK`回复该检查; 其他节点会以`HTTP 404/Not found` 回复. 63 | * Hint: 你可以使用, 例如, `/api/leader-check/503`是你明确希望获得503响应代码, 或类似的任何其他代码. 64 | * 只将流量导向通过该测试的节点(指`/api/leader-check` ) 65 | 66 | 作为例子, 这将是一个`HAProxy`的配置: 67 | 68 | ```bash 69 | listen orchestrator 70 | bind 0.0.0.0:80 process 1 71 | bind 0.0.0.0:80 process 2 72 | bind 0.0.0.0:80 process 3 73 | bind 0.0.0.0:80 process 4 74 | mode tcp 75 | option httpchk GET /api/leader-check 76 | maxconn 20000 77 | balance first 78 | retries 1 79 | timeout connect 1000 80 | timeout check 300 81 | timeout server 30s 82 | timeout client 30s 83 | 84 | default-server port 3000 fall 1 inter 1000 rise 1 downinter 1000 on-marked-down shutdown-sessions weight 10 85 | 86 | server orchestrator-node-0 orchestrator-node-0.fqdn.com:3000 check 87 | server orchestrator-node-1 orchestrator-node-1.fqdn.com:3000 check 88 | server orchestrator-node-2 orchestrator-node-2.fqdn.com:3000 check 89 | ``` 90 | 91 | 92 | #### Proxy: healthy raft nodes 93 | 对上述制约因素的放松 94 | 95 | > A relaxation of the above constraint. 96 | 97 | 健康的raft节点将反向代理你的请求给领导者. 你可以选择(对于kubernetes设置来说, 这恰好是可取的)与任何健康的raft成员交谈. 98 | 99 | 你不能访问*unhealthy*的raft成员, 即与*quorum*法定人数隔离的节点. 100 | 101 | > You *must not access unhealthy raft members, i.e. nodes that are isolated from the quorum*. 102 | 103 | * 使用`/api/raft-health`来识别一个节点是健康raft组的一部分 104 | * `HTTP 200/OK`的响应表明该节点是健康组的一部分, 你可以将流量导向该节点. 105 | * `HTTP 500/Internal Server Error`表明该节点不是健康组的一部分. 请注意, 在启动之后, 直到选出一个领导者, 你可能会发现有一段时间所有节点都报告为unhealthy. 请注意, 在领导者重新选举时, 你可能会观察到一个短暂的时期, 所有节点都报告为unhealthy. 106 | 107 | 108 | 109 | #### orchestrator-client 110 | 实现代理的另一种方法是使用`orchestrator-client`. 111 | 112 | [orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md)是一个shell脚本, 通过HTTP API访问`orchestrator` 服务, 并向用户提供一个命令行界面. 113 | 114 | 可以向`orchestrator-client`提供所有orchestrator API endpoints的完整列表. 在这种情况下, `orchestrator-client`会找出哪个endpoints是leader, 并将请求指向该endpoints. 115 | 116 | 例如: 117 | 118 | ```bash 119 | export ORCHESTRATOR_API="https://orchestrator.host1:3000/api https://orchestrator.host2:3000/api https://orchestrator.host3:3000/api" 120 | ``` 121 | A call to `orchestrator-client` will first check ( 译者: 这句话没懂, 好像没写完一样) 122 | 123 | 否则, 如果你已经有一个代理, 也可以让`orchestrator-client`与代理一起工作, 例如: 124 | 125 | ```bash 126 | export ORCHESTRATOR_API="https://orchestrator.proxy:80/api" 127 | ``` 128 | ### Behavior and implications of orchestrator/raft setup 行为和影响 129 | * 在`raft`模式中, 每个`orchestrator`节点独立运行所有服务器的发现. 这意味着在三个节点的`orchestrator`集群中, 你的每个MySQL拓扑服务器将被三个不同的`orchestrator`节点独立访问. 130 | * 在正常情况下, 这三个节点将看到一个或多或少相同的拓扑结构图. 但他们将各自有自己的独立分析. 131 | * 每个`orchestrator`节点都向自己的专用后端DB服务器(无论是`MySQL`还是`sqlite`)写入数据. 132 | * `orchestrator`节点间的通信非常少. 它们不共享发现信息(因为它们各自独立发现). 相反, leader与其他节点共享被拦截的用户指令, 例如: 133 | * `begin-downtime` 134 | * `register-candidate` 135 | * etc. 136 | 137 | The *leader* will also educate its followers about ongoing failovers. 138 | 139 | > 译者注: 就是说failover只能leader做, leader做failover时也会告诉followers呗. 140 | 141 | `orchestrator`节点之间的通信与事务性数据库的提交不相关, 而且是稀疏的. 142 | 143 | * 所有用户变更必须通过leader, 尤其是通过`HTTP API`. 你不能直接操作后台数据库, 因为这样的改变不会被发布到其他节点. 144 | * 因此, 在`orchestrator/raft`上, 人们不能在命令行模式下使用`orchestrator` 命令: 当raft模式被启用时, 试图运行orchestrator cli将被拒绝. 我们正在进行的一些开发工作是允许一些命令通过cli运行. 145 | * 有一个实用脚本, 即[orchestrator-client](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Use/orchestrator-client.md), 它提供了与`orchestrator`命令类似的接口, 并使用和操作`HTTP API` . 146 | * 只需在`orchestrator`服务节点上安装`orchestrator`二进制文件即可, 无需在其他地方安装. 而`orchestrator-client`可以安装在您希望安装的任何地方. 147 | * 单个`orchestrator`节点的故障将不会影响`orchestrator`的可用性. 在`3`个节点的集群中, 最多在只能有一个`orchestrator`节点发生故障. 在`5`个节点的设置中, 允许`2`个节点发生故障. 148 | * 如果没有后端数据库, `orchestrator`节点将无法运行. 对于使用 `sqlite` 后端, 这是微不足道的(感觉意思是使用sqlite几乎不用担心数据库故障引发orchestrator panic), 因为 `sqlite` 嵌入在 `orchestrator`中一起运行. 如果使用 `MySQL` 作为后端数据库, 假设在一段时间内`orchestrator`无法连接到后端 DB, 那么 `orchestrator` 服务将退出. 149 | * `orchestrator`节点可能会宕机, 然后恢复. (当它恢复时)它将重新加入`raft`组, 并接收它在离开时错过的任何事件. 节点离开多长时间并不重要. 如果它没有相关的本地`raft`log/snapshots, 另一个节点将自动为它提供最近的快照. 150 | * 如果无法加入 `raft` 组, 那么 `orchestrator` 服务将退出. 151 | 152 | See also [KV and orchestrator/raft](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Key-Value%20stores.md#kv-and-orchestratorraft) via `orchestrator/raft`. 153 | 154 | ### Main advantages of orchestrator/raft 主要优势 155 | * 高可用 156 | * 共识: 故障转移是由作为法定人数成员的领导节点进行的(而不是被孤立的节点进行的) 157 | * 支持使用`SQLite` (嵌入式)作为后端数据库, 不必非要使用`MySQL` 作为后端数据库(尽管也支持). 158 | * 几乎没有跨节点通信. 适用于高延迟跨DC网络环境. 159 | 160 | ### DC fencing example 161 | 考虑以下三个数据中心的示例: `DC1`、`DC2`和`DC3`. 我们使用三个节点运行`orchestrator/raft`, 每个数据中心都有一个`orchestrator`节点. 162 | 163 | ![image](images/PHGaWHq8bNVaqspDGa7Oo1722vM3oHhME2jq2DJCNII.png) 164 | 165 | * 假设一个跨3数据中心部署的集群 166 | * 每个数据中心部署一个`orchestrator`节点 167 | * 主库和一部分从库在DC2 168 | 故障场景是: 网络分区. 例如DC2网络出现问题, 没有任何进出流量 169 | 170 | What happens when `DC2` gets network isolated? 171 | 172 | ![image](images/2xZfM-jvkPjEpqXASZPIf6I6O60hkOMSF7BHOckE0j8.png) 173 | 174 | * 从DC2的角度来看 175 | 从server的角度来看, 特别是从在DC2的`orchestrator`节点的角度来看: 176 | * 主库和DC2的从库是正常的 177 | * DC1和DC3的server都挂了 178 | * (因为主库是"正常的"的)所以没必要failover 179 | * 然而, DC2的`orchestrator`并不是法定人数的一部分, 因此不是领导者 180 | 181 | ![image](images/HuIf-L441loi0GgS4f5fxDsGdo_K2uVc2Uax3BrRsac.png) 182 | 183 | * 在DC1和DC3的`orchestrator` 眼中: 184 | * 所有DC2的server, 包括主库, 都挂了 185 | * 需要failover 186 | * DC1的和DC3的协调器节点形成一个法定人数. 其中一个将成为领导者. 187 | * leader将启动故障转移. 188 | 189 | ![image](images/bHw5VkhfbgEnaerjTGLn4y0CandtPoE5BxBJGFW_2Sk.png) 190 | 191 | * 可能的failover结果是 192 | DC3中的从库当选为new master. 193 | * 拓扑结构被分离并分成两部分. 194 | * `orchestrator`节点将继续尝试联系DC2的服务器 195 | * 当DC2恢复时: 196 | * DC2的MySQL节点仍然被识别为"broken" 197 | * DC2的`orchestrator` 将重新加入法定人数, 并与集群保持同步. 198 | 199 | > 译者注: 这部分是个人总结; 这个案例存在一个问题: 如果DC2的应用仍然在向DC2的old master写入数据, 不就脑裂了吗? xenon能解决这个问题. 如果是xenon, DC2的xenon发现在脱离了raft组后, 会将old master只读打开. 200 | 201 | 202 | 203 | ### Roadmap 204 | 仍在进行中的TODO项: 205 | 206 | * 故障检测需要法定人数的同意(即`DeadMaster`需要由多个`orchestrator`节点进行分析), 以便启动故障转移/恢复. 207 | * 支持探测(probing)的共享(与上述相互排斥): 领导者将在所有节点之间划分要探测的服务器列表. 有可能按数据中心划分. 这将减少探测负载(每个MySQL服务器将由一个节点而不是所有节点探测). 所有协调器节点将看到相同的图片, 而不是独立的视图. 208 | -------------------------------------------------------------------------------- /Failure detection & recovery/Failure detection.md: -------------------------------------------------------------------------------- 1 | # Failure detection 2 | # [Failure detection](https://github.com/openark/orchestrator/blob/master/docs/failure-detection.md#masterwithtoomanysemisyncreplicas) 3 | `orchestrator`使用整体方法来检测master和intermediate master的故障. 4 | 5 | > `orchestrator` uses a holistic approach to detect master and intermediate master failures. 6 | 7 | 例如, 在一种简单的方法中, 监控工具会探测主库状态, 并在它无法连接主库或在主库执行查询时发出警报. 这种方法容易受到由网络故障引起的误报的影响. 这种方法通过运行以 `t` 时间间隔的 `n` 个测试来进一步减少误报. 在某些情况下, 这会减少误报的可能, 但在真正失败的情况下增加响应时间. 8 | 9 | `orchestrator`利用了复制拓扑结构. 它不仅观察(被监控的)server本身, 而且还观察其replicas. 例如, 为了诊断一个dead master scenario, `orchestrator`必须同时 10 | 11 | * 未能与上述主库联系 12 | * 能够联系 master 的副本, 并确认他们也看不到 master. (是联系部分还是所有副本, 还需确认) 13 | 14 | `orchestrator`不是按时间对错误进行分类, 而是由多个观察者、复制拓扑服务器本身进行分类. 事实上, 当一个 master 的所有副本都同意他们不能联系他们的 master 时, 复制拓扑实际上被破坏了, 并且故障转移是合理的. 15 | 16 | 众所周知, `orchestrator` 的整体故障检测方法在生产中非常可靠. 17 | 18 | ### Detection and recovery 探测和恢复 19 | 检测并不总是导致[Topology recovery](Failure%20detection%20%26%20recovery/Topology%20recovery.md). 有些情况下, 恢复是不可取的 20 | 21 | > Detection does not always lead to [recovery](https://github.com/openark/orchestrator/blob/master/docs/topology-recovery.md). There are scenarios where a recovery is undesired: 22 | 23 | * 该集群没有被列入自动失效器. 24 | > The cluster is not listed for auto-failovers. 25 | * admin用户表示不应该在特定的服务器上进行恢复. 26 | > The admin user has indicated a recovery should not take place on the specific server. 27 | * admin用户已全局禁用恢复功能. 28 | > The admin user has globally disabled recoveries. 29 | * 前不久刚刚对该集群完成了恢复, 并且正处于anti-flapping状态 30 | > A previous recovery on same cluster completed shortly before, and anti-flapping takes place. 31 | * 该故障类型被认为不值得恢复. 32 | > The failure type is not deemed worthy of recovery. 33 | 34 | 在理想的情况下, 检测到故障后立即恢复. 在其他情况下, such as blocked recoveries, 恢复可能在检测后的许多分钟后进行. 35 | 36 | 检测是独立于恢复的, 并且总是被启用.` OnFailureDetectionProcesses`钩子在每次检测时执行, 详见[Configuration: Failure detection](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Failure%20detection.md) 37 | 38 | ### Failure detection scenarios 故障检测场景 39 | 请注意以下潜在故障列表: 40 | 41 | * DeadMaster 42 | * DeadMasterAndReplicas 43 | * DeadMasterAndSomeReplicas 44 | * DeadMasterWithoutReplicas 45 | * UnreachableMasterWithLaggingReplicas 46 | * UnreachableMaster 47 | * LockedSemiSyncMaster 48 | * MasterWithTooManySemiSyncReplicas 49 | * AllMasterReplicasNotReplicating 50 | * AllMasterReplicasNotReplicatingOrDead 51 | * DeadCoMaster 52 | * DeadCoMasterAndSomeReplicas 53 | * DeadIntermediateMaster 54 | * DeadIntermediateMasterWithSingleReplicaFailingToConnect 55 | * DeadIntermediateMasterWithSingleReplica 56 | * DeadIntermediateMasterAndSomeReplicas 57 | * DeadIntermediateMasterAndReplicas 58 | * AllIntermediateMasterReplicasFailingToConnectOrDead 59 | * AllIntermediateMasterReplicasNotReplicating 60 | * UnreachableIntermediateMasterWithLaggingReplicas 61 | * UnreachableIntermediateMaster 62 | * BinlogServerFailingToConnectToMaster 63 | 64 | 简单地看一下一些例子, 以下是`orchestrator`如何得出失败的结论: 65 | 66 | > Briefly looking at some examples, here is how `orchestrator` reaches failure conclusions: 67 | 68 | #### `DeadMaster`: 69 | 1. 主库访问失败 70 | > Master MySQL access failure 71 | 2. 所有主库的副本复制失败 72 | > All of master's replicas are failing replication 73 | 74 | 这就形成了一个潜在的恢复过程 75 | 76 | > This makes for a potential recovery process 77 | 78 | #### `DeadMasterAndSomeReplicas`: 79 | 1. 主库访问失败 80 | > Master MySQL access failure 81 | 2. 该主库的一些副本也是unreachable的 82 | > Some of its replicas are also unreachable 83 | 3. 该主库的其余副本处于复制失败状态 84 | > Rest of the replicas are failing replication 85 | 86 | 这就形成了一个潜在的恢复过程 87 | 88 | > This makes for a potential recovery process 89 | 90 | #### `UnreachableMaster`: 91 | 1. 主库访问失败 92 | > Master MySQL access failure 93 | 2. 但它有复制状态正常的副本 94 | > But it has replicating replicas. 95 | 96 | 这并不会导致恢复过程. 然而, 为了改进分析, `orchestrator`将发出对副本(从库)的紧急重读, 以弄清它们是否真的对主站满意(在这种情况下,也许`orchestrator`由于网络故障而无法看到它), 还是实际上正在花时间弄清楚他们的复制失败了. 97 | 98 | > This does not make for a recovery process. However, to improve analysis, `orchestrator` will issue an emergent re-read of the replicas, to figure out whether they are really happy with the master (in which case maybe `orchestrator` cannot see it due to a network glitch) or were actually taking their time to figure out they were failing replication. 99 | 100 | #### `DeadIntermediateMaster`: 101 | 1. intermediate master 无法访问 102 | > An intermediate master (replica with replicas) cannot be reached 103 | 2. 该intermediate master所有的副本复制失败 104 | > All of its replicas are failing replication 105 | 106 | 这就形成了一个潜在的恢复过程 107 | 108 | > This makes for a potential recovery process. 109 | 110 | #### `UnreachableMasterWithLaggingReplicas`: 111 | 1. 主库无法访问 112 | > Master cannot be reached 113 | 2. 所有其直属从库(除延迟从库)都是滞后的(复制延迟) 114 | > All of its immediate replicas (excluding SQL delayed) are lagging 115 | 116 | 当主服务器过载时, 可能会发生这种情况. 客户端会收到"Too many connections"错误, 而很久以前就连接的副本却会声称主库没有问题(因为io\_thread是长连接). 类似地, 如果 master 由于某些元数据操作而被锁定, 则客户端将在连接时被阻塞, 而副本可能会声称一切正常. 但是, 由于应用程序无法连接到主库, 因此不会写入任何实际数据, 并且当使用诸如 `pt-heartbeat` 之类的心跳机制时, 我们可以观察到副本的延迟越来越大. 117 | 118 | `orchestrator`对这种情况的反应是重新启动主库的所有直属从库(猜测是restart slave io\_thread). 这将关闭这些副本上的旧客户端连接, 并试图启动新的连接. 现在, 这些连接可能会失败, 导致所有从库复制失败(复制状态异常). 这将导致`orchestrator`分析出一个`DeadMaster` (This will next lead `orchestrator` to analyze a `DeadMaster`.). 119 | 120 | > 貌似`DeadMaster` 会触发failover, 这与MHA机制不符. Too many connections 不会导致MHA触发failover. 121 | 122 | #### `LockedSemiSyncMaster` 123 | 1. 主库启用了半同步(`rpl_semi_sync_master_enabled=1`) 124 | > Master is running with semi-sync enabled (`rpl_semi_sync_master_enabled=1`) 125 | 2. 返回ack的从库数量小于`rpl_semi_sync_master_wait_for_slave_count` 126 | > Number of connected semi-sync replicas falls short of expected `rpl_semi_sync_master_wait_for_slave_count` 127 | 3. `rpl_semi_sync_master_timeout` 的值足够高, 主库的写入会被阻塞, 不会退化到异步复制. 128 | > `rpl_semi_sync_master_timeout` is high enough such that master locks writes and does not fall back to asynchronous replication 129 | 130 | 这个条件只在`ReasonableLockedSemiSyncMasterSeconds`过后触发. 如果没有设置`ReasonableLockedSemiSyncMasterSeconds`, 则在`ReasonableReplicationLagSeconds`之后触发. 131 | 132 | 这种情况的补救措施可以是在主库上禁用半同步, 或者启动(或启用)足够的半同步副本. 133 | 134 | 如果启用了 `EnforceExactSemiSyncReplicas`, 那么 `orchestrator` 将确定所需的半同步拓扑并enable/disable副本上的半同步(参数)以匹配它. 所需的拓扑由优先级顺序(见下文)和`rpl_semi_sync_master_wait_for_slave_count`值定义的. 135 | 136 | 如果启用了 `RecoverLockedSemiSyncMaster`, 那么 `orchestrator` 将按优先级顺序在副本上启用(但永远不会禁用)半同步, 直到半同步副本的数量与`rpl_semi_sync_master_wait_for_slave_count`匹配. 请注意, 如果设置了 `EnforceExactSemiSyncReplicas`, 则 `RecoverLockedSemiSyncMaster` 无效. 137 | 138 | 优先级顺序由 `DetectSemiSyncEnforcedQuery`(数字越大优先级越高)、提升规则 ([DetectPromotionRuleQuery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/%E9%85%8D%E7%BD%AE%E5%8F%82%E6%95%B0%E8%AF%A6%E8%A7%A3-%E2%85%A1.md#detectpromotionrulequery)) 和主机名(fallback)定义. 139 | 140 | 如果 `EnforceExactSemiSyncReplicas` 和 `RecoverLockedSemiSyncMaster` 均已禁用(默认), 则 `orchestrator` 不会为此类分析调用任何恢复过程. 141 | 142 | 另请参阅[Semi-sync topology](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Discovery%2C%20classifying%20servers.md#semi-sync-topology)文档以获取更多详细信息. 143 | 144 | #### `MasterWithTooManySemiSyncReplicas` 145 | 1. 主库启用了半同步(rpl\_semi\_sync\_master\_enabled=1) 146 | > Master is running with semi-sync enabled (`rpl_semi_sync_master_enabled=1`) 147 | 2. 返回ack的从库数量大于rpl\_semi\_sync\_master\_wait\_for\_slave\_count 148 | > Number of connected semi-sync replicas is higher than the expected `rpl_semi_sync_master_wait_for_slave_count` 149 | 3. 启用`EnforceExactSemiSyncReplicas`(如果该标志未被启用, 则不会触发该分析) 150 | > `EnforceExactSemiSyncReplicas` is enabled (this analysis is not triggered if this flag is not enabled) 151 | 152 | 如果启用了 `EnforceExactSemiSyncReplicas`, 那么 `orchestrator` 将确定所需的半同步拓扑并启用/禁用副本上的半同步(参数)以匹配它. 所需的拓扑由优先级顺序和`rpl_semi_sync_master_timeout`定义. 153 | 154 | 优先级顺序由 `DetectSemiSyncEnforcedQuery`(数字越大优先级越高)、提升规则 ([DetectPromotionRuleQuery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/%E9%85%8D%E7%BD%AE%E5%8F%82%E6%95%B0%E8%AF%A6%E8%A7%A3-%E2%85%A1.md#detectpromotionrulequery)) 和主机名(fallback)定义. 155 | 156 | 如果`EnforceExactSemiSyncReplicas`被禁用(默认), `orchestrator`不会为这种类型的分析调用任何恢复进程。 157 | 158 | 另请参阅[Semi-sync topology](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Discovery%2C%20classifying%20servers.md#semi-sync-topology)文档以获取更多详细信息. 159 | 160 | ### Failures of no interest 161 | 以下场景对 `orchestrator` 来说不感兴趣, 虽然信息和状态可供 `orchestrator` 使用, 但它本身并不识别此类场景为故障; 没有调用检测钩子, 显然也没有尝试恢复: 162 | 163 | * 简单的从库复制异常. 例外: 半同步副本导致 `LockedSemiSyncMaster` 164 | > Failure of simple replicas (*leaves* on the replication topology graph) Exception: semi sync replicas causing `LockedSemiSyncMaster` 165 | * 复制延迟, 甚至更严重. 166 | > Replication lags, even severe. 167 | 168 | ### Visibility 可见性 169 | 最新的分析可通过以下方式获得: 170 | 171 | > An up-to-date analysis is available via: 172 | 173 | * Command line: `orchestrator-client -c replication-analysis` or `orchestrator -c replication-analysis` 174 | * Web API: `/api/replication-analysis` 175 | * Web: `/web/clusters-analysis/` page (`Clusters`->`Failure analysis`). 这提供了一个不完整的问题列表, 只突出了可操作的问题. 176 | 177 | Read next: [Topology recovery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Topology%20recovery.md) 178 | -------------------------------------------------------------------------------- /Setup/配置/Configuration Recovery.md: -------------------------------------------------------------------------------- 1 | # Configuration: Recovery 2 | # [Configuration: recovery](https://github.com/openark/orchestrator/blob/master/docs/configuration-recovery.md) 3 | `orchestrator`将对您的拓扑结构进行故障恢复. 您将指示`orchestrator`对哪些集群进行自动恢复, 哪些集群需要人工来恢复. 您将为`orchestrator`配置钩子以移动VIP, 更新服务发现等. 4 | 5 | 恢复依赖于检测,在[Configuration: Failure detection](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/配置/Configuration%20%20Failure%20detection.md)中讨论过. 6 | 7 | 关于恢复的所有信息, 请参考 [Topology recovery](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Failure%20detection%20%26%20recovery/Topology%20recovery.md) 8 | 9 | 还要考虑到, 你的MySQL拓扑结构本身需要遵循一些规则, 参考[MySQL Configuration](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Recovery.md#mysql-configuration) 10 | 11 | ```yaml 12 | { 13 | "RecoveryPeriodBlockSeconds": 3600, 14 | "RecoveryIgnoreHostnameFilters": [], 15 | "RecoverMasterClusterFilters": [ 16 | "thiscluster", 17 | "thatcluster" 18 | ], 19 | "RecoverIntermediateMasterClusterFilters": [ 20 | "*" 21 | ], 22 | } 23 | ``` 24 | 上述配置中: 25 | 26 | * `orchestrator`将自动恢复所有集群intermediate master故障 27 | * `orchestrator`将自动恢复两个指定集群的主站故障(即`RecoverMasterClusterFilters`中指定的`thiscluster`和`thatcluster`); 其他集群的主库将不会自动恢复. 可以人工触发恢复操作. 28 | * 一旦集群经历了恢复, 协调器将在3600秒(1小时)之后阻止自动恢复. This is an anti-flapping mechanism. 29 | >  什么是anti-flapping?. 可以看[这篇文章](https://medium.com/@jonfinerty/flapping-and-anti-flapping-dcba5ba92a05)间接了解一下 30 | 31 | 请再次注意, 自动恢复是可选的. 32 | 33 | >  Note, again, that automated recovery is *opt in*. 34 | 35 | ### Promotion actions 36 | 不同的环境需要对recovery/promotion采取不同的行动 37 | 38 | ```yaml 39 | { 40 | "ApplyMySQLPromotionAfterMasterFailover": true, 41 | "PreventCrossDataCenterMasterFailover": false, 42 | "PreventCrossRegionMasterFailover": false, 43 | "FailMasterPromotionOnLagMinutes": 0, 44 | "FailMasterPromotionIfSQLThreadNotUpToDate": true, 45 | "DelayMasterPromotionIfSQLThreadNotUpToDate": false, 46 | "MasterFailoverLostInstancesDowntimeMinutes": 10, 47 | "DetachLostReplicasAfterMasterFailover": true, 48 | "MasterFailoverDetachReplicaMasterHost": false, 49 | "MasterFailoverLostInstancesDowntimeMinutes": 0, 官方文档写错了 50 | "PostponeReplicaRecoveryOnLagMinutes": 0, 51 | } 52 | ``` 53 | * `ApplyMySQLPromotionAfterMasterFailover` : 当为`true` 时, `orchestrator` 将在选举出的新主库上执行`reset slave all` 和`set read_only=0` . 默认: `true` . 当该参数为`true` 时, 将覆盖`MasterFailoverDetachSlaveMasterHost` . 54 | * `PreventCrossDataCenterMasterFailover` : 默认`false` . 当为`true` 时, `orchestrator`将只用与故障集群主库位于同一DC的从库替换故障的主库. 它将尽最大努力从同一DC中找到一个替代者, 如果找不到, 将中止(失败)故障转移. 另请参阅`DetectDataCenterQuery`和`DataCenterPattern`配置变量. 55 | * `PreventCrossRegionMasterFailover` : 默认`false` . 当为`true` 时, `orchestrator`将只用与故障集群主库位于同一region的从库替换故障的主库. 它将尽最大努力找到同一region的替代者, 如果找不到, 将中止(失败)故障转移. 另请参阅`DetectRegionQuery`和`RegionPattern`配置变量. 56 | * `FailMasterPromotionOnLagMinutes` : 默认`0` (not failing promotion). 如果候选replica落后太多, 该参数可以用于将选举置为失败状态. 例如: 由于种种原因从库(候选主库)故障了5个小时(数据存在5小时延迟), 随后, 主库出现故障. 这种情况下, 我们可能希望阻止failover, 以便恢复那5个小时的复制延迟. 要使用这个参数, 你必须设置`ReplicationLagQuery` 并使用类似于`pt-heatbeat` 的心跳机制. 因为当复制故障时, `SHOW SLAVE STATUS` 中的`Seconds_behind_master` 不会显示延迟(会显示为Null). 57 | * `FailMasterPromotionIfSQLThreadNotUpToDate` : 如果在故障发生时, 所有的从库都是滞后的, 即使是拥有最新数据的、被选举为新主库的候选节点也可能有未应用的中继日志. 在这样的节点上执行`reset slave all`会丢失中继日志数据. 58 | * `DelayMasterPromotionIfSQLThreadNotUpToDate` : 如果在故障发生时, 所有的从库都是滞后的, 即使是拥有最新数据的、被选举为新主库的候选节点也可能有未应用的中继日志. 当该参数为`true` 时, `orchestrator` 将等待SQL thread应用完所有relay log, 然后再将候选从库提升为新主库. `FailMasterPromotionIfSQLThreadNotUpToDate`和`DelayMasterPromotionIfSQLThreadNotUpToDate`是相互排斥的. 59 | * `DetachLostReplicasAfterMasterFailover` : 一些从库在恢复过程中可能会丢失. 如果该参数为`true`, `orchestrator`将通过`detach-replica`命令强行中断它们的复制, 以确保没有人认为它们是正常的. 60 | >  some replicas may get lost during recovery. When `true`, `orchestrator` will forcibly break their replication via `detach-replica` command to make sure no one assumes they're at all functional. 61 | * `MasterFailoverDetachReplicaMasterHost` : 当该参数为`true` 时, `orchestrator`将对被选举为新主库的节点执行`detach-replica-master-host`(这确保了即使旧主库"复活了", 新主库也不会试图从旧主库复制数据). 默认值: `false`. 如果`ApplyMySQLPromotionAfterMasterFailover`为真,这个参数将失去意义. `MasterFailoverDetachSlaveMasterHost`是它的一个别名. 62 | * `MasterFailoverLostInstancesDowntimeMinutes` :  number of minutes to downtime any server that was lost after a master failover (including failed master & lost replicas). Set to 0 to disable. Default: 0. 63 | >  个人理解为一个"置为下线"的时间. 一个节点发生或failover后, 不应该立即再次进行恢复. MHA也有类似机制, 默认8小时内不能再次failover 64 | * `PostponeReplicaRecoveryOnLagMinutes` : 在崩溃恢复时, 延迟超过给定分钟的副本只会在恢复过程的后期恢复, 在 master/intermediate master 被选择并执行进程之后. 值 0 禁用此功能. 默认值: 0. `PostponeSlaveRecoveryOnLagMinutes` 是它的别名. 65 | 66 | ### Hooks 67 | 在恢复过程中可以使用以下钩子: 68 | 69 | * `PreGracefulTakeoverProcesses` : 在计划中的、优雅的主库接管时, 在主库进入只读状态之前立即执行. 70 | * `PreFailoverProcesses` : 在`orchestrator`采取恢复行动之前立即执行. 任何这些进程的失败(非零退出代码)都会中止恢复. 提示: 这让你有机会根据系统的一些内部状态中止恢复. 71 | * `PostMasterFailoverProcesses` : 在主库恢复成功后执行. 72 | * `PostIntermediateMasterFailoverProcesses` : executed at the end of a successful intermediate master or replication group member with replicas recovery. (属实没看懂这个, 看沃趣文章翻译是: 在intermediate master恢复成功后执行). 73 | * `PostFailoverProcesses` : 在任何成功恢复结束时执行(包括并添加到上述两个 including and adding to the above two). 74 | * `PostUnsuccessfulFailoverProcesses` : 在任何不成功的恢复结束时执行. 75 | * `PostGracefulTakeoverProcesses` : 在计划中的、优雅的主库接管时, 在就主库作为新主库从库后执行. 76 | 77 | 任何以`"&"`结尾的进程命令都将被异步执行,并且此类进程的失败将被忽略 78 | 79 | 以上都是`orchestrator`按定义顺序依次执行的命令列表 80 | 81 | 一个简单的实现可能看起来像: 82 | 83 | ```yaml 84 | { 85 | "PreGracefulTakeoverProcesses": [ 86 | "echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log" 87 | ], 88 | "PreFailoverProcesses": [ 89 | "echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log" 90 | ], 91 | "PostFailoverProcesses": [ 92 | "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log" 93 | ], 94 | "PostUnsuccessfulFailoverProcesses": [], 95 | "PostMasterFailoverProcesses": [ 96 | "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:
 {failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log" 97 | ], 98 | "PostIntermediateMasterFailoverProcesses": [], 99 | "PostGracefulTakeoverProcesses": [ 100 | "echo 'Planned takeover complete' >> /tmp/recovery.log" 101 | ], 102 | } 103 | ``` 104 | #### Hooks arguments and environment 105 | `orchestrator` 为所有钩子提供了与 failure/recovery相关的信息, 例如故障实例的身份、提升实例的身份、受影响的副本、故障类型、集群名称等. 106 | 107 | 这些信息以两种方式独立传递, 您可以选择使用一种或两种方式: 108 | 109 | 1. 环境变量:`orchestrator` 将设置环境以下变量, 您的钩子可以从环境变量中获取这些信息(`orchestrator` will set the following, which can be retrieved by your hooks): 110 | 111 | * `ORC_FAILURE_TYPE` 112 | * `ORC_INSTANCE_TYPE` ("master", "co-master", "intermediate-master") 113 | * `ORC_IS_MASTER` (true/false) 114 | * `ORC_IS_CO_MASTER` (true/false) 115 | * `ORC_FAILURE_DESCRIPTION` 116 | * `ORC_FAILED_HOST` 117 | * `ORC_FAILED_PORT` 118 | * `ORC_FAILURE_CLUSTER` 119 | * `ORC_FAILURE_CLUSTER_ALIAS` 120 | * `ORC_FAILURE_CLUSTER_DOMAIN` 121 | * `ORC_COUNT_REPLICAS` 122 | * `ORC_IS_DOWNTIMED` 123 | * `ORC_AUTO_MASTER_RECOVERY` 124 | * `ORC_AUTO_INTERMEDIATE_MASTER_RECOVERY` 125 | * `ORC_ORCHESTRATOR_HOST` 126 | * `ORC_IS_SUCCESSFUL` 127 | * `ORC_LOST_REPLICAS` 128 | * `ORC_REPLICA_HOSTS` 129 | * `ORC_COMMAND` (`"force-master-failover"`, `"force-master-takeover"`, `"graceful-master-takeover"` if applicable) 130 | 并且, 如果恢复成功, `orchestrator`还将设置以下环境变量: 131 | 132 | * `ORC_SUCCESSOR_HOST` 133 | * `ORC_SUCCESSOR_PORT` 134 | * `ORC_SUCCESSOR_BINLOG_COORDINATES` 135 | * `ORC_SUCCESSOR_ALIAS` 136 | 137 | 138 | 139 | 2. 命令行文本替换. `orchestrator`在你的`*Proccesses`命令中替换了以下神奇的标记. 140 | 141 | * `{failureType}` 142 | * `{instanceType}` ("master", "co-master", "intermediate-master") 143 | * `{isMaster}` (true/false) 144 | * `{isCoMaster}` (true/false) 145 | * `{failureDescription}` 146 | * `{failedHost}` 147 | * `{failedPort}` 148 | * `{failureCluster}` 149 | * `{failureClusterAlias}` 150 | * `{failureClusterDomain}` 151 | * `{countReplicas}` (replaces `{countSlaves}`) 152 | * `{isDowntimed}` 153 | * `{autoMasterRecovery}` 154 | * `{autoIntermediateMasterRecovery}` 155 | * `{orchestratorHost}` 156 | * `{lostReplicas}` (replaces `{lostSlaves}`) 157 | * `{countLostReplicas}` 158 | * `{replicaHosts}` (replaces `{slaveHosts}`) 159 | * `{isSuccessful}` 160 | * `{command}` (`"force-master-failover"`, `"force-master-takeover"`, `"graceful-master-takeover"` if applicable) 161 | 并且, 如果恢复成功, `orchestrator`还将提供以下变量: 162 | 163 | * `{successorHost}` 164 | * `{successorPort}` 165 | * `{successorBinlogCoordinates}` 166 | * `{successorAlias}` 167 | 168 | 169 | ### MySQL Configuration 170 | 您的 MySQL 拓扑必须满足一些要求才能支持故障转移. 这些要求在很大程度上取决于您使用的拓扑/配置类型. 171 | 172 | * Oracle/Percona with GTID: promotable server必须启用`log_bin`和`log_slave_updates`. 复制体必须使用 `AUTO_POSITION=1`(通过 CHANGE MASTER TO MASTER\_AUTO\_POSITION=1). 173 | * MariaDB GTID: promotable server必须启用`log_bin`和`log_slave_updates`. 174 | * [Pseudo GTID](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Various/Pseudo%20GTID.md): promotable server必须启用`log_bin`和`log_slave_updates`. 如果使用`5.7/8.0` 并行复制(parallel replication), 请设置`slave_preserve_commit_order=1` 175 | * BinlogServers: promotable servers must have `log_bin` enabled. 176 | 177 | 还可以考虑通过阅读[MySQL configuration](https://github.com/Fanduzi/orchestrator-zh-doc/blob/master/Setup/%E9%85%8D%E7%BD%AE/Configuration%20%20Failure%20detection.md#mysql-configuration)优化故障检测的一些配置. 178 | -------------------------------------------------------------------------------- /Use/Scripting samples.md: -------------------------------------------------------------------------------- 1 | # Scripting samples 2 | # [Scripting samples](https://github.com/openark/orchestrator/blob/master/docs/script-samples.md) 3 | 本文档介绍了`orchestrator`的脚本用法和思想. 4 | 5 | ### show all clusters with aliases 显示所有有别名的集群 6 | ```bash 7 | $ orchestrator-client -c clusters-alias 8 | mysql-9766.dc1.domain.net:3306,cl1 9 | mysql-0909.dc1.domain.net:3306,olap 10 | mysql-0246.dc1.domain.net:3306,mycluster 11 | mysql-1111.dc1.domain.net:3306,oltp1 12 | mysql-9002.dc1.domain.net:3306,oltp2 13 | mysql-3972.dc1.domain.net:3306,oltp3 14 | mysql-0019.dc1.domain.net:3306,oltp4 15 | ``` 16 | ### Show only aliases 仅展示别名 17 | ```bash 18 | $ orchestrator-client -c clusters-alias | cut -d"," -f2 | sort 19 | cl1 20 | mycluster 21 | olap 22 | oltp1 23 | oltp2 24 | oltp3 25 | oltp4 26 | ``` 27 | #### master of cluster 集群主库 28 | ```bash 29 | $ orchestrator-client -c which-cluster-master -alias mycluster 30 | mysql-0246.dc1.domain.net:3306 31 | ``` 32 | #### All instances of cluster 集群所有实例 33 | ```bash 34 | $ orchestrator-client -c which-cluster-instances -alias mycluster 35 | mysql-0246.dc1.domain.net:3306 36 | mysql-1357.dc2.domain.net:3306 37 | mysql-bb00.dc1.domain.net:3306 38 | mysql-00ff.dc1.domain.net:3306 39 | mysql-8181.dc2.domain.net:3306 40 | mysql-2222.dc1.domain.net:3306 41 | mysql-ecec.dc2.domain.net:3306 42 | ``` 43 | 上述内容表明`orchestrator`对复制拓扑的了解. 列表中包括可能offline/broken的服务器 44 | 45 | #### Shell loop over instances 46 | ```bash 47 | $ orchestrator-client -c which-cluster-instances -alias mycluster | cut -d":" -f 1 | while read h ; do echo "Host is $h" ; done 48 | Host is mysql-0246.dc1.domain.net 49 | Host is mysql-1357.dc2.domain.net 50 | Host is mysql-bb00.dc1.domain.net 51 | Host is mysql-00ff.dc1.domain.net 52 | Host is mysql-8181.dc2.domain.net 53 | Host is mysql-2222.dc1.domain.net 54 | Host is mysql-ecec.dc2.domain.net 55 | ``` 56 | > 也许以后用这个比写SQL查CMDB方便了... 57 | 58 | #### disable semi sync on cluster 59 | ```bash 60 | $ orchestrator-client -c which-cluster-instances -alias mycluster | while read i ; do 61 | orchestrator-client -c disable-semi-sync-master -i $i 62 | done 63 | mysql-0246.dc1.domain.net:3306 64 | mysql-1357.dc2.domain.net:3306 65 | mysql-bb00.dc1.domain.net:3306 66 | mysql-00ff.dc1.domain.net:3306 67 | mysql-8181.dc2.domain.net:3306 68 | mysql-2222.dc1.domain.net:3306 69 | mysql-ecec.dc2.domain.net:3306 70 | ``` 71 | #### enable semi sync on cluster master 72 | ```bash 73 | $ orchestrator-client -c which-cluster-master -alias mycluster | while read i ; do 74 | orchestrator-client -c enable-semi-sync-master -i $i 75 | done 76 | mysql-0246.dc1.domain.net:3306 77 | ``` 78 | #### Let's try again. This time disable semi sync on all instances *except master 在除主库外的所有的实例上禁用半同步* 79 | ```bash 80 | $ master=$(orchestrator-client -c which-cluster-master -alias mycluster) 81 | $ orchestrator-client -c which-cluster-instances -alias mycluster | grep -v $master | while read i ; do 82 | orchestrator-client -c disable-semi-sync-master -i $i 83 | done 84 | ``` 85 | #### Likewise, set read-only on all replicas 在所有的副本上设置只读 86 | ```bash 87 | $ orchestrator-client -c which-cluster-instances -alias mycluster | grep -v $master | while read i ; do 88 | orchestrator-client -c set-read-only -i $i 89 | done 90 | ``` 91 | #### We don't really need to loop. We can use ccql 92 | [ccql](https://github.com/github/ccql)是一个并发的、多服务器的MySQL客户端. 它与一般的脚本, 特别是与`orchestrator`发挥得很好 93 | 94 | ```bash 95 | $ orchestrator-client -c which-cluster-instances -alias mycluster | grep -v $master | ccql -C ~/.my.cnf -q "set global read_only=1" 96 | ``` 97 | #### Extract master hostname (no ":3306") 98 | ```bash 99 | $ master_host=$(orchestrator-client -c which-cluster-master -alias mycluster | cut -d":" -f1) 100 | $ echo $master_host 101 | mysql-0246.dc1.domain.net 102 | ``` 103 | We will use the `master_host` variable following. 104 | 105 | #### Using the API to show all data of a specific host 106 | ```json 107 | $ orchestrator-client -c api -path instance/$master_host/3306 | jq . 108 | { 109 | "Key": { 110 | "Hostname": "mysql-0246.dc1.domain.net", 111 | "Port": 3306 112 | }, 113 | "InstanceAlias": "", 114 | "Uptime": 12203, 115 | "ServerID": 65884260, 116 | "ServerUUID": "3e87bd92-2be0-13e8-ac1b-008cda544064", 117 | "Version": "5.7.18-log", 118 | "VersionComment": "MySQL Community Server (GPL)", 119 | "FlavorName": "MySQL", 120 | "ReadOnly": false, 121 | "Binlog_format": "ROW", 122 | "BinlogRowImage": "FULL", 123 | "LogBinEnabled": true, 124 | "LogReplicationUpdatesEnabled": true, 125 | "SelfBinlogCoordinates": { 126 | "LogFile": "mysql-bin.000002", 127 | "LogPos": 333006336, 128 | "Type": 0 129 | }, 130 | "MasterKey": { 131 | "Hostname": "", 132 | "Port": 0 133 | }, 134 | "IsDetachedMaster": false, 135 | "ReplicationSQLThreadRuning": false, 136 | "ReplicationIOThreadRuning": false, 137 | "HasReplicationFilters": false, 138 | "GTIDMode": "OFF", 139 | "SupportsOracleGTID": false, 140 | "UsingOracleGTID": false, 141 | "UsingMariaDBGTID": false, 142 | "UsingPseudoGTID": true, 143 | "ReadBinlogCoordinates": { 144 | "LogFile": "", 145 | "LogPos": 0, 146 | "Type": 0 147 | }, 148 | "ExecBinlogCoordinates": { 149 | "LogFile": "", 150 | "LogPos": 0, 151 | "Type": 0 152 | }, 153 | "IsDetached": false, 154 | "RelaylogCoordinates": { 155 | "LogFile": "", 156 | "LogPos": 0, 157 | "Type": 1 158 | }, 159 | "LastSQLError": "", 160 | "LastIOError": "", 161 | "SecondsBehindMaster": { 162 | "Int64": 0, 163 | "Valid": false 164 | }, 165 | "SQLDelay": 0, 166 | "ExecutedGtidSet": "", 167 | "GtidPurged": "", 168 | "ReplicationLagSeconds": { 169 | "Int64": 0, 170 | "Valid": true 171 | }, 172 | "Replicas": [ 173 | { 174 | "Hostname": "mysql-2222.dc1.domain.net", 175 | "Port": 3306 176 | }, 177 | { 178 | "Hostname": "mysql-00ff.dc1.domain.net", 179 | "Port": 3306 180 | }, 181 | { 182 | "Hostname": "mysql-1357.dc2.domain.net", 183 | "Port": 3306 184 | } 185 | ], 186 | "ClusterName": "mysql-0246.dc1.domain.net:3306", 187 | "SuggestedClusterAlias": "mycluster", 188 | "DataCenter": "dc1", 189 | "PhysicalEnvironment": "", 190 | "ReplicationDepth": 0, 191 | "IsCoMaster": false, 192 | "HasReplicationCredentials": false, 193 | "ReplicationCredentialsAvailable": false, 194 | "SemiSyncEnforced": false, 195 | "SemiSyncMasterEnabled": true, 196 | "SemiSyncReplicaEnabled": false, 197 | "LastSeenTimestamp": "2018-03-21 04:40:38", 198 | "IsLastCheckValid": true, 199 | "IsUpToDate": true, 200 | "IsRecentlyChecked": true, 201 | "SecondsSinceLastSeen": { 202 | "Int64": 2, 203 | "Valid": true 204 | }, 205 | "CountMySQLSnapshots": 0, 206 | "IsCandidate": false, 207 | "PromotionRule": "neutral", 208 | "IsDowntimed": false, 209 | "DowntimeReason": "", 210 | "DowntimeOwner": "", 211 | "DowntimeEndTimestamp": "", 212 | "ElapsedDowntime": 0, 213 | "UnresolvedHostname": "", 214 | "AllowTLS": false, 215 | "LastDiscoveryLatency": 7233416 216 | } 217 | ``` 218 | #### Extract the hostname from the JSON: 219 | ```bash 220 | $ orchestrator-client -c api -path instance/$master_host/3306 | jq .Key.Hostname -r 221 | mysql-0246.dc1.domain.net 222 | ``` 223 | #### Extract master's hostname from the JSON: 224 | ```bash 225 | $ orchestrator-client -c api -path instance/$master_host/3306 | jq .MasterKey.Hostname -r 226 | 227 | (empty, this is the master) 228 | ``` 229 | #### Another way of listing all hostnames in a cluster: using API and jq 230 | ```bash 231 | $ orchestrator-client -c api -path cluster/alias/mycluster | jq .[].Key.Hostname -r 232 | mysql-0246.dc1.domain.net 233 | mysql-1357.dc2.domain.net 234 | mysql-bb00.dc1.domain.net 235 | mysql-00ff.dc1.domain.net 236 | mysql-8181.dc2.domain.net 237 | mysql-2222.dc1.domain.net 238 | mysql-ecec.dc2.domain.net 239 | ``` 240 | #### Show the master host for each member in the cluster: 241 | ```Plain Text 242 | $ orchestrator-client -c api -path cluster/alias/mycluster | jq .[].MasterKey.Hostname -r 243 | 244 | mysql-0246.dc1.domain.net 245 | mysql-00ff.dc1.domain.net 246 | mysql-0246.dc1.domain.net 247 | mysql-bb00.dc1.domain.net 248 | mysql-0246.dc1.domain.net 249 | mysql-bb00.dc1.domain.net 250 | ``` 251 | #### What is the master hostname of a specific instance? 252 | ```bash 253 | $ orchestrator-client -c api -path instance/mysql-bb00.dc1.domain.net/3306 | jq .MasterKey.Hostname -r 254 | mysql-00ff.dc1.domain.net 255 | ``` 256 | #### How many replicas to a specific instance? 257 | ```bash 258 | $ orchestrator-client -c api -path instance/$master_host/3306 | jq '.Replicas | length' 259 | 3 260 | ``` 261 | #### How many replicas to each of a cluster's members? 262 | ```bash 263 | $ orchestrator-client -c api -path cluster/alias/mycluster | jq '.[].Replicas | length' 264 | 3 265 | 0 266 | 2 267 | 1 268 | 0 269 | 0 270 | 0 271 | ``` 272 | #### Another way of listing all replicas 273 | We filter out those that don't have output for `show slave status`: 274 | 275 | ```bash 276 | $ orchestrator-client -c which-cluster-instances -alias mycluster | ccql -C ~/.my.cnf -q "show slave status" | awk '{print $1}' 277 | mysql-00ff.dc1.domain.net:3306 278 | mysql-bb00.dc1.domain.net:3306 279 | mysql-2222.dc1.domain.net:3306 280 | mysql-ecec.dc2.domain.net:3306 281 | mysql-1357.dc2.domain.net:3306 282 | mysql-8181.dc2.domain.net:3306 283 | ``` 284 | #### Followup, restart replication on all cluster's instances 285 | ```bash 286 | $ orchestrator-client -c which-cluster-instances -alias mycluster | ccql -C ~/.my.cnf -q "show slave status" | awk '{print $1}' | ccql -C ~/.my.cnf -q "stop slave; start slave;" 287 | ``` 288 | #### I'd like to apply changes to replication, without changing the replica's state (if it's running, I want it to keep running. If it's not running, I don't want to start replication) 289 | 我想在不改变复制状态的情况下对复制进行修改(如果它正在运行,我希望它能继续运行。如果它没有运行,我就不想启动复制) 290 | 291 | ```bash 292 | $ orchestrator-client -c restart-replica-statements -i mysql-bb00.dc1.domain.net -query "change master to auto_position=1" | jq .[] -r 293 | stop slave io_thread; 294 | stop slave sql_thread; 295 | change master to auto_position=1; 296 | start slave sql_thread; 297 | start slave io_thread; 298 | ``` 299 | Compare with: 300 | 301 | ```bash 302 | $ orchestrator-client -c stop-replica -i mysql-bb00.dc1.domain.net 303 | mysql-bb00.dc1.domain.net:3306 304 | 305 | $ orchestrator-client -c restart-replica-statements -i mysql-bb00.dc1.domain.net -query "change master to auto_position=1" | jq .[] -r 306 | change master to auto_position=1; 307 | ``` 308 | 上面只是输出语句,我们需要把它们推回给服务器 309 | 310 | ```bash 311 | orchestrator-client -c restart-replica-statements -i mysql-bb00.dc1.domain.net -query "change master to auto_position=1" | jq .[] -r | mysql -h mysql-bb00.dc1.domain.net 312 | ``` 313 | #### In which DC (data center) is a specific instance? 314 | 这个问题和下一个问题假设`DetectDataCenterQuery`或`DataCenterPattern`已经配置好. 315 | 316 | ```bash 317 | $ orchestrator-client -c api -path instance/mysql-bb00.dc1.domain.net/3306 | jq '.DataCenter' 318 | dc1 319 | ``` 320 | #### In which DCs is a cluster deployed, and how many hosts in each DC? 321 | ```bash 322 | $ orchestrator-client -c api -path cluster/mycluster | jq '.[].DataCenter' -r | sort | uniq -c 323 | 4 dc1 324 | 3 dc2 325 | ``` 326 | #### Which replicas are replicating cross DC? 327 | ```bash 328 | $ orchestrator-client -c api -path cluster/mycluster | 329 | jq '.[] | select(.MasterKey.Hostname != "") | 330 | (.Key.Hostname + ":" + (.Key.Port | tostring) + " " + .DataCenter + " " + .MasterKey.Hostname + "/" + (.MasterKey.Port | tostring))' -r | 331 | while read h dc m ; do 332 | orchestrator-client -c api -path "instance/$m" | jq '.DataCenter' -r | 333 | { read master_dc ; [ "$master_dc" != "$dc" ] && echo $h ; } ; 334 | done 335 | 336 | mysql-bb00.dc1.domain.net:3306 337 | mysql-8181.dc2.domain.net:3306 338 | ``` 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | --------------------------------------------------------------------------------