├── README.md └── atlas_auto_setline.pl /README.md: -------------------------------------------------------------------------------- 1 | Atlas_auto_setline 2 | ================== 3 | 4 | a tool for automatic offline/online unusable slave node in Atlas open source software 5 | 6 | 此脚本配合360 Atlas中间件的使用, 检测slave状况(延迟或slavethread错误),自动上线或下线存在于Atlas admin接口里的slave节点; 7 | 8 | 9 | - 不对master做改动,仅检测slave信息; 10 | - 支持多个slave, 详见 perldoc atlas_auto_setline说明; 11 | - 多个atlas端口必须是同一实例下的; 12 | - 新加循环检测, 默认每10s检测一次, 在上下线过程中忽略kill的INT和TERM两个信号; 13 | 14 | 15 | 需要安装的依赖: 16 | ``` 17 | DBI 18 | DBD::mysql 19 | Config::Auto 20 | ``` 21 | 22 | db.conf文件配置(单实例下的多个库)举例,: 23 | ``` 24 | #slave host and atlas admin host info. 25 | slave_host:172.30.0.15,172.30.0.16 #多台slave以','分隔 26 | slave_port:3306 #slave 服务端口 27 | slave_user:slave_user #可以检测slave 延迟状态的用户 28 | slave_pass:xxxxxx #slave_user口令 29 | atlas_host:172.30.0.18 #atlas对外服务的ip, 建议是虚ip 30 | atlas_port:5012 #atlas对外服务的管理端口, 一个atlas的mysql-proxyd占用一个端口, 如果起了多个, 以','分隔指定多个端口 31 | atlas_user:admin #atlas的管理账户 32 | atlas_pass:xxxxxxx #atlas管理账户的口令信息 33 | mail:chenzhe07@gmail.com 34 | ``` 35 | atlas_port和atlas_user和atlas_pass三个参数应该指定atlas的管理端口和管理的用户信息, 用于读取 atlas 的后端状态backend. 36 | 37 | 可添加到任务计划循环检测, 如下: 38 | 39 | ``` 40 | #!/bin/bash 41 | ( 42 | flock -x -n 200 43 | if [[ $? -ne 0 ]]; then 44 | echo "Failed acquiring lock" 45 | exit 1 46 | fi 47 | perl atlas_auto_setline.pl --conf=db.conf --verbose --setline --interval=10 >>setline.log 2>&1 48 | ) 200>/web/scripts/atlas_auto/atlas.lock 49 | ``` 50 | 测试说明: 51 | ========= 52 | 53 | ### 关闭SQL_THREAD: 54 | ``` 55 | mysql> select * from backends; 56 | +-------------+-------------------+-------+------+ 57 | | backend_ndx | address | state | type | 58 | +-------------+-------------------+-------+------+ 59 | | 1 | 172.30.0.14:3306 | up | rw | 60 | | 2 | 172.30.0.14:3306 | up | ro | 61 | | 3 | 172.30.0.15:3306 | up | ro | 62 | | 4 | 172.30.0.16:3306 | up | ro | 63 | +-------------+-------------------+-------+------+ 64 | 4 rows in set (0.00 sec) 65 | ``` 66 | 67 | 停止ip为16的slave的复制线程后(多个端口,多个offline操作): 68 | ``` 69 | # perl atlas_auto_setline.pl --conf=db.conf --verbose --setline --threshold=30 70 | +---2014-04-15 11:53:01, 172.30.0.15, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 13 71 | +---2014-04-15 11:53:01, 172.30.0.16, Slave_IO_Running: No, Slave_SQL_Running: No, Seconds_Behind_Master: NULL 72 | +-- 2014-04-15 11:53:01 OK SET offline node 172.30.0.16:3306 73 | ``` 74 | atlas下线: 75 | ``` 76 | mysql> select * from backends; 77 | +-------------+-------------------+-------+------+ 78 | | backend_ndx | address | state | type | 79 | +-------------+-------------------+-------+------+ 80 | | 1 | 172.30.0.14:3306 | up | rw | 81 | | 2 | 172.30.0.14:3306 | up | ro | 82 | | 3 | 172.30.0.15:3306 | up | ro | 83 | | 4 | 172.30.0.16:3306 | offline| ro | 84 | +-------------+-------------------+-------+------+ 85 | 4 rows in set (0.00 sec) 86 | ``` 87 | 88 | ### 启动SQL_THREAD: 89 | ``` 90 | # perl atlas_auto_setline.pl --conf=db.conf --verbose --setline --threshold=30 91 | +---2014-04-15 11:54:01, 172.30.0.15, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 92 | +---2014-04-15 11:54:01, 172.30.0.16, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 93 | +-- 2014-04-15 11:54:01 OK SET online node 172.30.0.16:5012 94 | ``` 95 | 96 | 手工offline一个节点: 97 | =================== 98 | ``` 99 | mysql> set offline 4; 100 | +-------------+------------------+---------+------+ 101 | | backend_ndx | address | state | type | 102 | +-------------+------------------+---------+------+ 103 | | 3 | 172.30.0.16:3306 | offline | ro | 104 | +-------------+------------------+---------+------+ 105 | 106 | 1 row in set (0.00 sec) 107 | 108 | 109 | mysql> select * from backends; 110 | +-------------+-------------------+-------+------+ 111 | | backend_ndx | address | state | type | 112 | +-------------+-------------------+-------+------+ 113 | | 1 | 172.30.0.14:3306 | up | rw | 114 | | 2 | 172.30.0.14:3306 | up | ro | 115 | | 3 | 172.30.0.15:3306 | up | ro | 116 | | 4 | 172.30.0.16:3306 | offline| ro | 117 | +-------------+-------------------+-------+------+ 118 | 4 rows in set (0.00 sec) 119 | ``` 120 | 121 | ### 运行脚本使其上线: 122 | ``` 123 | # perl atlas_auto_setline.pl --conf=db.conf --verbose --setline --threshold=30 124 | +---2014-04-15 11:56:01, 172.30.0.15, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 125 | +---2014-04-15 11:56:01, 172.30.0.16, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 126 | +-- 2014-04-15 11:56:01 OK SET online node 172.30.0.16:5012 127 | ``` 128 | 上线成功: 129 | 130 | 循环检测 131 | ======== 132 | ``` 133 | # perl atlas_auto_setline.pl --conf=db.conf --verbose --setline --threshold=30 --interval=10 134 | +---2014-09-22 16:22:42, 172.30.0.154, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 135 | +---2014-09-22 16:22:42, 172.30.0.133, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 136 | +---2014-09-22 16:22:52, 172.30.0.154, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 137 | +---2014-09-22 16:22:52, 172.30.0.133, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 138 | +---2014-09-22 16:23:02, 172.30.0.154, Slave_IO_Running: No, Slave_SQL_Running: No, Seconds_Behind_Master: NULL 139 | +-- 2014-09-22 16:23:02 OK SET offline node 172.30.0.154:5012 140 | +---2014-09-22 16:23:02, 172.30.0.133, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 141 | +---2014-09-22 16:23:12, 172.30.0.154, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 142 | +-- 2014-09-22 16:23:12 OK SET online node 172.30.0.154:5012 143 | +---2014-09-22 16:23:12, 172.30.0.133, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 144 | +---2014-09-22 16:23:22, 172.30.0.154, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 145 | +---2014-09-22 16:23:22, 172.30.0.133, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 146 | +---2014-09-22 16:23:32, 172.30.0.154, Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0 147 | ``` 148 | 说明 149 | ==== 150 | 151 | MySQL Server 版本为 5.6, 运行脚本的时候会有安全提示 ```Warning: Using a password on the command line interface can be insecure.```, 这个错误信息可以忽略, 不影响脚本的执行. perl DBI驱动不兼容atlas的管理端口, 只能通过 mysql -h 的方式调用取到结果, 5.6 的警告信息可以忽略, 也可以在my.cnf 配置里[client]设置密码信息, 然后脚本里去掉 -p$pass这部分就可以避免出现错误信息, 或使用 2>/dev/null 将错误重定向. 152 | 增加隐藏连接 MySQL 的密码信息功能, 参考文章 [safe-bash-with-mysql](http://highdb.com/%E5%A6%82%E4%BD%95%E5%AE%89%E5%85%A8%E7%9A%84%E4%BD%BF%E7%94%A8-bash-%E6%93%8D%E4%BD%9C-mysql/) 153 | -------------------------------------------------------------------------------- /atlas_auto_setline.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | =pod 3 | 4 | =head1 NAME 5 | 6 | atlas_auto_setline: a tool for automatic offline/online unusable slave node in Atlas open source software. 7 | 8 | =head1 SYNOPSIS 9 | 10 | Usage: atlas_auto_setline [OPTION...] 11 | 12 | perl atlas_auto_setline.pl --conf=db.conf --verbose --setline 13 | 14 | atlas_auto_setline can help you monit the Atlas middleware, online/offline the slave node when slave either error or ok. 15 | 16 | =head1 RISKS 17 | Use the slave Seconds_Behind_Master value to determine whether offline or not, this maybe not accurately. 18 | 19 | Offline function connect to Atlas admin interface to select the slave id which should be off. 20 | 21 | Slave ip address in db.conf file should be the ip address that in slave node, not atlas node address. 22 | 23 | user/pass should be the same either in slave or atlas. 24 | 25 | =cut 26 | 27 | use strict; 28 | use warnings; 29 | use Getopt::Long; 30 | use DBI; 31 | use DBD::mysql; 32 | use Data::Dumper; 33 | use POSIX qw(strftime); 34 | use Config::Auto; 35 | 36 | my $help = 0; 37 | my $conf = "db.conf"; 38 | my $setline = 0; 39 | my $verbose = 0; 40 | my $version = 0; 41 | my $threshold= 30; 42 | my $interval = 10; 43 | 44 | my $VER = '0.0.2'; 45 | 46 | GetOptions( 47 | "conf=s" => \$conf, 48 | "help!" => \$help, 49 | "setline!" => \$setline, 50 | "verbose!" => \$verbose, 51 | "version!" => \$version, 52 | "threshold=i"=> \$threshold, 53 | "interval=i" => \$interval, 54 | ); 55 | 56 | sub usage { 57 | my $name = shift; 58 | system("perldoc $name"); 59 | exit 0; 60 | } 61 | 62 | 63 | sub mysql_setup { 64 | my $command = `which mysql`; 65 | chomp($command); 66 | if (! -e $command) { 67 | print "Unable to fine mysql command in your \$PATH.\n"; 68 | exit 1; 69 | } 70 | } 71 | 72 | 73 | 74 | sub get_slave_status { 75 | my ($host, $port, $user, $pass, $threshold) = @_; 76 | my $cur_time = strftime( "%Y-%m-%d %H:%M:%S", localtime(time) ); 77 | 78 | # slave status. 79 | my %slave; 80 | my @slave_info; 81 | 82 | eval { 83 | @slave_info = `printf \\ 84 | "%s\n" \\ 85 | "\[client\]" \\ 86 | "user=$user" \\ 87 | "password=$pass" \\ 88 | "host=$host" \\ 89 | "port=$port" \\ 90 | "database=information_schema" \\ 91 | | mysql --defaults-file=/dev/stdin -Bse 'show slave status\\G' 92 | `; 93 | }; 94 | if ($@ or not grep { /Slave/i} @slave_info) { 95 | print " +-- error in get slave $host:$port info. $@\n"; 96 | return 'ERR'; 97 | } else { 98 | foreach my $line (@slave_info) { 99 | next if $line =~ /1\. row/; 100 | $line =~ /([a-zA-Z_]*):\s(.*)/; 101 | $slave{$1} = $2; 102 | } 103 | print " +---$cur_time, $host, Slave_IO_Running: " . $slave{'Slave_IO_Running'} . 104 | ", Slave_SQL_Running: " . $slave{'Slave_SQL_Running'} . 105 | ", Seconds_Behind_Master: " . $slave{'Seconds_Behind_Master'} . 106 | "\n" if $verbose; 107 | if ($slave{'Slave_IO_Running'} eq 'Yes' and $slave{'Slave_SQL_Running'} eq 'Yes' and $slave{'Seconds_Behind_Master'} + 0 < $threshold) { 108 | return 'OK'; 109 | } else { 110 | return 'ERR'; 111 | } 112 | } 113 | } 114 | 115 | 116 | #+-------------+-------------------+-------+------+ 117 | #| backend_ndx | address | state | type | 118 | #+-------------+-------------------+-------+------+ 119 | #| 1 | 172.30.0.153:3306 | up | rw | 120 | #| 2 | 172.30.0.153:3306 | up | ro | 121 | #| 3 | 172.30.0.154:3306 | up | ro | 122 | #| 4 | 172.30.0.133:3306 | up | ro | 123 | #+-------------+-------------------+-------+------+ 124 | sub atlas_ends { 125 | my ($host, $port, $user, $pass, $slave_host) = @_; 126 | my @atlas_state; 127 | eval { 128 | @atlas_state = `mysql -h $host -P $port -u$user -p$pass -Bse 'select * from backends'`; 129 | }; 130 | 131 | if($@ or not grep { /ro/i } @atlas_state) { 132 | print "+-- connect to atlas $host:$port error: $@\n"; 133 | return; 134 | } else { 135 | my %admin_state; 136 | foreach my $line (@atlas_state) { 137 | next if $line !~ /$slave_host/; 138 | $line =~ /(\d+)\s+(.+)\s+(.+)\s+(.+)/; 139 | $admin_state{$port}{'id'} = $1; 140 | $admin_state{$port}{'port'} = $port; 141 | $admin_state{$port}{'state'} = $3; 142 | $admin_state{$port}{'type'} = $4; 143 | } 144 | return \%admin_state; 145 | } 146 | } 147 | 148 | sub atlas_setline { 149 | my ($tag,$slave_msg, $atlashost, $port, $user, $pass, $id) = @_; 150 | my $cur_time = strftime( "%Y-%m-%d %H:%M:%S", localtime(time) ); 151 | eval { 152 | if ($tag eq 'offline') { 153 | my @off = `mysql -h $atlashost -P $port -u$user -p$pass -e "SET OFFLINE $id"`; 154 | } 155 | 156 | if ($tag eq 'online') { 157 | my @on = `mysql -h $atlashost -P $port -u$user -p$pass -e "SET ONLINE $id"`; 158 | } 159 | }; 160 | if ($@) { 161 | print " +-- $cur_time SET $tag - $slave_msg ERR :$@\n"; 162 | send_msg("$cur_time SET $tag - $slave_msg ERR"); 163 | } else { 164 | print " +-- $cur_time OK SET $tag node $slave_msg\n" ; 165 | send_msg("$cur_time OK SET $tag node $slave_msg"); 166 | } 167 | } 168 | 169 | #SIG{'INT'} and SIG{'TERM'} should be ignored when do set online/offline progress. 170 | sub catch_sig { 171 | my $signame = shift; 172 | local $SIG{$signame} = 'IGNORE' if $signame eq 'INT' or $signame eq 'TERM'; 173 | our $halt = 1; 174 | print STDOUT "+-- signal $signame was ignored when in the online/offline progress.\n"; 175 | return $SIG{$signame}; 176 | } 177 | 178 | if ($help) { 179 | usage($0); 180 | } 181 | 182 | if ($version) { 183 | print "Current version : $VER\n"; 184 | exit 0; 185 | } 186 | 187 | $conf = "./$conf" if $conf && $conf =~ /^[^\/]/; 188 | my $config = Config::Auto::parse("$conf"); 189 | my $port_ref = $config->{'atlas_port'}; 190 | my $host_ref = $config->{'slave_host'}; 191 | my $mail_ref = $config->{'mail'}; 192 | 193 | my @port; 194 | if (ref($port_ref) eq "ARRAY") { 195 | foreach my $adminport (@$port_ref) { 196 | push @port, $adminport; 197 | } 198 | } else { 199 | push @port, $port_ref; 200 | } 201 | 202 | my @slave_host; 203 | if (ref($host_ref) eq "ARRAY") { 204 | foreach my $host (@$host_ref) { 205 | push @slave_host, $host; 206 | } 207 | } else { 208 | push @slave_host, $host_ref; 209 | } 210 | 211 | my @mail; 212 | if (ref($mail_ref) eq "ARRAY") { 213 | foreach my $recv (@$mail_ref) { 214 | push @mail, $recv; 215 | } 216 | } else { 217 | push @mail, $mail_ref; 218 | } 219 | 220 | sub send_msg { 221 | my $data = join("\n", map { $_ = '+-- ' . $_ } @_); 222 | my $to = join( ' ', @mail); 223 | eval { 224 | `echo "$data" | /bin/mail -r "atlas\@setline.com" -s "atlas auto setline" $to`; 225 | }; 226 | 227 | if ( $@ ) { 228 | warn "error send: $@"; 229 | } 230 | } 231 | 232 | mysql_setup; 233 | 234 | while(1) { 235 | sleep($interval) if $interval; 236 | foreach my $slavehost (@slave_host) { 237 | my $state = get_slave_status($slavehost, $config->{'slave_port'}, 238 | $config->{'slave_user'}, $config->{'slave_pass'}, $threshold); 239 | 240 | my $slave_msg = $slavehost . ":" . $config->{'slave_port'}; 241 | { 242 | local $SIG{'INT'} = \&catch_sig; 243 | local $SIG{'TERM'} = \&catch_sig; 244 | for my $atlas_port (@port) { 245 | my $atlas_info = atlas_ends($config->{'atlas_host'}, $atlas_port, 246 | $config->{'atlas_user'}, $config->{'atlas_pass'}, $slavehost); 247 | if ( $atlas_info ) { 248 | #set offline when slave has error but atlas is ok. 249 | if ( $state eq 'ERR' and $atlas_info->{$atlas_port}->{'port'} + 0 == $atlas_port 250 | and $atlas_info->{$atlas_port}->{'state'} eq 'up' 251 | and $atlas_info->{$atlas_port}->{'type'} eq 'ro') { 252 | 253 | atlas_setline('offline', $slave_msg, $config->{'atlas_host'}, $atlas_port, 254 | $config->{'atlas_user'}, $config->{'atlas_pass'}, 255 | $atlas_info->{$atlas_port}->{'id'}) if $setline; 256 | } 257 | 258 | #set online when slave is ok but atlas is error. 259 | if ( $state eq 'OK' and $atlas_info->{$atlas_port}->{'port'} + 0 == $atlas_port 260 | and $atlas_info->{$atlas_port}->{'state'} eq 'offline' 261 | and $atlas_info->{$atlas_port}->{'type'} eq 'ro') { 262 | 263 | atlas_setline('online', $slave_msg, $config->{'atlas_host'}, $atlas_port, 264 | $config->{'atlas_user'}, $config->{'atlas_pass'}, 265 | $atlas_info->{$atlas_port}->{'id'}) if $setline; 266 | } 267 | } 268 | } 269 | } 270 | } 271 | } 272 | 273 | =pod 274 | 275 | =head1 DESCRIPTION 276 | 277 | automatic set online/offline when slave node has error or delay. 278 | 279 | =head1 OPTIONS 280 | 281 | =over 282 | 283 | =item --conf 284 | 285 | type: var:value 286 | 287 | Specifies slave and atlas source configuration: 288 | virtual ip address is recommonded. 289 | eg: 290 | #slave host and atlas admin host info. 291 | slave_host:172.30.0.15,172.30.0.16 #multi slave hosts, split with ','. 292 | slave_port:3306 #slave service port 293 | slave_user:slave_user #slave user, which can detect slave lag info. 294 | slave_pass:xxxxxx #slave_user password 295 | atlas_host:172.30.0.18 #atlas service ip address, virtual ip is recommended. 296 | atlas_port:5012 #atlas service port, one mysql_proxyd one port 297 | atlas_user:admin #atlas user 298 | atlas_pass:xxxxxxx #atlas user password 299 | 300 | =item --setline 301 | 302 | Enable set online/offline mode 303 | 304 | =item --verbose 305 | 306 | type: integer 307 | 308 | Whether print slave check info or not. 309 | 310 | =item --version 311 | 312 | type: integer 313 | 314 | Version of this script. 315 | 316 | =item --threshold 317 | 318 | type: integer 319 | 320 | set offline node if slave lag greater than threshold value, default 30s. 321 | 322 | =item --interval 323 | 324 | type: integer 325 | 326 | check every interval seconds. 327 | 328 | =back 329 | 330 | =head1 SYSTEM REQUIREMENTS 331 | 332 | DBI, DBD::mysql, Config::Auto, Getopt::Long 333 | 334 | 335 | =head1 BUGS 336 | 337 | =head1 SEE ALSO 338 | 339 | related tasks 340 | 341 | =head1 AUTHOR 342 | 343 | zhe.chen 344 | 345 | =head1 CHANGELOG 346 | 347 | v0.0.1 initial version 348 | 349 | =cut 350 | --------------------------------------------------------------------------------