├── Cluster.txt ├── Files.rar ├── HAProxy.txt ├── Hadoop.txt ├── High Performace Web .txt ├── LAMP(new).txt ├── LEMP.txt ├── LPT & Virtulization.txt ├── LVS Web php session memcached.txt ├── Linux Adminitration.txt ├── Linux Basics.txt ├── Linuxfiles11.rar ├── Little Linux.txt ├── Mini Linux Scripts.txt ├── MySQL 5.6 Replication.txt ├── MySQL Compiling and Installation.txt ├── MySQL Optimization.txt ├── MySQL.txt ├── Nagios.txt ├── Nginx On rhel6.4.txt ├── Nginx.txt ├── Openstack folsom installation.txt ├── README.md ├── RHCS.txt ├── Scalable System Design.txt ├── Services and Security(2).txt ├── Services and Security.txt ├── Tomcat.txt ├── Xtrabackup.txt ├── adduser.sh ├── adduser2.sh ├── addusers.sh ├── adminusers.sh ├── adminusers2.sh ├── adminusers3.sh ├── awk bascis.txt ├── awk.txt ├── bash.sh ├── bash2.sh ├── bincp.sh ├── calc.sh ├── case.sh ├── check_cpu.sh ├── check_cpu.sh.bak ├── check_mem.pl ├── check_mem.sh ├── check_mem.sh.bak ├── corosync.txt ├── debug.sh ├── delusers.sh ├── drbd+pacemaker.txt ├── drbd.txt ├── filetest.sh ├── filetest2.sh ├── filetest3.sh ├── first.sh ├── git.txt ├── hello.sh ├── initrd.gz ├── iscsi configuration.txt ├── jiaowu.sql ├── keepalived.txt ├── kernel-2.6.38.5-i686.cfg ├── lsof.txt ├── lvs.txt ├── make iso image.txt ├── memcached.txt ├── pcnet32.ko ├── postfix(10th).txt ├── postfix.txt ├── postfix_new.txt ├── quit.sh ├── random.sh ├── rc.functions ├── readme.txt ├── second.sh ├── service.sh ├── shift.sh ├── showlogged.sh ├── showshells.sh ├── showsum.sh ├── showusers.sh ├── showusers2.sh ├── sum.sh ├── sysroot.5.gz ├── sysroot.8.gz ├── testuser.sh ├── testuser2.sh ├── third.sh ├── tiny.2.gz ├── tiny.new1.gz ├── tiny.new3.gz ├── uptime ├── usertest.sh ├── varnish.txt ├── vmlinuz ├── vsftpd+pam+mysql(1).txt ├── vsftpd+pam+mysql.txt └── 开篇.txt /Files.rar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/Files.rar -------------------------------------------------------------------------------- /LAMP(new).txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/LAMP(new).txt -------------------------------------------------------------------------------- /LEMP.txt: -------------------------------------------------------------------------------- 1 | 2 | 传统上基于进程或线程模型架构的web服务通过每进程或每线程处理并发连接请求,这势必会在网络和I/O操作时产生阻塞,其另一个必然结果则是对内存或CPU的利用率低下。生成一个新的进程/线程需要事先备好其运行时环境,这包括为其分配堆内存和栈内存,以及为其创建新的执行上下文等。这些操作都需要占用CPU,而且过多的进程/线程还会带来线程抖动或频繁的上下文切换,系统性能也会由此进一步下降。 3 | 4 | 在设计的最初阶段,nginx的主要着眼点就是其高性能以及对物理计算资源的高密度利用,因此其采用了不同的架构模型。受启发于多种操作系统设计中基于“事件”的高级处理机制,nginx采用了模块化、事件驱动、异步、单线程及非阻塞的架构,并大量采用了多路复用及事件通知机制。在nginx中,连接请求由为数不多的几个仅包含一个线程的进程worker以高效的回环(run-loop)机制进行处理,而每个worker可以并行处理数千个的并发连接及请求。 5 | 6 | 如果负载以CPU密集型应用为主,如SSL或压缩应用,则worker数应与CPU数相同;如果负载以IO密集型为主,如响应大量内容给客户端,则worker数应该为CPU个数的1.5或2倍。 7 | 8 | Nginx会按需同时运行多个进程:一个主进程(master)和几个工作进程(worker),配置了缓存时还会有缓存加载器进程(cache loader)和缓存管理器进程(cache manager)等。所有进程均是仅含有一个线程,并主要通过“共享内存”的机制实现进程间通信。主进程以root用户身份运行,而worker、cache loader和cache manager均应以非特权用户身份运行。 9 | 10 | 主进程主要完成如下工作: 11 | 1. 读取并验正配置信息; 12 | 2. 创建、绑定及关闭套接字; 13 | 3. 启动、终止及维护worker进程的个数; 14 | 4. 无须中止服务而重新配置工作特性; 15 | 5. 控制非中断式程序升级,启用新的二进制程序并在需要时回滚至老版本; 16 | 6. 重新打开日志文件,实现日志滚动; 17 | 7. 编译嵌入式perl脚本; 18 | 19 | worker进程主要完成的任务包括: 20 | 1. 接收、传入并处理来自客户端的连接; 21 | 2. 提供反向代理及过滤功能; 22 | 3. nginx任何能完成的其它任务; 23 | 24 | cache loader进程主要完成的任务包括: 25 | 1. 检查缓存存储中的缓存对象; 26 | 2. 使用缓存元数据建立内存数据库; 27 | 28 | cache manager进程的主要任务: 29 | 1. 缓存的失效及过期检验; 30 | 31 | Nginx的配置有着几个不同的上下文:main、http、server、upstream和location(还有实现邮件服务反向代理的mail)。配置语法的格式和定义方式遵循所谓的C风格,因此支持嵌套,还有着逻辑清晰并易于创建、阅读和维护等优势。 32 | 33 | 34 | Nginx的代码是由一个核心和一系列的模块组成, 核心主要用于提供Web Server的基本功能,以及Web和Mail反向代理的功能;还用于启用网络协议,创建必要的运行时环境以及确保不同的模块之间平滑地进行交互。不过,大多跟协议相关的功能和某应用特有的功能都是由nginx的模块实现的。这些功能模块大致可以分为事件模块、阶段性处理器、输出过滤器、变量处理器、协议、upstream和负载均衡几个类别,这些共同组成了nginx的http功能。事件模块主要用于提供OS独立的(不同操作系统的事件机制有所不同)事件通知机制如kqueue或epoll等。协议模块则负责实现nginx通过http、tls/ssl、smtp、pop3以及imap与对应的客户端建立会话。 35 | 36 | 在nginx内部,进程间的通信是通过模块的pipeline或chain实现的;换句话说,每一个功能或操作都由一个模块来实现。例如,压缩、通过FastCGI或uwsgi协议与upstream服务器通信,以及与memcached建立会话等。 37 | 38 | 一、安装Nginx: 39 | 40 | 1、解决依赖关系 41 | 42 | 编译安装nginx需要事先需要安装开发包组"Development Tools"和 "Development Libraries"。同时,还需要专门安装pcre-devel包: 43 | # yum -y install pcre-devel 44 | 45 | 2、安装 46 | 47 | 首先添加用户nginx,实现以之运行nginx服务进程: 48 | # groupadd -r nginx 49 | # useradd -r -g nginx nginx 50 | 51 | 接着开始编译和安装: 52 | # ./configure \ 53 | --prefix=/usr \ 54 | --sbin-path=/usr/sbin/nginx \ 55 | --conf-path=/etc/nginx/nginx.conf \ 56 | --error-log-path=/var/log/nginx/error.log \ 57 | --http-log-path=/var/log/nginx/access.log \ 58 | --pid-path=/var/run/nginx/nginx.pid \ 59 | --lock-path=/var/lock/nginx.lock \ 60 | --user=nginx \ 61 | --group=nginx \ 62 | --with-http_ssl_module \ 63 | --with-http_flv_module \ 64 | --with-http_stub_status_module \ 65 | --with-http_gzip_static_module \ 66 | --http-client-body-temp-path=/var/tmp/nginx/client/ \ 67 | --http-proxy-temp-path=/var/tmp/nginx/proxy/ \ 68 | --http-fastcgi-temp-path=/var/tmp/nginx/fcgi/ \ 69 | --http-uwsgi-temp-path=/var/tmp/nginx/uwsgi \ 70 | --http-scgi-temp-path=/var/tmp/nginx/scgi \ 71 | --with-pcre 72 | # make && make install 73 | 74 | 说明:如果想使用nginx的perl模块,可以通过为configure脚本添加--with-http_perl_module选项来实现,但目前此模块仍处于实验性使用阶段,可能会在运行中出现意外,因此,其实现方式这里不再介绍。如果想使用基于nginx的cgi功能,也可以基于FCGI来实现,具体实现方法请参照网上的文档。 75 | 76 | 3、为nginx提供SysV init脚本: 77 | 78 | 新建文件/etc/rc.d/init.d/nginx,内容如下: 79 | #!/bin/sh 80 | # 81 | # nginx - this script starts and stops the nginx daemon 82 | # 83 | # chkconfig: - 85 15 84 | # description: Nginx is an HTTP(S) server, HTTP(S) reverse \ 85 | # proxy and IMAP/POP3 proxy server 86 | # processname: nginx 87 | # config: /etc/nginx/nginx.conf 88 | # config: /etc/sysconfig/nginx 89 | # pidfile: /var/run/nginx.pid 90 | 91 | # Source function library. 92 | . /etc/rc.d/init.d/functions 93 | 94 | # Source networking configuration. 95 | . /etc/sysconfig/network 96 | 97 | # Check that networking is up. 98 | [ "$NETWORKING" = "no" ] && exit 0 99 | 100 | nginx="/usr/sbin/nginx" 101 | prog=$(basename $nginx) 102 | 103 | NGINX_CONF_FILE="/etc/nginx/nginx.conf" 104 | 105 | [ -f /etc/sysconfig/nginx ] && . /etc/sysconfig/nginx 106 | 107 | lockfile=/var/lock/subsys/nginx 108 | 109 | make_dirs() { 110 | # make required directories 111 | user=`nginx -V 2>&1 | grep "configure arguments:" | sed 's/[^*]*--user=\([^ ]*\).*/\1/g' -` 112 | options=`$nginx -V 2>&1 | grep 'configure arguments:'` 113 | for opt in $options; do 114 | if [ `echo $opt | grep '.*-temp-path'` ]; then 115 | value=`echo $opt | cut -d "=" -f 2` 116 | if [ ! -d "$value" ]; then 117 | # echo "creating" $value 118 | mkdir -p $value && chown -R $user $value 119 | fi 120 | fi 121 | done 122 | } 123 | 124 | start() { 125 | [ -x $nginx ] || exit 5 126 | [ -f $NGINX_CONF_FILE ] || exit 6 127 | make_dirs 128 | echo -n $"Starting $prog: " 129 | daemon $nginx -c $NGINX_CONF_FILE 130 | retval=$? 131 | echo 132 | [ $retval -eq 0 ] && touch $lockfile 133 | return $retval 134 | } 135 | 136 | stop() { 137 | echo -n $"Stopping $prog: " 138 | killproc $prog -QUIT 139 | retval=$? 140 | echo 141 | [ $retval -eq 0 ] && rm -f $lockfile 142 | return $retval 143 | } 144 | 145 | restart() { 146 | configtest || return $? 147 | stop 148 | sleep 1 149 | start 150 | } 151 | 152 | reload() { 153 | configtest || return $? 154 | echo -n $"Reloading $prog: " 155 | killproc $nginx -HUP 156 | RETVAL=$? 157 | echo 158 | } 159 | 160 | force_reload() { 161 | restart 162 | } 163 | 164 | configtest() { 165 | $nginx -t -c $NGINX_CONF_FILE 166 | } 167 | 168 | rh_status() { 169 | status $prog 170 | } 171 | 172 | rh_status_q() { 173 | rh_status >/dev/null 2>&1 174 | } 175 | 176 | case "$1" in 177 | start) 178 | rh_status_q && exit 0 179 | $1 180 | ;; 181 | stop) 182 | rh_status_q || exit 0 183 | $1 184 | ;; 185 | restart|configtest) 186 | $1 187 | ;; 188 | reload) 189 | rh_status_q || exit 7 190 | $1 191 | ;; 192 | force-reload) 193 | force_reload 194 | ;; 195 | status) 196 | rh_status 197 | ;; 198 | condrestart|try-restart) 199 | rh_status_q || exit 0 200 | ;; 201 | *) 202 | echo $"Usage: $0 {start|stop|status|restart|condrestart|try-restart|reload|force-reload|configtest}" 203 | exit 2 204 | esac 205 | 206 | 而后为此脚本赋予执行权限: 207 | # chmod +x /etc/rc.d/init.d/nginx 208 | 209 | 添加至服务管理列表,并让其开机自动启动: 210 | # chkconfig --add nginx 211 | # chkconfig nginx on 212 | 213 | 而后就可以启动服务并测试了: 214 | # service nginx start 215 | 216 | 217 | 二、安装mysql-5.5.28 218 | 219 | 1、准备数据存放的文件系统 220 | 221 | 新建一个逻辑卷,并将其挂载至特定目录即可。这里不再给出过程。 222 | 223 | 这里假设其逻辑卷的挂载目录为/mydata,而后需要创建/mydata/data目录做为mysql数据的存放目录。 224 | 225 | 2、新建用户以安全方式运行进程: 226 | 227 | # groupadd -r mysql 228 | # useradd -g mysql -r -s /sbin/nologin -M -d /mydata/data mysql 229 | # chown -R mysql:mysql /mydata/data 230 | 231 | 3、安装并初始化mysql-5.5.28 232 | 233 | 首先下载平台对应的mysql版本至本地,这里是32位平台,因此,选择的为mysql-5.5.28-linux2.6-i686.tar.gz,其下载位置为ftp://172.16.0.1/pub/Sources/mysql-5.5。 234 | 235 | # tar xf mysql-5.5.28-linux2.6-i686.tar.gz -C /usr/local 236 | # cd /usr/local/ 237 | # ln -sv mysql-5.5.24-linux2.6-i686 mysql 238 | # cd mysql 239 | 240 | # chown -R mysql:mysql . 241 | # scripts/mysql_install_db --user=mysql --datadir=/mydata/data 242 | # chown -R root . 243 | 244 | 4、为mysql提供主配置文件: 245 | 246 | # cd /usr/local/mysql 247 | # cp support-files/my-large.cnf /etc/my.cnf 248 | 249 | 并修改此文件中thread_concurrency的值为你的CPU个数乘以2,比如这里使用如下行: 250 | thread_concurrency = 2 251 | 252 | 另外还需要添加如下行指定mysql数据文件的存放位置: 253 | datadir = /mydata/data 254 | 255 | 256 | 5、为mysql提供sysv服务脚本: 257 | 258 | # cd /usr/local/mysql 259 | # cp support-files/mysql.server /etc/rc.d/init.d/mysqld 260 | 261 | 添加至服务列表: 262 | # chkconfig --add mysqld 263 | # chkconfig mysqld on 264 | 265 | 而后就可以启动服务测试使用了。 266 | 267 | 268 | 为了使用mysql的安装符合系统使用规范,并将其开发组件导出给系统使用,这里还需要进行如下步骤: 269 | 270 | 6、输出mysql的man手册至man命令的查找路径: 271 | 272 | 编辑/etc/man.config,添加如下行即可: 273 | MANPATH /usr/local/mysql/man 274 | 275 | 7、输出mysql的头文件至系统头文件路径/usr/include: 276 | 277 | 这可以通过简单的创建链接实现: 278 | # ln -sv /usr/local/mysql/include /usr/include/mysql 279 | 280 | 8、输出mysql的库文件给系统库查找路径: 281 | 282 | # echo '/usr/local/mysql/lib' > /etc/ld.so.conf.d/mysql.conf 283 | 284 | 而后让系统重新载入系统库: 285 | # ldconfig 286 | 287 | 9、修改PATH环境变量,让系统可以直接使用mysql的相关命令。具体实现过程这里不再给出。 288 | 289 | 290 | 三、编译安装php-5.4.4 291 | 292 | 1、解决依赖关系: 293 | 294 | 请配置好yum源(可以是本地系统光盘)后执行如下命令: 295 | # yum -y groupinstall "X Software Development" 296 | 297 | 如果想让编译的php支持mcrypt、mhash扩展和libevent,此处还需要下载ftp://172.16.0.1/pub/Sources/ngnix目录中的如下几个rpm包并安装之: 298 | libmcrypt-2.5.8-4.el5.centos.i386.rpm 299 | libmcrypt-devel-2.5.8-4.el5.centos.i386.rpm 300 | mhash-0.9.9-1.el5.centos.i386.rpm 301 | mhash-devel-0.9.9-1.el5.centos.i386.rpm 302 | mcrypt-2.6.8-1.el5.i386.rpm 303 | 304 | 最好使用升级的方式安装上面的rpm包,命令格式如下: 305 | # rpm -Uvh 306 | 307 | 308 | 另外,也可以根据需要安装libevent,系统一般会自带libevent,但版本有些低。因此可以升级安装之,它包含如下两个rpm包。 309 | libevent-2.0.17-2.i386.rpm 310 | libevent-devel-2.0.17-2.i386.rpm 311 | 312 | 说明:libevent是一个异步事件通知库文件,其API提供了在某文件描述上发生某事件时或其超时时执行回调函数的机制,它主要用来替换事件驱动的网络服务器上的event loop机制。目前来说, libevent支持/dev/poll、kqueue、select、poll、epoll及Solaris的event ports。 313 | 314 | 2、编译安装php-5.4.4 315 | 316 | 首先下载源码包至本地目录,下载位置ftp://172.16.0.1/pub/Sources/new_lamp。 317 | 318 | # tar xf php-5.4.4.tar.bz2 319 | # cd php-5.4.4 320 | # ./configure --prefix=/usr/local/php --with-mysql=/usr/local/mysql --with-openssl --enable-fpm --enable-sockets --enable-sysvshm --with-mysqli=/usr/local/mysql/bin/mysql_config --enable-mbstring --with-freetype-dir --with-jpeg-dir --with-png-dir --with-zlib-dir --with-libxml-dir=/usr --enable-xml --with-mhash --with-mcrypt --with-config-file-path=/etc --with-config-file-scan-dir=/etc/php.d --with-bz2 --with-curl 321 | 322 | 说明:如果前面第1步解决依赖关系时安装mcrypt相关的两个rpm包,此./configure命令还可以带上--with-mcrypt选项以让php支持mycrpt扩展。--with-snmp选项则用于实现php的SNMP扩展,但此功能要求提前安装net-snmp相关软件包。 323 | 324 | # make 325 | # make test 326 | # make intall 327 | 328 | 为php提供配置文件: 329 | # cp php.ini-production /etc/php.ini 330 | 331 | 为php-fpm提供Sysv init脚本,并将其添加至服务列表: 332 | # cp sapi/fpm/init.d.php-fpm /etc/rc.d/init.d/php-fpm 333 | # chmod +x /etc/rc.d/init.d/php-fpm 334 | # chkconfig --add php-fpm 335 | # chkconfig php-fpm on 336 | 337 | 为php-fpm提供配置文件: 338 | # cp /usr/local/php/etc/php-fpm.conf.default /usr/local/php/etc/php-fpm.conf 339 | 340 | 编辑php-fpm的配置文件: 341 | # vim /usr/local/php/etc/php-fpm.conf 342 | 配置fpm的相关选项为你所需要的值,并启用pid文件(如下最后一行): 343 | pm.max_children = 150 344 | pm.start_servers = 8 345 | pm.min_spare_servers = 5 346 | pm.max_spare_servers = 10 347 | pid = /usr/local/php/var/run/php-fpm.pid 348 | 349 | 接下来就可以启动php-fpm了: 350 | # service php-fpm start 351 | 352 | 使用如下命令来验正(如果此命令输出有中几个php-fpm进程就说明启动成功了): 353 | # ps aux | grep php-fpm 354 | 355 | 356 | 四、整合nginx和php5 357 | 358 | 1、编辑/etc/nginx/nginx.conf,启用如下选项: 359 | location ~ \.php$ { 360 | root html; 361 | fastcgi_pass 127.0.0.1:9000; 362 | fastcgi_index index.php; 363 | fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; 364 | include fastcgi_params; 365 | } 366 | 367 | 2、编辑/etc/nginx/fastcgi_params,将其内容更改为如下内容: 368 | fastcgi_param GATEWAY_INTERFACE CGI/1.1; 369 | fastcgi_param SERVER_SOFTWARE nginx; 370 | fastcgi_param QUERY_STRING $query_string; 371 | fastcgi_param REQUEST_METHOD $request_method; 372 | fastcgi_param CONTENT_TYPE $content_type; 373 | fastcgi_param CONTENT_LENGTH $content_length; 374 | fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; 375 | fastcgi_param SCRIPT_NAME $fastcgi_script_name; 376 | fastcgi_param REQUEST_URI $request_uri; 377 | fastcgi_param DOCUMENT_URI $document_uri; 378 | fastcgi_param DOCUMENT_ROOT $document_root; 379 | fastcgi_param SERVER_PROTOCOL $server_protocol; 380 | fastcgi_param REMOTE_ADDR $remote_addr; 381 | fastcgi_param REMOTE_PORT $remote_port; 382 | fastcgi_param SERVER_ADDR $server_addr; 383 | fastcgi_param SERVER_PORT $server_port; 384 | fastcgi_param SERVER_NAME $server_name; 385 | 386 | 并在所支持的主页面格式中添加php格式的主页,类似如下: 387 | location / { 388 | root html; 389 | index index.php index.html index.htm; 390 | } 391 | 392 | 而后重新载入nginx的配置文件: 393 | # service nginx reload 394 | 395 | 3、在/usr/html新建index.php的测试页面,测试php是否能正常工作: 396 | # cat > /usr/html/index.php << EOF 397 | 400 | 401 | 接着就可以通过浏览器访问此测试页面了。 402 | 403 | 404 | 五、安装xcache,为php加速: 405 | 406 | 1、安装 407 | # tar xf xcache-2.0.0.tar.gz 408 | # cd xcache-2.0.0 409 | # /usr/local/php/bin/phpize 410 | # ./configure --enable-xcache --with-php-config=/usr/local/php/bin/php-config 411 | # make && make install 412 | 413 | 安装结束时,会出现类似如下行: 414 | Installing shared extensions: /usr/local/php/lib/php/extensions/no-debug-zts-20100525/ 415 | 416 | 2、编辑php.ini,整合php和xcache: 417 | 418 | 首先将xcache提供的样例配置导入php.ini 419 | # mkdir /etc/php.d 420 | # cp xcache.ini /etc/php.d 421 | 422 | 说明:xcache.ini文件在xcache的源码目录中。 423 | 424 | 接下来编辑/etc/php.d/xcache.ini,找到zend_extension开头的行,修改为如下行: 425 | zend_extension = /usr/local/php/lib/php/extensions/no-debug-zts-20100525/xcache.so 426 | 427 | 注意:如果php.ini文件中有多条zend_extension指令行,要确保此新增的行排在第一位。 428 | 429 | 3、重新启动php-fpm 430 | # service php-fpm restart 431 | 432 | 433 | 六、补充说明 434 | 435 | 如果要在SSL中使用php,需要在php的location中添加此选项: 436 | 437 | fastcgi_param HTTPS on; 438 | 439 | 440 | 441 | 442 | 补充阅读材料: 443 | 444 | Events is one of paradigms to achieve asynchronous execution. But not all asynchronous systems use events. That is about semantic meaning of these two - one is super-entity of another. 445 | 446 | epoll and aio use different metaphors: 447 | 448 | epoll is a blocking operation (epoll_wait()) - you block the thread until some event happens and then you dispatch the event to different procedures/functions/branches in your code. 449 | 450 | In AIO you pass address of you callback function (completion routine) to the system and the system calls your function when something happens. 451 | 452 | Problem with AIO is that your callback function code runs from system thread and so on top of system stack. A few problems with that as you can imagine. 453 | 454 | 455 | 456 | -------------------------------------------------------------------------------- /LVS Web php session memcached.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/LVS Web php session memcached.txt -------------------------------------------------------------------------------- /Linux Adminitration.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/Linux Adminitration.txt -------------------------------------------------------------------------------- /Linux Basics.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/Linux Basics.txt -------------------------------------------------------------------------------- /Linuxfiles11.rar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/Linuxfiles11.rar -------------------------------------------------------------------------------- /Little Linux.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/Little Linux.txt -------------------------------------------------------------------------------- /Mini Linux Scripts.txt: -------------------------------------------------------------------------------- 1 | 2 | #/bin/sh 3 | # 4 | echo "mounting proc and sys..." 5 | mount -t proc proc /proc 6 | mount -t sysfs sysfs /sys 7 | 8 | echo "Load ext3 module..." 9 | insmod /lib/modules/jbd.ko 10 | insmod /lib/modules/ext3.ko 11 | 12 | echo "Detect and export hardware infomation..." 13 | mdev -s 14 | 15 | echo "Mount real rootfs to /mnt/sysroot..." 16 | mount -t ext3 /dev/hda2 /mnt/sysroot 17 | 18 | echo "Switch to read rootfs ..." 19 | exec switch_root /mnt/sysroot /sbin/init 20 | 21 | 22 | 23 | #!/bin/sh 24 | # 25 | echo -e "\tWelcome to \033[34mMageEdu Tiny\033[0m Linux" 26 | 27 | echo "mount proc and sys..." 28 | mount -t proc proc /proc 29 | mount -t sysfs sysfs /sys 30 | 31 | echo "Remount the rootfs..." 32 | mount -t ext3 -o remount,rw /dev/hda2 / 33 | 34 | echo "Detect and export hardward infomation...." 35 | mdev -s 36 | 37 | echo "Mount the other filesystem..." 38 | mount -a 39 | 40 | 41 | 42 | 43 | 44 | -------------------------------------------------------------------------------- /MySQL Compiling and Installation.txt: -------------------------------------------------------------------------------- 1 | 编译安装MySQL-5.5 2 | 3 | cmake的重要特性之一是其独立于源码(out-of-source)的编译功能,即编译工作可以在另一个指定的目录中而非源码目录中进行,这可以保证源码目录不受任何一次编译的影响,因此在同一个源码树上可以进行多次不同的编译,如针对于不同平台编译。 4 | 5 | 编译安装MySQL-5.5 6 | 7 | 8 | 一、安装cmake 9 | 10 | 跨平台编译器 11 | 12 | # tar xf cmake-2.8.8.tar.gz 13 | # cd cmake-2.8.8 14 | # ./bootstrap 15 | # make 16 | # make install 17 | 18 | 19 | 20 | 二、编译安装mysql-5.5.25a 21 | 22 | 1、使用cmake编译mysql-5.5 23 | cmake指定编译选项的方式不同于make,其实现方式对比如下: 24 | ./configure cmake . 25 | ./configure --help cmake . -LH or ccmake . 26 | 27 | 28 | 29 | 指定安装文件的安装路径时常用的选项: 30 | -DCMAKE_INSTALL_PREFIX=/usr/local/mysql 31 | -DMYSQL_DATADIR=/data/mysql 32 | -DSYSCONFDIR=/etc 33 | 34 | 35 | 默认编译的存储引擎包括:csv、myisam、myisammrg和heap。若要安装其它存储引擎,可以使用类似如下编译选项: 36 | -DWITH_INNOBASE_STORAGE_ENGINE=1 37 | -DWITH_ARCHIVE_STORAGE_ENGINE=1 38 | -DWITH_BLACKHOLE_STORAGE_ENGINE=1 39 | -DWITH_FEDERATED_STORAGE_ENGINE=1 40 | 41 | 若要明确指定不编译某存储引擎,可以使用类似如下的选项: 42 | -DWITHOUT__STORAGE_ENGINE=1 43 | 比如: 44 | -DWITHOUT_EXAMPLE_STORAGE_ENGINE=1 45 | -DWITHOUT_FEDERATED_STORAGE_ENGINE=1 46 | -DWITHOUT_PARTITION_STORAGE_ENGINE=1 47 | 48 | 如若要编译进其它功能,如SSL等,则可使用类似如下选项来实现编译时使用某库或不使用某库: 49 | -DWITH_READLINE=1 50 | -DWITH_SSL=system 51 | -DWITH_ZLIB=system 52 | -DWITH_LIBWRAP=0 53 | 54 | 其它常用的选项: 55 | -DMYSQL_TCP_PORT=3306 56 | -DMYSQL_UNIX_ADDR=/tmp/mysql.sock 57 | -DENABLED_LOCAL_INFILE=1 58 | -DEXTRA_CHARSETS=all 59 | -DDEFAULT_CHARSET=utf8 60 | -DDEFAULT_COLLATION=utf8_general_ci 61 | -DWITH_DEBUG=0 62 | -DENABLE_PROFILING=1 63 | 64 | 65 | 66 | 如果想清理此前的编译所生成的文件,则需要使用如下命令: 67 | make clean 68 | rm CMakeCache.txt 69 | 70 | 71 | 2、编译安装 72 | 73 | # groupadd -r mysql 74 | # useradd -g mysql -r -d /data/mydata mysql 75 | # tar xf mysql-5.5.25a.tar.gz 76 | # cd mysql-5.5.25a 77 | # cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/mysql \ 78 | -DMYSQL_DATADIR=/mydata/data \ 79 | -DSYSCONFDIR=/etc \ 80 | -DWITH_INNOBASE_STORAGE_ENGINE=1 \ 81 | -DWITH_ARCHIVE_STORAGE_ENGINE=1 \ 82 | -DWITH_BLACKHOLE_STORAGE_ENGINE=1 \ 83 | -DWITH_READLINE=1 \ 84 | -DWITH_SSL=system \ 85 | -DWITH_ZLIB=system \ 86 | -DWITH_LIBWRAP=0 \ 87 | -DMYSQL_UNIX_ADDR=/tmp/mysql.sock \ 88 | -DDEFAULT_CHARSET=utf8 \ 89 | -DDEFAULT_COLLATION=utf8_general_ci 90 | # make 91 | # make install -------------------------------------------------------------------------------- /MySQL Optimization.txt: -------------------------------------------------------------------------------- 1 | MySQL优化框架 2 | 3 | 1. SQL语句优化 4 | 2. 索引优化 5 | 3. 数据库结构优化 6 | 4. InnoDB表优化 7 | 5. MyISAM表优化 8 | 6. Memory表优化 9 | 7. 理解查询执行计划 10 | 8. 缓冲和缓存 11 | 9. 锁优化 12 | 10. MySQL服务器优化 13 | 11. 性能评估 14 | 12. MySQL优化内幕 15 | 16 | MySQL优化需要在三个不同层次上协调进行:MySQL级别、OS级别和硬件级别。MySQL级别的优化包括表优化、查询优化和MySQL服务器配置优化等,而MySQL的各种数据结构又最终作用于OS直至硬件设备,因此还需要了解每种结构对OS级别的资源的需要并最终导致的CPU和I/O操作等,并在此基础上将CPU及I/O操作需要尽量降低以提升其效率。 17 | 18 | 数据库层面的优化着眼点: 19 | 1、是否正确设定了表结构的相关属性,尤其是每个字段的字段类型是否为最佳。同时,是否为特定类型的工作组织使用了合适的表及表字段也将影响系统性能,比如,数据频繁更新的场景应该使用较多的表而每张表有着较少字段的结构,而复杂数据查询或分析的场景应该使用较少的表而每张表较多字段的结构等。 20 | 2、是否为高效进行查询创建了合适的索引。 21 | 3、是否为每张表选用了合适的存储引擎,并有效利用了选用的存储引擎本身的优势和特性。 22 | 4、是否基于存储引擎为表选用了合适的行格式(row format)。例如,压缩表在读写操作中会降低I/O操作需求并占用较少的磁盘空间,InnoDB支持读写应用场景中使用压缩表,但MyISAM仅能在读环境中使用压缩表。 23 | 5、是否使用了合适的锁策略,如在并发操作场景中使用共享锁,而对较高优先级的需求使用独占锁等。同时,还应该考虑存储引擎所支持的锁类型。 24 | 6、是否为InnoDB的缓冲池、MyISAM的键缓存以及MySQL查询缓存设定了合适大小的内存空间,以便能够存储频繁访问的数据且又不会引起页面换出。 25 | 26 | 操作系统和硬件级别的优化着眼点: 27 | 1、是否为实际的工作负载选定了合适的CPU,如对于CPU密集型的应用场景要使用更快速度的CPU甚至更多数量的CPU,为有着更多查询的场景使用更多的CPU等。基于多核以及超线程(hyperthreading)技术,现代的CPU架构越来越复杂、性能也越来越强了,但MySQL对多CPU架构的并行计算能力的利用仍然是有着不太尽如人意之处,尤其是较老的版本如MySQL 5.1之前的版本甚至无法发挥多CPU的优势。不过,通常需要实现的CPU性能提升目标有两类:低迟延和高吞吐量。低延迟需要更快速度的CPU,因为单个查询只能使用一颗;而需要同时运行许多查询的场景,多CPU更能提供更好的吞吐能力,然而其能否奏效还依赖于实际工作场景,因为MySQL尚不能高效的运行于多CPU,并且其对CPU数量的支持也有着限制。一般来说,较新的版本可以支持16至24颗CPU甚至更多。 28 | 2、是否有着合适大小的物理内存,并通过合理的配置平衡内存和磁盘资源,降低甚至避免磁盘I/O。现代的程序设计为提高性能通常都会基于局部性原理使用到缓存技术,这对于频繁操作数据的数据库系统来说尤其如此——有着良好设计的数据库缓存通常比针对通用任务的操作系统的缓存效率更高。缓存可以有效地延迟写入、优化写入,但并能消除写入,并综合考虑存储空间的可扩展性等,为业务选择合理的外部存储设备也是非常重要的工作。 29 | 3、是否选择了合适的网络设备并正确地配置了网络对整体系统系统也有着重大影响。延迟和带宽是网络连接的限制性因素,而常见的网络问题如丢包等,即是很小的丢包率也会赞成性能的显著下降。而更重要的还有按需调整系统中关网络方面的设置,以高效处理大量的连接和小查询。 30 | 4、是否基于操作系统选择了适用的文件系统。实际测试表明大部分文件系统的性能都非常接近,因此,为了性能而苦选文件系统并不划算。但考虑到文件系统的修复能力,应该使用日志文件系统如ext3、ext4、XFS等。同时,关闭文件系统的某些特性如访问时间和预读行为,并选择合理的磁盘调度器通常都会给性能提升带来帮助。 31 | 5、MySQL为响应每个用户连接使用一个单独的线程,再加内部使用的线程、特殊目的线程以及其它任何由存储引擎创建的线程等,MySQL需要对这些大量线程进行有效管理。Linux系统上的NPTL线程库更为轻量级也更有效率。MySQL 5.5引入了线程池插件,但其效用尚不明朗。 32 | 33 | 34 | 35 | 36 | 37 | 38 | 使用InnoDB存储引擎最佳实践: 39 | 1、基于MySQL查询语句中最常用的字段或字段组合创建主键,如果没有合适的主键也最好使用AUTO_INCRMENT类型的某字段为主键。 40 | 2、根据需要考虑使用多表查询,将这些表通过外键建立约束关系。 41 | 3、关闭autocommit。 42 | 4、使用事务(START TRANSACTION和COMMIT语句)组合相关的修改操作或一个整体的工作单元,当然也不应该创建过大的执行单元。 43 | 5、停止使用LOCK TABLES语句,InnoDB可以高效地处理来自多个会话的并发读写请求。如果需要在一系列的行上获取独占访问权限,可以使用SELECT ... FOR UPDATE锁定仅需要更新的行。 44 | 6、启用innodb_file_per_table选项,将各表的数据和索引分别进行存放。 45 | 7、评估数据和访问模式是否能从InnoDB的表压缩功能中受益(在创建表时使用ROW_FORMAT=COMPRESSED选项),如果可以,则应该启用压缩功能。 46 | 47 | 48 | 49 | 50 | 51 | EXPLAIN语句解析: 52 | id:SELECT语句的标识符,一般为数字,表示对应的SELECT语句在原始语句中的位置。没有子查询或联合的整个查询只有一个SELECT语句,因此其id通常为1。在联合或子查询语句中,内层的SELECT语句通常按它们在原始语句中的次序进行编号。但UNION操作通常最后会有一个id为NULL的行,因为UNION的结果通常保存至临时表中,而MySQL需要到此临时表中取得结果。 53 | 54 | select_type: 55 | 即SELECT类型,有如下值列表: 56 | SIMPLE:简单查询,即没有使用联合或子查询; 57 | PRIMARY:UNION的最外围的查询或者最先进行的查询; 58 | UNION:相对于PRIMARY,为联合查询的第二个及以后的查询; 59 | DEPENDENT UNION:与UNION相同,但其位于联合子查询中(即UNION查询本身是子查询); 60 | UNION RESULT:UNION的执行结果; 61 | SUBQUERY:非从属子查询,优化器通常认为其只需要运行一次; 62 | DEPENDENT SUBQUERY:从属子查询,优化器认为需要为外围的查询的每一行运行一次,如用于IN操作符中的子查询; 63 | DERIVED:用于FROM子句的子查询,即派生表查询; 64 | 65 | 66 | table: 67 | 输出信息所关系到的表的表名,也有可能会显示为如下格式: 68 | :id为M和N的查询执行联合查询后的结果; 69 | :id为N的查询执行的结果集; 70 | 71 | 72 | type: 73 | MySQL官方手册中解释type的作用为“type of join(联结的类型)”,但其更确切的意思应该是“记录(record)访问类型”,因为其主要目的在于展示MySQL在表中找到所需行的方式。通常有如下所示的记录访问类型: 74 | system: 表中仅有一行,是const类型的一种特殊情况; 75 | const:表中至多有一个匹配的行,该行仅在查询开始时读取一次,因此,该行此字段中的值可以被优化器看作是个常量(constant);当基于PRIMARY KEY或UNIQUE NOT NULL字段查询,且与某常量进行等值比较时其类型就为const,其执行速度非常快; 76 | eq_ref:类似于const,表中至多有一个匹配的行,但比较的数值不是某常量,而是来自于其它表;ed_ref出现在PRIMARY KEY或UNIQUE NOT NULL类型的索引完全用于联结操作中进行等值(=)比较时;这是除了system和const之外最好的访问类型; 77 | ref:查询时的索引类型不是PRIMARY KEY或UNIQUE NOT NULL导致匹配到的行可能不惟一,或者仅能用到索引的左前缀而非全部时的访问类型;ref可被用于基于索引的字段进行=或<=>操作; 78 | fulltext:用于FULLTEXT索引中用纯文本匹配的方法来检索记录。 79 | ref_or_null:类似于ref,但可以额外搜索NULL值; 80 | index_merge:使用“索引合并优化”的记录访问类型,相应地,其key字段(EXPLAIN的输出结果)中会出现用到的多个索引,key_len字段中会出现被使用索引的最长长度列表;将多个“范围扫描(range scan)”获取到的行进行合并成一个结果集的操作即索引合并(index merge)。 81 | unique_subquery:用于IN比较操作符中的子查询中进行的“键值惟一”的访问类型场景中,如 value IN (SELECT primary_key FROM single_table WHERE some_expr); 82 | index_subquery:类似于unique_subquery,但子查询中键值不惟一; 83 | range:带有范围限制的索引扫描,而非全索引扫描,它开始于索引里的某一点,返回匹配那个值的范围的行;相应地,其key字段(EXPLAIN的输出结果)中会输出所用到的索引,key_len字段中会包含用到的索引的最长部分的长度;range通常用于将索引与常量进行=、<>、>、>=、<、<=、IS NULL、<=>、BETWEEN或IN()类的比较操作中; 84 | index:同全表扫描(ALL),只不过是按照索引的次序进行而不行的次序;其优点是避免了排序,但是要承担按索引次序读取整个表的开销,这意味着若是按随机次序访问行,代价将非常大; 85 | ALL:“全表扫描”的方式查找所需要的行,如果第一张表的查询类型(EXPLAIN的输出结果)为const,其性能可能不算太坏,而第一张表的查询类型为其它结果时,其性能通常会非常差; 86 | 87 | 88 | Extra: 89 | Using where:MySQL服务器将在存储引擎收到数据后进行“后过滤(post-filter)”以限定发送给下张表或客户端的行;如果WHERE条件中使用了索引列,其读取索引时就由存储引擎检查,因此,并非所有带有WHERE子句的查询都会显示“Using where”; 90 | Using index:表示所需要的数据从索引就能够全部获取到,从而不再需要从表中查询获取所需要数据,这意味着MySQL将使用覆盖索引;但如果同时还出现了Using where,则表示索引将被用于查找特定的键值; 91 | Using index for group-by:类似于Using index,它表示MySQL可仅通过索引中的数据完成GROUP BY或DISTINCT类的查询; 92 | Using filesort:表示MySQL会对结果使用一个外部索引排序,而不是从表里按索引次序来读取行; 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | -------------------------------------------------------------------------------- /Nagios.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/Nagios.txt -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/README.md -------------------------------------------------------------------------------- /RHCS.txt: -------------------------------------------------------------------------------- 1 | 2 | 前提: 3 | 1)本配置共有三个测试节点,分别node1.magedu.com、node2.magedu.com和node3.magedu.com,相的IP地址分别为172.16.100.6、172.16.100.7和172.16.100.8;系统为rhel5.8 32bit; 4 | 2)集群服务为apache的httpd服务; 5 | 3)提供web服务的地址为172.16.100.1; 6 | 4)为集群中的每个节点事先配置好yum源; 7 | 5) 额外提供了主机172.16.100.100做为跳板机,以其为平台实现对集群中各节点的管理;其主机名称为stepping.magedu.com; 8 | 9 | 一、准备工作 10 | 11 | 为了配置一台Linux主机成为HA的节点,通常需要做出如下的准备工作: 12 | 13 | 1.1 设定主机名称解析 14 | 15 | 所有节点的主机名称和对应的IP地址解析服务可以正常工作,且每个节点的主机名称需要跟"uname -n“命令的结果保持一致;因此,需要保证三个节点上的/etc/hosts文件均为下面的内容: 16 | 172.16.100.6 node1.magedu.com node1 17 | 172.16.100.7 node2.magedu.com node2 18 | 172.16.100.8 node3.magedu.com node3 19 | 20 | 为了使得重新启动系统后仍能保持如上的主机名称,还分别需要在各节点执行类似如下的命令: 21 | 22 | Node1: 23 | # sed -i 's@\(HOSTNAME=\).*@\1node1.magedu.com@g' /etc/sysconfig/network 24 | # hostname node1.magedu.com 25 | 26 | Node2: 27 | # sed -i 's@\(HOSTNAME=\).*@\1node2.magedu.com@g' /etc/sysconfig/network 28 | # hostname node2.magedu.com 29 | 30 | Node3: 31 | # sed -i 's@\(HOSTNAME=\).*@\1node3.magedu.com@g' /etc/sysconfig/network 32 | # hostname node3.magedu.com 33 | 34 | 1.2 管理机设定 35 | 36 | 后续的诸多设定,如rpm包安装、配置文件修改等都需要在三个节点上同时进行;为了便于实现此过程,我们这里提供了一台跳板机172.16.100.100,其可以以基于密钥认证的方式分别与三个节点进行通信。实际使用中,如果没有专用的跳板机,也可以以三个节点中的某节点来替代。 37 | 38 | 首先让跳板机能以主机名称与各节点进行通信,此为非必须,仅为使用方便。在跳板机上建立/etc/hosts文件,内容如下: 39 | 172.16.100.6 node1.magedu.com node1 40 | 172.16.100.7 node2.magedu.com node2 41 | 172.16.100.8 node3.magedu.com node3 42 | 43 | 接着在跳板机上为ssh生成密钥: 44 | # ssh-keygen -t rsa -P '' 45 | 46 | 其次生成的密钥的公钥传输至集群中的每个节点: 47 | # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1 48 | # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2 49 | # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node3 50 | 51 | 说明:如果不想使用跳板机,后面演示过程中,但凡在跳板机上以循环方式执行的命令均可以分别在各节点执行的方式进行。 52 | 53 | 二、集群安装 54 | 55 | RHCS的核心组件为cman和rgmanager,其中cman为基于openais的“集群基础架构层”,rgmanager为资源管理器。RHCS的集群中资源的配置需要修改其主配置文件/etc/cluster/cluster.xml实现,这对于很多用户来说是比较有挑战性的,因此,RHEL提供了system-config-cluster这个GUI工具,其仅安装在集群中的某一节点上即可,而cman和rgmanager需要分别安装在集群中的每个节点上。这里选择将此三个rpm包分别安装在了集群中的每个节点上,这可以在跳板机上执行如下命令实现: 56 | 57 | # for I in {1..3}; do ssh node$I 'yum -y install cman rgmanager system-config-cluster'; done 58 | 59 | 三、集群配置及其启动 60 | 61 | 3.1 为集群创建配置文件 62 | 63 | RHCS的配置文件/etc/cluster/cluster.conf,其在每个节点上都必须有一份,且内容均相同,其默认不存在,因此需要事先创建,ccs_tool命令可以完成此任务。另外,每个集群通过集群ID来标识自身,因此,在创建集群配置文件时需要为其选定一个集群名称,这里假设其为tcluster。此命令需要在集群中的某个节点上执行。 64 | 65 | # ccs_tool create tcluster 66 | 67 | 查看生成的配置文件的内容: 68 | # cat /etc/cluster/cluster.conf 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | ccs_tool命令用于在线更新CCS的配置文件,其有许多子命令,可以使用-h获取其使用帮助及每个子命令的使用帮助。 80 | 81 | 3.2 为集群添加fence设备 82 | 83 | 一个RHCS集群至少需要一个fence设备,正常环境中,可以选择将其配置到集群中来。这里为演示环境,没有可用的fence设备,因此,选择使用“manual fence”,即手动fence。创建fence设备也需要使用ccs_tool命令进行,其需要在集群中的某节点上执行,而且需要与前面创建集群配置文件时所执行的命令在同一个节点上进行。 84 | 85 | 查看fence代理的名称,可以通过查看cman安装生成文件来实现。 86 | # rpm -ql cman | grep /sbin/fence 87 | /sbin/fence_ack_manual 88 | /sbin/fence_apc 89 | /sbin/fence_apc_snmp 90 | /sbin/fence_bladecenter 91 | /sbin/fence_brocade 92 | /sbin/fence_bullpap 93 | /sbin/fence_cisco_mds 94 | /sbin/fence_cisco_ucs 95 | /sbin/fence_drac 96 | /sbin/fence_drac5 97 | /sbin/fence_egenera 98 | /sbin/fence_ifmib 99 | /sbin/fence_ilo 100 | /sbin/fence_ilo_mp 101 | /sbin/fence_ipmilan 102 | /sbin/fence_lpar 103 | /sbin/fence_manual 104 | /sbin/fence_mcdata 105 | /sbin/fence_node 106 | /sbin/fence_rhevm 107 | /sbin/fence_rps10 108 | /sbin/fence_rsa 109 | /sbin/fence_rsb 110 | /sbin/fence_sanbox2 111 | /sbin/fence_scsi 112 | /sbin/fence_scsi_test 113 | /sbin/fence_tool 114 | /sbin/fence_virsh 115 | /sbin/fence_vixel 116 | /sbin/fence_vmware 117 | /sbin/fence_vmware_helper 118 | /sbin/fence_vmware_soap 119 | /sbin/fence_wti 120 | /sbin/fence_xvm 121 | /sbin/fence_xvmd 122 | /sbin/fenced 123 | 124 | 这里为tcluster添加名为meatware的fence设备,其fence代理为fence-manual。 125 | # ccs_tool addfence meatware fence-manaual 126 | 127 | 接着可以使用ccs_tool lsfence查看fence设备: 128 | # ccs_tool lsfence 129 | Name Agent 130 | meatware fence-manaual 131 | 132 | 3.3 为集群添加节点 133 | 134 | RHCS集群需要配置好各节点及相关的fence设备后才能启动,因此,这里需要事先将各节点添加进集群配置文件。每个节点在添加进集群时,需要至少为其配置node id(每个节点的id必须惟一)及相关的fence设备两个属性。ccs_tool的addnode子命令可以完成节点添加。例如将前面规划的三个集群节点添加至集群中,可以使用如下命令实现。 135 | 136 | # ccs_tool addnode -n 1 -f meatware node1.magedu.com 137 | # ccs_tool addnode -n 2 -f meatware node2.magedu.com 138 | # ccs_tool addnode -n 3 -f meatware node3.magedu.com 139 | 140 | 查看已经添加完成的节点及相关信息: 141 | # ccs_tool lsnode 142 | 143 | Cluster name: tcluster, config_version: 5 144 | 145 | Nodename Votes Nodeid Fencetype 146 | node1.magedu.com 1 1 meatware 147 | node2.magedu.com 1 2 meatware 148 | node3.magedu.com 1 3 meatware 149 | 150 | 3.4 启动集群 151 | 152 | RHCS集群会等待各节点都启动后方才进入正常工作状态,因此,需要把集群各节点上的cman服务同时启动起来。这分别需要在各节点上执行如下命令。 153 | 154 | # /etc/rc.d/init.d/cman start 155 | 156 | 查看服务监听的端口,以验正服务启动状况: 157 | # netstat -tunlp | grep -E "ccsd|aisexec" 158 | tcp 0 0 127.0.0.1:50006 0.0.0.0:* LISTEN 14544/ccsd 159 | tcp 0 0 0.0.0.0:50008 0.0.0.0:* LISTEN 14544/ccsd 160 | udp 0 0 172.16.100.6:5405 0.0.0.0:* 14552/aisexec 161 | udp 0 0 172.16.100.6:5149 0.0.0.0:* 14552/aisexec 162 | udp 0 0 239.192.110.162:5405 0.0.0.0:* 14552/aisexec 163 | udp 0 0 0.0.0.0:50007 0.0.0.0:* 14544/ccsd 164 | 165 | 而后在各节点启动rgmanager服务,这可以在跳板机上执行如下命令实现: 166 | # for I in {1..3}; do ssh node$I '/etc/init.d/rgmanager start'; done 167 | 168 | 3.5 查看集群状态信息 169 | 170 | clustat命令可用于显示集群成员信息、法定票数信息及服务相关信息。 171 | # clustat 172 | Cluster Status for tcluster @ Mon May 13 12:06:53 2013 173 | Member Status: Quorate 174 | 175 | Member Name ID Status 176 | ------ ---- ---- ------ 177 | node1.magedu.com 1 Online, Local 178 | node2.magedu.com 2 Online 179 | node3.magedu.com 3 Online 180 | 181 | cman_tool的status子命令则以当前节点为视角来显示集群的相关信息。 182 | # cman_tool status 183 | Version: 6.2.0 184 | Config Version: 5 185 | Cluster Name: tcluster 186 | Cluster Id: 28212 187 | Cluster Member: Yes 188 | Cluster Generation: 12 189 | Membership state: Cluster-Member 190 | Nodes: 3 191 | Expected votes: 3 192 | Total votes: 3 193 | Node votes: 1 194 | Quorum: 2 195 | Active subsystems: 8 196 | Flags: Dirty 197 | Ports Bound: 0 177 198 | Node name: node1.magedu.com 199 | Node ID: 1 200 | Multicast addresses: 239.192.110.162 201 | Node addresses: 172.16.100.6 202 | 203 | cman_tool的nodes子命令则可以列出集群中每个节点的相关信息。 204 | # cman_tool nodes 205 | Node Sts Inc Joined Name 206 | 1 M 4 2013-05-13 12:00:09 node1.magedu.com 207 | 2 M 8 2013-05-13 12:00:28 node2.magedu.com 208 | 3 M 12 2013-05-13 12:00:39 node3.magedu.com 209 | 210 | cman_tool的nodes子命令则可以列出集群中每个服务的相关信息。 211 | # cman_tool services 212 | type level name id state 213 | fence 0 default 00010001 none 214 | [1 2 3] 215 | dlm 1 rgmanager 00020001 none 216 | [1 2 3] 217 | 218 | 四、配置集群服务 219 | 220 | 配置集群服务涉及到配置故障转移域、服务及资源,这些需要手动修改集群配置文件,或使用system-config-cluster这个GUI程序完成。 221 | 222 | 223 | 五、配置使用gfs2文件系统 224 | 225 | 这里假设集群节点均已经正常登录某iscsi target,本地正常映射了磁盘/dev/sdb,且创建分区/dev/sdb1和/dev/sdb2。 226 | 227 | 5.1 在集群节点上安装gfs2-utils 228 | 229 | 以下命令在跳板机上执行,实现在集群所有节点上统一部署安装gfs2-utils并启动gfs2的服务 230 | 231 | # for I in {1..3}; do ssh node$I 'yum -y install gfs2-utils; service gfs2 start'; done 232 | 233 | 在集群中的某节点上执行如下命令,查看gfs2模块的装载情况: 234 | 235 | # lsmod | grep gfs 236 | gfs2 354825 1 lock_dlm 237 | configfs 28625 2 dlm 238 | 239 | 5.2 gfs2相关命令行工具的使用 240 | 241 | mkfs.gfs2为gfs2文件系统创建工具,其一般常用的选项有: 242 | 243 | -b BlockSize:指定文件系统块大小,最小为512,默认为4096; 244 | -J MegaBytes:指定gfs2日志区域大小,默认为128MB,最小值为8MB; 245 | -j Number:指定创建gfs2文件系统时所创建的日志区域个数,一般需要为每个挂载的客户端指定一个日志区域; 246 | -p LockProtoName:所使用的锁协议名称,通常为lock_dlm或lock_nolock之一; 247 | -t LockTableName:锁表名称,一般来说一个集群文件系统需一个锁表名以便让集群节点在施加文件锁时得悉其所关联到的集群文件系统,锁表名称为clustername:fsname,其中的clustername必须跟集群配置文件中的集群名称保持一致,因此,也仅有此集群内的节点可访问此集群文件系统;此外,同一个集群内,每个文件系统的名称必须惟一; 248 | 249 | 因此,若要在前面的/dev/sdb1上创建集群文件系统gfs2,可以使用如下命令: 250 | # mkfs.gfs2 -j 3 -p lock_dlm -t tcluster:sdb1 /dev/sdb1 251 | 252 | 六、配置使用cLVM(集群逻辑卷) 253 | 254 | 在RHCS集群节点上安装lvm2-cluster: 255 | # for I in {1..3}; do ssh node$I 'yum -y install lvm2-cluster;'; done 256 | 257 | 在RHCS的各节点上,为lvm启用集群功能: 258 | # for I in {1..3}; do ssh node$I 'lvmconf --enable-cluster'; done 259 | 260 | 而后,为RHCS各节点启动clvmd服务: 261 | # for I in {1..3}; do ssh node$I 'service clvmd start'; done 262 | 263 | 如果需要创建物理卷、卷组和逻辑卷,使用管理单机逻辑卷的相关命令即可;比如,将/dev/sdb2创建为物理卷: 264 | # pvcreate /dev/sdb2 265 | # pvs 266 | 267 | 此时,在另外的其它节点上也能够看到刚刚创建的物理卷。 268 | 269 | 创建卷组和逻辑卷: 270 | # vgcreate clustervg /dev/sdb2 271 | # lvcreate -L 2G -n clusterlv clustervg 272 | 273 | 七、gfs2的其它管理工具 274 | 275 | 7.1 gfs2-tool 276 | 277 | 查看挂载至/mydata目录上的某gfs2文件系统上的日志相关信息: 278 | # gfs2-tool journals /mydata 279 | 280 | 7.2 gfs2-jadd 281 | 282 | 为挂载至/mydata的gfs2文件系统添加新的日志区域: 283 | # gfs2_jadd -j 1 /mydata 284 | 285 | 7.3 gfs2-grow 286 | 287 | 如果需要扩展逻辑卷,其方式与普通逻辑卷相同,只是gfs2文件系统的扩展,则需要gfs2_grow进行,其需要以挂载点为参数: 288 | 289 | # gfs2-grow /mydata 290 | 291 | 7.4 gfs2-tool gettune 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 附: 301 | 302 | Red Hat Resource Group Manager provides high availability of critical server applications in the event of planned or unplanned system downtime. 303 | 304 | 创建一个GFS文件系统: 305 | 需要提供的信息: 306 | 1、锁类型: 307 | lock_nolock 308 | lock_dlm 309 | 2、锁文件的名字,通常即文件系统名 310 | cluster_name:fs_name 311 | 3、日志的个数,通常一个节点对应一个日志文件,但建议提供比节点数更多的日志数目,以提供冗余; 312 | 4、日志文件大小 313 | 5、文件系统大小 314 | 315 | Syntax: gfs_mkfs -p lock_dlm -t ClusterName:FSName -j Number -b block_size -J journal_size BlockDevice 316 | 317 | 如: 318 | # gfs_mkfs -p lock_dlm -t gfscluster:gfslv -j 5 /dev/vg0/gfslv 319 | 320 | 可以通过其对应挂载点查看gfs文件系统属性信息; 321 | # gfs_tool df /mount_point 322 | 323 | 324 | 325 | 326 | 挂载GFS文件系统: 327 | mount -o StdMountOpts,GFSOptions -t gfs DEVICE MOUNTPOINT 328 | 329 | 前提:挂载GFS文件的主机必须是对应集群中的节点; 330 | 331 | 挂载GFS文件时有如下常用选项可用: 332 | lockproto=[locl_dlm,lock_nolock] 333 | locktable=clustername:fsname 334 | upgrade # GFS版本升级时有用 335 | acl 336 | 337 | 如果不想每一次启用GFS时都得指定某选项,也可以通过类似如下命令为其一次性指定: 338 | # gfs_tool margs "lockproto=lock_dlm,acl" 339 | 340 | 341 | 342 | 载入相应的gfs模块,并查看lv是否成功 343 | 344 | # modprobe gfs 345 | # modprobe gfs2 346 | # chkconfig gfs on 347 | # chkconfig gfs2 on 348 | 349 | # chkconfig clvmd on 350 | 351 | # /etc/init.d/gfs restart 352 | # /etc/init.d/gfs2 restart 353 | # /etc/init.d/clvmd restart 354 | 355 | # lvscan 356 | 357 | 358 | lvmconf --enale-cluster 359 | 360 | 361 | 配置故障切换域 362 | 故障切换域是一个命名的集群节点子集,它可在节点失败事件中运行集群服务。故障切换域有以下特征: 363 | 无限制 — 允许您为在子集指定首选成员子集,但分配给这个域名的集群服务可在任意可用成员中运行。 364 | 限制 — 允许您限制可运行具体集群服务的成员。如果在限制故障切换域中没有可用成员,则无法启动集群服务(手动或者使用集群软件均不可行)。 365 | 无序 — 当将一个集群服务分配给一个无序故障切换域时,则可从可用故障切换域成员中随机选择运行集群服务的成员,没有优先顺序。 366 | 有序的 — 可让您在故障切换域的成员间指定顺序。该列表顶端的成员是首选成员,接下来是列表中的第二个成员,依此类推。 367 | 故障恢复 — 允许您指定在故障切换域中的服务是否应该恢复到节点失败前最初运行的节点。配置这个特性在作为有序故障切换域一部分节点重复失败的环境中很有帮助。在那种情况下,如果某个节点是故障切换域中的首选节点,在可能在首选节点和其它节点间重复切换和恢复某个服务,从而不会对性能产生严重影响。 368 | -------------------------------------------------------------------------------- /Scalable System Design.txt: -------------------------------------------------------------------------------- 1 | Building scalable system is becoming a hotter and hotter topic. Mainly because more and more people are using computer these days, both the transaction volume and their performance expectation has grown tremendously. 2 | 3 | This one covers general considerations. I have another blogs with more specific coverage on DB scalability as well as Web site scalability. 4 | 5 | General Principles 6 | "Scalability" is not equivalent to "Raw Performance" 7 | Scalability is about reducing the adverse impact due to growth on performance, cost, maintainability and many other aspects 8 | e.g. Running every components in one box will have higher performance when the load is small. But it is not scalable because performance drops drastically when the load is increased beyond the machine's capacity 9 | 10 | Understand environmental workload conditions that the system is design for 11 | Dimension of growth and growth rate: e.g. Number of users, Transaction volume, Data volume 12 | Measurement and their target: e.g. Response time, Throughput 13 | Understand who is your priority customers 14 | Rank the importance of traffic so you know what to sacrifice in case you cannot handle all of them 15 | Scale out and Not scale up 16 | Scale the system horizontally (adding more cheap machine), but not vertically (upgrade to a more powerful machine) 17 | Keep your code modular and simple 18 | The ability to swap out old code and replace with new code without worries of breaking other parts of the system allows you to experiment different ways of optimization quickly 19 | Never sacrifice code modularity for any (including performance-related) reasons 20 | Don't guess the bottleneck, Measure it 21 | Bottlenecks are slow code which are frequently executed. Don't optimize slow code if they are rarely executed 22 | Write performance unit test so you can collect fine grain performance data at the component level 23 | Setup a performance lab so you can conduct end-to-end performance improvement measurement easily 24 | Plan for growth 25 | Do regular capacity planning. Collect usage statistics, predict the growth rate 26 | 27 | Common Techniques 28 | Server Farm (real time access) 29 | If there is a large number of independent (potentially concurrent) request, then you can use a server farm which is basically a set of identically configured machine, frontend by a load balancer. 30 | The application itself need to be stateless so the request can be dispatched purely based on load conditions and not other factors. 31 | Incoming requests will be dispatched by the load balancer to different machines and hence the workload is spread and shared across the servers in the farm. 32 | The architecture allows horizontal growth so when the workload increases, you can just add more server instances into the farm. 33 | This strategy is even more effective when combining with Cloud computing as adding more VM instances into the farm is just an API call. 34 | Data Partitioning 35 | Spread your data into multiple DB so that data access workload can be distributed across multiple servers 36 | By nature, data is stateful. So there must be a deterministic mechanism to dispatch data request to the server that host the data 37 | Data partitioning mechanism also need to take into considerations the data access pattern. Data that need to be accessed together should be staying in the same server. A more sophisticated approach can migrate data continuously according to data access pattern shift. 38 | Most distributed key/value store do this 39 | Map / Reduce (Batch Parallel Processing) 40 | The algorithm itself need to be parallelizable. This usually mean the steps of execution should be relatively independent of each other. 41 | Google's Map/Reduce is a good framework for this model. There is also an open source Java framework Hadoop as well. 42 | Content Delivery Network (Static Cache) 43 | This is common for static media content. The idea is to create many copies of contents that are distributed geographically across servers. 44 | User request will be routed to the server replica with close proxmity 45 | Cache Engine (Dynamic Cache) 46 | This is a time vs space tradeoff. Some executions may use the same set of input parameters over and over again. Therefore, instead of redo the same execution for same input parameters, we can remember the previous execution's result. 47 | This is typically implemented as a lookup cache. 48 | Memcached and EHCache are some of the popular caching packages 49 | Resources Pool 50 | DBSession and TCP connection are expensive to create, so reuse them across multiple requests 51 | Calculate an approximate result 52 | Instead of calculate an accurate answer, see if you can tradeoff some accuracy for speed. 53 | If real life, usually some degree of inaccuracy is tolerable 54 | Filtering at the source 55 | Try to do more processing upstream (where data get generated) than downstream because it reduce the amount of data being propagated 56 | Asynchronous Processing 57 | You make a call which returns a result. But you don't need to use the result until at a much later stage of your process. Therefore, you don't need to wait immediately after making the call., instead you can proceed to do other things until you reach the point where you need to use the result. 58 | In additional, the waiting thread is idle but consume system resources. For high transaction volume, the number of idle threads is (arrival_rate * processing_time) which can be a very big number if the arrival_rate is high. The system is running under a very ineffective mode 59 | The service call in this example is better handled using an asynchronous processing model. This is typically done in 2 ways: Callback and Polling 60 | In callback mode, the caller need to provide a response handler when making the call. The call itself will return immediately before the actually work is done at the server side. When the work is done later, response will be coming back as a separate thread which will execute the previous registered response handler. Some kind of co-ordination may be required between the calling thread and the callback thread. 61 | In polling mode, the call itself will return a "future" handle immediately. The caller can go off doing other things and later poll the "future" handle to see if the response if ready. In this model, there is no extra thread being created so no extra thread co-ordination is needed. 62 | Implementation design considerations 63 | Use efficient algorithms and data structure. Analyze the time (CPU) and space (memory) complexity for logic that are execute frequently (ie: hot spots). For example, carefully decide if hash table or binary tree should be use for lookup. 64 | Analyze your concurrent access scenarios when multiple threads accessing shared data. Carefully analyze the synchronization scenario and make sure the locking is fine-grain enough. Also watch for any possibility of deadlock situation and how you detect or prevent them. A wrong concurrent access model can have huge impact in your system's scalability. Also consider using Lock-Free data structure (e.g. Java's Concurrent Package have a couple of them) 65 | Analyze the memory usage patterns in your logic. Determine where new objects are created and where they are eligible for garbage collection. Be aware of the creation of a lot of short-lived temporary objects as they will put a high load on the Garbage Collector. 66 | However, never trade off code readability for performance. (e.g. Don't try to bundle too much logic into a single method). Let the VM handle this execution for you. 67 | 68 | References 69 | Reference: 70 | Scalable System Design 71 | Published at DZone with permission of Ricky Ho, author and DZone MVB. (source) -------------------------------------------------------------------------------- /Xtrabackup.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/Xtrabackup.txt -------------------------------------------------------------------------------- /adduser.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | DEBUG=0 4 | 5 | case $1 in 6 | -v|--verbose) 7 | DEBUG=1 8 | ;; 9 | esac 10 | 11 | useradd tom &> /dev/null 12 | [ $DEBUG -eq 1 ] && echo "Add user tom finished." 13 | -------------------------------------------------------------------------------- /adduser2.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ! id user1 &> /dev/null && useradd user1 && echo "user1" | passwd --stdin user1 &> /dev/null || echo "user1 exists." 3 | ! id user2 &> /dev/null && useradd user2 && echo "user2" | passwd --stdin user2 &> /dev/null || echo "user2 exists." 4 | ! id user3 &> /dev/null && useradd user3 && echo "user3" | passwd --stdin user3 &> /dev/null || echo "user3 exists." 5 | 6 | USERS=`wc -l /etc/passwd | cut -d: -f1` 7 | echo "$USERS users." 8 | -------------------------------------------------------------------------------- /addusers.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | useradd user1 3 | echo "user1" | passwd --stdin user1 &> /dev/null 4 | echo "Add user1 finished." 5 | -------------------------------------------------------------------------------- /adminusers.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | 4 | if [ $# -lt 1 ]; then 5 | echo "Usage: adminusers ARG" 6 | exit 7 7 | fi 8 | 9 | if [ $1 == '--add' ]; then 10 | for I in {1..10}; do 11 | if id user$I &> /dev/null; then 12 | echo "user$I exists." 13 | else 14 | useradd user$I 15 | echo user$I | passwd --stdin user$I &> /dev/null 16 | echo "Add user$I finished." 17 | fi 18 | done 19 | elif [ $1 == '--del' ]; then 20 | for I in {1..10}; do 21 | if id user$I &> /dev/null; then 22 | userdel -r user$I 23 | echo "Delete user$I finished." 24 | else 25 | echo "No user$I." 26 | fi 27 | done 28 | else 29 | echo "Unknown ARG" 30 | exit 8 31 | fi 32 | 33 | 34 | -------------------------------------------------------------------------------- /adminusers2.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | 4 | if [ $1 == '--add' ]; then 5 | for I in `echo $2 | sed 's/,/ /g'`; do 6 | if id $I &> /dev/null; then 7 | echo "$I exists." 8 | else 9 | useradd $I 10 | echo $I | passwd --stdin $I &> /dev/null 11 | echo "add $I finished." 12 | fi 13 | done 14 | elif [ $1 == '--del' ];then 15 | for I in `echo $2 | sed 's/,/ /g'`; do 16 | if id $I &> /dev/null; then 17 | userdel -r $I 18 | echo "Delete $I finished." 19 | else 20 | echo "$I NOT exist." 21 | fi 22 | done 23 | elif [ $1 == '--help' ]; then 24 | echo "Usage: adminuser2.sh --add USER1,USER2,... | --del USER1,USER2,...| --help" 25 | else 26 | echo "Unknown options." 27 | fi 28 | 29 | -------------------------------------------------------------------------------- /adminusers3.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | DEBUG=0 4 | ADD=0 5 | DEL=0 6 | 7 | for I in `seq 0 $#`; do 8 | if [ $# -gt 0 ]; then 9 | case $1 in 10 | -v|--verbose) 11 | DEBUG=1 12 | shift ;; 13 | -h|--help) 14 | echo "Usage: `basename $0` --add USER_LIST --del USER_LIST -v|--verbose -h|--help" 15 | exit 0 16 | ;; 17 | --add) 18 | ADD=1 19 | ADDUSERS=$2 20 | shift 2 21 | ;; 22 | --del) 23 | DEL=1 24 | DELUSERS=$2 25 | shift 2 26 | ;; 27 | *) 28 | echo "Usage: `basename $0` --add USER_LIST --del USER_LIST -v|--verbose -h|--help" 29 | exit 7 30 | ;; 31 | esac 32 | fi 33 | done 34 | 35 | if [ $ADD -eq 1 ]; then 36 | for USER in `echo $ADDUSERS | sed 's@,@ @g'`; do 37 | if id $USER &> /dev/null; then 38 | [ $DEBUG -eq 1 ] && echo "$USER exists." 39 | else 40 | useradd $USER 41 | [ $DEBUG -eq 1 ] && echo "Add user $USER finished." 42 | fi 43 | done 44 | fi 45 | 46 | if [ $DEL -eq 1 ]; then 47 | for USER in `echo $DELUSERS | sed 's@,@ @g'`; do 48 | if id $USER &> /dev/null; then 49 | userdel -r $USER 50 | [ $DEBUG -eq 1 ] && echo "Delete $USER finished." 51 | else 52 | [ $DEBUG -eq 1 ] && echo "$USER not exist." 53 | fi 54 | done 55 | fi 56 | -------------------------------------------------------------------------------- /awk bascis.txt: -------------------------------------------------------------------------------- 1 | 2 | 3 | grep: 文本过滤器 4 | grep 'pattern' input_file ... 5 | 6 | sed:流编辑器 7 | 8 | awk: 报告生成器 9 | 格式化以后,显示 10 | 11 | AWK a.k.a. Aho, Kernighan and Weinberger 12 | 13 | new awk: nawk 14 | 15 | gawk, awk 16 | 17 | 18 | 19 | # awk [options] 'script' file1 file2, ... 20 | # awk [options] 'PATTERN { action }' file1 file2, ... 21 | print, printf 22 | 23 | -F 24 | 25 | 26 | 27 | 28 | 29 | awk的输出: 30 | 31 | 一、print 32 | print的使用格式: 33 | print item1, item2, ... 34 | 要点: 35 | 1、各项目之间使用逗号隔开,而输出时则以空白字符分隔; 36 | 2、输出的item可以为字符串或数值、当前记录的字段(如$1)、变量或awk的表达式;数值会先转换为字符串,而后再输出; 37 | 3、print命令后面的item可以省略,此时其功能相当于print $0, 因此,如果想输出空白行,则需要使用print ""; 38 | 39 | 例子: 40 | # awk 'BEGIN { print "line one\nline two\nline three" }' 41 | awk -F: '{ print $1, $3 }' /etc/passwd 42 | 43 | 44 | 二、awk变量 45 | 46 | 2.1 awk内置变量之记录变量: 47 | FS: field separator,读取文件本时,所使用字段分隔符; 48 | RS: Record separator,输入文本信息所使用的换行符; 49 | OFS: Output Filed Separator: 50 | ORS:Output Row Separator: 51 | 52 | awk -F: 53 | OFS="#" 54 | FS=":" 55 | 56 | 57 | 2.2 awk内置变量之数据变量: 58 | NR: The number of input records,awk命令所处理的记录数;如果有多个文件,这个数目会把处理的多个文件中行统一计数; 59 | NF:Number of Field,当前记录的field个数; 60 | FNR: 与NR不同的是,FNR用于记录正处理的行是当前这一文件中被总共处理的行数; 61 | ARGV: 数组,保存命令行本身这个字符串,如awk '{print $0}' a.txt b.txt这个命令中,ARGV[0]保存awk,ARGV[1]保存a.txt; 62 | ARGC: awk命令的参数的个数; 63 | FILENAME: awk命令所处理的文件的名称; 64 | ENVIRON:当前shell环境变量及其值的关联数组; 65 | 66 | 如:awk 'BEGIN{print ENVIRON["PATH"]}' 67 | 68 | 2.3 用户自定义变量 69 | 70 | gawk允许用户自定义自己的变量以便在程序代码中使用,变量名命名规则与大多数编程语言相同,只能使用字母、数字和下划线,且不能以数字开头。gawk变量名称区分字符大小写。 71 | 72 | 2.3.1 在脚本中赋值变量 73 | 74 | 在gawk中给变量赋值使用赋值语句进行,例如: 75 | awk 'BEGIN{var="variable testing";print var}' 76 | 77 | 2.3.2 在命令行中使用赋值变量 78 | 79 | gawk命令也可以在“脚本”外为变量赋值,并在脚本中进行引用。例如,上述的例子还可以改写为: 80 | awk -v var="variable testing" 'BEGIN{print var}' 81 | 82 | 三、printf 83 | printf命令的使用格式: 84 | printf format, item1, item2, ... 85 | 86 | 要点: 87 | 1、其与print命令的最大不同是,printf需要指定format; 88 | 2、format用于指定后面的每个item的输出格式; 89 | 3、printf语句不会自动打印换行符;\n 90 | 91 | format格式的指示符都以%开头,后跟一个字符;如下: 92 | %c: 显示字符的ASCII码; 93 | %d, %i:十进制整数; 94 | %e, %E:科学计数法显示数值; 95 | %f: 显示浮点数; 96 | %g, %G: 以科学计数法的格式或浮点数的格式显示数值; 97 | %s: 显示字符串; 98 | %u: 无符号整数; 99 | %%: 显示%自身; 100 | 101 | 修饰符: 102 | N: 显示宽度; 103 | -: 左对齐; 104 | +:显示数值符号; 105 | 106 | 例子: 107 | # awk -F: '{printf "%-15s %i\n",$1,$3}' /etc/passwd 108 | 109 | 四、输出重定向 110 | 111 | print items > output-file 112 | print items >> output-file 113 | print items | command 114 | 115 | 特殊文件描述符: 116 | /dev/stdin:标准输入 117 | /dev/sdtout: 标准输出 118 | /dev/stderr: 错误输出 119 | /dev/fd/N: 某特定文件描述符,如/dev/stdin就相当于/dev/fd/0; 120 | 121 | 例子: 122 | # awk -F: '{printf "%-15s %i\n",$1,$3 > "/dev/stderr" }' /etc/passwd 123 | 124 | 125 | 六、awk的操作符: 126 | 127 | 6.1 算术操作符: 128 | 129 | -x: 负值 130 | +x: 转换为数值; 131 | x^y: 132 | x**y: 次方 133 | x*y: 乘法 134 | x/y:除法 135 | x+y: 136 | x-y: 137 | x%y: 138 | 139 | 6.2 字符串操作符: 140 | 只有一个,而且不用写出来,用于实现字符串连接; 141 | 142 | 6.3 赋值操作符: 143 | = 144 | += 145 | -= 146 | *= 147 | /= 148 | %= 149 | ^= 150 | **= 151 | 152 | ++ 153 | -- 154 | 155 | 需要注意的是,如果某模式为=号,此时使用/=/可能会有语法错误,应以/[=]/替代; 156 | 157 | 6.4 布尔值 158 | 159 | awk中,任何非0值或非空字符串都为真,反之就为假; 160 | 161 | 6.5 比较操作符: 162 | x < y True if x is less than y. 163 | x <= y True if x is less than or equal to y. 164 | x > y True if x is greater than y. 165 | x >= y True if x is greater than or equal to y. 166 | x == y True if x is equal to y. 167 | x != y True if x is not equal to y. 168 | x ~ y True if the string x matches the regexp denoted by y. 169 | x !~ y True if the string x does not match the regexp denoted by y. 170 | subscript in array True if the array array has an element with the subscript subscript. 171 | 172 | 6.7 表达式间的逻辑关系符: 173 | && 174 | || 175 | 176 | 6.8 条件表达式: 177 | selector?if-true-exp:if-false-exp 178 | 179 | if selector; then 180 | if-true-exp 181 | else 182 | if-false-exp 183 | fi 184 | 185 | a=3 186 | b=4 187 | a>b?a is max:b ia max 188 | 189 | 6.9 函数调用: 190 | function_name (para1,para2) 191 | 192 | 193 | 194 | 195 | 七 awk的模式: 196 | 197 | awk 'program' input-file1 input-file2 ... 198 | 其中的program为: 199 | pattern { action } 200 | pattern { action } 201 | ... 202 | 203 | 7.1 常见的模式类型: 204 | 1、Regexp: 正则表达式,格式为/regular expression/ 205 | 2、expresssion: 表达式,其值非0或为非空字符时满足条件,如:$1 ~ /foo/ 或 $1 == "magedu",用运算符~(匹配)和!~(不匹配)。 206 | 3、Ranges: 指定的匹配范围,格式为pat1,pat2 207 | 4、BEGIN/END:特殊模式,仅在awk命令执行前运行一次或结束前运行一次 208 | 5、Empty(空模式):匹配任意输入行; 209 | 210 | 7.2 常见的Action 211 | 1、Expressions: 212 | 2、Control statements 213 | 3、Compound statements 214 | 4、Input statements 215 | 5、Output statements 216 | 217 | 218 | /正则表达式/:使用通配符的扩展集。 219 | 220 | 关系表达式:可以用下面运算符表中的关系运算符进行操作,可以是字符串或数字的比较,如$2>%1选择第二个字段比第一个字段长的行。 221 | 222 | 模式匹配表达式: 223 | 224 | 模式,模式:指定一个行的范围。该语法不能包括BEGIN和END模式。 225 | 226 | BEGIN:让用户指定在第一条输入记录被处理之前所发生的动作,通常可在这里设置全局变量。 227 | 228 | END:让用户在最后一条输入记录被读取之后发生的动作。 229 | 230 | 231 | 232 | 233 | 234 | 八 控制语句: 235 | 8.1 if-else 236 | 语法:if (condition) {then-body} else {[ else-body ]} 237 | 例子: 238 | awk -F: '{if ($1=="root") print $1, "Admin"; else print $1, "Common User"}' /etc/passwd 239 | awk -F: '{if ($1=="root") printf "%-15s: %s\n", $1,"Admin"; else printf "%-15s: %s\n", $1, "Common User"}' /etc/passwd 240 | awk -F: -v sum=0 '{if ($3>=500) sum++}END{print sum}' /etc/passwd 241 | 242 | 8.2 while 243 | 语法: while (condition){statement1; statment2; ...} 244 | awk -F: '{i=1;while (i<=3) {print $i;i++}}' /etc/passwd 245 | awk -F: '{i=1;while (i<=NF) { if (length($i)>=4) {print $i}; i++ }}' /etc/passwd 246 | 247 | 8.3 do-while 248 | 语法: do {statement1, statement2, ...} while (condition) 249 | awk -F: '{i=1;do {print $i;i++}while(i<=3)}' /etc/passwd 250 | 251 | 8.4 for 252 | 语法: for ( variable assignment; condition; iteration process) { statement1, statement2, ...} 253 | awk -F: '{for(i=1;i<=3;i++) print $i}' /etc/passwd 254 | awk -F: '{for(i=1;i<=NF;i++) { if (length($i)>=4) {print $i}}}' /etc/passwd 255 | 256 | for循环还可以用来遍历数组元素: 257 | 语法: for (i in array) {statement1, statement2, ...} 258 | awk -F: '$NF!~/^$/{BASH[$NF]++}END{for(A in BASH){printf "%15s:%i\n",A,BASH[A]}}' /etc/passwd 259 | 260 | 8.5 case 261 | 语法:switch (expression) { case VALUE or /REGEXP/: statement1, statement2,... default: statement1, ...} 262 | 263 | 8.6 break 和 continue 264 | 常用于循环或case语句中 265 | 266 | 8.7 next 267 | 提前结束对本行文本的处理,并接着处理下一行;例如,下面的命令将显示其ID号为奇数的用户: 268 | # awk -F: '{if($3%2==0) next;print $1,$3}' /etc/passwd 269 | 270 | 271 | 九 awk中使用数组 272 | 273 | 9.1 数组 274 | 275 | array[index-expression] 276 | 277 | index-expression可以使用任意字符串;需要注意的是,如果某数据组元素事先不存在,那么在引用其时,awk会自动创建此元素并初始化为空串;因此,要判断某数据组中是否存在某元素,需要使用index in array的方式。 278 | 279 | 要遍历数组中的每一个元素,需要使用如下的特殊结构: 280 | for (var in array) { statement1, ... } 281 | 其中,var用于引用数组下标,而不是元素值; 282 | 283 | 例子: 284 | netstat -ant | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}' 285 | 每出现一被/^tcp/模式匹配到的行,数组S[$NF]就加1,NF为当前匹配到的行的最后一个字段,此处用其值做为数组S的元素索引; 286 | 287 | awk '{counts[$1]++}; END {for(url in counts) print counts[url], url}' /var/log/httpd/access_log 288 | 用法与上一个例子相同,用于统计某日志文件中IP地的访问量 289 | 290 | 9.2 删除数组变量 291 | 292 | 从关系数组中删除数组索引需要使用delete命令。使用格式为: 293 | 294 | delete array[index] 295 | 296 | 297 | 298 | 十、awk的内置函数 299 | 300 | split(string, array [, fieldsep [, seps ] ]) 301 | 功能:将string表示的字符串以fieldsep为分隔符进行分隔,并将分隔后的结果保存至array为名的数组中;数组下标为从0开始的序列; 302 | 303 | netstat -ant | awk '/:80\>/{split($5,clients,":");IP[clients[1]]++}END{for(i in IP){print IP[i],i}}' | sort -rn | head -50 304 | 305 | length([string]) 306 | 功能:返回string字符串中字符的个数; 307 | 308 | 309 | substr(string, start [, length]) 310 | 功能:取string字符串中的子串,从start开始,取length个;start从1开始计数; 311 | 312 | system(command) 313 | 功能:执行系统command并将结果返回至awk命令 314 | 315 | systime() 316 | 功能:取系统当前时间 317 | 318 | tolower(s) 319 | 功能:将s中的所有字母转为小写 320 | 321 | toupper(s) 322 | 功能:将s中的所有字母转为大写 323 | 324 | 十一、用户自定义函数 325 | 326 | 自定义函数使用function关键字。格式如下: 327 | 328 | function F_NAME([variable]) 329 | { 330 | statements 331 | } 332 | 333 | 函数还可以使用return语句返回值,格式为“return value”。 334 | 335 | 336 | 337 | 338 | 339 | -------------------------------------------------------------------------------- /awk.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joojfork/Linuxyunwei/5f8bb5f726229735c504c1f655f8b7bdad0d2b62/awk.txt -------------------------------------------------------------------------------- /bash.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | grep "\ /dev/null 4 | RETVAL=$? 5 | 6 | if [ $RETVAL -eq 0 ]; then 7 | USERS=`grep "\ /dev/null 4 | RETVAL=$? 5 | 6 | if [ $RETVAL -eq 0 ]; then 7 | AUSER=`grep "\ ${TMPFILE} 110 | for a in $(seq 0 1 ${cpucount} ); do 111 | echo "old_system[${a}]=${system[${a}]}" >> ${TMPFILE} 112 | echo "old_user[${a}]=${user[${a}]}" >> ${TMPFILE} 113 | echo "old_nice[${a}]=${nice[${a}]}" >> ${TMPFILE} 114 | echo "old_iowait[${a}]=${iowait[${a}]}" >> ${TMPFILE} 115 | echo "old_irq[${a}]=${irq[${a}]}" >> ${TMPFILE} 116 | echo "old_softirq[${a}]=${softirq[${a}]}" >> ${TMPFILE} 117 | echo "old_idle[${a}]=${idle[${a}]}" >> ${TMPFILE} 118 | echo "old_used[${a}]=${used[${a}]}" >> ${TMPFILE} 119 | echo "old_total[${a}]=${total[${a}]}" >> ${TMPFILE} 120 | done 121 | } 122 | 123 | read_tmpfile() { 124 | if [ -e ${TMPFILE} ]; then 125 | source ${TMPFILE} # include the vars from the tmp file 126 | fi 127 | (( DEBUG )) && cat ${TMPFILE} 128 | } 129 | 130 | ######################## 131 | # MAIN 132 | ######################## 133 | 134 | parse_options $@ 135 | 136 | read_tmpfile 137 | 138 | procstat=$(cat /proc/stat 2>&1) 139 | (( DEBUG )) && echo "$procstat" 140 | cpucount=$(( $(grep -i cpu <<< "${procstat}" | tail -n 1 | cut -d' ' -f 1 | grep -Eo [0-9]+) + 1 )) 141 | (( DEBUG )) && echo "cpucount=${cpucount}" 142 | 143 | for a in $(seq 0 1 ${cpucount} ); do 144 | if [ $a -eq ${cpucount} ]; then 145 | cpu[$a]=$(head -n 1 <<< "${procstat}" | sed 's/ / /g') 146 | else 147 | cpu[$a]=$(grep cpu${a} <<< "${procstat}") 148 | fi 149 | user[$a]=$(cut -d' ' -f 2 <<< ${cpu[$a]}) 150 | nice[$a]=$(cut -d' ' -f 3 <<< ${cpu[$a]}) 151 | system[$a]=$(cut -d' ' -f 4 <<< ${cpu[$a]}) 152 | idle[$a]=$(cut -d' ' -f 5 <<< ${cpu[$a]}) 153 | iowait[$a]=$(cut -d' ' -f 6 <<< ${cpu[$a]}) 154 | irq[$a]=$(cut -d' ' -f 7 <<< ${cpu[$a]}) 155 | softirq[$a]=$(cut -d' ' -f 8 <<< ${cpu[$a]}) 156 | used[$a]=$((( ${user[$a]} + ${nice[$a]} + ${system[$a]} + ${iowait[$a]} + ${irq[$a]} + ${softirq[$a]} ))) 157 | total[$a]=$((( ${user[$a]} + ${nice[$a]} + ${system[$a]} + ${idle[$a]} + ${iowait[$a]} + ${irq[$a]} + ${softirq[$a]} ))) 158 | 159 | [ -z ${old_user[${a}]} ] && old_user[${a}]=0 160 | [ -z ${old_nice[${a}]} ] && old_nice[${a}]=0 161 | [ -z ${old_system[${a}]} ] && old_system[${a}]=0 162 | [ -z ${old_idle[${a}]} ] && old_idle[${a}]=0 163 | [ -z ${old_iowait[${a}]} ] && old_iowait[${a}]=0 164 | [ -z ${old_irq[${a}]} ] && old_irq[${a}]=0 165 | [ -z ${old_softirq[${a}]} ] && old_softirq[${a}]=0 166 | [ -z ${old_used[${a}]} ] && old_used[${a}]=0 167 | [ -z ${old_total[${a}]} ] && old_total[${a}]=0 168 | 169 | diff_user[$a]=$(((${user[$a]}-${old_user[${a}]}))) 170 | diff_nice[$a]=$(((${nice[$a]}-${old_nice[${a}]}))) 171 | diff_system[$a]=$(((${system[$a]}-${old_system[${a}]}))) 172 | diff_idle[$a]=$(((${idle[$a]}-${old_idle[${a}]}))) 173 | diff_iowait[$a]=$(((${iowait[$a]}-${old_iowait[${a}]}))) 174 | diff_irq[$a]=$(((${irq[$a]}-${old_irq[${a}]}))) 175 | diff_softirq[$a]=$(((${softirq[$a]}-${old_softirq[${a}]}))) 176 | diff_used[$a]=$(((${used[$a]}-${old_used[${a}]}))) 177 | diff_total[$a]=$(((${total[$a]}-${old_total[${a}]}))) 178 | 179 | pct_user[$a]=$(bc <<< "scale=${scale};${diff_user[$a]}*100/${diff_total[$a]}") 180 | pct_nice[$a]=$(bc <<< "scale=${scale};${diff_nice[$a]}*100/${diff_total[$a]}") 181 | pct_system[$a]=$(bc <<< "scale=${scale};${diff_system[$a]}*100/${diff_total[$a]}") 182 | pct_idle[$a]=$(bc <<< "scale=${scale};${diff_idle[$a]}*100/${diff_total[$a]}") 183 | pct_iowait[$a]=$(bc <<< "scale=${scale};${diff_iowait[$a]}*100/${diff_total[$a]}") 184 | pct_irq[$a]=$(bc <<< "scale=${scale};${diff_irq[$a]}*100/${diff_total[$a]}") 185 | pct_softirq[$a]=$(bc <<< "scale=${scale};${diff_softirq[$a]}*100/${diff_total[$a]}") 186 | pct_used[$a]=$(bc <<< "scale=${scale};${diff_used[$a]}*100/${diff_total[$a]}") 187 | done 188 | 189 | write_tmpfile 190 | 191 | [ $(cut -d'.' -f 1 <<< ${pct_used[${cpucount}]}) -ge ${warning} ] && exitstatus=1 192 | [ $(cut -d'.' -f 1 <<< ${pct_used[${cpucount}]}) -ge ${critical} ] && exitstatus=2 193 | 194 | result="CPU=${pct_used[${cpucount}]}" 195 | if [ $show_all -gt 0 ]; then 196 | for a in $(seq 0 1 $(((${cpucount} - 1)))); do 197 | result="${result}, CPU${a}=${pct_used[${a}]}" 198 | done 199 | fi 200 | 201 | if [ "${warning}" = "999" ]; then 202 | warning="" 203 | fi 204 | if [ "${critical}" = "999" ]; then 205 | critical="" 206 | fi 207 | 208 | perfdata="used=${pct_used[${cpucount}]};${warning};${critical};; system=${pct_system[${cpucount}]};;;; user=${pct_user[${cpucount}]};;;; nice=${pct_nice[${cpucount}]};;;; iowait=${pct_iowait[${cpucount}]};;;; irq=${pct_irq[${cpucount}]};;;; softirq=${pct_softirq[${cpucount}]};;;;" 209 | if [ $show_all -gt 0 ]; then 210 | for a in $(seq 0 1 $(((${cpucount} - 1)))); do 211 | perfdata="${perfdata} used${a}=${pct_used[${a}]};;;; system${a}=${pct_system[${a}]};;;; user${a}=${pct_user[${a}]};;;; nice${a}=${pct_nice[${a}]};;;; iowait${a}=${pct_iowait[${a}]};;;; irq${a}=${pct_irq[${a}]};;;; softirq${a}=${pct_softirq[${a}]};;;;" 212 | done 213 | fi 214 | 215 | echo "${status[$exitstatus]}${result} | ${perfdata}" 216 | exit $exitstatus 217 | 218 | -------------------------------------------------------------------------------- /check_cpu.sh.bak: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Check CPU Usage via /proc/stats 4 | 5 | ######################## 6 | # DECLARATIONS 7 | ######################## 8 | 9 | PROGNAME=`basename $0` 10 | REVISION=`echo '$Revision: 1.0 $' | sed -e 's/[^0-9.]//g'` 11 | 12 | DEBUG=0 13 | 14 | exitstatus=0 15 | result="" 16 | perfdata="" 17 | scale=2 18 | show_all=0 19 | warning=999 20 | critical=999 21 | 22 | TMPFILE="/tmp/check_cpu.tmp" 23 | 24 | status[0]="OK: " 25 | status[1]="WARNING: " 26 | status[2]="CRITICAL: " 27 | status[3]="UNKNOWN: " 28 | 29 | ######################## 30 | # FUNCTIONS 31 | ######################## 32 | 33 | print_usage() { 34 | echo "Usage: $PROGNAME [options]" 35 | echo " e.g. $PROGNAME -w 75 -c 90 -s 2 --all" 36 | echo 37 | echo "Options:" 38 | echo -e "\t --help | -h print help" 39 | echo -e "\t --version | -V print version" 40 | echo -e "\t --verbose | -v be verbose (debug mode)" 41 | echo -e "\t --scale | -s [int] decimal precision of results" 42 | echo -e "\t default=2" 43 | echo -e "\t --all | -a return values for all cpus individually" 44 | echo -e "\t default= summary data only" 45 | echo -e "\t -w [int] set warning value" 46 | echo -e "\t -c [int] set critical value" 47 | echo 48 | echo 49 | } 50 | 51 | print_help() { 52 | # print_revision $PROGNAME $REVISION 53 | echo "${PROGNAME} Revision: ${REVISION}" 54 | echo 55 | echo "This plugin checks local cpu usage using /proc/stat" 56 | echo 57 | print_usage 58 | echo 59 | # support 60 | exit 3 61 | } 62 | 63 | parse_options() { 64 | # parse cmdline arguments 65 | (( DEBUG )) && echo "Parsing options $1 $2 $3 $4 $5 $6 $7 $8" 66 | if [ "$#" -gt 0 ]; then 67 | while [ "$#" -gt 0 ]; do 68 | case "$1" in 69 | '--help'|'-h') 70 | print_help 71 | exit 3 72 | ;; 73 | '--version'|'-V') 74 | #print_revision $PROGNAME $REVISION 75 | echo "${PROGNAME} Revision: ${REVISION}" 76 | exit 3 77 | ;; 78 | '--verbose'|'-v') 79 | DEBUG=1 80 | shift 1 81 | ;; 82 | '--scale'|'-s') 83 | scale="$2" 84 | shift 2 85 | ;; 86 | '--all'|'-a') 87 | show_all=1 88 | shift 1 89 | ;; 90 | '-c') 91 | critical="$2" 92 | shift 2 93 | ;; 94 | '-w') 95 | warning="$2" 96 | shift 2 97 | ;; 98 | *) 99 | echo "Unknown option!" 100 | print_usage 101 | exit 3 102 | ;; 103 | esac 104 | done 105 | fi 106 | } 107 | 108 | write_tmpfile() { 109 | echo "old_date=$(date +%s)" > ${TMPFILE} 110 | for a in $(seq 0 1 ${cpucount} ); do 111 | echo "old_system[${a}]=${system[${a}]}" >> ${TMPFILE} 112 | echo "old_user[${a}]=${user[${a}]}" >> ${TMPFILE} 113 | echo "old_nice[${a}]=${nice[${a}]}" >> ${TMPFILE} 114 | echo "old_iowait[${a}]=${iowait[${a}]}" >> ${TMPFILE} 115 | echo "old_irq[${a}]=${irq[${a}]}" >> ${TMPFILE} 116 | echo "old_softirq[${a}]=${softirq[${a}]}" >> ${TMPFILE} 117 | echo "old_idle[${a}]=${idle[${a}]}" >> ${TMPFILE} 118 | echo "old_used[${a}]=${used[${a}]}" >> ${TMPFILE} 119 | echo "old_total[${a}]=${total[${a}]}" >> ${TMPFILE} 120 | done 121 | } 122 | 123 | read_tmpfile() { 124 | if [ -e ${TMPFILE} ]; then 125 | source ${TMPFILE} # include the vars from the tmp file 126 | fi 127 | (( DEBUG )) && cat ${TMPFILE} 128 | } 129 | 130 | ######################## 131 | # MAIN 132 | ######################## 133 | 134 | parse_options $@ 135 | 136 | read_tmpfile 137 | 138 | procstat=$(cat /proc/stat 2>&1) 139 | (( DEBUG )) && echo "$procstat" 140 | cpucount=$(( $(grep -i cpu <<< "${procstat}" | tail -n 1 | cut -d' ' -f 1 | grep -Eo [0-9]+) + 1 )) 141 | (( DEBUG )) && echo "cpucount=${cpucount}" 142 | 143 | for a in $(seq 0 1 ${cpucount} ); do 144 | if [ $a -eq ${cpucount} ]; then 145 | cpu[$a]=$(head -n 1 <<< "${procstat}" | sed 's/ / /g') 146 | else 147 | cpu[$a]=$(grep cpu${a} <<< "${procstat}") 148 | fi 149 | user[$a]=$(cut -d' ' -f 2 <<< ${cpu[$a]}) 150 | nice[$a]=$(cut -d' ' -f 3 <<< ${cpu[$a]}) 151 | system[$a]=$(cut -d' ' -f 4 <<< ${cpu[$a]}) 152 | idle[$a]=$(cut -d' ' -f 5 <<< ${cpu[$a]}) 153 | iowait[$a]=$(cut -d' ' -f 6 <<< ${cpu[$a]}) 154 | irq[$a]=$(cut -d' ' -f 7 <<< ${cpu[$a]}) 155 | softirq[$a]=$(cut -d' ' -f 8 <<< ${cpu[$a]}) 156 | used[$a]=$((( ${user[$a]} + ${nice[$a]} + ${system[$a]} + ${iowait[$a]} + ${irq[$a]} + ${softirq[$a]} ))) 157 | total[$a]=$((( ${user[$a]} + ${nice[$a]} + ${system[$a]} + ${idle[$a]} + ${iowait[$a]} + ${irq[$a]} + ${softirq[$a]} ))) 158 | 159 | [ -z ${old_user[${a}]} ] && old_user[${a}]=0 160 | [ -z ${old_nice[${a}]} ] && old_nice[${a}]=0 161 | [ -z ${old_system[${a}]} ] && old_system[${a}]=0 162 | [ -z ${old_idle[${a}]} ] && old_idle[${a}]=0 163 | [ -z ${old_iowait[${a}]} ] && old_iowait[${a}]=0 164 | [ -z ${old_irq[${a}]} ] && old_irq[${a}]=0 165 | [ -z ${old_softirq[${a}]} ] && old_softirq[${a}]=0 166 | [ -z ${old_used[${a}]} ] && old_used[${a}]=0 167 | [ -z ${old_total[${a}]} ] && old_total[${a}]=0 168 | 169 | diff_user[$a]=$(((${user[$a]}-${old_user[${a}]}))) 170 | diff_nice[$a]=$(((${nice[$a]}-${old_nice[${a}]}))) 171 | diff_system[$a]=$(((${system[$a]}-${old_system[${a}]}))) 172 | diff_idle[$a]=$(((${idle[$a]}-${old_idle[${a}]}))) 173 | diff_iowait[$a]=$(((${iowait[$a]}-${old_iowait[${a}]}))) 174 | diff_irq[$a]=$(((${irq[$a]}-${old_irq[${a}]}))) 175 | diff_softirq[$a]=$(((${softirq[$a]}-${old_softirq[${a}]}))) 176 | diff_used[$a]=$(((${used[$a]}-${old_used[${a}]}))) 177 | diff_total[$a]=$(((${total[$a]}-${old_total[${a}]}))) 178 | 179 | pct_user[$a]=$(bc <<< "scale=${scale};${diff_user[$a]}*100/${diff_total[$a]}") 180 | pct_nice[$a]=$(bc <<< "scale=${scale};${diff_nice[$a]}*100/${diff_total[$a]}") 181 | pct_system[$a]=$(bc <<< "scale=${scale};${diff_system[$a]}*100/${diff_total[$a]}") 182 | pct_idle[$a]=$(bc <<< "scale=${scale};${diff_idle[$a]}*100/${diff_total[$a]}") 183 | pct_iowait[$a]=$(bc <<< "scale=${scale};${diff_iowait[$a]}*100/${diff_total[$a]}") 184 | pct_irq[$a]=$(bc <<< "scale=${scale};${diff_irq[$a]}*100/${diff_total[$a]}") 185 | pct_softirq[$a]=$(bc <<< "scale=${scale};${diff_softirq[$a]}*100/${diff_total[$a]}") 186 | pct_used[$a]=$(bc <<< "scale=${scale};${diff_used[$a]}*100/${diff_total[$a]}") 187 | done 188 | 189 | write_tmpfile 190 | 191 | [ $(cut -d'.' -f 1 <<< ${pct_used[${cpucount}]}) -ge ${warning} ] && exitstatus=1 192 | [ $(cut -d'.' -f 1 <<< ${pct_used[${cpucount}]}) -ge ${critical} ] && exitstatus=2 193 | 194 | result="CPU=${pct_used[${cpucount}]}" 195 | if [ $show_all -gt 0 ]; then 196 | for a in $(seq 0 1 $(((${cpucount} - 1)))); do 197 | result="${result}, CPU${a}=${pct_used[${a}]}" 198 | done 199 | fi 200 | 201 | if [ "${warning}" = "999" ]; then 202 | warning="" 203 | fi 204 | if [ "${critical}" = "999" ]; then 205 | critical="" 206 | fi 207 | 208 | perfdata="used=${pct_used[${cpucount}]};${warning};${critical};; system=${pct_system[${cpucount}]};;;; user=${pct_user[${cpucount}]};;;; nice=${pct_nice[${cpucount}]};;;; iowait=${pct_iowait[${cpucount}]};;;; irq=${pct_irq[${cpucount}]};;;; softirq=${pct_softirq[${cpucount}]};;;;" 209 | if [ $show_all -gt 0 ]; then 210 | for a in $(seq 0 1 $(((${cpucount} - 1)))); do 211 | perfdata="${perfdata} used${a}=${pct_used[${a}]};;;; system${a}=${pct_system[${a}]};;;; user${a}=${pct_user[${a}]};;;; nice${a}=${pct_nice[${a}]};;;; iowait${a}=${pct_iowait[${a}]};;;; irq${a}=${pct_irq[${a}]};;;; softirq${a}=${pct_softirq[${a}]};;;;" 212 | done 213 | fi 214 | 215 | echo "${status[$exitstatus]}${result} | ${perfdata}" 216 | exit $exitstatus 217 | 218 | -------------------------------------------------------------------------------- /check_mem.pl: -------------------------------------------------------------------------------- 1 | #! /usr/bin/perl -w 2 | # 3 | # $Id: check_mem.pl 8 2008-08-23 08:59:52Z rhomann $ 4 | # 5 | # check_mem v1.7 plugin for nagios 6 | # 7 | # uses the output of `free` to find the percentage of memory used 8 | # 9 | # Copyright Notice: GPL 10 | # 11 | # History: 12 | # v1.8 Rouven Homann - rouven.homann@cimt.de 13 | # + added findbin patch from Duane Toler 14 | # + added backward compatibility patch from Timour Ezeev 15 | # 16 | # v1.7 Ingo Lantschner - ingo AT boxbe DOT com 17 | # + adapted for systems with no swap (avoiding divison through 0) 18 | # 19 | # v1.6 Cedric Temple - cedric DOT temple AT cedrictemple DOT info 20 | # + add swap monitoring 21 | # + if warning and critical threshold are 0, exit with OK 22 | # + add a directive to exclude/include buffers 23 | # 24 | # v1.5 Rouven Homann - rouven.homann@cimt.de 25 | # + perfomance tweak with free -mt (just one sub process started instead of 7) 26 | # + more code cleanup 27 | # 28 | # v1.4 Garrett Honeycutt - gh@3gupload.com 29 | # + Fixed PerfData output to adhere to standards and show crit/warn values 30 | # 31 | # v1.3 Rouven Homann - rouven.homann@cimt.de 32 | # + Memory installed, used and free displayed in verbose mode 33 | # + Bit Code Cleanup 34 | # 35 | # v1.2 Rouven Homann - rouven.homann@cimt.de 36 | # + Bug fixed where verbose output was required (nrpe2) 37 | # + Bug fixed where perfomance data was not displayed at verbose output 38 | # + FindBin Module used for the nagios plugin path of the utils.pm 39 | # 40 | # v1.1 Rouven Homann - rouven.homann@cimt.de 41 | # + Status Support (-c, -w) 42 | # + Syntax Help Informations (-h) 43 | # + Version Informations Output (-V) 44 | # + Verbose Output (-v) 45 | # + Better Error Code Output (as described in plugin guideline) 46 | # 47 | # v1.0 Garrett Honeycutt - gh@3gupload.com 48 | # + Initial Release 49 | # 50 | use strict; 51 | use FindBin; 52 | FindBin::again(); 53 | use lib $FindBin::Bin; 54 | use utils qw($TIMEOUT %ERRORS &print_revision &support); 55 | use vars qw($PROGNAME $PROGVER); 56 | use Getopt::Long; 57 | use vars qw($opt_V $opt_h $verbose $opt_w $opt_c); 58 | 59 | $PROGNAME = "check_mem"; 60 | $PROGVER = "1.8"; 61 | 62 | # add a directive to exclude buffers: 63 | my $DONT_INCLUDE_BUFFERS = 0; 64 | 65 | sub print_help (); 66 | sub print_usage (); 67 | 68 | Getopt::Long::Configure('bundling'); 69 | GetOptions ("V" => \$opt_V, "version" => \$opt_V, 70 | "h" => \$opt_h, "help" => \$opt_h, 71 | "v" => \$verbose, "verbose" => \$verbose, 72 | "w=s" => \$opt_w, "warning=s" => \$opt_w, 73 | "c=s" => \$opt_c, "critical=s" => \$opt_c); 74 | 75 | if ($opt_V) { 76 | print_revision($PROGNAME,'$Revision: '.$PROGVER.' $'); 77 | exit $ERRORS{'UNKNOWN'}; 78 | } 79 | 80 | if ($opt_h) { 81 | print_help(); 82 | exit $ERRORS{'UNKNOWN'}; 83 | } 84 | 85 | print_usage() unless (($opt_c) && ($opt_w)); 86 | 87 | my ($mem_critical, $swap_critical); 88 | my ($mem_warning, $swap_warning); 89 | ($mem_critical, $swap_critical) = ($1,$2) if ($opt_c =~ /([0-9]+)[%]?(?:,([0-9]+)[%]?)?/); 90 | ($mem_warning, $swap_warning) = ($1,$2) if ($opt_w =~ /([0-9]+)[%]?(?:,([0-9]+)[%]?)?/); 91 | 92 | # Check if swap params were supplied 93 | $swap_critical ||= 100; 94 | $swap_warning ||= 100; 95 | 96 | # print threshold in output message 97 | my $mem_threshold_output = " ("; 98 | my $swap_threshold_output = " ("; 99 | 100 | if ( $mem_warning > 0 && $mem_critical > 0) { 101 | $mem_threshold_output .= "W> $mem_warning, C> $mem_critical"; 102 | } 103 | elsif ( $mem_warning > 0 ) { 104 | $mem_threshold_output .= "W> $mem_warning"; 105 | } 106 | elsif ( $mem_critical > 0 ) { 107 | $mem_threshold_output .= "C> $mem_critical"; 108 | } 109 | 110 | if ( $swap_warning > 0 && $swap_critical > 0) { 111 | $swap_threshold_output .= "W> $swap_warning, C> $swap_critical"; 112 | } 113 | elsif ( $swap_warning > 0 ) { 114 | $swap_threshold_output .= "W> $swap_warning"; 115 | } 116 | elsif ( $swap_critical > 0 ) { 117 | $swap_threshold_output .= "C> $swap_critical"; 118 | } 119 | 120 | $mem_threshold_output .= ")"; 121 | $swap_threshold_output .= ")"; 122 | 123 | my $verbose = $verbose; 124 | 125 | my ($mem_percent, $mem_total, $mem_used, $swap_percent, $swap_total, $swap_used) = &sys_stats(); 126 | my $free_mem = $mem_total - $mem_used; 127 | my $free_swap = $swap_total - $swap_used; 128 | 129 | # set output message 130 | my $output = "Memory Usage".$mem_threshold_output.": ". $mem_percent.'%
'; 131 | $output .= "Swap Usage".$swap_threshold_output.": ". $swap_percent.'%'; 132 | 133 | # set verbose output message 134 | my $verbose_output = "Memory Usage:".$mem_threshold_output.": ". $mem_percent.'% '."- Total: $mem_total MB, used: $mem_used MB, free: $free_mem MB
"; 135 | $verbose_output .= "Swap Usage:".$swap_threshold_output.": ". $swap_percent.'% '."- Total: $swap_total MB, used: $swap_used MB, free: $free_swap MB
"; 136 | 137 | # set perfdata message 138 | my $perfdata_output = "MemUsed=$mem_percent\%;$mem_warning;$mem_critical"; 139 | $perfdata_output .= " SwapUsed=$swap_percent\%;$swap_warning;$swap_critical"; 140 | 141 | 142 | # if threshold are 0, exit with OK 143 | if ( $mem_warning == 0 ) { $mem_warning = 101 }; 144 | if ( $swap_warning == 0 ) { $swap_warning = 101 }; 145 | if ( $mem_critical == 0 ) { $mem_critical = 101 }; 146 | if ( $swap_critical == 0 ) { $swap_critical = 101 }; 147 | 148 | 149 | if ($mem_percent>$mem_critical || $swap_percent>$swap_critical) { 150 | if ($verbose) { print "CRITICAL: ".$verbose_output."|".$perfdata_output."\n";} 151 | else { print "CRITICAL: ".$output."|".$perfdata_output."\n";} 152 | exit $ERRORS{'CRITICAL'}; 153 | } elsif ($mem_percent>$mem_warning || $swap_percent>$swap_warning) { 154 | if ($verbose) { print "WARNING: ".$verbose_output."|".$perfdata_output."\n";} 155 | else { print "WARNING: ".$output."|".$perfdata_output."\n";} 156 | exit $ERRORS{'WARNING'}; 157 | } else { 158 | if ($verbose) { print "OK: ".$verbose_output."|".$perfdata_output."\n";} 159 | else { print "OK: ".$output."|".$perfdata_output."\n";} 160 | exit $ERRORS{'OK'}; 161 | } 162 | 163 | sub sys_stats { 164 | my @memory = split(" ", `free -mt`); 165 | my $mem_total = $memory[7]; 166 | my $mem_used; 167 | if ( $DONT_INCLUDE_BUFFERS) { $mem_used = $memory[15]; } 168 | else { $mem_used = $memory[8];} 169 | my $swap_total = $memory[18]; 170 | my $swap_used = $memory[19]; 171 | my $mem_percent = ($mem_used / $mem_total) * 100; 172 | my $swap_percent; 173 | if ($swap_total == 0) { 174 | $swap_percent = 0; 175 | } else { 176 | $swap_percent = ($swap_used / $swap_total) * 100; 177 | } 178 | return (sprintf("%.0f",$mem_percent),$mem_total,$mem_used, sprintf("%.0f",$swap_percent),$swap_total,$swap_used); 179 | } 180 | 181 | sub print_usage () { 182 | print "Usage: $PROGNAME -w -c [-v] [-h]\n"; 183 | exit $ERRORS{'UNKNOWN'} unless ($opt_h); 184 | } 185 | 186 | sub print_help () { 187 | print_revision($PROGNAME,'$Revision: '.$PROGVER.' $'); 188 | print "Copyright (c) 2005 Garrett Honeycutt/Rouven Homann/Cedric Temple\n"; 189 | print "\n"; 190 | print_usage(); 191 | print "\n"; 192 | print "-w , = Memory and Swap usage to activate a warning message (eg: -w 90,25 ) .\n"; 193 | print "-c , = Memory and Swap usage to activate a critical message (eg: -c 95,50 ).\n"; 194 | print "-v = Verbose Output.\n"; 195 | print "-h = This screen.\n\n"; 196 | support(); 197 | } 198 | -------------------------------------------------------------------------------- /check_mem.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Check Memory Usage via `free -mt` 4 | 5 | ######################## 6 | # DECLARATIONS 7 | ######################## 8 | 9 | PROGNAME=`basename $0` 10 | REVISION=`echo '$Revision: 1.0 $' | sed -e 's/[^0-9.]//g'` 11 | 12 | DEBUG=0 13 | 14 | exitstatus=0 15 | result="" 16 | perfdata="" 17 | pctWarning="" 18 | pctCritical="" 19 | pctSwpWarning="" 20 | pctSwpCritical="" 21 | rawOutput=0 22 | 23 | status[0]="OK: " 24 | status[1]="WARNING: " 25 | status[2]="CRITICAL: " 26 | status[3]="UNKNOWN: " 27 | 28 | ######################## 29 | # FUNCTIONS 30 | ######################## 31 | 32 | print_usage() { 33 | echo "Usage: $PROGNAME [options]" 34 | echo " e.g. $PROGNAME -w 75 -c 95" 35 | echo 36 | echo "Options:" 37 | echo -e "\t --help | -h print help" 38 | echo -e "\t --version | -V print version" 39 | echo -e "\t --verbose | -v be verbose (debug mode)" 40 | echo -e "\t --raw | -r Use MB instead of % for output data" 41 | echo -e "\t -w [int] set warning value for physical RAM used %" 42 | echo -e "\t -c [int] set critical value for physical RAM used %" 43 | echo 44 | echo 45 | } 46 | 47 | print_help() { 48 | # print_revision $PROGNAME $REVISION 49 | echo "${PROGNAME} Revision: ${REVISION}" 50 | echo 51 | echo "This plugin checks local memory usage using 'free -mt' and 'ps axo comm,rss" 52 | echo 53 | print_usage 54 | echo 55 | # support 56 | exit 3 57 | } 58 | 59 | parse_options() { 60 | # parse cmdline arguments 61 | (( DEBUG )) && echo "Parsing options $1 $2 $3 $4 $5 $6 $7 $8" 62 | if [ "$#" -gt 0 ]; then 63 | while [ "$#" -gt 0 ]; do 64 | case "$1" in 65 | '--help'|'-h') 66 | print_help 67 | exit 3 68 | ;; 69 | '--version'|'-V') 70 | #print_revision $PROGNAME $REVISION 71 | echo "${PROGNAME} Revision: ${REVISION}" 72 | exit 3 73 | ;; 74 | '--verbose'|'-v') 75 | DEBUG=1 76 | shift 1 77 | ;; 78 | '--raw'|'-r') 79 | rawOutput=1 80 | shift 1 81 | ;; 82 | '-c') 83 | pctCritical="$2" 84 | shift 2 85 | ;; 86 | '-w') 87 | pctWarning="$2" 88 | shift 2 89 | ;; 90 | *) 91 | echo "Unknown option!" 92 | print_usage 93 | exit 3 94 | ;; 95 | esac 96 | done 97 | fi 98 | } 99 | 100 | ######################## 101 | # MAIN 102 | ######################## 103 | if ps axo comm,rss | grep java &> /dev/null; then 104 | MemUsedList=$(ps axo comm,rss | grep java | awk '{print $2}') 105 | for I in $MemUsedList; do 106 | javaUsed+=$I 107 | (( DEBUG )) && echo "javaUsed=$javaUsed" 108 | done 109 | else 110 | echo "Java was not started yet." 111 | exit 3 112 | fi 113 | 114 | parse_options $@ 115 | 116 | memory=$(free -mt) 117 | (( DEBUG )) && echo "memory=$memory" 118 | 119 | phyTotal=$(cut -d' ' -f 8 <<< $memory) 120 | (( DEBUG )) && echo "phyTotal=$phyTotal" 121 | phyShared=$(cut -d' ' -f 11 <<< $memory) 122 | (( DEBUG )) && echo "phyShared=$phyShared" 123 | phyBuffers=$(cut -d' ' -f 12 <<< $memory) 124 | (( DEBUG )) && echo "phyBuffers=$phyBuffers" 125 | phyCached=$(cut -d' ' -f 13 <<< $memory) 126 | (( DEBUG )) && echo "phyCached=$phyCached" 127 | phyUsed=$(cut -d' ' -f 16 <<< $memory) 128 | (( DEBUG )) && echo "phyUsed=$phyUsed" 129 | phyAllUsed=$(cut -d' ' -f 9 <<< $memory) 130 | (( DEBUG )) && echo "phyAllUsed=$phyAllUsed" 131 | 132 | pctPhyShared=$(bc <<< "scale=2;$phyShared*100/$phyTotal") 133 | (( DEBUG )) && echo "pctPhyShared=$pctPhyShared" 134 | pctPhyBuffers=$(bc <<< "scale=2;$phyBuffers*100/$phyTotal") 135 | (( DEBUG )) && echo "pctPhyBuffers=$pctPhyBuffers" 136 | pctPhyCached=$(bc <<< "scale=2;$phyCached*100/$phyTotal") 137 | (( DEBUG )) && echo "pctPhyCached=$pctPhyCached" 138 | pctPhyUsed=$(bc <<< "scale=2;$phyUsed*100/$phyTotal") 139 | (( DEBUG )) && echo "pctPhyUsed=$pctPhyUsed" 140 | pctPhyAllUsed=$(bc <<< "scale=2;$phyAllUsed*100/$phyTotal") 141 | (( DEBUG )) && echo "pctPhyAllUsed=$pctPhyAllUsed" 142 | 143 | (( DEBUG )) && echo "rawOutput=$rawOutput" 144 | (( DEBUG )) && echo "pctWarning=$pctWarning" 145 | (( DEBUG )) && echo "pctCritical=$pctCritical" 146 | 147 | if [ -n "$pctWarning" ]; then 148 | warning=$(bc <<< "scale=0;$pctWarning * $phyTotal / 100") 149 | (( DEBUG )) && echo "warning=$warning" 150 | if [ $(bc <<< "$javaUsed >= $pctWarning") -ne 0 ]; then 151 | exitstatus=1 152 | fi 153 | fi 154 | 155 | if [ -n "$pctCritical" ]; then 156 | critical=$(bc <<< "scale=0;$pctCritical * $phyTotal / 100") 157 | (( DEBUG )) && echo "critical=$critical" 158 | if [ $(bc <<< "$javaUsed >= $pctCritical") -ne 0 ]; then 159 | exitstatus=2 160 | fi 161 | fi 162 | 163 | result="Memory Usage - ${phyUsed}MB of ${phyTotal}MB RAM used" 164 | perfdata="phyUsed=${phyUsed};${warning};${critical};0;${phyTotal} phyShared=${phyShared};;;0;${phyTotal} phyBuffers=${phyBuffers};;;0;${phyTotal} phyCached=${phyCached};;;0;${phyTotal} phyAllUsed=${phyAllUsed};;;0;${phyTotal}" 165 | 166 | echo "${status[$exitstatus]}${result} | ${perfdata}" 167 | exit $exitstatus 168 | -------------------------------------------------------------------------------- /check_mem.sh.bak: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Check Memory Usage via `free -mt` 4 | 5 | ######################## 6 | # DECLARATIONS 7 | ######################## 8 | 9 | PROGNAME=`basename $0` 10 | REVISION=`echo '$Revision: 1.0 $' | sed -e 's/[^0-9.]//g'` 11 | 12 | DEBUG=0 13 | 14 | exitstatus=0 15 | result="" 16 | perfdata="" 17 | pctWarning="" 18 | pctCritical="" 19 | pctSwpWarning="" 20 | pctSwpCritical="" 21 | rawOutput=0 22 | 23 | status[0]="OK: " 24 | status[1]="WARNING: " 25 | status[2]="CRITICAL: " 26 | status[3]="UNKNOWN: " 27 | 28 | ######################## 29 | # FUNCTIONS 30 | ######################## 31 | 32 | print_usage() { 33 | echo "Usage: $PROGNAME [options]" 34 | echo " e.g. $PROGNAME -w 75 -W 25 -c 95 -C 75" 35 | echo 36 | echo "Options:" 37 | echo -e "\t --help | -h print help" 38 | echo -e "\t --version | -V print version" 39 | echo -e "\t --verbose | -v be verbose (debug mode)" 40 | echo -e "\t --raw | -r Use MB instead of % for output data" 41 | echo -e "\t -w [int] set warning value for physical RAM used %" 42 | echo -e "\t -c [int] set critical value for physical RAM used %" 43 | echo -e "\t -W [int] set warning value for swap used %" 44 | echo -e "\t -C [int] set critical value for swap used %" 45 | echo 46 | echo 47 | } 48 | 49 | print_help() { 50 | # print_revision $PROGNAME $REVISION 51 | echo "${PROGNAME} Revision: ${REVISION}" 52 | echo 53 | echo "This plugin checks local memory usage using 'free -mt'" 54 | echo 55 | print_usage 56 | echo 57 | # support 58 | exit 3 59 | } 60 | 61 | parse_options() { 62 | # parse cmdline arguments 63 | (( DEBUG )) && echo "Parsing options $1 $2 $3 $4 $5 $6 $7 $8" 64 | if [ "$#" -gt 0 ]; then 65 | while [ "$#" -gt 0 ]; do 66 | case "$1" in 67 | '--help'|'-h') 68 | print_help 69 | exit 3 70 | ;; 71 | '--version'|'-V') 72 | #print_revision $PROGNAME $REVISION 73 | echo "${PROGNAME} Revision: ${REVISION}" 74 | exit 3 75 | ;; 76 | '--verbose'|'-v') 77 | DEBUG=1 78 | shift 1 79 | ;; 80 | '--raw'|'-r') 81 | rawOutput=1 82 | shift 1 83 | ;; 84 | '-c') 85 | pctCritical="$2" 86 | shift 2 87 | ;; 88 | '-w') 89 | pctWarning="$2" 90 | shift 2 91 | ;; 92 | '-C') 93 | pctSwpCritical="$2" 94 | shift 2 95 | ;; 96 | '-W') 97 | pctSwpWarning="$2" 98 | shift 2 99 | ;; 100 | *) 101 | echo "Unknown option!" 102 | print_usage 103 | exit 3 104 | ;; 105 | esac 106 | done 107 | fi 108 | } 109 | 110 | ######################## 111 | # MAIN 112 | ######################## 113 | 114 | parse_options $@ 115 | 116 | memory=$(free -mt) 117 | (( DEBUG )) && echo "memory=$memory" 118 | 119 | phyTotal=$(cut -d' ' -f 8 <<< $memory) 120 | (( DEBUG )) && echo "phyTotal=$phyTotal" 121 | swpTotal=$(cut -d' ' -f 19 <<< $memory) 122 | (( DEBUG )) && echo "swpTotal=$swpTotal" 123 | phyShared=$(cut -d' ' -f 11 <<< $memory) 124 | (( DEBUG )) && echo "phyShared=$phyShared" 125 | phyBuffers=$(cut -d' ' -f 12 <<< $memory) 126 | (( DEBUG )) && echo "phyBuffers=$phyBuffers" 127 | phyCached=$(cut -d' ' -f 13 <<< $memory) 128 | (( DEBUG )) && echo "phyCached=$phyCached" 129 | phyUsed=$(cut -d' ' -f 16 <<< $memory) 130 | (( DEBUG )) && echo "phyUsed=$phyUsed" 131 | phyAllUsed=$(cut -d' ' -f 9 <<< $memory) 132 | (( DEBUG )) && echo "phyAllUsed=$phyAllUsed" 133 | swpUsed=$(cut -d' ' -f 20 <<< $memory) 134 | (( DEBUG )) && echo "swpUsed=$swpUsed" 135 | 136 | pctPhyShared=$(bc <<< "scale=2;$phyShared*100/$phyTotal") 137 | (( DEBUG )) && echo "pctPhyShared=$pctPhyShared" 138 | pctPhyBuffers=$(bc <<< "scale=2;$phyBuffers*100/$phyTotal") 139 | (( DEBUG )) && echo "pctPhyBuffers=$pctPhyBuffers" 140 | pctPhyCached=$(bc <<< "scale=2;$phyCached*100/$phyTotal") 141 | (( DEBUG )) && echo "pctPhyCached=$pctPhyCached" 142 | pctPhyUsed=$(bc <<< "scale=2;$phyUsed*100/$phyTotal") 143 | (( DEBUG )) && echo "pctPhyUsed=$pctPhyUsed" 144 | pctPhyAllUsed=$(bc <<< "scale=2;$phyAllUsed*100/$phyTotal") 145 | (( DEBUG )) && echo "pctPhyAllUsed=$pctPhyAllUsed" 146 | if [ $swpTotal -eq 0 ]; then 147 | pctSwpUsed=0 148 | else 149 | pctSwpUsed=$(bc <<< "scale=2;$swpUsed*100/$swpTotal") 150 | fi 151 | (( DEBUG )) && echo "pctSwpUsed=$pctSwpUsed" 152 | (( DEBUG )) && echo "rawOutput=$rawOutput" 153 | (( DEBUG )) && echo "pctWarning=$pctWarning" 154 | (( DEBUG )) && echo "pctCritical=$pctCritical" 155 | (( DEBUG )) && echo "pctSwpWarning=$pctSwpWarning" 156 | (( DEBUG )) && echo "pctSwpCritical=$pctSwpCritical" 157 | 158 | if [ -n "$pctWarning" ]; then 159 | warning=$(bc <<< "scale=0;$pctWarning * $phyTotal / 100") 160 | (( DEBUG )) && echo "warning=$warning" 161 | if [ $(bc <<< "$pctPhyUsed >= $pctWarning") -ne 0 ]; then 162 | exitstatus=1 163 | fi 164 | fi 165 | 166 | if [ -n "$pctSwpWarning" ]; then 167 | swpWarning=$(bc <<< "scale=0;$pctSwpWarning * $swpTotal / 100") 168 | (( DEBUG )) && echo "swpWarning=$swpWarning" 169 | if [ $(bc <<< "$pctSwpUsed >= $pctSwpWarning") -ne 0 ]; then 170 | exitstatus=1 171 | fi 172 | fi 173 | 174 | if [ -n "$pctCritical" ]; then 175 | critical=$(bc <<< "scale=0;$pctCritical * $phyTotal / 100") 176 | (( DEBUG )) && echo "critical=$critical" 177 | if [ $(bc <<< "$pctPhyUsed >= $pctCritical") -ne 0 ]; then 178 | exitstatus=2 179 | fi 180 | fi 181 | 182 | if [ -n "$pctSwpCritical" ]; then 183 | swpCritical=$(bc <<< "scale=0;$pctSwpCritical * $swpTotal / 100") 184 | (( DEBUG )) && echo "swpCritical=$swpCritical" 185 | if [ $(bc <<< "$pctSwpUsed >= $pctSwpCritical") -ne 0 ]; then 186 | exitstatus=2 187 | fi 188 | fi 189 | 190 | if [ $rawOutput -eq 1 ]; then 191 | result="Memory Usage - ${phyUsed}MB of ${phyTotal}MB RAM used, ${swpUsed}MB of ${swpTotal}MB Swap used" 192 | perfdata="phyUsed=${phyUsed};${warning};${critical};0;${phyTotal} phyShared=${phyShared};;;0;${phyTotal} phyBuffers=${phyBuffers};;;0;${phyTotal} phyCached=${phyCached};;;0;${phyTotal} phyAllUsed=${phyAllUsed};;;0;${phyTotal} swpUsed=${swpUsed};${swpWarning};${swpCritical};0;${swpTotal}" 193 | else 194 | result="Memory Usage - ${pctPhyUsed}% RAM, ${pctSwpUsed}% Swap" 195 | perfdata="phyUsed=${pctPhyUsed}%;${pctWarning};${pctCritical};0;100 phyShared=${pctPhyShared}%;;;0;100 phyBuffers=${pctPhyBuffers}%;;;0;100 phyCached=${pctPhyCached}%;;;0;100 phyAllUsed=${pctPhyAllUsed}%;;;0;100 swpUsed=${pctSwpUsed}%;${pctSwpWarning};${pctSwpCritical};0;100" 196 | fi 197 | 198 | echo "${status[$exitstatus]}${result} | ${perfdata}" 199 | exit $exitstatus 200 | 201 | -------------------------------------------------------------------------------- /corosync.txt: -------------------------------------------------------------------------------- 1 | 2 | 3 | 前提: 4 | 1)本配置共有两个测试节点,分别node1.magedu.com和node2.magedu.com,相的IP地址分别为172.16.100.11和172.16.100.12; 5 | 2)集群服务为apache的httpd服务; 6 | 3)提供web服务的地址为172.16.100.1; 7 | 4)系统为rhel5.8 8 | 9 | 1、准备工作 10 | 11 | 为了配置一台Linux主机成为HA的节点,通常需要做出如下的准备工作: 12 | 13 | 1)所有节点的主机名称和对应的IP地址解析服务可以正常工作,且每个节点的主机名称需要跟"uname -n“命令的结果保持一致;因此,需要保证两个节点上的/etc/hosts文件均为下面的内容: 14 | 172.16.100.11 node1.magedu.com node1 15 | 172.16.100.12 node2.magedu.com node2 16 | 17 | 为了使得重新启动系统后仍能保持如上的主机名称,还分别需要在各节点执行类似如下的命令: 18 | 19 | Node1: 20 | # sed -i 's@\(HOSTNAME=\).*@\1node1.magedu.com@g' /etc/sysconfig/network 21 | # hostname node1.magedu.com 22 | 23 | Node2: 24 | # sed -i 's@\(HOSTNAME=\).*@\1node2.magedu.com@g' /etc/sysconfig/network 25 | # hostname node2.magedu.com 26 | 27 | 2)设定两个节点可以基于密钥进行ssh通信,这可以通过类似如下的命令实现: 28 | Node1: 29 | # ssh-keygen -t rsa 30 | # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2 31 | 32 | Node2: 33 | # ssh-keygen -t rsa 34 | # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1 35 | 36 | 37 | 2、安装如下rpm包: 38 | libibverbs, librdmacm, lm_sensors, libtool-ltdl, openhpi-libs, openhpi, perl-TimeDate 39 | 40 | 3、安装corosync和pacemaker,首先下载所需要如下软件包至本地某专用目录(这里为/root/cluster): 41 | cluster-glue 42 | cluster-glue-libs 43 | heartbeat 44 | resource-agents 45 | corosync 46 | heartbeat-libs 47 | pacemaker 48 | corosynclib 49 | libesmtp 50 | pacemaker-libs 51 | 52 | 下载地址:http://clusterlabs.org/rpm/。请根据硬件平台及操作系统类型选择对应的软件包;这里建议每个软件包都使用目前最新的版本。 53 | 32bits rpm包下载地址: http://clusterlabs.org/rpm/epel-5/i386/ 54 | 64bits rpm包下载地址: http://clusterlabs.org/rpm/epel-5/x86_64/ 55 | 56 | 使用如下命令安装: 57 | # cd /root/cluster 58 | # yum -y --nogpgcheck localinstall *.rpm 59 | 60 | 4、配置corosync,(以下命令在node1.magedu.com上执行) 61 | 62 | # cd /etc/corosync 63 | # cp corosync.conf.example corosync.conf 64 | 65 | 接着编辑corosync.conf,添加如下内容: 66 | service { 67 | ver: 0 68 | name: pacemaker 69 | # use_mgmtd: yes 70 | } 71 | 72 | aisexec { 73 | user: root 74 | group: root 75 | } 76 | 77 | 并设定此配置文件中 bindnetaddr后面的IP地址为你的网卡所在网络的网络地址,我们这里的两个节点在172.16.0.0网络,因此这里将其设定为172.16.0.0;如下 78 | bindnetaddr: 172.16.0.0 79 | 80 | 生成节点间通信时用到的认证密钥文件: 81 | # corosync-keygen 82 | 83 | 将corosync和authkey复制至node2: 84 | # scp -p corosync authkey node2:/etc/corosync/ 85 | 86 | 分别为两个节点创建corosync生成的日志所在的目录: 87 | # mkdir /var/log/cluster 88 | # ssh node2 'mkdir /var/log/cluster' 89 | 90 | 5、尝试启动,(以下命令在node1上执行): 91 | 92 | # /etc/init.d/corosync start 93 | 94 | 查看corosync引擎是否正常启动: 95 | # grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages 96 | Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. 97 | Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. 98 | Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397. 99 | Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. 100 | Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. 101 | 102 | 查看初始化成员节点通知是否正常发出: 103 | # grep TOTEM /var/log/messages 104 | Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP). 105 | Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). 106 | Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [172.16.100.11] is now up. 107 | Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was formed. 108 | 109 | 检查启动过程中是否有错误产生: 110 | # grep ERROR: /var/log/messages | grep -v unpack_resources 111 | 112 | 查看pacemaker是否正常启动: 113 | # grep pcmk_startup /var/log/messages 114 | Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized 115 | Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup 116 | Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295 117 | Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9 118 | Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.magedu.com 119 | 120 | 如果上面命令执行均没有问题,接着可以执行如下命令启动node2上的corosync 121 | # ssh node2 -- /etc/init.d/corosync start 122 | 123 | 注意:启动node2需要在node1上使用如上命令进行,不要在node2节点上直接启动; 124 | 125 | 使用如下命令查看集群节点的启动状态: 126 | # crm status 127 | ============ 128 | Last updated: Tue Jun 14 19:07:06 2011 129 | Stack: openais 130 | Current DC: node1.magedu.com - partition with quorum 131 | Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 132 | 2 Nodes configured, 2 expected votes 133 | 0 Resources configured. 134 | ============ 135 | 136 | Online: [ node1.magedu.com node2.magedu.com ] 137 | 138 | 从上面的信息可以看出两个节点都已经正常启动,并且集群已经处于正常工作状态。 139 | 140 | 执行ps auxf命令可以查看corosync启动的各相关进程。 141 | root 4665 0.4 0.8 86736 4244 ? Ssl 17:00 0:04 corosync 142 | root 4673 0.0 0.4 11720 2260 ? S 17:00 0:00 \_ /usr/lib/heartbeat/stonithd 143 | 101 4674 0.0 0.7 12628 4100 ? S 17:00 0:00 \_ /usr/lib/heartbeat/cib 144 | root 4675 0.0 0.3 6392 1852 ? S 17:00 0:00 \_ /usr/lib/heartbeat/lrmd 145 | 101 4676 0.0 0.4 12056 2528 ? S 17:00 0:00 \_ /usr/lib/heartbeat/attrd 146 | 101 4677 0.0 0.5 8692 2784 ? S 17:00 0:00 \_ /usr/lib/heartbeat/pengine 147 | 101 4678 0.0 0.5 12136 3012 ? S 17:00 0:00 \_ /usr/lib/heartbeat/crmd 148 | 149 | 150 | 6、配置集群的工作属性,禁用stonith 151 | 152 | corosync默认启用了stonith,而当前集群并没有相应的stonith设备,因此此默认配置目前尚不可用,这可以通过如下命令验正: 153 | 154 | # crm_verify -L 155 | crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined 156 | crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option 157 | crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity 158 | Errors found during check: config not valid 159 | -V may provide more details 160 | 161 | 我们里可以通过如下命令先禁用stonith: 162 | # crm configure property stonith-enabled=false 163 | 164 | 使用如下命令查看当前的配置信息: 165 | # crm configure show 166 | node node1.magedu.com 167 | node node2.magedu.com 168 | property $id="cib-bootstrap-options" \ 169 | dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \ 170 | cluster-infrastructure="openais" \ 171 | expected-quorum-votes="2" \ 172 | stonith-enabled="false 173 | 174 | 从中可以看出stonith已经被禁用。 175 | 176 | 上面的crm,crm_verify命令是1.0后的版本的pacemaker提供的基于命令行的集群管理工具;可以在集群中的任何一个节点上执行。 177 | 178 | 7、为集群添加集群资源 179 | 180 | corosync支持heartbeat,LSB和ocf等类型的资源代理,目前较为常用的类型为LSB和OCF两类,stonith类专为配置stonith设备而用; 181 | 182 | 可以通过如下命令查看当前集群系统所支持的类型: 183 | 184 | # crm ra classes 185 | heartbeat 186 | lsb 187 | ocf / heartbeat pacemaker 188 | stonith 189 | 190 | 如果想要查看某种类别下的所用资源代理的列表,可以使用类似如下命令实现: 191 | # crm ra list lsb 192 | # crm ra list ocf heartbeat 193 | # crm ra list ocf pacemaker 194 | # crm ra list stonith 195 | 196 | # crm ra info [class:[provider:]]resource_agent 197 | 例如: 198 | # crm ra info ocf:heartbeat:IPaddr 199 | 200 | 8、接下来要创建的web集群创建一个IP地址资源,以在通过集群提供web服务时使用;这可以通过如下方式实现: 201 | 202 | 语法: 203 | primitive [:[:]] 204 | [params attr_list] 205 | [operations id_spec] 206 | [op op_type [=...] ...] 207 | 208 | op_type :: start | stop | monitor 209 | 210 | 例子: 211 | primitive apcfence stonith:apcsmart \ 212 | params ttydev=/dev/ttyS0 hostlist="node1 node2" \ 213 | op start timeout=60s \ 214 | op monitor interval=30m timeout=60s 215 | 216 | 应用: 217 | # crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=172.16.100.1 218 | 219 | 通过如下的命令执行结果可以看出此资源已经在node1.magedu.com上启动: 220 | # crm status 221 | ============ 222 | Last updated: Tue Jun 14 19:31:05 2011 223 | Stack: openais 224 | Current DC: node1.magedu.com - partition with quorum 225 | Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 226 | 2 Nodes configured, 2 expected votes 227 | 1 Resources configured. 228 | ============ 229 | 230 | Online: [ node1.magedu.com node2.magedu.com ] 231 | 232 | WebIP (ocf::heartbeat:IPaddr): Started node1.magedu.com 233 | 234 | 当然,也可以在node1上执行ifconfig命令看到此地址已经在eth0的别名上生效: 235 | # ifconfig 236 | eth0:0 Link encap:Ethernet HWaddr 00:0C:29:AA:DD:CF 237 | inet addr:172.16.100.1 Bcast:192.168.0.255 Mask:255.255.255.0 238 | UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 239 | Interrupt:67 Base address:0x2000 240 | 241 | 而后我们到node2上通过如下命令停止node1上的corosync服务: 242 | # ssh node1 -- /etc/init.d/corosync stop 243 | 244 | 查看集群工作状态: 245 | # crm status 246 | ============ 247 | Last updated: Tue Jun 14 19:37:23 2011 248 | Stack: openais 249 | Current DC: node2.magedu.com - partition WITHOUT quorum 250 | Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 251 | 2 Nodes configured, 2 expected votes 252 | 1 Resources configured. 253 | ============ 254 | 255 | Online: [ node2.magedu.com ] 256 | OFFLINE: [ node1.magedu.com ] 257 | 258 | 上面的信息显示node1.magedu.com已经离线,但资源WebIP却没能在node2.magedu.com上启动。这是因为此时的集群状态为"WITHOUT quorum",即已经失去了quorum,此时集群服务本身已经不满足正常运行的条件,这对于只有两节点的集群来讲是不合理的。因此,我们可以通过如下的命令来修改忽略quorum不能满足的集群状态检查: 259 | 260 | # crm configure property no-quorum-policy=ignore 261 | 262 | 片刻之后,集群就会在目前仍在运行中的节点node2上启动此资源了,如下所示: 263 | # crm status 264 | ============ 265 | Last updated: Tue Jun 14 19:43:42 2011 266 | Stack: openais 267 | Current DC: node2.magedu.com - partition WITHOUT quorum 268 | Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 269 | 2 Nodes configured, 2 expected votes 270 | 1 Resources configured. 271 | ============ 272 | 273 | Online: [ node2.magedu.com ] 274 | OFFLINE: [ node1.magedu.com ] 275 | 276 | WebIP (ocf::heartbeat:IPaddr): Started node2.magedu.com 277 | 278 | 好了,验正完成后,我们正常启动node1.magedu.com: 279 | # ssh node1 -- /etc/init.d/corosync start 280 | 281 | 正常启动node1.magedu.com后,集群资源WebIP很可能会重新从node2.magedu.com转移回node1.magedu.com。资源的这种在节点间每一次的来回流动都会造成那段时间内其无法正常被访问,所以,我们有时候需要在资源因为节点故障转移到其它节点后,即便原来的节点恢复正常也禁止资源再次流转回来。这可以通过定义资源的黏性(stickiness)来实现。在创建资源时或在创建资源后,都可以指定指定资源黏性。 282 | 283 | 资源黏性值范围及其作用: 284 | 0:这是默认选项。资源放置在系统中的最适合位置。这意味着当负载能力“较好”或较差的节点变得可用时才转移资源。此选项的作用基本等同于自动故障回复,只是资源可能会转移到非之前活动的节点上; 285 | 大于0:资源更愿意留在当前位置,但是如果有更合适的节点可用时会移动。值越高表示资源越愿意留在当前位置; 286 | 小于0:资源更愿意移离当前位置。绝对值越高表示资源越愿意离开当前位置; 287 | INFINITY:如果不是因节点不适合运行资源(节点关机、节点待机、达到migration-threshold 或配置更改)而强制资源转移,资源总是留在当前位置。此选项的作用几乎等同于完全禁用自动故障回复; 288 | -INFINITY:资源总是移离当前位置; 289 | 290 | 我们这里可以通过以下方式为资源指定默认黏性值: 291 | # crm configure rsc_defaults resource-stickiness=100 292 | 293 | 9、结合上面已经配置好的IP地址资源,将此集群配置成为一个active/passive模型的web(httpd)服务集群 294 | 295 | 为了将此集群启用为web(httpd)服务器集群,我们得先在各节点上安装httpd,并配置其能在本地各自提供一个测试页面。 296 | 297 | Node1: 298 | # yum -y install httpd 299 | # echo "

Node1.magedu.com

" > /var/www/html/index.html 300 | 301 | Node2: 302 | # yum -y install httpd 303 | # echo "

Node2.magedu.com

" > /var/www/html/index.html 304 | 305 | 而后在各节点手动启动httpd服务,并确认其可以正常提供服务。接着使用下面的命令停止httpd服务,并确保其不会自动启动(在两个节点各执行一遍): 306 | # /etc/init.d/httpd stop 307 | # chkconfig httpd off 308 | 309 | 310 | 接下来我们将此httpd服务添加为集群资源。将httpd添加为集群资源有两处资源代理可用:lsb和ocf:heartbeat,为了简单起见,我们这里使用lsb类型: 311 | 312 | 首先可以使用如下命令查看lsb类型的httpd资源的语法格式: 313 | # crm ra info lsb:httpd 314 | lsb:httpd 315 | 316 | Apache is a World Wide Web server. It is used to serve \ 317 | HTML files and CGI. 318 | 319 | Operations' defaults (advisory minimum): 320 | 321 | start timeout=15 322 | stop timeout=15 323 | status timeout=15 324 | restart timeout=15 325 | force-reload timeout=15 326 | monitor interval=15 timeout=15 start-delay=15 327 | 328 | 接下来新建资源WebSite: 329 | # crm configure primitive WebSite lsb:httpd 330 | 331 | 查看配置文件中生成的定义: 332 | node node1.magedu.com 333 | node node2.magedu.com 334 | primitive WebIP ocf:heartbeat:IPaddr \ 335 | params ip="172.16.100.1" 336 | primitive WebSite lsb:httpd 337 | property $id="cib-bootstrap-options" \ 338 | dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \ 339 | cluster-infrastructure="openais" \ 340 | expected-quorum-votes="2" \ 341 | stonith-enabled="false" \ 342 | no-quorum-policy="ignore" 343 | 344 | 查看资源的启用状态: 345 | # crm status 346 | ============ 347 | Last updated: Tue Jun 14 19:57:31 2011 348 | Stack: openais 349 | Current DC: node2.magedu.com - partition with quorum 350 | Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 351 | 2 Nodes configured, 2 expected votes 352 | 2 Resources configured. 353 | ============ 354 | 355 | Online: [ node1.magedu.com node2.magedu.com ] 356 | 357 | WebIP (ocf::heartbeat:IPaddr): Started node1.magedu.com 358 | WebSite (lsb:httpd): Started node2.magedu.com 359 | 360 | 从上面的信息中可以看出WebIP和WebSite有可能会分别运行于两个节点上,这对于通过此IP提供Web服务的应用来说是不成立的,即此两者资源必须同时运行在某节点上。 361 | 362 | 由此可见,即便集群拥有所有必需资源,但它可能还无法进行正确处理。资源约束则用以指定在哪些群集节点上运行资源,以何种顺序装载资源,以及特定资源依赖于哪些其它资源。pacemaker共给我们提供了三种资源约束方法: 363 | 1)Resource Location(资源位置):定义资源可以、不可以或尽可能在哪些节点上运行; 364 | 2)Resource Collocation(资源排列):排列约束用以定义集群资源可以或不可以在某个节点上同时运行; 365 | 3)Resource Order(资源顺序):顺序约束定义集群资源在节点上启动的顺序; 366 | 367 | 定义约束时,还需要指定分数。各种分数是集群工作方式的重要组成部分。其实,从迁移资源到决定在已降级集群中停止哪些资源的整个过程是通过以某种方式修改分数来实现的。分数按每个资源来计算,资源分数为负的任何节点都无法运行该资源。在计算出资源分数后,集群选择分数最高的节点。INFINITY(无穷大)目前定义为 1,000,000。加减无穷大遵循以下3个基本规则: 368 | 1)任何值 + 无穷大 = 无穷大 369 | 2)任何值 - 无穷大 = -无穷大 370 | 3)无穷大 - 无穷大 = -无穷大 371 | 372 | 定义资源约束时,也可以指定每个约束的分数。分数表示指派给此资源约束的值。分数较高的约束先应用,分数较低的约束后应用。通过使用不同的分数为既定资源创建更多位置约束,可以指定资源要故障转移至的目标节点的顺序。 373 | 374 | 因此,对于前述的WebIP和WebSite可能会运行于不同节点的问题,可以通过以下命令来解决: 375 | # crm configure colocation website-with-ip INFINITY: WebSite WebIP 376 | 377 | 接着,我们还得确保WebSite在某节点启动之前得先启动WebIP,这可以使用如下命令实现: 378 | # crm configure order httpd-after-ip mandatory: WebIP WebSite 379 | 380 | 此外,由于HA集群本身并不强制每个节点的性能相同或相近,所以,某些时候我们可能希望在正常时服务总能在某个性能较强的节点上运行,这可以通过位置约束来实现: 381 | # crm configure location prefer-node1 WebSite rule 200: node1 382 | 这条命令实现了将WebSite约束在node1上,且指定其分数为200; 383 | 384 | 385 | 386 | 387 | 388 | 389 | 补充知识: 390 | 多播地址(multicast address)即组播地址,是一组主机的标示符,它已经加入到一个多播组中。在以太网中,多播地址是一个48位的标示符,命名了一组应该在这个网络中应用接收到一个分组的站点。在IPv4中,它历史上被叫做D类地址,一种类型的IP地址,它的范围从224.0.0.0到239.255.255.255,或,等同的,在224.0.0.0/4。在IPv6,多播地址都有前缀ff00::/8。 391 | 392 | 多播是第一个字节的最低位为1的所有地址,例如01-12-0f-00-00-02。广播地址是全1的48位地址,也属于多播地址。但是广播又是多播中的特例,就像是正方形属于长方形,但是正方形有长方形没有的特点。 393 | 394 | 395 | 396 | 397 | 398 | colocation (collocation) 399 | 400 | This constraint expresses the placement relation between two or more resources. If there are more than two resources, then the constraint is called a resource set. Collocation resource sets have an extra attribute to allow for sets of resources which don’t depend on each other in terms of state. The shell syntax for such sets is to put resources in parentheses. 401 | 402 | Usage: 403 | 404 | colocation : [:] [:] ... 405 | Example: 406 | 407 | colocation dummy_and_apache -inf: apache dummy 408 | colocation c1 inf: A ( B C ) 409 | 410 | 411 | 412 | 413 | 414 | 415 | order 416 | 417 | This constraint expresses the order of actions on two resources or more resources. If there are more than two resources, then the constraint is called a resource set. Ordered resource sets have an extra attribute to allow for sets of resources whose actions may run in parallel. The shell syntax for such sets is to put resources in parentheses. 418 | 419 | Usage: 420 | 421 | order score-type: [:] [:] ... 422 | [symmetrical=] 423 | 424 | score-type :: advisory | mandatory | 425 | Example: 426 | 427 | order c_apache_1 mandatory: apache:start ip_1 428 | order o1 inf: A ( B C ) 429 | 430 | 431 | 432 | 433 | 434 | 435 | property 436 | 437 | Set the cluster (crm_config) options. 438 | 439 | Usage: 440 | 441 | property [$id=]