├── README.md ├── empty_dict.lua ├── output.lua └── record.lua /README.md: -------------------------------------------------------------------------------- 1 | ## 介绍 2 | 3 | 以前我们为nginx做统计,都是通过对日志的分析来完成.比较麻烦,现在基于ngx_lua插件,开发了实时统计站点状态的脚本,解放生产力. 4 | 5 | ## 功能 6 | 7 | - 支持分不同虚拟主机统计, 同一个虚拟主机下可以分不同的location统计. 8 | - 可以统计与query-times request-time status-code speed 相关的数据. 9 | 10 | 11 | ## 环境依赖 12 | 13 | - nginx + ngx_http_lua_module 14 | 15 | ## 安装 16 | 17 | ``` 18 | http://wiki.nginx.org/HttpLuaModule#Installation 19 | ``` 20 | 21 | ## 使用方法 22 | 23 | ### 添加全局字典 24 | 25 | 在nginx的配置中添加dict的初始化, 类似如下 26 | 27 | ``` 28 | lua_shared_dict log_dict 20M; 29 | lua_shared_dict result_dict 20M; 30 | ``` 31 | 32 | ### 为特定的location添加统计 33 | 34 | 只需要添加一句即可~~ 35 | 将lua脚本嵌套进nginx的配置中, 例如: 36 | 37 | ``` 38 | server { 39 | listen 8080; 40 | server_name weatherapi.market.xiaomi.com; 41 | access_log /home/work/nginx/logs/weatherapi.market.xiaomi.com.log milog; 42 | location / { 43 | proxy_set_header Host $host; 44 | proxy_set_header X-Forwarded-For $remote_addr; 45 | proxy_pass http://weatherapi.market.xiaomi.com_backend; 46 | 47 | log_by_lua_file ./site-enable/record.lua; 48 | } 49 | } 50 | 51 | ``` 52 | 53 | ### 输出结果 54 | 55 | 通过配置一个server, 使得可以通过curl获取到字典里的所有结果 56 | 57 | ``` 58 | server { 59 | listen 8080 default; 60 | server_name _; 61 | 62 | location / { 63 | return 404; 64 | } 65 | 66 | location /status { 67 | content_by_lua_file ./site-enable/output.lua; 68 | } 69 | 70 | location /empty_dict { 71 | content_by_lua_file ./site-enable/empty_dict.lua; 72 | } 73 | } 74 | ``` 75 | 76 | 可以通过如下命令获取 77 | 78 | ``` 79 | curl ip_addr:8080/status 80 | ``` 81 | 82 | ### 清理字典 83 | 运行一段时间之后, 字典会变大. 可以通过如下接口清理 84 | 85 | ``` 86 | curl ip_addr:8080/empty_dict 87 | ``` 88 | 89 | ### 支持的统计数据说明 90 | 91 | 目前支持统计以下数据,返回的原始数据类似于 92 | 93 | ``` 94 | 95 | -------------------------- 96 | key: weatherapi.market.xiaomi.com__upstream_time_10.0.3.32:8250_counter 97 | 0.375 98 | key: weatherapi.market.xiaomi.com__upstream_time_10.0.3.32:8250_nb_counter 99 | 124 100 | key: weatherapi.market.xiaomi.com__upstream_time_10.0.4.93:8250_counter 101 | 0.131 102 | key: weatherapi.market.xiaomi.com__upstream_time_10.0.4.93:8250_nb_counter 103 | 123 104 | key: weatherapi.market.xiaomi.com__upstream_time_10.20.12.49:8250_counter 105 | 0.081 106 | key: weatherapi.market.xiaomi.com__upstream_time_10.20.12.49:8250_nb_counter 107 | 127 108 | key: weatherapi.market.xiaomi.com__query_counter 109 | 500 110 | key: weatherapi.market.xiaomi.com__request_time_counter 111 | 0.683 112 | key: weatherapi.market.xiaomi.com__upstream_time_counter 113 | 0.683 114 | key: weatherapi.market.xiaomi.com__upstream_time_10.20.12.59:8250_counter 115 | 0.096 116 | key: weatherapi.market.xiaomi.com__upstream_time_10.20.12.59:8250_nb_counter 117 | 126 118 | key: weatherapi.market.xiaomi.com__bytes_sent_counter 119 | 81500 120 | 121 | ``` 122 | 123 | 其中 __ 用来分割虚拟主机(包含prefix)与后面的数据项,便于数据处理. 124 | counter表示此值一直在累加 125 | nb表示次数 126 | 127 | 128 | 可以得到的数据包括: query次数 request_time bytes_sent upstream_time 129 | 其中 upstream_time_10.20.12.49:8250_counter 表示到某个特定后端的upstrea_time耗时累加 130 | upstream_time_10.20.12.49:8250_nb_counter 表示到到某个特定后端的upstrea_time次数累加 131 | 132 | 133 | ## 如何处理数据 134 | 135 | ``` 136 | 因为采集到的数据大多都是counter值,需要监控系统支持对于delta的处理.目前我们公司的perf-counter监控系统支持简单运算。所以这个处理起来比较简单,对于没有这种系统的同学来说,需要自己处理数据,得到delta值以及更复杂的数据。 137 | 比如 delta(bytes_sent_counter)/delta(query_counter) 得到就是这段时间的http传输速度 138 | delta(upstream_time_10.20.12.49:8250_counter)/delta(upstream_time_10.20.12.49:8250_nb_counter) 得到的就是这个后端upstream_time的平均值 139 | ``` 140 | 141 | ## ToDo 142 | 143 | 对于percentile的支持是下一步的重点计划. 144 | 145 | 146 | ## Help! 147 | 联系 xiedanbo <xiedanbo@xiaomi.com> 148 | -------------------------------------------------------------------------------- /empty_dict.lua: -------------------------------------------------------------------------------- 1 | ---- 2 | local log_dict = ngx.shared.log_dict 3 | local result_dict = ngx.shared.result_dict 4 | ---- 将字典中每个key的值重置为0 5 | for k,v in pairs(result_dict:get_keys())do 6 | result_dict:set(v, 0) 7 | log_dict:set(v, 0) 8 | end 9 | -------------------------------------------------------------------------------- /output.lua: -------------------------------------------------------------------------------- 1 | ---- 2 | local log_dict = ngx.shared.log_dict 3 | local result_dict = ngx.shared.result_dict 4 | ---- 将字典中所有的值输出出来 5 | for k,v in pairs(result_dict:get_keys())do 6 | ngx.say("key: ", v) 7 | ngx.say(result_dict:get(v)) 8 | end 9 | -------------------------------------------------------------------------------- /record.lua: -------------------------------------------------------------------------------- 1 | 2 | ---- log_dict做临时记录用 result_dict记录最终需要采集的数据 3 | local log_dict = ngx.shared.log_dict 4 | local result_dict = ngx.shared.result_dict 5 | 6 | 7 | ---- 用server 和 location来作为标示. localtion通过对uri的处理拿到 8 | local server_name = ngx.var.server_name 9 | local uri = ngx.var.uri 10 | local var_prefix = server_name.."_" 11 | local m, err = ngx.re.match(uri, "(/[0-9a-zA-Z]+)(/[0-9a-zA-Z]+)", "ix") 12 | 13 | -- var_prefix是每个站点+location的标示, 以__作为分界, 便于数据处理 14 | if m then 15 | if m[1] then 16 | local var_prefix = server_name.."_"..m[1].."_" 17 | else 18 | local var_prefix = server_name.."_"..m[0].."_" 19 | end 20 | else 21 | var_prefix = server_name.."_" 22 | end 23 | 24 | 25 | ---- 请求次数统计, counter 26 | query_nb_var = var_prefix.."_query_counter" 27 | 28 | local newval, err = result_dict:incr(query_nb_var, 1) 29 | if not newval and err == "not found" then 30 | result_dict:add(query_nb_var, 0) 31 | result_dict:incr(query_nb_var, 1) 32 | end 33 | 34 | 35 | ---- request_time统计, counter 36 | request_time_var = var_prefix.."_request_time_counter" 37 | 38 | local request_time = tonumber(ngx.var.request_time) 39 | 40 | local sum = result_dict:get(request_time_var) or 0 41 | sum = sum + request_time 42 | result_dict:set(request_time_var, sum) 43 | 44 | 45 | ---- upstream_time统计, counter 46 | upstream_time_var = var_prefix.."_upstream_time_counter" 47 | 48 | local upstream_time = tonumber(ngx.var.upstream_response_time) 49 | 50 | local sum = result_dict:get(upstream_time_var) or 0 51 | sum = sum + upstream_time 52 | result_dict:set(upstream_time_var, sum) 53 | 54 | 55 | ---- upstream_time_addr统计, counter 56 | local upstream_addr = ngx.var.upstream_addr 57 | upstream_time_addr_var = var_prefix.."_upstream_time_"..upstream_addr.."_counter" 58 | 59 | local upstream_time_addr = tonumber(ngx.var.upstream_response_time) 60 | 61 | local sum = result_dict:get(upstream_time_addr_var) or 0 62 | sum = sum + upstream_time_addr 63 | result_dict:set(upstream_time_addr_var, sum) 64 | 65 | -- upstream_time_addr记录query次数的累加器, counter 66 | upstream_time_addr_nb_var = var_prefix.."_upstream_time_"..upstream_addr.."_nb_counter" 67 | 68 | local newval, err = result_dict:incr(upstream_time_addr_nb_var, 1) 69 | if not newval and err == "not found" then 70 | result_dict:add(upstream_time_addr_nb_var, 0) 71 | result_dict:incr(upstream_time_addr_nb_var, 1) 72 | end 73 | 74 | 75 | 76 | ---- bytes_sent累加, 便于做speed统计, counter 77 | bytes_sent_var = var_prefix.."_bytes_sent_counter" 78 | 79 | local bytes_sent = tonumber(ngx.var.bytes_sent) 80 | 81 | local sum = result_dict:get(bytes_sent_var) or 0 82 | sum = sum + bytes_sent 83 | result_dict:set(bytes_sent_var, sum) 84 | 85 | 86 | ---- 状态码统计, 4xx, 5xx, counter 87 | local status_code = tonumber(ngx.var.status) 88 | status_code_4xx_nb_var = var_prefix.."_status_code_4xx_counter" 89 | status_code_5xx_nb_var = var_prefix.."_status_code_5xx_counter" 90 | 91 | local status_code_4xx_nb = result_dict:get(status_code_4xx_nb_var) or 0 92 | if status_code >= 400 then 93 | local newval, err = result_dict:incr(status_code_4xx_nb_var, 1) 94 | if not newval and err == "not found" then 95 | result_dict:add(status_code_4xx_nb_var, 0) 96 | result_dict:incr(status_code_4xx_nb_var, 1) 97 | end 98 | end 99 | 100 | local status_code_5xx_nb = result_dict:get(status_code_5xx_nb_var ) or 0 101 | if status_code >= 500 then 102 | local newval, err = result_dict:incr(status_code_5xx_nb_var, 1) 103 | if not newval and err == "not found" then 104 | result_dict:add(status_code_5xx_nb_var, 0) 105 | result_dict:incr(status_code_5xx_nb_var, 1) 106 | end 107 | end 108 | --------------------------------------------------------------------------------