├── README ├── config ├── nginx.patch ├── ngx_http_healthcheck_module.c ├── ngx_http_healthcheck_module.h └── sample_ngx_config.conf /README: -------------------------------------------------------------------------------- 1 | # Update 2 | 3 | This module is no longer maintained. I recommend using https://github.com/yaoweibin/nginx_upstream_check_module instead. 4 | 5 | If you're curious about how this module used to work, read ahead: 6 | 7 | 8 | Healthcheck plugin for nginx. It polls backends and if they respond with 9 | HTTP 200 + an optional request body, they are marked good. Otherwise, they 10 | are marked bad. Similar to haproxy/varnish health checks. 11 | 12 | For help on all the options, see the docblocks inside the .c file where each 13 | option is defined. 14 | 15 | Note this also gives you access to a health status page that lets you see 16 | how well your healthcheck are doing. 17 | 18 | 19 | ==Important= 20 | Nginx gives you full freedom which server peer to pick when you write an 21 | upstream. This means that the healthchecking plugin is only a tool that 22 | other upstreams must know about to use. So your upstream code MUST SUPPORT 23 | HEALTHCHECKS. It's actually pretty easy to modify the code to support them. 24 | 25 | See the .h file for how as well as the upstream_hash patch which shows 26 | how to modify upstream_hash to support healthchecks. 27 | 28 | For an example plugin modified to support healthchecks, see my modifications 29 | to the upstream_hash plugin here: 30 | 31 | http://github.com/cep21/nginx_upstream_hash/tree/support_http_healthchecks 32 | 33 | ==Limitations== 34 | The module only supports HTTP 1.0, not 1.1. What that really means is it 35 | doesn't understand chunked encoding. You should ask for a 1.0 reponse with 36 | your healthcheck, unless you're sure the upstream won't send back chunked 37 | encoding. See the sample config for an example. 38 | 39 | ==INSTALL== 40 | # Similar to the upstream_hash module 41 | 42 | cd nginx-0.7.62 # or whatever 43 | patch -p1 < /path/to/this/directory/nginx.patch 44 | ./configure --add-module=/path/to/this/directory 45 | make 46 | make install 47 | 48 | ==How the module works== 49 | My first attempt was to spawn a pthread inside the master process, but nginx 50 | freaks out on all kinds of levels when you try to have multiple threads 51 | running at the same time. Then I thought, fine I'll just fork my own child. 52 | But that caused lots of issues when I tried to HUP the master process because 53 | my own child wasn't getting signals. I was thinking to myself, these just 54 | don't feel like the nginx way of doing things. So, I figured I would just 55 | work directly with the worker process model. 56 | 57 | When each worker process starts, they add an repeating event to the event 58 | tree asking for ownership of a server's healthcheck. When that ownership 59 | event comes up, they lock the server's healthcheck and try to claim it with 60 | their pid. If the process can't claim it, then it retries to claim the 61 | healthcheck later, cause maybe the worker that does own it dies or something. 62 | 63 | For the worker that does own it, it inserts a healthcheck event into nginx's 64 | event tree. When that triggers, then it starts a peer connection to the 65 | server and goes to town sending and getting data. When the healthcheck 66 | finishes, or times out, it updates the shared memory structure and signals for 67 | another healthcheck later. 68 | 69 | A few random issues I had were: 70 | 1) When nginx tries to shut down, it waits for the event tree to empty out. 71 | To get around this, I check for ngx_quit and all kinds of other variables. 72 | This means that when you do HUP nginx, your worker needs to sit around doing 73 | nothing until *something* in the healthcheck event tree comes up, after which 74 | it can clear all the healthcheck events and move on. I could fix this if 75 | nginx added a per module callback on HUP. Maybe a 'cleanup' or something. 76 | The current exit_process callback is called after the event tree is empty, not 77 | after a request to shutdown a worker. 78 | 79 | ==Extending== 80 | It should be very easy to extend this module to work with fastcgi or even 81 | generic TCP backends. You would need to just change, or abstract out, 82 | ngx_http_healthcheck_process_recv. Patches that do that are welcome, and I'm 83 | happy to help out with any questions. I'm also happy to help out with 84 | extending your upstream picking modules to work with healthchecks as well. 85 | Your code can even be no healthcheck compatable by surrounding the changes 86 | with #if (NGX_HTTP_HEALTHCHECK) 87 | 88 | ==Config== 89 | See sample_ngx_config.conf 90 | 91 | Author: Jack Lindamood 92 | 93 | ==License== 94 | 95 | Apache License, Version 2.0 96 | -------------------------------------------------------------------------------- /config: -------------------------------------------------------------------------------- 1 | ngx_addon_name=ngx_http_healthcheck_module 2 | HTTP_INCS="$HTTP_INCS $ngx_addon_dir" 3 | HTTP_MODULES="$HTTP_MODULES ngx_http_healthcheck_module" 4 | NGX_ADDON_SRCS="$NGX_ADDON_SRCS $ngx_addon_dir/ngx_http_healthcheck_module.c" 5 | have=NGX_HTTP_HEALTHCHECK . auto/have 6 | -------------------------------------------------------------------------------- /nginx.patch: -------------------------------------------------------------------------------- 1 | diff -ru nginx-1.0.10/src/http/ngx_http_upstream.c nginx-1.0.10-patched/src/http/ngx_http_upstream.c 2 | --- nginx-1.0.10/src/http/ngx_http_upstream.c 2011-11-01 10:18:10.000000000 -0400 3 | +++ nginx-1.0.10-patched/src/http/ngx_http_upstream.c 2011-11-30 20:34:10.000000000 -0500 4 | @@ -4293,6 +4293,17 @@ 5 | uscf->line = cf->conf_file->line; 6 | uscf->port = u->port; 7 | uscf->default_port = u->default_port; 8 | +#if (NGX_HTTP_HEALTHCHECK) 9 | + uscf->healthcheck_enabled = 0; 10 | + uscf->health_delay = 10000; 11 | + uscf->health_timeout = 2000; 12 | + uscf->health_failcount = 2; 13 | + uscf->health_buffersize = 1000; 14 | + uscf->health_send.data = (u_char*)""; 15 | + uscf->health_send.len = 0; 16 | + uscf->health_expected.len = NGX_CONF_UNSET_SIZE; 17 | + uscf->health_expected.data = NGX_CONF_UNSET_PTR; 18 | +#endif 19 | 20 | if (u->naddrs == 1) { 21 | uscf->servers = ngx_array_create(cf->pool, 1, 22 | Only in nginx-1.0.10-patched/src/http: ngx_http_upstream.c.orig 23 | diff -ru nginx-1.0.10/src/http/ngx_http_upstream.h nginx-1.0.10-patched/src/http/ngx_http_upstream.h 24 | --- nginx-1.0.10/src/http/ngx_http_upstream.h 2011-11-01 10:18:10.000000000 -0400 25 | +++ nginx-1.0.10-patched/src/http/ngx_http_upstream.h 2011-11-30 20:34:10.000000000 -0500 26 | @@ -109,6 +109,24 @@ 27 | 28 | ngx_array_t *servers; /* ngx_http_upstream_server_t */ 29 | 30 | +#if (NGX_HTTP_HEALTHCHECK) 31 | + // If true, healthchecking is enabled for this upstream 32 | + unsigned healthcheck_enabled:1; 33 | + // Delay between healthchecks (in sec) 34 | + time_t health_delay; 35 | + // Total time a healthcheck is allowed to execute 36 | + ngx_msec_t health_timeout; 37 | + // Number of good/bad results indicate the node is up/down 38 | + ngx_int_t health_failcount; 39 | + // Size of the body+headers buffer 40 | + ngx_int_t health_buffersize; 41 | + // What is sent to initiate the healthcheck 42 | + ngx_str_t health_send; 43 | + // Expected from healthcheck, excluding headers 44 | + ngx_str_t health_expected; 45 | +#endif 46 | + 47 | + 48 | ngx_uint_t flags; 49 | ngx_str_t host; 50 | u_char *file_name; 51 | Only in nginx-1.0.10-patched/src/http: ngx_http_upstream.h.orig 52 | diff -ru nginx-1.0.10/src/http/ngx_http_upstream_round_robin.c nginx-1.0.10-patched/src/http/ngx_http_upstream_round_robin.c 53 | --- nginx-1.0.10/src/http/ngx_http_upstream_round_robin.c 2011-09-30 10:30:01.000000000 -0400 54 | +++ nginx-1.0.10-patched/src/http/ngx_http_upstream_round_robin.c 2011-11-30 20:34:10.000000000 -0500 55 | @@ -4,6 +4,8 @@ 56 | */ 57 | 58 | 59 | +/* on top, so it won't collide with ngx_supervisord's patch */ 60 | +#include 61 | #include 62 | #include 63 | #include 64 | @@ -12,7 +14,8 @@ 65 | static ngx_int_t ngx_http_upstream_cmp_servers(const void *one, 66 | const void *two); 67 | static ngx_uint_t 68 | -ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers); 69 | +ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers, 70 | + ngx_log_t * log); 71 | 72 | #if (NGX_HTTP_SSL) 73 | 74 | @@ -32,6 +35,7 @@ 75 | ngx_uint_t i, j, n; 76 | ngx_http_upstream_server_t *server; 77 | ngx_http_upstream_rr_peers_t *peers, *backup; 78 | + ngx_int_t health_index; 79 | 80 | us->peer.init = ngx_http_upstream_init_round_robin_peer; 81 | 82 | @@ -66,6 +70,14 @@ 83 | continue; 84 | } 85 | 86 | + /* on top, so it won't collide with ngx_supervisord's patch */ 87 | + health_index = ngx_http_healthcheck_add_peer(us, 88 | + &server[i].addrs[j], cf->pool); 89 | + if (health_index == NGX_ERROR) { 90 | + return NGX_ERROR; 91 | + } 92 | + peers->peer[n].health_index = health_index; 93 | + 94 | peers->peer[n].sockaddr = server[i].addrs[j].sockaddr; 95 | peers->peer[n].socklen = server[i].addrs[j].socklen; 96 | peers->peer[n].name = server[i].addrs[j].name; 97 | @@ -377,6 +389,7 @@ 98 | ngx_connection_t *c; 99 | ngx_http_upstream_rr_peer_t *peer; 100 | ngx_http_upstream_rr_peers_t *peers; 101 | + ngx_int_t healthy; 102 | 103 | ngx_log_debug1(NGX_LOG_DEBUG_HTTP, pc->log, 0, 104 | "get rr peer, try: %ui", pc->tries); 105 | @@ -422,7 +435,7 @@ 106 | i = pc->tries; 107 | 108 | for ( ;; ) { 109 | - rrp->current = ngx_http_upstream_get_peer(rrp->peers); 110 | + rrp->current = ngx_http_upstream_get_peer(rrp->peers, pc->log); 111 | 112 | ngx_log_debug2(NGX_LOG_DEBUG_HTTP, pc->log, 0, 113 | "get rr peer, current: %ui %i", 114 | @@ -483,7 +496,11 @@ 115 | 116 | peer = &rrp->peers->peer[rrp->current]; 117 | 118 | - if (!peer->down) { 119 | + healthy = !ngx_http_healthcheck_is_down( 120 | + peer->health_index, 121 | + pc->log); 122 | + 123 | + if ((!peer->down) && healthy) { 124 | 125 | if (peer->max_fails == 0 126 | || peer->fails < peer->max_fails) 127 | @@ -588,12 +605,14 @@ 128 | 129 | 130 | static ngx_uint_t 131 | -ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers) 132 | +ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers, ngx_log_t *log) 133 | { 134 | ngx_uint_t i, n, reset = 0; 135 | ngx_http_upstream_rr_peer_t *peer; 136 | + ngx_uint_t health_check_rounds; 137 | 138 | peer = &peers->peer[0]; 139 | + health_check_rounds = 2; 140 | 141 | for ( ;; ) { 142 | 143 | @@ -613,6 +632,11 @@ 144 | continue; 145 | } 146 | 147 | + if (health_check_rounds && ngx_http_healthcheck_is_down( 148 | + peer[i].health_index, 149 | + log)) 150 | + continue; 151 | + 152 | if (peer[n].current_weight * 1000 / peer[i].current_weight 153 | > peer[n].weight * 1000 / peer[i].weight) 154 | { 155 | @@ -633,6 +657,9 @@ 156 | return 0; 157 | } 158 | 159 | + if (health_check_rounds) 160 | + --health_check_rounds; 161 | + 162 | for (i = 0; i < peers->number; i++) { 163 | peer[i].current_weight = peer[i].weight; 164 | } 165 | Only in nginx-1.0.10-patched/src/http: ngx_http_upstream_round_robin.c.orig 166 | diff -ru nginx-1.0.10/src/http/ngx_http_upstream_round_robin.h nginx-1.0.10-patched/src/http/ngx_http_upstream_round_robin.h 167 | --- nginx-1.0.10/src/http/ngx_http_upstream_round_robin.h 2009-11-02 07:41:56.000000000 -0500 168 | +++ nginx-1.0.10-patched/src/http/ngx_http_upstream_round_robin.h 2011-11-30 20:34:10.000000000 -0500 169 | @@ -26,6 +26,7 @@ 170 | 171 | ngx_uint_t max_fails; 172 | time_t fail_timeout; 173 | + ngx_int_t health_index; 174 | 175 | ngx_uint_t down; /* unsigned down:1; */ 176 | 177 | -------------------------------------------------------------------------------- /ngx_http_healthcheck_module.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Does health checks of servers in an upstream 3 | * 4 | * Author: Jack Lindamood 5 | * 6 | */ 7 | 8 | #include 9 | #include 10 | #include 11 | #include 12 | #ifdef NGX_SUPERVISORD_MODULE 13 | #include 14 | #if (NGX_SUPERVISORD_API_VERSION != 2) 15 | #error "ngx_http_upstream_fair_module requires NGX_SUPERVISORD_API v2" 16 | #endif 17 | #endif 18 | 19 | #if (!NGX_HAVE_ATOMIC_OPS) 20 | #error "Healthcheck module only works with atomic ops" 21 | #endif 22 | 23 | typedef enum { 24 | // In progress states 25 | NGX_HEALTH_UNINIT_STATE = 0, 26 | NGX_HEALTH_WAITING, 27 | NGX_HEALTH_SENDING_CHECK, 28 | NGX_HEALTH_READING_STAT_LINE, 29 | NGX_HEALTH_READING_STAT_CODE, 30 | NGX_HEALTH_READING_HEADER, 31 | NGX_HEALTH_HEADER_ALMOST_DONE, 32 | NGX_HEALTH_READING_BODY, 33 | // Good + final states 34 | NGX_HEALTH_OK = 100, 35 | // bad + final states 36 | NGX_HEALTH_BAD_HEADER = 200, 37 | NGX_HEALTH_BAD_STATUS, 38 | NGX_HEALTH_BAD_BODY, 39 | NGX_HEALTH_BAD_STATE, 40 | NGX_HEALTH_BAD_CONN, 41 | NGX_HEALTH_BAD_CODE, 42 | NGX_HEALTH_TIMEOUT, 43 | NGX_HEALTH_FULL_BUFFER, 44 | NGX_HEALTH_EARLY_CLOSE 45 | } ngx_http_health_state; 46 | 47 | typedef struct { 48 | // Worker pid processing this healthcheck 49 | ngx_pid_t owner; 50 | // matches the non shared memory index 51 | ngx_uint_t index; 52 | // Last time any action (read/write/timeout) was taken on this structure 53 | ngx_msec_t action_time; 54 | // Number of concurrent bad or good responses 55 | ngx_int_t concurrent; 56 | // How long this server's been concurrently bad or good 57 | ngx_msec_t since; 58 | // If true, the server's last response was bad 59 | unsigned last_down:1; 60 | // Code (above ngx_http_health_state) of last finished check 61 | ngx_http_health_state down_code; 62 | // Used so multiple processes don't try to healthcheck the same peer 63 | ngx_atomic_t lock; 64 | /** 65 | * If true, the server is actually down. This is 66 | * different than last_down because a server needs 67 | * X concurrent good or bad connections to actually 68 | * be down 69 | */ 70 | ngx_atomic_t down; 71 | } ngx_http_healthcheck_status_shm_t; 72 | 73 | 74 | typedef struct { 75 | // Upstream this peer belongs to 76 | ngx_http_upstream_srv_conf_t *conf; 77 | // The peer to check 78 | #if defined(nginx_version) && nginx_version >= 8022 79 | ngx_addr_t *peer; 80 | #else 81 | ngx_peer_addr_t *peer; 82 | #endif 83 | // Index of the peer. Matches shm segment and is used for 'down' checking 84 | // by external clients 85 | ngx_uint_t index; 86 | // Current state of the healthcheck. Different than shm->down_state 87 | // because this is an active state and that is a finisehd state. 88 | ngx_http_health_state state; 89 | // Connection to the peer. We reuse this memory each healthcheck, but 90 | // memset zero it 91 | ngx_peer_connection_t *pc; 92 | // When the check began so we can diff it with action_time and time the 93 | // check out 94 | ngx_msec_t check_start_time; 95 | // Event that triggers a health check 96 | ngx_event_t health_ev; 97 | // Event that triggers an attempt at ownership of this healthcheck 98 | ngx_event_t ownership_ev; 99 | ngx_buf_t *read_buffer; 100 | // Where I am reading the entire connection, headers + body 101 | ssize_t read_pos; 102 | // Where I am in conf->health_expected (the body only) 103 | ssize_t body_read_pos; 104 | // Where I am in conf->health_send 105 | ssize_t send_pos; 106 | // HTTP status code returned (200, 404, etc) 107 | ngx_uint_t stat_code; 108 | ngx_http_healthcheck_status_shm_t *shm; 109 | } ngx_http_healthcheck_status_t; 110 | 111 | // This one is not shared. Created when the config is parsed 112 | static ngx_array_t *ngx_http_healthchecks_arr; 113 | // This is the same as the above data ->elts. For ease of use 114 | #define ngx_http_healthchecks \ 115 | ((ngx_http_healthcheck_status_t*) ngx_http_healthchecks_arr->elts) 116 | static ngx_http_healthcheck_status_shm_t *ngx_http_healthchecks_shm; 117 | 118 | static ngx_int_t ngx_http_healthcheck_init(ngx_conf_t *cf); 119 | static char* ngx_http_healthcheck_enabled(ngx_conf_t *cf, ngx_command_t *cmd, 120 | void *conf); 121 | static char* ngx_http_healthcheck_delay(ngx_conf_t *cf, ngx_command_t *cmd, 122 | void *conf); 123 | static char* ngx_http_healthcheck_timeout(ngx_conf_t *cf, ngx_command_t *cmd, 124 | void *conf); 125 | static char* ngx_http_healthcheck_failcount(ngx_conf_t *cf, ngx_command_t *cmd, 126 | void *conf); 127 | static char* ngx_http_healthcheck_send(ngx_conf_t *cf, ngx_command_t *cmd, 128 | void *conf); 129 | static char* ngx_http_healthcheck_expected(ngx_conf_t *cf, ngx_command_t *cmd, 130 | void *conf); 131 | static char* ngx_http_healthcheck_buffer(ngx_conf_t *cf, ngx_command_t *cmd, 132 | void *conf); 133 | static char* ngx_http_set_healthcheck_status(ngx_conf_t *cf, ngx_command_t *cmd, 134 | void*conf); 135 | static ngx_int_t ngx_http_healthcheck_procinit(ngx_cycle_t *cycle); 136 | static ngx_int_t ngx_http_healthcheck_preconfig(ngx_conf_t *cf); 137 | static ngx_int_t ngx_http_healthcheck_init_zone(ngx_shm_zone_t *shm_zone, 138 | void *data); 139 | static ngx_int_t ngx_http_healthcheck_process_recv( 140 | ngx_http_healthcheck_status_t *stat); 141 | static char* ngx_http_healthcheck_statestr( 142 | ngx_http_health_state state); 143 | 144 | // I really wish there was a way to make nginx call this when you HUP the 145 | // master 146 | void ngx_http_healthcheck_clear_events(ngx_log_t *log); 147 | 148 | static ngx_command_t ngx_http_healthcheck_commands[] = { 149 | /** 150 | * If mentioned, enable healthchecks for this upstream 151 | */ 152 | { ngx_string("healthcheck_enabled"), 153 | NGX_HTTP_UPS_CONF|NGX_CONF_NOARGS, 154 | ngx_http_healthcheck_enabled, 155 | 0, 156 | 0, 157 | NULL }, 158 | /** 159 | * Delay in msec between healthchecks for a single peer 160 | */ 161 | { ngx_string("healthcheck_delay"), 162 | NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1, 163 | ngx_http_healthcheck_delay, 164 | 0, 165 | 0, 166 | NULL } , 167 | /** 168 | * How long in msec a healthcheck is allowed to take place 169 | */ 170 | { ngx_string("healthcheck_timeout"), 171 | NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1, 172 | ngx_http_healthcheck_timeout, 173 | 0, 174 | 0, 175 | NULL }, 176 | /** 177 | * Number of healthchecks good or bad in a row it takes to switch from 178 | * down to up and back. Good to prevent flapping 179 | */ 180 | { ngx_string("healthcheck_failcount"), 181 | NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1, 182 | ngx_http_healthcheck_failcount, 183 | 0, 184 | 0, 185 | NULL } , 186 | /** 187 | * What to send for the healthcheck. Each argument is appended by \r\n 188 | * and the entire thing is suffixed with another \r\n. For example, 189 | * 190 | * healthcheck_send 'GET /health HTTP/1.1' 191 | * 'Host: www.facebook.com' 'Connection: close'; 192 | * 193 | * Note that you probably want to end your health check with some directive 194 | * that closes the connection, like Connection: close. 195 | * 196 | */ 197 | { ngx_string("healthcheck_send"), 198 | NGX_HTTP_UPS_CONF|NGX_CONF_1MORE, 199 | ngx_http_healthcheck_send, 200 | 0, 201 | 0, 202 | NULL }, 203 | /** 204 | * What to expect in the HTTP BODY, (meaning not the headers), in a correct 205 | * response 206 | */ 207 | { ngx_string("healthcheck_expected"), 208 | NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1, 209 | ngx_http_healthcheck_expected, 210 | 0, 211 | 0, 212 | NULL }, 213 | /** 214 | * How big a buffer to use for the health check. Remember to include 215 | * headers PLUS body, not just body. 216 | */ 217 | { ngx_string("healthcheck_buffer"), 218 | NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1, 219 | ngx_http_healthcheck_buffer, 220 | 0, 221 | 0, 222 | NULL }, 223 | /** 224 | * When inside a /location block, replaced the HTTP body with backend 225 | * health status. Use similarly to the stub_status module 226 | */ 227 | { ngx_string("healthcheck_status"), 228 | NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_NOARGS, 229 | ngx_http_set_healthcheck_status, 230 | 0, 231 | 0, 232 | NULL }, 233 | ngx_null_command 234 | }; 235 | 236 | 237 | // Note: I tried using the "create server configuration" section rather than 238 | // patching the nginx code, but it didn't work. When you set the options 239 | // you're in a different config context than when you use them in the upstream. 240 | // It's very strange and unintuitive, but it's nginx 241 | 242 | static ngx_http_module_t ngx_http_healthcheck_module_ctx = { 243 | ngx_http_healthcheck_preconfig, /* preconfiguration */ 244 | ngx_http_healthcheck_init, /* postconfiguration */ 245 | 246 | NULL, /* create main configuration */ 247 | NULL, /* init main configuration */ 248 | 249 | NULL, /* create server configuration */ 250 | NULL, /* merge server configuration */ 251 | 252 | NULL, /* create location configuration */ 253 | NULL /* merge location configuration */ 254 | }; 255 | 256 | ngx_module_t ngx_http_healthcheck_module = { 257 | NGX_MODULE_V1, 258 | &ngx_http_healthcheck_module_ctx, /* module context */ 259 | ngx_http_healthcheck_commands, /* module directives */ 260 | NGX_HTTP_MODULE, /* module type */ 261 | NULL, /* init master */ 262 | NULL, /* init module */ 263 | ngx_http_healthcheck_procinit, /* init process */ 264 | NULL, /* init thread */ 265 | NULL, /* exit thread */ 266 | NULL, /* exit process */ 267 | NULL, /* exit master */ 268 | NGX_MODULE_V1_PADDING 269 | }; 270 | 271 | 272 | void ngx_http_healthcheck_mark_finished(ngx_http_healthcheck_status_t *stat) { 273 | #ifdef NGX_SUPERVISORD_MODULE 274 | ngx_http_upstream_rr_peers_t *peers = stat->conf->peer.data; 275 | #endif 276 | ngx_log_debug2(NGX_LOG_DEBUG_HTTP, stat->health_ev.log, 0, 277 | "healthcheck: Finished %V, state %d", &stat->peer->name, 278 | stat->state); 279 | if (stat->state == NGX_HEALTH_OK) { 280 | if (stat->shm->last_down) { 281 | stat->shm->last_down = 0; 282 | stat->shm->concurrent = 1; 283 | stat->shm->since = ngx_current_msec; 284 | #ifdef NGX_SUPERVISORD_MODULE 285 | (void) ngx_supervisord_execute(stat->conf, 286 | NGX_SUPERVISORD_CMD_START, 287 | peers->peer[stat->index].onumber, 288 | NULL); 289 | #endif 290 | } else { 291 | stat->shm->concurrent++; 292 | } 293 | } else { 294 | if (stat->shm->last_down) { 295 | stat->shm->concurrent++; 296 | } else { 297 | stat->shm->last_down = 1; 298 | stat->shm->concurrent = 1; 299 | stat->shm->since = ngx_current_msec; 300 | #ifdef NGX_SUPERVISORD_MODULE 301 | (void) ngx_supervisord_execute(stat->conf, 302 | NGX_SUPERVISORD_CMD_STOP, 303 | peers->peer[stat->index].onumber, 304 | NULL); 305 | #endif 306 | } 307 | } 308 | if (stat->shm->concurrent >= stat->conf->health_failcount) { 309 | stat->shm->down = stat->shm->last_down; 310 | } 311 | stat->shm->down_code = stat->state; 312 | ngx_close_connection(stat->pc->connection); 313 | stat->pc->connection = NULL; 314 | stat->state = NGX_HEALTH_WAITING; 315 | if (!ngx_terminate && !ngx_exiting && !ngx_quit) { 316 | ngx_add_timer(&stat->health_ev, stat->conf->health_delay); 317 | } else { 318 | ngx_http_healthcheck_clear_events(stat->health_ev.log); 319 | } 320 | stat->shm->action_time = ngx_current_msec; 321 | } 322 | 323 | void ngx_http_healthcheck_send_request(ngx_connection_t *); 324 | 325 | void ngx_http_healthcheck_write_handler(ngx_event_t *wev) { 326 | ngx_connection_t *c; 327 | 328 | c = wev->data; 329 | 330 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, wev->log, 0, 331 | "healthcheck: Write handler called"); 332 | 333 | ngx_http_healthcheck_send_request(c); 334 | } 335 | 336 | void ngx_http_healthcheck_send_request(ngx_connection_t *c) { 337 | ngx_http_healthcheck_status_t *stat = c->data; 338 | ssize_t size; 339 | 340 | if (stat->state != NGX_HEALTH_SENDING_CHECK) { 341 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, c->log, 0, 342 | "healthcheck: Ignoring a write. Not in writing state"); 343 | return; 344 | } 345 | 346 | do { 347 | size = 348 | c->send(c, stat->conf->health_send.data + stat->send_pos, 349 | stat->conf->health_send.len - stat->send_pos); 350 | ngx_log_debug1(NGX_LOG_DEBUG_HTTP, c->log, 0, 351 | "healthcheck: Send size %z", size); 352 | if (size == NGX_ERROR || size == 0) { 353 | // If the send fails, the connection is bad. Close it out 354 | stat->state = NGX_HEALTH_BAD_CONN; 355 | ngx_http_healthcheck_mark_finished(stat); 356 | stat->shm->action_time = ngx_current_msec; 357 | break; 358 | } else if (size == NGX_AGAIN) { 359 | // I guess this means return and try again later 360 | break; 361 | } else { 362 | stat->shm->action_time = ngx_current_msec; 363 | stat->send_pos += size; 364 | } 365 | } while (stat->send_pos < (ssize_t)stat->conf->health_send.len); 366 | 367 | if (stat->send_pos > (ssize_t)stat->conf->health_send.len) { 368 | ngx_log_error(NGX_LOG_WARN, c->log, 0, 369 | "healthcheck: Logic error. %z send pos bigger than buffer len %i", 370 | stat->send_pos, stat->conf->health_send.len); 371 | } else if (stat->send_pos == (ssize_t)stat->conf->health_send.len) { 372 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, c->log, 0, 373 | "healthcheck: Finished sending request"); 374 | stat->state = NGX_HEALTH_READING_STAT_LINE; 375 | } 376 | } 377 | 378 | void ngx_http_healthcheck_read_handler(ngx_event_t *rev) { 379 | ngx_connection_t *c; 380 | ngx_buf_t *rb; 381 | ngx_int_t rc; 382 | ssize_t size; 383 | ngx_http_healthcheck_status_t *stat; 384 | ngx_int_t expect_finished; 385 | 386 | c = rev->data; 387 | stat = c->data; 388 | rb = stat->read_buffer; 389 | 390 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, rev->log, 0, 391 | "healthcheck: Read handler called"); 392 | 393 | stat->shm->action_time = ngx_current_msec; 394 | if (ngx_current_msec - stat->check_start_time >= 395 | stat->conf->health_timeout) { 396 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, rev->log, 0, 397 | "healthcheck: timeout!"); 398 | stat->state = NGX_HEALTH_TIMEOUT; 399 | ngx_http_healthcheck_mark_finished(stat); 400 | return; 401 | } 402 | expect_finished = 0; 403 | do { 404 | size = c->recv(c, rb->pos, rb->end - rb->pos); 405 | ngx_log_debug2(NGX_LOG_DEBUG_HTTP, rev->log, 0, 406 | "healthcheck: Recv size %z when I wanted %O", size, 407 | rb->end - rb->pos); 408 | if (size == NGX_ERROR) { 409 | // If the send fails, the connection is bad. Close it out 410 | stat->state = NGX_HEALTH_BAD_CONN; 411 | break; 412 | } else if (size == NGX_AGAIN) { 413 | break; 414 | } else if (size == 0) { 415 | expect_finished = 1; 416 | break; 417 | } else { 418 | rb->pos += size; 419 | } 420 | } while (rb->pos < rb->end); 421 | 422 | if (stat->state != NGX_HEALTH_BAD_CONN) { 423 | rc = ngx_http_healthcheck_process_recv(stat); 424 | switch (rc) { 425 | case NGX_AGAIN: 426 | if (expect_finished) { 427 | stat->state = NGX_HEALTH_EARLY_CLOSE; 428 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, rev->log, 0, 429 | "healthcheck: prematurely closed connection"); 430 | } else if (rb->end == rb->pos) { 431 | // We used up our read buffer and STILL can't verify 432 | stat->state = NGX_HEALTH_FULL_BUFFER; 433 | ngx_http_healthcheck_mark_finished(stat); 434 | } 435 | // We want more data to see if the body is OK or not 436 | break; 437 | case NGX_ERROR: 438 | ngx_http_healthcheck_mark_finished(stat); 439 | break; 440 | case NGX_OK: 441 | ngx_http_healthcheck_mark_finished(stat); 442 | break; 443 | default: 444 | ngx_log_error(NGX_LOG_WARN, rev->log, 0, 445 | "healthcheck: Unknown process_recv code %i", rc); 446 | break; 447 | } 448 | } else { 449 | ngx_http_healthcheck_mark_finished(stat); 450 | } 451 | } 452 | 453 | static ngx_int_t ngx_http_healthcheck_process_recv( 454 | ngx_http_healthcheck_status_t *stat) { 455 | 456 | ngx_buf_t *rb; 457 | u_char ch; 458 | ngx_str_t *health_expected; 459 | 460 | rb = stat->read_buffer; 461 | health_expected = &stat->conf->health_expected; 462 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, stat->health_ev.log, 0, 463 | "healthcheck: Process recv"); 464 | 465 | while (rb->start + stat->read_pos < rb->pos) { 466 | ch = *(rb->start+stat->read_pos); 467 | stat->read_pos++; 468 | #if 0 469 | // Useful for debugging 470 | ngx_log_debug2(NGX_LOG_DEBUG_HTTP, stat->health_ev.log, 0, 471 | "healthcheck: CH %c state %d", ch, stat->state); 472 | #endif 473 | switch (stat->state) { 474 | case NGX_HEALTH_READING_STAT_LINE: 475 | // Look for regex '/ \d+/' 476 | if (ch == ' ') { 477 | stat->state = NGX_HEALTH_READING_STAT_CODE; 478 | stat->stat_code = 0; 479 | } else if (ch == '\r' || ch == '\n') { 480 | stat->state = NGX_HEALTH_BAD_STATUS; 481 | return NGX_ERROR; 482 | } 483 | break; 484 | case NGX_HEALTH_READING_STAT_CODE: 485 | if (ch == ' ') { 486 | if (stat->stat_code != NGX_HTTP_OK /*200*/) { 487 | stat->state = NGX_HEALTH_BAD_CODE; 488 | return NGX_ERROR; 489 | } else { 490 | stat->state = NGX_HEALTH_READING_HEADER; 491 | } 492 | } else if (ch < '0' || ch > '9') { 493 | stat->state = NGX_HEALTH_BAD_STATUS; 494 | return NGX_ERROR; 495 | } else { 496 | stat->stat_code = stat->stat_code * 10 + (ch - '0'); 497 | } 498 | break; 499 | case NGX_HEALTH_READING_HEADER: 500 | if (ch == '\n') { 501 | stat->state = NGX_HEALTH_HEADER_ALMOST_DONE; 502 | } 503 | break; 504 | case NGX_HEALTH_HEADER_ALMOST_DONE: 505 | if (ch == '\n') { 506 | if (health_expected->len == NGX_CONF_UNSET_SIZE) { 507 | stat->state = NGX_HEALTH_OK; 508 | return NGX_OK; 509 | } else { 510 | stat->state = NGX_HEALTH_READING_BODY; 511 | } 512 | } else if (ch != '\r') { 513 | stat->state = NGX_HEALTH_READING_HEADER; 514 | } 515 | break; 516 | case NGX_HEALTH_READING_BODY: 517 | if (stat->body_read_pos == (ssize_t)health_expected->len) { 518 | // Body was ok, but is now too long 519 | stat->state = NGX_HEALTH_BAD_BODY; 520 | return NGX_ERROR; 521 | } else if (ch != health_expected->data[stat->body_read_pos]) { 522 | // Body was actually bad 523 | stat->state = NGX_HEALTH_BAD_BODY; 524 | return NGX_ERROR; 525 | } else { 526 | stat->body_read_pos++; 527 | } 528 | break; 529 | default: 530 | ngx_log_error(NGX_LOG_CRIT, stat->health_ev.log, 0, 531 | "healthcheck: Logic error. Invalid state: %d", 532 | stat->state); 533 | stat->state = NGX_HEALTH_BAD_STATE; 534 | return NGX_ERROR; 535 | } 536 | } 537 | if (stat->state == NGX_HEALTH_READING_BODY && 538 | stat->body_read_pos == (ssize_t)health_expected->len) { 539 | stat->state = NGX_HEALTH_OK; 540 | return NGX_OK; 541 | } else if (stat->state == NGX_HEALTH_OK) { 542 | return NGX_OK; 543 | } else { 544 | return NGX_AGAIN; 545 | } 546 | } 547 | 548 | static void ngx_http_healthcheck_begin_healthcheck(ngx_event_t *event) { 549 | ngx_http_healthcheck_status_t * stat; 550 | ngx_connection_t *c; 551 | ngx_int_t rc; 552 | 553 | stat = event->data; 554 | if (stat->state != NGX_HEALTH_WAITING) { 555 | ngx_log_error(NGX_LOG_WARN, event->log, 0, 556 | "healthcheck: State not waiting, is %d", stat->state); 557 | } 558 | ngx_log_debug1(NGX_LOG_DEBUG_HTTP, event->log, 0, 559 | "healthcheck: begun healthcheck of index %i", stat->index); 560 | 561 | ngx_memzero(stat->pc, sizeof(ngx_peer_connection_t)); 562 | ngx_log_debug1(NGX_LOG_DEBUG_HTTP, event->log, 0, 563 | "healthcheck: Memzero done", stat->index); 564 | 565 | stat->pc->get = ngx_event_get_peer; 566 | 567 | stat->pc->sockaddr = stat->peer->sockaddr; 568 | stat->pc->socklen = stat->peer->socklen; 569 | stat->pc->name = &stat->peer->name; 570 | 571 | stat->pc->log = event->log; 572 | stat->pc->log_error = NGX_ERROR_ERR; // Um I guess (???) 573 | 574 | stat->pc->cached = 0; 575 | stat->pc->connection = NULL; 576 | ngx_log_debug1(NGX_LOG_DEBUG_HTTP, event->log, 0, 577 | "healthcheck: Connecting peer", stat->index); 578 | 579 | rc = ngx_event_connect_peer(stat->pc); 580 | if (rc == NGX_ERROR || rc == NGX_BUSY || rc == NGX_DECLINED) { 581 | ngx_log_error(NGX_LOG_CRIT, event->log, 0, 582 | "healthcheck: Could not connect to peer. This is" 583 | " pretty bad and probably means your health checks won't" 584 | " work anymore: %i", rc); 585 | if (stat->pc->connection) { 586 | ngx_close_connection(stat->pc->connection); 587 | } 588 | // Try to do it again later, but if you're getting errors when you 589 | // try to connect to a peer, this probably won't work 590 | ngx_add_timer(&stat->health_ev, stat->conf->health_delay); 591 | return; 592 | } 593 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, event->log, 0, 594 | "healthcheck: connected so far"); 595 | 596 | 597 | c = stat->pc->connection; 598 | c->data = stat; 599 | c->log = stat->pc->log; 600 | c->write->handler = ngx_http_healthcheck_write_handler; 601 | c->read->handler = ngx_http_healthcheck_read_handler; 602 | c->sendfile = 0; 603 | c->read->log = c->log; 604 | c->write->log = c->log; 605 | 606 | stat->state = NGX_HEALTH_SENDING_CHECK; 607 | stat->shm->action_time = ngx_current_msec; 608 | stat->read_pos = 0; 609 | stat->send_pos = 0; 610 | stat->body_read_pos = 0; 611 | stat->read_buffer->pos = stat->read_buffer->start; 612 | stat->read_buffer->last = stat->read_buffer->start; 613 | stat->check_start_time = ngx_current_msec; 614 | ngx_add_timer(c->read, stat->conf->health_timeout); 615 | ngx_log_debug1(NGX_LOG_DEBUG_HTTP, event->log, 0, 616 | "healthcheck: Peer connected", stat->index); 617 | 618 | ngx_http_healthcheck_send_request(c); 619 | } 620 | 621 | static void ngx_http_healthcheck_try_for_ownership(ngx_event_t *event) { 622 | ngx_http_healthcheck_status_t * stat; 623 | ngx_int_t i_own_it; 624 | 625 | stat = event->data; 626 | if (ngx_terminate || ngx_exiting || ngx_quit) { 627 | ngx_http_healthcheck_clear_events(stat->health_ev.log); 628 | return; 629 | } 630 | 631 | i_own_it = 0; 632 | // nxg_time_update(0, 0); 633 | // Spinlock. So don't own for a long time! 634 | // Use spinlock so two worker processes don't try to healthcheck the same 635 | // peer 636 | ngx_spinlock(&stat->shm->lock, ngx_pid, 1024); 637 | if (stat->shm->owner == ngx_pid) { 638 | i_own_it = 1; 639 | } else if (ngx_current_msec - stat->shm->action_time >= 640 | (stat->conf->health_delay + stat->conf->health_timeout) * 3) { 641 | stat->shm->owner = ngx_pid; 642 | stat->shm->action_time = ngx_current_msec; 643 | stat->state = NGX_HEALTH_WAITING; 644 | ngx_http_healthcheck_begin_healthcheck(&stat->health_ev); 645 | i_own_it = 1; 646 | } 647 | if (!ngx_atomic_cmp_set(&stat->shm->lock, ngx_pid, 0)) { 648 | ngx_log_error(NGX_LOG_CRIT, event->log, 0, 649 | "healthcheck: spinlock didn't work. Should be %P, but isn't", 650 | ngx_pid); 651 | 652 | stat->shm->lock = 0; 653 | } 654 | if (!i_own_it) { 655 | // Try again for ownership later in case the guy that DOES own it dies or 656 | // something 657 | ngx_add_timer(&stat->ownership_ev, 5000); 658 | } 659 | } 660 | 661 | void ngx_http_healthcheck_clear_events(ngx_log_t *log) { 662 | ngx_uint_t i; 663 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, log, 0, 664 | "healthcheck: Clearing events"); 665 | 666 | // Note: From what I can tell it is safe to ngx_del_timer events 667 | // that are not in the event tree 668 | for (i=0; inelts; i++) { 669 | ngx_del_timer(&ngx_http_healthchecks[i].health_ev); 670 | ngx_del_timer(&ngx_http_healthchecks[i].ownership_ev); 671 | } 672 | } 673 | 674 | static ngx_int_t ngx_http_healthcheck_procinit(ngx_cycle_t *cycle) { 675 | ngx_uint_t i; 676 | ngx_msec_t t; 677 | 678 | if (ngx_http_healthchecks_arr->nelts == 0) { 679 | return NGX_OK; 680 | } 681 | 682 | // Otherwise, the distribution isn't very random because each process 683 | // is a fork, so they all have the same seed 684 | srand(ngx_pid); 685 | ngx_log_debug1(NGX_LOG_DEBUG_HTTP, cycle->log, 0, 686 | "healthcheck: Adding events to worker process %P", ngx_pid); 687 | for (i=0; inelts; i++) { 688 | ngx_http_healthchecks[i].shm = &ngx_http_healthchecks_shm[i]; 689 | 690 | if (ngx_http_healthchecks[i].conf->healthcheck_enabled) { 691 | 692 | ngx_http_healthchecks[i].ownership_ev.handler = 693 | ngx_http_healthcheck_try_for_ownership; 694 | ngx_http_healthchecks[i].ownership_ev.log = cycle->log; 695 | ngx_http_healthchecks[i].ownership_ev.data = 696 | &ngx_http_healthchecks[i]; 697 | // I'm not sure why the timer_set needs to be reset to zero. 698 | // It shouldn't (??), but it does when you HUP the process 699 | ngx_http_healthchecks[i].ownership_ev.timer_set = 0; 700 | 701 | ngx_http_healthchecks[i].health_ev.handler = 702 | ngx_http_healthcheck_begin_healthcheck; 703 | ngx_http_healthchecks[i].health_ev.log = cycle->log; 704 | ngx_http_healthchecks[i].health_ev.data = 705 | &ngx_http_healthchecks[i]; 706 | ngx_http_healthchecks[i].health_ev.timer_set = 0; 707 | 708 | t = abs(ngx_random() % ngx_http_healthchecks[i].conf->health_delay); 709 | ngx_add_timer(&ngx_http_healthchecks[i].ownership_ev, t); 710 | } 711 | } 712 | return NGX_OK; 713 | } 714 | 715 | static ngx_int_t ngx_http_healthcheck_preconfig(ngx_conf_t *cf) { 716 | ngx_http_healthchecks_arr = ngx_array_create(cf->pool, 10, 717 | sizeof(ngx_http_healthcheck_status_t)); 718 | if (ngx_http_healthchecks_arr == NULL) { 719 | return NGX_ERROR; 720 | } 721 | return NGX_OK; 722 | } 723 | 724 | static ngx_int_t ngx_http_healthcheck_init(ngx_conf_t *cf) { 725 | ngx_str_t *shm_name; 726 | ngx_shm_zone_t *shm_zone; 727 | ngx_uint_t i; 728 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, cf->log, 0, 729 | "healthcheck: healthcheck_init"); 730 | 731 | if (ngx_http_healthchecks_arr->nelts == 0) { 732 | ngx_http_healthchecks_shm = NULL; 733 | return NGX_OK; 734 | } 735 | 736 | shm_name = ngx_palloc(cf->pool, sizeof *shm_name); 737 | shm_name->len = sizeof("http_healthcheck") - 1; 738 | shm_name->data = (unsigned char *) "http_healthcheck"; 739 | 740 | // I guess a page each is good enough (?) 741 | shm_zone = ngx_shared_memory_add(cf, shm_name, 742 | ngx_pagesize * (ngx_http_healthchecks_arr->nelts + 1), 743 | &ngx_http_healthcheck_module); 744 | 745 | if (shm_zone == NULL) { 746 | return NGX_ERROR; 747 | } 748 | shm_zone->init = ngx_http_healthcheck_init_zone; 749 | 750 | for (i=0; inelts; i++) { 751 | // It says 'temp', but it should last forever-ish 752 | ngx_http_healthchecks[i].read_buffer = ngx_create_temp_buf(cf->pool, 753 | ngx_http_healthchecks[i].conf->health_buffersize); 754 | if (ngx_http_healthchecks[i].read_buffer == NULL) { 755 | return NGX_ERROR; 756 | } 757 | } 758 | 759 | return NGX_OK; 760 | } 761 | 762 | static ngx_int_t 763 | ngx_http_healthcheck_init_zone(ngx_shm_zone_t *shm_zone, void *data) { 764 | ngx_uint_t i; 765 | ngx_slab_pool_t *shpool; 766 | 767 | ngx_log_debug0(NGX_LOG_DEBUG_HTTP, shm_zone->shm.log, 0, 768 | "healthcheck: Init zone"); 769 | 770 | // If we're being HUP'd, I can't just use the same 'data' segment because 771 | // the number of servers may of changed. Instead, I need to recreate a 772 | // slab 773 | 774 | shpool = (ngx_slab_pool_t *) shm_zone->shm.addr; 775 | 776 | ngx_http_healthchecks_shm = ngx_slab_alloc(shpool, 777 | (sizeof (ngx_http_healthcheck_status_shm_t)) * 778 | ngx_http_healthchecks_arr->nelts); 779 | if (ngx_http_healthchecks_shm == NULL) { 780 | return NGX_ERROR; 781 | } 782 | for (i=0; inelts; i++) { 783 | ngx_http_healthchecks_shm[i].index = i; 784 | ngx_http_healthchecks_shm[i].action_time = 0; 785 | ngx_http_healthchecks_shm[i].down = 0; 786 | ngx_http_healthchecks_shm[i].since = ngx_current_msec; 787 | } 788 | shm_zone->data = ngx_http_healthchecks_shm; 789 | 790 | return NGX_OK; 791 | } 792 | 793 | 794 | // --- BEGIN PUBLIC METHODS --- 795 | ngx_int_t 796 | ngx_http_healthcheck_add_peer(ngx_http_upstream_srv_conf_t *uscf, 797 | #if defined(nginx_version) && nginx_version >= 8022 798 | ngx_addr_t *peer, ngx_pool_t *pool) { 799 | #else 800 | ngx_peer_addr_t *peer, ngx_pool_t *pool) { 801 | #endif 802 | ngx_http_healthcheck_status_t *status; 803 | status = ngx_array_push(ngx_http_healthchecks_arr); 804 | if (status == NULL) { 805 | return NGX_ERROR; 806 | } 807 | status->conf = uscf; 808 | status->peer = peer; 809 | status->index = ngx_http_healthchecks_arr->nelts - 1; 810 | status->pc = ngx_pcalloc(pool, sizeof(ngx_peer_connection_t)); 811 | if (status->pc == NULL) { 812 | return NGX_ERROR; 813 | } 814 | return ngx_http_healthchecks_arr->nelts - 1; 815 | } 816 | 817 | ngx_int_t ngx_http_healthcheck_is_down(ngx_uint_t index, ngx_log_t *log) { 818 | if (index >= ngx_http_healthchecks_arr->nelts) { 819 | ngx_log_error(NGX_LOG_CRIT, log, 0, 820 | "healthcheck: Invalid index to is_down: %i", index); 821 | return 0; 822 | } else { 823 | return ngx_http_healthchecks[index].conf->healthcheck_enabled && 824 | ngx_http_healthchecks[index].shm->down; 825 | } 826 | } 827 | // --- END PUBLIC METHODS --- 828 | 829 | // Health status page 830 | static char* ngx_http_healthcheck_statestr( 831 | ngx_http_health_state state) { 832 | switch (state) { 833 | case NGX_HEALTH_OK: 834 | return "OK"; 835 | case NGX_HEALTH_BAD_HEADER: 836 | return "Malformed header"; 837 | case NGX_HEALTH_BAD_STATUS: 838 | return "Bad status line. Maybe not HTTP"; 839 | case NGX_HEALTH_BAD_BODY: 840 | return "Bad HTTP body contents"; 841 | case NGX_HEALTH_BAD_STATE: 842 | return "Internal error. Bad healthcheck state"; 843 | case NGX_HEALTH_BAD_CONN: 844 | return "Error reading contents. Bad connection"; 845 | case NGX_HEALTH_BAD_CODE: 846 | return "Non 200 HTTP status code"; 847 | case NGX_HEALTH_TIMEOUT: 848 | return "Healthcheck timed out"; 849 | case NGX_HEALTH_FULL_BUFFER: 850 | return "Contents could not fit read buffer"; 851 | case NGX_HEALTH_EARLY_CLOSE: 852 | return "Connection closed early"; 853 | default: 854 | return "Unknown state"; 855 | } 856 | } 857 | 858 | ngx_buf_t* ngx_http_healthcheck_buf_append(ngx_buf_t *dst, ngx_buf_t *src, 859 | ngx_pool_t *pool) { 860 | //TODO: Consider using a buffer chain 861 | ngx_buf_t *new_buf; 862 | if (dst->last + (src->last - src->pos) > dst->end) { 863 | new_buf = ngx_create_temp_buf(pool, ((dst->last - dst->pos) + (src->last - src->pos)) * 2 + 1); 864 | if (new_buf == NULL) { 865 | return NULL; 866 | } 867 | ngx_memcpy(new_buf->last, dst->pos, (dst->last - dst->pos)); 868 | new_buf->last += (dst->last - dst->pos); 869 | // TODO: I don't think there's a way to uncreate the dst buffer (??) 870 | // Should be ok because these are small and cleared at the end of 871 | // the status request 872 | dst = new_buf; 873 | } 874 | ngx_memcpy(dst->last, src->pos, (src->last - src->pos)); 875 | dst->last += (src->last - src->pos); 876 | return dst; 877 | } 878 | 879 | #define NGX_HEALTH_APPEND_CHECK(dst, src, pool) \ 880 | do { \ 881 | dst = ngx_http_healthcheck_buf_append(b, tmp, pool); \ 882 | if (dst == NULL) { \ 883 | return NGX_HTTP_INTERNAL_SERVER_ERROR; \ 884 | } \ 885 | } while (0); 886 | 887 | static ngx_int_t ngx_http_healthcheck_status_handler(ngx_http_request_t *r) { 888 | ngx_int_t rc; 889 | ngx_buf_t *b, *tmp; 890 | ngx_chain_t out; 891 | ngx_uint_t i; 892 | ngx_http_healthcheck_status_t *stat; 893 | ngx_http_healthcheck_status_shm_t *shm; 894 | if (r->method != NGX_HTTP_GET && r->method != NGX_HTTP_HEAD) { 895 | return NGX_HTTP_NOT_ALLOWED; 896 | } 897 | 898 | rc = ngx_http_discard_request_body(r); 899 | 900 | if (rc != NGX_OK) { 901 | return rc; 902 | } 903 | 904 | ngx_str_t str_tmp = ngx_string("text/html; charset=utf-8"); 905 | r->headers_out.content_type = str_tmp; 906 | 907 | if (r->method == NGX_HTTP_HEAD) { 908 | r->headers_out.status = NGX_HTTP_OK; 909 | 910 | rc = ngx_http_send_header(r); 911 | 912 | if (rc == NGX_ERROR || rc > NGX_OK || r->header_only) { 913 | return rc; 914 | } 915 | } 916 | 917 | b = ngx_create_temp_buf(r->pool, 10); 918 | tmp = ngx_create_temp_buf(r->pool, 1000); 919 | if (b == NULL || tmp == NULL) { 920 | return NGX_HTTP_INTERNAL_SERVER_ERROR; 921 | } 922 | 923 | tmp->last = ngx_snprintf(tmp->pos, tmp->end - tmp->pos, 924 | "\n" 926 | "\n" 927 | "\n" 928 | " NGINX Healthcheck status\n" 929 | "\n" 930 | "\n" 931 | "\n" 932 | " \n" 933 | " \n" 934 | " \n" 935 | " \n" 936 | " \n" 937 | " \n" 938 | " \n" 939 | " \n" 940 | " \n" 941 | " \n" 942 | " \n"); 943 | 944 | NGX_HEALTH_APPEND_CHECK(b, tmp, (r->pool)); 945 | 946 | for (i=0; inelts; i++) { 947 | stat = &ngx_http_healthchecks[i]; 948 | shm = stat->shm; 949 | 950 | tmp->last = ngx_snprintf(tmp->pos, tmp->end - tmp->pos, 951 | " \n" 952 | " \n" // Index 953 | " \n" // Name 954 | " \n" // PID 955 | " \n" // action time 956 | " \n" // concurrent status values 957 | " \n" // Time concurrent 958 | " \n" // Last response down? 959 | " \n" // Code of last response 960 | " \n" // Is down? 961 | " \n", stat->index, &stat->peer->name, shm->owner, 962 | shm->action_time, shm->concurrent, 963 | shm->since, (int)shm->last_down, 964 | ngx_http_healthcheck_statestr(shm->down_code), 965 | shm->down); 966 | NGX_HEALTH_APPEND_CHECK(b, tmp, r->pool); 967 | } 968 | 969 | tmp->last = ngx_snprintf(tmp->pos, tmp->end - tmp->pos, 970 | "
IndexNameOwner PIDLast action timeConcurrent status valuesTime of concurrent valuesLast response downLast health statusIs down?
%i%V%P%M%i%M%d%s%A
\n" 971 | "\n" 972 | "\n"); 973 | NGX_HEALTH_APPEND_CHECK(b, tmp, r->pool); 974 | 975 | r->headers_out.status = NGX_HTTP_OK; 976 | r->headers_out.content_length_n = b->last - b->pos; 977 | 978 | b->last_buf = 1; 979 | out.buf = b; 980 | out.next = NULL; 981 | 982 | rc = ngx_http_send_header(r); 983 | 984 | if (rc == NGX_ERROR || rc > NGX_OK || r->header_only) { 985 | return rc; 986 | } 987 | 988 | return ngx_http_output_filter(r, &out); 989 | } 990 | #undef NGX_HEALTH_APPEND_CHECK 991 | // end health status page 992 | 993 | // 994 | // 995 | // BEGIN THE BORING PART: Setting config variables 996 | // 997 | // 998 | 999 | static char* ngx_http_healthcheck_enabled(ngx_conf_t *cf, ngx_command_t *cmd, 1000 | void *conf) { 1001 | ngx_http_upstream_srv_conf_t *uscf; 1002 | uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module); 1003 | uscf->healthcheck_enabled = 1; 1004 | return NGX_CONF_OK; 1005 | } 1006 | 1007 | static char* ngx_http_healthcheck_delay(ngx_conf_t *cf, ngx_command_t *cmd, 1008 | void *conf) { 1009 | ngx_http_upstream_srv_conf_t *uscf; 1010 | ngx_str_t *value; 1011 | value = cf->args->elts; 1012 | 1013 | uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module); 1014 | uscf->health_delay = (ngx_uint_t)ngx_atoi(value[1].data, value[1].len); 1015 | if (uscf->health_delay == NGX_ERROR) { 1016 | return "Invalid healthcheck delay"; 1017 | } 1018 | return NGX_CONF_OK; 1019 | } 1020 | static char* ngx_http_healthcheck_timeout(ngx_conf_t *cf, ngx_command_t *cmd, 1021 | void *conf) { 1022 | ngx_http_upstream_srv_conf_t *uscf; 1023 | ngx_str_t *value; 1024 | value = cf->args->elts; 1025 | 1026 | uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module); 1027 | uscf->health_timeout = ngx_atoi(value[1].data, value[1].len); 1028 | if (uscf->health_timeout == (ngx_msec_t)NGX_ERROR) { 1029 | return "Invalid healthcheck timeout "; 1030 | } 1031 | return NGX_CONF_OK; 1032 | } 1033 | static char* ngx_http_healthcheck_failcount(ngx_conf_t *cf, ngx_command_t *cmd, 1034 | void *conf) { 1035 | ngx_http_upstream_srv_conf_t *uscf; 1036 | ngx_str_t *value; 1037 | value = cf->args->elts; 1038 | 1039 | uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module); 1040 | uscf->health_failcount = ngx_atoi(value[1].data, value[1].len); 1041 | if (uscf->health_failcount == NGX_ERROR) { 1042 | return "Invalid healthcheck failcount"; 1043 | } 1044 | return NGX_CONF_OK; 1045 | } 1046 | static char* ngx_http_healthcheck_send(ngx_conf_t *cf, ngx_command_t *cmd, 1047 | void *conf) { 1048 | ngx_http_upstream_srv_conf_t *uscf; 1049 | ngx_str_t *value; 1050 | ngx_int_t num; 1051 | int i; 1052 | uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module); 1053 | value = cf->args->elts; 1054 | num = cf->args->nelts; 1055 | uscf->health_send.len = 0; 1056 | size_t at; 1057 | uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module); 1058 | for (i = 1; ihealth_send.len += 2; // \r\n 1061 | } 1062 | uscf->health_send.len += value[i].len; 1063 | } 1064 | uscf->health_send.len += (sizeof(CRLF) - 1) * 2; 1065 | uscf->health_send.data = ngx_pnalloc(cf->pool, uscf->health_send.len + 1); 1066 | if (uscf->health_send.data == NULL) { 1067 | return "Unable to alloc data to send"; 1068 | } 1069 | at = 0; 1070 | for (i = 1; ihealth_send.data + at, CRLF, sizeof(CRLF) - 1); 1073 | at += sizeof(CRLF) - 1; 1074 | } 1075 | ngx_memcpy(uscf->health_send.data + at, value[i].data, value[i].len); 1076 | at += value[i].len; 1077 | } 1078 | ngx_memcpy(uscf->health_send.data + at, CRLF CRLF, (sizeof(CRLF) - 1) * 2); 1079 | at += (sizeof(CRLF) - 1) * 2; 1080 | uscf->health_send.data[at] = 0; 1081 | if (at != uscf->health_send.len) { 1082 | return "healthcheck: Logic error. Length doesn't match"; 1083 | } 1084 | 1085 | return NGX_CONF_OK; 1086 | } 1087 | 1088 | static char* ngx_http_healthcheck_expected(ngx_conf_t *cf, ngx_command_t *cmd, 1089 | void *conf) { 1090 | ngx_http_upstream_srv_conf_t *uscf; 1091 | ngx_str_t *value; 1092 | value = cf->args->elts; 1093 | 1094 | uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module); 1095 | uscf->health_expected.data = value[1].data; 1096 | uscf->health_expected.len = value[1].len; 1097 | 1098 | return NGX_CONF_OK; 1099 | } 1100 | 1101 | static char* ngx_http_healthcheck_buffer(ngx_conf_t *cf, ngx_command_t *cmd, 1102 | void *conf) { 1103 | ngx_http_upstream_srv_conf_t *uscf; 1104 | ngx_str_t *value; 1105 | value = cf->args->elts; 1106 | 1107 | uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module); 1108 | uscf->health_buffersize = ngx_atoi(value[1].data, value[1].len); 1109 | if (uscf->health_buffersize == NGX_ERROR) { 1110 | return "Invalid healthcheck buffer size"; 1111 | } 1112 | return NGX_CONF_OK; 1113 | } 1114 | 1115 | 1116 | static char* ngx_http_set_healthcheck_status(ngx_conf_t *cf, ngx_command_t *cmd, 1117 | void*conf) { 1118 | 1119 | ngx_http_core_loc_conf_t *clcf; 1120 | 1121 | clcf = ngx_http_conf_get_module_loc_conf(cf, ngx_http_core_module); 1122 | clcf->handler = ngx_http_healthcheck_status_handler; 1123 | 1124 | return NGX_CONF_OK; 1125 | } 1126 | 1127 | #undef ngx_http_healthchecks 1128 | -------------------------------------------------------------------------------- /ngx_http_healthcheck_module.h: -------------------------------------------------------------------------------- 1 | #ifndef _NGX_HEALTHCHECK_MODULE_H_ 2 | #define _NGX_HEALTHCHECK_MODULE_H_ 3 | 4 | #include 5 | #include 6 | #include 7 | 8 | // I don't define everything here, just the stuff external users will 9 | // want to call 10 | 11 | /** 12 | * Add a peer for healthchecking 13 | * 14 | * @param uscf The upstream the peer belongs to 15 | * @param peer The peer to check 16 | * @param pool Pool of memory to create peer checking data from 17 | * 18 | * @return Integer identifier for this healthcheck or NGX_ERROR if stuff 19 | * went bad. 20 | */ 21 | #if defined(nginx_version) && nginx_version >= 8022 22 | ngx_int_t ngx_http_healthcheck_add_peer(ngx_http_upstream_srv_conf_t *uscf, 23 | ngx_addr_t *peer, ngx_pool_t *pool); 24 | #else 25 | ngx_int_t ngx_http_healthcheck_add_peer(ngx_http_upstream_srv_conf_t *uscf, 26 | ngx_peer_addr_t *peer, ngx_pool_t *pool); 27 | #endif 28 | 29 | /** 30 | * Check the health of a peer 31 | * 32 | * @param index Integer identifier index to check 33 | * @param log Gets warning and error messages 34 | * @return True if the given peer has failed its healthcheck 35 | */ 36 | ngx_int_t ngx_http_healthcheck_is_down(ngx_uint_t index, ngx_log_t *log); 37 | 38 | #endif 39 | -------------------------------------------------------------------------------- /sample_ngx_config.conf: -------------------------------------------------------------------------------- 1 | worker_processes 5; 2 | #daemon off; 3 | 4 | events { 5 | worker_connections 1000; 6 | } 7 | 8 | # Only if you want to see lots of spam 9 | error_log log/error_log debug_http; 10 | 11 | http { 12 | 13 | upstream test_upstreams { 14 | server localhost:11114; 15 | server localhost:11115; 16 | hash $filename; 17 | hash_again 10; 18 | healthcheck_enabled; 19 | healthcheck_delay 1000; 20 | healthcheck_timeout 1000; 21 | healthcheck_failcount 1; 22 | # Important: There is no \n at the end of this. Or \r. Make sure you 23 | # don't have a \n or \r or anything else at the end of your healthcheck 24 | # response 25 | healthcheck_expected 'I_AM_ALIVE'; 26 | # Important: HTTP/1.0 27 | healthcheck_send "GET /health HTTP/1.0" 'Host: www.mysite.com'; 28 | # Optional supervisord module support 29 | #supervisord none; 30 | #supervisord_inherit_backend_status; 31 | } 32 | 33 | server { 34 | listen 11114; 35 | location / { 36 | root html_11114; 37 | } 38 | } 39 | server { 40 | listen 11115; 41 | location / { 42 | root html_11115; 43 | } 44 | } 45 | 46 | server { 47 | listen 81; 48 | 49 | location / { 50 | set $filename $request_uri; 51 | if ($request_uri ~* ".*/(.*)") { 52 | set $filename $1; 53 | } 54 | proxy_set_header Host $http_host; 55 | proxy_pass http://test_upstreams; 56 | proxy_connect_timeout 3; 57 | } 58 | location /stat { 59 | healthcheck_status; 60 | } 61 | } 62 | } 63 | --------------------------------------------------------------------------------