├── README
├── config
├── nginx.patch
├── ngx_http_healthcheck_module.c
├── ngx_http_healthcheck_module.h
└── sample_ngx_config.conf


/README:
--------------------------------------------------------------------------------
 1 | # Update
 2 | 
 3 | This module is no longer maintained.  I recommend using https://github.com/yaoweibin/nginx_upstream_check_module instead.
 4 | 
 5 | If you're curious about how this module used to work, read ahead:
 6 | 
 7 | 
 8 | Healthcheck plugin for nginx.  It polls backends and if they respond with
 9 | HTTP 200 + an optional request body, they are marked good.  Otherwise, they
10 | are marked bad.  Similar to haproxy/varnish health checks.
11 | 
12 | For help on all the options, see the docblocks inside the .c file where each
13 | option is defined.
14 | 
15 | Note this also gives you access to a health status page that lets you see
16 | how well your healthcheck are doing.
17 | 
18 | 
19 | ==Important=
20 | Nginx gives you full freedom which server peer to pick when you write an
21 | upstream.  This means that the healthchecking plugin is only a tool that
22 | other upstreams must know about to use.  So your upstream code MUST SUPPORT
23 | HEALTHCHECKS.  It's actually pretty easy to modify the code to support them.
24 | 
25 | See the .h file for how as well as the upstream_hash patch which shows
26 | how to modify upstream_hash to support healthchecks.
27 | 
28 | For an example plugin modified to support healthchecks, see my modifications
29 | to the upstream_hash plugin here:
30 | 
31 | http://github.com/cep21/nginx_upstream_hash/tree/support_http_healthchecks
32 | 
33 | ==Limitations==
34 | The module only supports HTTP 1.0, not 1.1.  What that really means is it
35 | doesn't understand chunked encoding.  You should ask for a 1.0 reponse with
36 | your healthcheck, unless you're sure the upstream won't send back chunked
37 | encoding.  See the sample config for an example.
38 | 
39 | ==INSTALL==
40 | # Similar to the upstream_hash module
41 | 
42 | cd nginx-0.7.62 # or whatever
43 | patch -p1 < /path/to/this/directory/nginx.patch
44 | ./configure --add-module=/path/to/this/directory
45 | make
46 | make install
47 | 
48 | ==How the module works==
49 | My first attempt was to spawn a pthread inside the master process, but nginx
50 | freaks out on all kinds of levels when you try to have multiple threads
51 | running at the same time.  Then I thought, fine I'll just fork my own child.
52 | But that caused lots of issues when I tried to HUP the master process because
53 | my own child wasn't getting signals.  I was thinking to myself, these just
54 | don't feel like the nginx way of doing things.  So, I figured I would just
55 | work directly with the worker process model.
56 | 
57 | When each worker process starts, they add an repeating event to the event
58 | tree asking for ownership of a server's healthcheck.  When that ownership
59 | event comes up, they lock the server's healthcheck and try to claim it with
60 | their pid.  If the process can't claim it, then it retries to claim the
61 | healthcheck later, cause maybe the worker that does own it dies or something.
62 | 
63 | For the worker that does own it, it inserts a healthcheck event into nginx's
64 | event tree.  When that triggers, then it starts a peer connection to the
65 | server and goes to town sending and getting data.  When the healthcheck
66 | finishes, or times out, it updates the shared memory structure and signals for
67 | another healthcheck later.
68 | 
69 | A few random issues I had were:
70 | 1) When nginx tries to shut down, it waits for the event tree to empty out.
71 | To get around this, I check for ngx_quit and all kinds of other variables.
72 | This means that when you do HUP nginx, your worker needs to sit around doing
73 | nothing until *something* in the healthcheck event tree comes up, after which
74 | it can clear all the healthcheck events and move on.  I could fix this if
75 | nginx added a per module callback on HUP.  Maybe a 'cleanup' or something.
76 | The current exit_process callback is called after the event tree is empty, not
77 | after a request to shutdown a worker.
78 | 
79 | ==Extending==
80 | It should be very easy to extend this module to work with fastcgi or even
81 | generic TCP backends.  You would need to just change, or abstract out,
82 | ngx_http_healthcheck_process_recv.  Patches that do that are welcome, and I'm
83 | happy to help out with any questions.  I'm also happy to help out with
84 | extending your upstream picking modules to work with healthchecks as well.
85 | Your code can even be no healthcheck compatable by surrounding the changes
86 | with #if (NGX_HTTP_HEALTHCHECK)
87 | 
88 | ==Config==
89 | See sample_ngx_config.conf
90 | 
91 | Author: Jack Lindamood <jack@facebook.com>
92 | 
93 | ==License==
94 | 
95 | Apache License, Version 2.0
96 | 


--------------------------------------------------------------------------------
/config:
--------------------------------------------------------------------------------
1 | ngx_addon_name=ngx_http_healthcheck_module
2 | HTTP_INCS="$HTTP_INCS $ngx_addon_dir"
3 | HTTP_MODULES="$HTTP_MODULES ngx_http_healthcheck_module"
4 | NGX_ADDON_SRCS="$NGX_ADDON_SRCS $ngx_addon_dir/ngx_http_healthcheck_module.c"
5 | have=NGX_HTTP_HEALTHCHECK . auto/have
6 | 


--------------------------------------------------------------------------------
/nginx.patch:
--------------------------------------------------------------------------------
  1 | diff -ru nginx-1.0.10/src/http/ngx_http_upstream.c nginx-1.0.10-patched/src/http/ngx_http_upstream.c
  2 | --- nginx-1.0.10/src/http/ngx_http_upstream.c	2011-11-01 10:18:10.000000000 -0400
  3 | +++ nginx-1.0.10-patched/src/http/ngx_http_upstream.c	2011-11-30 20:34:10.000000000 -0500
  4 | @@ -4293,6 +4293,17 @@
  5 |      uscf->line = cf->conf_file->line;
  6 |      uscf->port = u->port;
  7 |      uscf->default_port = u->default_port;
  8 | +#if (NGX_HTTP_HEALTHCHECK)
  9 | +    uscf->healthcheck_enabled = 0;
 10 | +    uscf->health_delay = 10000;
 11 | +    uscf->health_timeout = 2000;
 12 | +    uscf->health_failcount = 2;
 13 | +    uscf->health_buffersize = 1000;
 14 | +    uscf->health_send.data = (u_char*)"";
 15 | +    uscf->health_send.len = 0;
 16 | +    uscf->health_expected.len = NGX_CONF_UNSET_SIZE;
 17 | +    uscf->health_expected.data = NGX_CONF_UNSET_PTR;
 18 | +#endif
 19 |  
 20 |      if (u->naddrs == 1) {
 21 |          uscf->servers = ngx_array_create(cf->pool, 1,
 22 | Only in nginx-1.0.10-patched/src/http: ngx_http_upstream.c.orig
 23 | diff -ru nginx-1.0.10/src/http/ngx_http_upstream.h nginx-1.0.10-patched/src/http/ngx_http_upstream.h
 24 | --- nginx-1.0.10/src/http/ngx_http_upstream.h	2011-11-01 10:18:10.000000000 -0400
 25 | +++ nginx-1.0.10-patched/src/http/ngx_http_upstream.h	2011-11-30 20:34:10.000000000 -0500
 26 | @@ -109,6 +109,24 @@
 27 |  
 28 |      ngx_array_t                     *servers;  /* ngx_http_upstream_server_t */
 29 |  
 30 | +#if (NGX_HTTP_HEALTHCHECK)
 31 | +    // If true, healthchecking is enabled for this upstream
 32 | +    unsigned                         healthcheck_enabled:1;
 33 | +    // Delay between healthchecks (in sec)
 34 | +    time_t                           health_delay;
 35 | +    // Total time a healthcheck is allowed to execute
 36 | +    ngx_msec_t                       health_timeout;
 37 | +    // Number of good/bad results indicate the node is up/down
 38 | +    ngx_int_t                        health_failcount;
 39 | +    // Size of the body+headers buffer
 40 | +    ngx_int_t                        health_buffersize;
 41 | +    // What is sent to initiate the healthcheck
 42 | +    ngx_str_t                        health_send;
 43 | +    // Expected from healthcheck, excluding headers
 44 | +    ngx_str_t                        health_expected;
 45 | +#endif
 46 | +
 47 | +
 48 |      ngx_uint_t                       flags;
 49 |      ngx_str_t                        host;
 50 |      u_char                          *file_name;
 51 | Only in nginx-1.0.10-patched/src/http: ngx_http_upstream.h.orig
 52 | diff -ru nginx-1.0.10/src/http/ngx_http_upstream_round_robin.c nginx-1.0.10-patched/src/http/ngx_http_upstream_round_robin.c
 53 | --- nginx-1.0.10/src/http/ngx_http_upstream_round_robin.c	2011-09-30 10:30:01.000000000 -0400
 54 | +++ nginx-1.0.10-patched/src/http/ngx_http_upstream_round_robin.c	2011-11-30 20:34:10.000000000 -0500
 55 | @@ -4,6 +4,8 @@
 56 |   */
 57 |  
 58 |  
 59 | +/* on top, so it won't collide with ngx_supervisord's patch */
 60 | +#include <ngx_http_healthcheck_module.h>
 61 |  #include <ngx_config.h>
 62 |  #include <ngx_core.h>
 63 |  #include <ngx_http.h>
 64 | @@ -12,7 +14,8 @@
 65 |  static ngx_int_t ngx_http_upstream_cmp_servers(const void *one,
 66 |      const void *two);
 67 |  static ngx_uint_t
 68 | -ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers);
 69 | +ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers,
 70 | +    ngx_log_t * log);
 71 |  
 72 |  #if (NGX_HTTP_SSL)
 73 |  
 74 | @@ -32,6 +35,7 @@
 75 |      ngx_uint_t                     i, j, n;
 76 |      ngx_http_upstream_server_t    *server;
 77 |      ngx_http_upstream_rr_peers_t  *peers, *backup;
 78 | +    ngx_int_t                      health_index;
 79 |  
 80 |      us->peer.init = ngx_http_upstream_init_round_robin_peer;
 81 |  
 82 | @@ -66,6 +70,14 @@
 83 |                      continue;
 84 |                  }
 85 |  
 86 | +                /* on top, so it won't collide with ngx_supervisord's patch */
 87 | +                health_index = ngx_http_healthcheck_add_peer(us,
 88 | +                                   &server[i].addrs[j], cf->pool);
 89 | +                if (health_index == NGX_ERROR) {
 90 | +                    return NGX_ERROR;
 91 | +                }
 92 | +                peers->peer[n].health_index = health_index;
 93 | +
 94 |                  peers->peer[n].sockaddr = server[i].addrs[j].sockaddr;
 95 |                  peers->peer[n].socklen = server[i].addrs[j].socklen;
 96 |                  peers->peer[n].name = server[i].addrs[j].name;
 97 | @@ -377,6 +389,7 @@
 98 |      ngx_connection_t              *c;
 99 |      ngx_http_upstream_rr_peer_t   *peer;
100 |      ngx_http_upstream_rr_peers_t  *peers;
101 | +    ngx_int_t                      healthy;
102 |  
103 |      ngx_log_debug1(NGX_LOG_DEBUG_HTTP, pc->log, 0,
104 |                     "get rr peer, try: %ui", pc->tries);
105 | @@ -422,7 +435,7 @@
106 |              i = pc->tries;
107 |  
108 |              for ( ;; ) {
109 | -                rrp->current = ngx_http_upstream_get_peer(rrp->peers);
110 | +                rrp->current = ngx_http_upstream_get_peer(rrp->peers, pc->log);
111 |  
112 |                  ngx_log_debug2(NGX_LOG_DEBUG_HTTP, pc->log, 0,
113 |                                 "get rr peer, current: %ui %i",
114 | @@ -483,7 +496,11 @@
115 |  
116 |                      peer = &rrp->peers->peer[rrp->current];
117 |  
118 | -                    if (!peer->down) {
119 | +                    healthy = !ngx_http_healthcheck_is_down(
120 | +                                peer->health_index,
121 | +                                pc->log);
122 | +
123 | +                    if ((!peer->down) && healthy) {
124 |  
125 |                          if (peer->max_fails == 0
126 |                              || peer->fails < peer->max_fails)
127 | @@ -588,12 +605,14 @@
128 |  
129 |  
130 |  static ngx_uint_t
131 | -ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers)
132 | +ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers, ngx_log_t *log)
133 |  {
134 |      ngx_uint_t                    i, n, reset = 0;
135 |      ngx_http_upstream_rr_peer_t  *peer;
136 | +    ngx_uint_t                    health_check_rounds;
137 |  
138 |      peer = &peers->peer[0];
139 | +    health_check_rounds = 2;
140 |  
141 |      for ( ;; ) {
142 |  
143 | @@ -613,6 +632,11 @@
144 |                      continue;
145 |                  }
146 |  
147 | +                if (health_check_rounds && ngx_http_healthcheck_is_down(
148 | +                        peer[i].health_index,
149 | +                        log))
150 | +                    continue;
151 | +
152 |                  if (peer[n].current_weight * 1000 / peer[i].current_weight
153 |                      > peer[n].weight * 1000 / peer[i].weight)
154 |                  {
155 | @@ -633,6 +657,9 @@
156 |              return 0;
157 |          }
158 |  
159 | +        if (health_check_rounds)
160 | +            --health_check_rounds;
161 | +
162 |          for (i = 0; i < peers->number; i++) {
163 |              peer[i].current_weight = peer[i].weight;
164 |          }
165 | Only in nginx-1.0.10-patched/src/http: ngx_http_upstream_round_robin.c.orig
166 | diff -ru nginx-1.0.10/src/http/ngx_http_upstream_round_robin.h nginx-1.0.10-patched/src/http/ngx_http_upstream_round_robin.h
167 | --- nginx-1.0.10/src/http/ngx_http_upstream_round_robin.h	2009-11-02 07:41:56.000000000 -0500
168 | +++ nginx-1.0.10-patched/src/http/ngx_http_upstream_round_robin.h	2011-11-30 20:34:10.000000000 -0500
169 | @@ -26,6 +26,7 @@
170 |  
171 |      ngx_uint_t                      max_fails;
172 |      time_t                          fail_timeout;
173 | +    ngx_int_t                       health_index;
174 |  
175 |      ngx_uint_t                      down;          /* unsigned  down:1; */
176 |  
177 | 


--------------------------------------------------------------------------------
/ngx_http_healthcheck_module.c:
--------------------------------------------------------------------------------
   1 | /*
   2 |  * Does health checks of servers in an upstream
   3 |  *
   4 |  * Author: Jack Lindamood <jack facebook com>
   5 |  *
   6 |  */
   7 | 
   8 | #include <ngx_config.h>
   9 | #include <ngx_core.h>
  10 | #include <ngx_http.h>
  11 | #include <ngx_http_healthcheck_module.h>
  12 | #ifdef NGX_SUPERVISORD_MODULE
  13 | #include <ngx_supervisord.h>
  14 | #if (NGX_SUPERVISORD_API_VERSION != 2)
  15 |   #error "ngx_http_upstream_fair_module requires NGX_SUPERVISORD_API v2"
  16 | #endif
  17 | #endif
  18 | 
  19 | #if (!NGX_HAVE_ATOMIC_OPS)
  20 | #error "Healthcheck module only works with atomic ops"
  21 | #endif
  22 | 
  23 | typedef enum {
  24 |     // In progress states
  25 |     NGX_HEALTH_UNINIT_STATE = 0,
  26 |     NGX_HEALTH_WAITING,
  27 |     NGX_HEALTH_SENDING_CHECK,
  28 |     NGX_HEALTH_READING_STAT_LINE,
  29 |     NGX_HEALTH_READING_STAT_CODE,
  30 |     NGX_HEALTH_READING_HEADER,
  31 |     NGX_HEALTH_HEADER_ALMOST_DONE,
  32 |     NGX_HEALTH_READING_BODY,
  33 |     // Good + final states
  34 |     NGX_HEALTH_OK = 100,
  35 |     // bad + final states
  36 |     NGX_HEALTH_BAD_HEADER = 200,
  37 |     NGX_HEALTH_BAD_STATUS,
  38 |     NGX_HEALTH_BAD_BODY,
  39 |     NGX_HEALTH_BAD_STATE,
  40 |     NGX_HEALTH_BAD_CONN,
  41 |     NGX_HEALTH_BAD_CODE,
  42 |     NGX_HEALTH_TIMEOUT,
  43 |     NGX_HEALTH_FULL_BUFFER,
  44 |     NGX_HEALTH_EARLY_CLOSE
  45 | } ngx_http_health_state;
  46 | 
  47 | typedef struct {
  48 |     // Worker pid processing this healthcheck
  49 |     ngx_pid_t                         owner;
  50 |     // matches the non shared memory index
  51 |     ngx_uint_t                        index;
  52 |     // Last time any action (read/write/timeout) was taken on this structure
  53 |     ngx_msec_t                        action_time;
  54 |     // Number of concurrent bad or good responses
  55 |     ngx_int_t concurrent;
  56 |     // How long this server's been concurrently bad or good
  57 |     ngx_msec_t                        since;
  58 |     // If true, the server's last response was bad
  59 |     unsigned last_down:1;
  60 |     // Code (above ngx_http_health_state) of last finished check
  61 |     ngx_http_health_state             down_code;
  62 |     // Used so multiple processes don't try to healthcheck the same peer
  63 |     ngx_atomic_t lock;
  64 |     /**
  65 |      * If true, the server is actually down.  This is
  66 |      * different than last_down because a server needs
  67 |      * X concurrent good or bad connections to actually
  68 |      * be down
  69 |      */
  70 |     ngx_atomic_t down;
  71 | } ngx_http_healthcheck_status_shm_t;
  72 | 
  73 | 
  74 | typedef struct {
  75 |     // Upstream this peer belongs to
  76 |     ngx_http_upstream_srv_conf_t    *conf;
  77 |     // The peer to check
  78 | #if defined(nginx_version) && nginx_version >= 8022
  79 |     ngx_addr_t                      *peer;
  80 | #else
  81 |     ngx_peer_addr_t                 *peer;
  82 | #endif
  83 |     // Index of the peer.  Matches shm segment and is used for 'down' checking
  84 |     //  by external clients
  85 |     ngx_uint_t                       index;
  86 |     // Current state of the healthcheck.  Different than shm->down_state
  87 |     // because this is an active state and that is a finisehd state.
  88 |     ngx_http_health_state            state;
  89 |     // Connection to the peer.  We reuse this memory each healthcheck, but
  90 |     // memset zero it
  91 |     ngx_peer_connection_t           *pc;
  92 |     // When the check began so we can diff it with action_time and time the
  93 |     // check out
  94 |     ngx_msec_t                       check_start_time;
  95 |     // Event that triggers a health check
  96 |     ngx_event_t                       health_ev;
  97 |     // Event that triggers an attempt at ownership of this healthcheck
  98 |     ngx_event_t                       ownership_ev;
  99 |     ngx_buf_t                        *read_buffer;
 100 |     // Where I am reading the entire connection, headers + body
 101 |     ssize_t                           read_pos;
 102 |     // Where I am in conf->health_expected (the body only)
 103 |     ssize_t                           body_read_pos;
 104 |     // Where I am in conf->health_send
 105 |     ssize_t                           send_pos;
 106 |     // HTTP status code returned (200, 404, etc)
 107 |     ngx_uint_t                        stat_code;
 108 |     ngx_http_healthcheck_status_shm_t *shm;
 109 | } ngx_http_healthcheck_status_t;
 110 | 
 111 | // This one is not shared. Created when the config is parsed
 112 | static ngx_array_t                   *ngx_http_healthchecks_arr;
 113 | // This is the same as the above data ->elts.  For ease of use
 114 | #define ngx_http_healthchecks \
 115 |   ((ngx_http_healthcheck_status_t*) ngx_http_healthchecks_arr->elts)
 116 | static ngx_http_healthcheck_status_shm_t *ngx_http_healthchecks_shm;
 117 | 
 118 | static ngx_int_t ngx_http_healthcheck_init(ngx_conf_t *cf);
 119 | static char* ngx_http_healthcheck_enabled(ngx_conf_t *cf, ngx_command_t *cmd,
 120 |         void *conf);
 121 | static char* ngx_http_healthcheck_delay(ngx_conf_t *cf, ngx_command_t *cmd,
 122 |         void *conf);
 123 | static char* ngx_http_healthcheck_timeout(ngx_conf_t *cf, ngx_command_t *cmd,
 124 |         void *conf);
 125 | static char* ngx_http_healthcheck_failcount(ngx_conf_t *cf, ngx_command_t *cmd,
 126 |         void *conf);
 127 | static char* ngx_http_healthcheck_send(ngx_conf_t *cf, ngx_command_t *cmd,
 128 |         void *conf);
 129 | static char* ngx_http_healthcheck_expected(ngx_conf_t *cf, ngx_command_t *cmd,
 130 |         void *conf);
 131 | static char* ngx_http_healthcheck_buffer(ngx_conf_t *cf, ngx_command_t *cmd,
 132 |         void *conf);
 133 | static char* ngx_http_set_healthcheck_status(ngx_conf_t *cf, ngx_command_t *cmd,
 134 |       void*conf);
 135 | static ngx_int_t ngx_http_healthcheck_procinit(ngx_cycle_t *cycle);
 136 | static ngx_int_t ngx_http_healthcheck_preconfig(ngx_conf_t *cf);
 137 | static ngx_int_t ngx_http_healthcheck_init_zone(ngx_shm_zone_t *shm_zone,
 138 |         void *data);
 139 | static ngx_int_t ngx_http_healthcheck_process_recv(
 140 |         ngx_http_healthcheck_status_t *stat);
 141 | static char* ngx_http_healthcheck_statestr(
 142 |         ngx_http_health_state state);
 143 | 
 144 | // I really wish there was a way to make nginx call this when you HUP the
 145 | // master
 146 | void ngx_http_healthcheck_clear_events(ngx_log_t *log);
 147 | 
 148 | static ngx_command_t  ngx_http_healthcheck_commands[] = {
 149 |     /**
 150 |      * If mentioned, enable healthchecks for this upstream
 151 |      */
 152 |     { ngx_string("healthcheck_enabled"),
 153 |         NGX_HTTP_UPS_CONF|NGX_CONF_NOARGS,
 154 |         ngx_http_healthcheck_enabled,
 155 |         0,
 156 |         0,
 157 |         NULL },
 158 |     /**
 159 |      * Delay in msec between healthchecks for a single peer
 160 |      */
 161 |     { ngx_string("healthcheck_delay"),
 162 |         NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1,
 163 |         ngx_http_healthcheck_delay,
 164 |         0,
 165 |         0,
 166 |         NULL } ,
 167 |     /**
 168 |      * How long in msec a healthcheck is allowed to take place
 169 |      */
 170 |     { ngx_string("healthcheck_timeout"),
 171 |         NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1,
 172 |         ngx_http_healthcheck_timeout,
 173 |         0,
 174 |         0,
 175 |         NULL },
 176 |     /**
 177 |      * Number of healthchecks good or bad in a row it takes to switch from
 178 |      * down to up and back.  Good to prevent flapping
 179 |      */
 180 |     { ngx_string("healthcheck_failcount"),
 181 |         NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1,
 182 |         ngx_http_healthcheck_failcount,
 183 |         0,
 184 |         0,
 185 |         NULL } ,
 186 |     /**
 187 |      * What to send for the healthcheck.  Each argument is appended by \r\n
 188 |      * and the entire thing is suffixed with another \r\n.  For example,
 189 |      *
 190 |      *     healthcheck_send 'GET /health HTTP/1.1'
 191 |      *       'Host: www.facebook.com' 'Connection: close';
 192 |      *
 193 |      * Note that you probably want to end your health check with some directive
 194 |      * that closes the connection, like Connection: close.
 195 |      *
 196 |      */
 197 |     { ngx_string("healthcheck_send"),
 198 |         NGX_HTTP_UPS_CONF|NGX_CONF_1MORE,
 199 |         ngx_http_healthcheck_send,
 200 |         0,
 201 |         0,
 202 |         NULL },
 203 |     /**
 204 |      * What to expect in the HTTP BODY, (meaning not the headers), in a correct
 205 |      * response
 206 |      */
 207 |     { ngx_string("healthcheck_expected"),
 208 |         NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1,
 209 |         ngx_http_healthcheck_expected,
 210 |         0,
 211 |         0,
 212 |         NULL },
 213 |     /**
 214 |      * How big a buffer to use for the health check.  Remember to include
 215 |      * headers PLUS body, not just body.
 216 |      */
 217 |     { ngx_string("healthcheck_buffer"),
 218 |         NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1,
 219 |         ngx_http_healthcheck_buffer,
 220 |         0,
 221 |         0,
 222 |         NULL },
 223 |     /**
 224 |      * When inside a /location block, replaced the HTTP body with backend
 225 |      * health status.  Use similarly to the stub_status module
 226 |      */
 227 |     { ngx_string("healthcheck_status"),
 228 |       NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_NOARGS,
 229 |       ngx_http_set_healthcheck_status,
 230 |       0,
 231 |       0,
 232 |       NULL },
 233 |     ngx_null_command
 234 | };
 235 | 
 236 | 
 237 | // Note: I tried using the "create server configuration" section rather than
 238 | // patching the nginx code, but it didn't work.  When you set the options
 239 | // you're in a different config context than when you use them in the upstream.
 240 | // It's very strange and unintuitive, but it's nginx
 241 | 
 242 | static ngx_http_module_t  ngx_http_healthcheck_module_ctx = {
 243 |     ngx_http_healthcheck_preconfig,        /* preconfiguration */
 244 |     ngx_http_healthcheck_init,             /* postconfiguration */
 245 | 
 246 |     NULL,                                  /* create main configuration */
 247 |     NULL,                                  /* init main configuration */
 248 | 
 249 |     NULL,                                  /* create server configuration */
 250 |     NULL,                                  /* merge server configuration */
 251 | 
 252 |     NULL,                                  /* create location configuration */
 253 |     NULL                                   /* merge location configuration */
 254 | };
 255 | 
 256 | ngx_module_t  ngx_http_healthcheck_module = {
 257 |     NGX_MODULE_V1,
 258 |     &ngx_http_healthcheck_module_ctx,      /* module context */
 259 |     ngx_http_healthcheck_commands,         /* module directives */
 260 |     NGX_HTTP_MODULE,                       /* module type */
 261 |     NULL,                                  /* init master */
 262 |     NULL,                                  /* init module */
 263 |     ngx_http_healthcheck_procinit,         /* init process */
 264 |     NULL,                                  /* init thread */
 265 |     NULL,                                  /* exit thread */
 266 |     NULL,                                  /* exit process */
 267 |     NULL,                                  /* exit master */
 268 |     NGX_MODULE_V1_PADDING
 269 | };
 270 | 
 271 | 
 272 | void ngx_http_healthcheck_mark_finished(ngx_http_healthcheck_status_t *stat) {
 273 | #ifdef NGX_SUPERVISORD_MODULE
 274 |     ngx_http_upstream_rr_peers_t  *peers = stat->conf->peer.data;
 275 | #endif
 276 |     ngx_log_debug2(NGX_LOG_DEBUG_HTTP, stat->health_ev.log, 0,
 277 |             "healthcheck: Finished %V, state %d", &stat->peer->name,
 278 |             stat->state);
 279 |     if (stat->state == NGX_HEALTH_OK) {
 280 |         if (stat->shm->last_down) {
 281 |             stat->shm->last_down = 0;
 282 |             stat->shm->concurrent = 1;
 283 |             stat->shm->since = ngx_current_msec;
 284 | #ifdef NGX_SUPERVISORD_MODULE
 285 |             (void) ngx_supervisord_execute(stat->conf,
 286 |                                           NGX_SUPERVISORD_CMD_START,
 287 |                                           peers->peer[stat->index].onumber,
 288 |                                           NULL);
 289 | #endif
 290 |         } else {
 291 |             stat->shm->concurrent++;
 292 |         }
 293 |     } else {
 294 |         if (stat->shm->last_down) {
 295 |             stat->shm->concurrent++;
 296 |         } else {
 297 |             stat->shm->last_down = 1;
 298 |             stat->shm->concurrent = 1;
 299 |             stat->shm->since = ngx_current_msec;
 300 | #ifdef NGX_SUPERVISORD_MODULE
 301 |             (void) ngx_supervisord_execute(stat->conf,
 302 |                                           NGX_SUPERVISORD_CMD_STOP,
 303 |                                           peers->peer[stat->index].onumber,
 304 |                                           NULL);
 305 | #endif
 306 |         }
 307 |     }
 308 |     if (stat->shm->concurrent >= stat->conf->health_failcount) {
 309 |         stat->shm->down = stat->shm->last_down;
 310 |     }
 311 |     stat->shm->down_code = stat->state;
 312 |     ngx_close_connection(stat->pc->connection);
 313 |     stat->pc->connection = NULL;
 314 |     stat->state = NGX_HEALTH_WAITING;
 315 |     if (!ngx_terminate && !ngx_exiting && !ngx_quit) {
 316 |       ngx_add_timer(&stat->health_ev, stat->conf->health_delay);
 317 |     } else {
 318 |       ngx_http_healthcheck_clear_events(stat->health_ev.log);
 319 |     }
 320 |     stat->shm->action_time = ngx_current_msec;
 321 | }
 322 | 
 323 | void ngx_http_healthcheck_send_request(ngx_connection_t *);
 324 | 
 325 | void ngx_http_healthcheck_write_handler(ngx_event_t *wev) {
 326 |     ngx_connection_t        *c;
 327 | 
 328 |     c = wev->data;
 329 | 
 330 |     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, wev->log, 0,
 331 |             "healthcheck: Write handler called");
 332 | 
 333 |     ngx_http_healthcheck_send_request(c);
 334 | }
 335 | 
 336 | void ngx_http_healthcheck_send_request(ngx_connection_t *c) {
 337 |     ngx_http_healthcheck_status_t *stat = c->data;
 338 |     ssize_t size;
 339 | 
 340 |     if (stat->state != NGX_HEALTH_SENDING_CHECK) {
 341 |         ngx_log_debug0(NGX_LOG_DEBUG_HTTP, c->log, 0,
 342 |                 "healthcheck: Ignoring a write.  Not in writing state");
 343 |         return;
 344 |     }
 345 | 
 346 |     do {
 347 |         size =
 348 |             c->send(c, stat->conf->health_send.data + stat->send_pos,
 349 |                     stat->conf->health_send.len - stat->send_pos);
 350 |         ngx_log_debug1(NGX_LOG_DEBUG_HTTP, c->log, 0,
 351 |                 "healthcheck: Send size %z", size);
 352 |         if (size == NGX_ERROR || size == 0) {
 353 |             // If the send fails, the connection is bad.  Close it out
 354 |             stat->state = NGX_HEALTH_BAD_CONN;
 355 |             ngx_http_healthcheck_mark_finished(stat);
 356 |             stat->shm->action_time = ngx_current_msec;
 357 |             break;
 358 |         } else if (size == NGX_AGAIN) {
 359 |             // I guess this means return and try again later
 360 |             break;
 361 |         } else {
 362 |             stat->shm->action_time = ngx_current_msec;
 363 |             stat->send_pos += size;
 364 |         }
 365 |     } while (stat->send_pos < (ssize_t)stat->conf->health_send.len);
 366 | 
 367 |     if (stat->send_pos > (ssize_t)stat->conf->health_send.len) {
 368 |         ngx_log_error(NGX_LOG_WARN, c->log, 0,
 369 |             "healthcheck: Logic error.  %z send pos bigger than buffer len %i",
 370 |                 stat->send_pos, stat->conf->health_send.len);
 371 |     } else if (stat->send_pos == (ssize_t)stat->conf->health_send.len) {
 372 |         ngx_log_debug0(NGX_LOG_DEBUG_HTTP, c->log, 0,
 373 |                 "healthcheck: Finished sending request");
 374 |         stat->state = NGX_HEALTH_READING_STAT_LINE;
 375 |     }
 376 | }
 377 | 
 378 | void ngx_http_healthcheck_read_handler(ngx_event_t *rev) {
 379 |     ngx_connection_t        *c;
 380 |     ngx_buf_t               *rb;
 381 |     ngx_int_t                rc;
 382 |     ssize_t size;
 383 |     ngx_http_healthcheck_status_t *stat;
 384 |     ngx_int_t                expect_finished;
 385 | 
 386 |     c = rev->data;
 387 |     stat = c->data;
 388 |     rb = stat->read_buffer;
 389 | 
 390 |     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, rev->log, 0,
 391 |             "healthcheck: Read handler called");
 392 | 
 393 |     stat->shm->action_time = ngx_current_msec;
 394 |     if (ngx_current_msec - stat->check_start_time >=
 395 |             stat->conf->health_timeout) {
 396 |         ngx_log_debug0(NGX_LOG_DEBUG_HTTP, rev->log, 0,
 397 |                 "healthcheck: timeout!");
 398 |         stat->state = NGX_HEALTH_TIMEOUT;
 399 |         ngx_http_healthcheck_mark_finished(stat);
 400 |         return;
 401 |     }
 402 |     expect_finished = 0;
 403 |     do {
 404 |         size = c->recv(c, rb->pos, rb->end - rb->pos);
 405 |         ngx_log_debug2(NGX_LOG_DEBUG_HTTP, rev->log, 0,
 406 |                 "healthcheck: Recv size %z when I wanted %O", size,
 407 |                 rb->end - rb->pos);
 408 |         if (size == NGX_ERROR) {
 409 |             // If the send fails, the connection is bad.  Close it out
 410 |             stat->state = NGX_HEALTH_BAD_CONN;
 411 |             break;
 412 |         } else if (size == NGX_AGAIN) {
 413 |             break;
 414 |         } else if (size == 0) {
 415 |             expect_finished = 1;
 416 |             break;
 417 |         } else {
 418 |             rb->pos += size;
 419 |         }
 420 |     } while (rb->pos < rb->end);
 421 | 
 422 |     if (stat->state != NGX_HEALTH_BAD_CONN) {
 423 |         rc = ngx_http_healthcheck_process_recv(stat);
 424 |         switch (rc) {
 425 |             case NGX_AGAIN:
 426 |                 if (expect_finished) {
 427 |                     stat->state = NGX_HEALTH_EARLY_CLOSE;
 428 |                     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, rev->log, 0,
 429 |                       "healthcheck: prematurely closed connection");
 430 |                 } else if (rb->end == rb->pos) {
 431 |                     // We used up our read buffer and STILL can't verify
 432 |                     stat->state = NGX_HEALTH_FULL_BUFFER;
 433 |                     ngx_http_healthcheck_mark_finished(stat);
 434 |                 }
 435 |                 // We want more data to see if the body is OK or not
 436 |                 break;
 437 |             case NGX_ERROR:
 438 |                 ngx_http_healthcheck_mark_finished(stat);
 439 |                 break;
 440 |             case NGX_OK:
 441 |                 ngx_http_healthcheck_mark_finished(stat);
 442 |                 break;
 443 |             default:
 444 |                 ngx_log_error(NGX_LOG_WARN, rev->log, 0,
 445 |                         "healthcheck: Unknown process_recv code %i", rc);
 446 |                 break;
 447 |         }
 448 |     } else {
 449 |         ngx_http_healthcheck_mark_finished(stat);
 450 |     }
 451 | }
 452 | 
 453 | static ngx_int_t ngx_http_healthcheck_process_recv(
 454 |         ngx_http_healthcheck_status_t *stat) {
 455 | 
 456 |     ngx_buf_t               *rb;
 457 |     u_char                   ch;
 458 |     ngx_str_t               *health_expected;
 459 | 
 460 |     rb = stat->read_buffer;
 461 |     health_expected = &stat->conf->health_expected;
 462 |     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, stat->health_ev.log, 0,
 463 |             "healthcheck: Process recv");
 464 | 
 465 |     while (rb->start + stat->read_pos < rb->pos) {
 466 |         ch = *(rb->start+stat->read_pos);
 467 |         stat->read_pos++;
 468 | #if 0
 469 |         // Useful for debugging
 470 |         ngx_log_debug2(NGX_LOG_DEBUG_HTTP, stat->health_ev.log, 0,
 471 |                 "healthcheck: CH %c state %d", ch, stat->state);
 472 | #endif
 473 |         switch (stat->state) {
 474 |             case NGX_HEALTH_READING_STAT_LINE:
 475 |                 // Look for regex '/ \d+/'
 476 |                 if (ch == ' ') {
 477 |                     stat->state = NGX_HEALTH_READING_STAT_CODE;
 478 |                     stat->stat_code = 0;
 479 |                 } else if (ch == '\r' || ch == '\n') {
 480 |                     stat->state = NGX_HEALTH_BAD_STATUS;
 481 |                     return NGX_ERROR;
 482 |                 }
 483 |                 break;
 484 |             case NGX_HEALTH_READING_STAT_CODE:
 485 |                 if (ch == ' ') {
 486 |                     if (stat->stat_code != NGX_HTTP_OK /*200*/) {
 487 |                         stat->state = NGX_HEALTH_BAD_CODE;
 488 |                         return NGX_ERROR;
 489 |                     } else {
 490 |                         stat->state = NGX_HEALTH_READING_HEADER;
 491 |                     }
 492 |                 } else if (ch < '0' || ch > '9') {
 493 |                     stat->state = NGX_HEALTH_BAD_STATUS;
 494 |                     return NGX_ERROR;
 495 |                 } else {
 496 |                     stat->stat_code = stat->stat_code * 10 + (ch - '0');
 497 |                 }
 498 |                 break;
 499 |             case NGX_HEALTH_READING_HEADER:
 500 |                 if (ch == '\n') {
 501 |                     stat->state = NGX_HEALTH_HEADER_ALMOST_DONE;
 502 |                 }
 503 |                 break;
 504 |             case NGX_HEALTH_HEADER_ALMOST_DONE:
 505 |                 if (ch == '\n') {
 506 |                     if (health_expected->len == NGX_CONF_UNSET_SIZE) {
 507 |                         stat->state = NGX_HEALTH_OK;
 508 |                         return NGX_OK;
 509 |                     } else {
 510 |                         stat->state = NGX_HEALTH_READING_BODY;
 511 |                     }
 512 |                 } else if (ch != '\r') {
 513 |                     stat->state = NGX_HEALTH_READING_HEADER;
 514 |                 }
 515 |                 break;
 516 |             case NGX_HEALTH_READING_BODY:
 517 |                 if (stat->body_read_pos == (ssize_t)health_expected->len) {
 518 |                     // Body was ok, but is now too long
 519 |                     stat->state = NGX_HEALTH_BAD_BODY;
 520 |                     return NGX_ERROR;
 521 |                 } else if (ch != health_expected->data[stat->body_read_pos]) {
 522 |                     // Body was actually bad
 523 |                     stat->state = NGX_HEALTH_BAD_BODY;
 524 |                     return NGX_ERROR;
 525 |                 } else {
 526 |                     stat->body_read_pos++;
 527 |                 }
 528 |                 break;
 529 |             default:
 530 |                 ngx_log_error(NGX_LOG_CRIT, stat->health_ev.log, 0,
 531 |                   "healthcheck: Logic error.  Invalid state: %d",
 532 |                   stat->state);
 533 |                 stat->state = NGX_HEALTH_BAD_STATE;
 534 |                 return NGX_ERROR;
 535 |         }
 536 |     }
 537 |     if (stat->state == NGX_HEALTH_READING_BODY &&
 538 |           stat->body_read_pos == (ssize_t)health_expected->len) {
 539 |         stat->state = NGX_HEALTH_OK;
 540 |         return NGX_OK;
 541 |     } else if (stat->state == NGX_HEALTH_OK) {
 542 |         return NGX_OK;
 543 |     } else {
 544 |         return NGX_AGAIN;
 545 |     }
 546 | }
 547 | 
 548 | static void ngx_http_healthcheck_begin_healthcheck(ngx_event_t *event) {
 549 |     ngx_http_healthcheck_status_t * stat;
 550 |     ngx_connection_t        *c;
 551 |     ngx_int_t rc;
 552 | 
 553 |     stat = event->data;
 554 |     if (stat->state != NGX_HEALTH_WAITING) {
 555 |         ngx_log_error(NGX_LOG_WARN, event->log, 0,
 556 |                 "healthcheck: State not waiting, is %d", stat->state);
 557 |     }
 558 |     ngx_log_debug1(NGX_LOG_DEBUG_HTTP, event->log, 0,
 559 |             "healthcheck: begun healthcheck of index %i", stat->index);
 560 | 
 561 |     ngx_memzero(stat->pc, sizeof(ngx_peer_connection_t));
 562 |     ngx_log_debug1(NGX_LOG_DEBUG_HTTP, event->log, 0,
 563 |             "healthcheck: Memzero done", stat->index);
 564 | 
 565 |     stat->pc->get = ngx_event_get_peer;
 566 | 
 567 |     stat->pc->sockaddr = stat->peer->sockaddr;
 568 |     stat->pc->socklen = stat->peer->socklen;
 569 |     stat->pc->name = &stat->peer->name;
 570 | 
 571 |     stat->pc->log = event->log;
 572 |     stat->pc->log_error = NGX_ERROR_ERR; // Um I guess (???)
 573 | 
 574 |     stat->pc->cached = 0;
 575 |     stat->pc->connection = NULL;
 576 |     ngx_log_debug1(NGX_LOG_DEBUG_HTTP, event->log, 0,
 577 |             "healthcheck: Connecting peer", stat->index);
 578 | 
 579 |     rc = ngx_event_connect_peer(stat->pc);
 580 |     if (rc == NGX_ERROR || rc == NGX_BUSY || rc == NGX_DECLINED) {
 581 |       ngx_log_error(NGX_LOG_CRIT, event->log, 0,
 582 |         "healthcheck: Could not connect to peer.  This is"
 583 |         " pretty bad and probably means your health checks won't"
 584 |         " work anymore: %i", rc);
 585 |       if (stat->pc->connection) {
 586 |           ngx_close_connection(stat->pc->connection);
 587 |       }
 588 |       // Try to do it again later, but if you're getting errors when you
 589 |       // try to connect to a peer, this probably won't work
 590 |       ngx_add_timer(&stat->health_ev, stat->conf->health_delay);
 591 |       return;
 592 |     }
 593 |     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, event->log, 0,
 594 |       "healthcheck: connected so far");
 595 | 
 596 | 
 597 |     c = stat->pc->connection;
 598 |     c->data = stat;
 599 |     c->log = stat->pc->log;
 600 |     c->write->handler = ngx_http_healthcheck_write_handler;
 601 |     c->read->handler = ngx_http_healthcheck_read_handler;
 602 |     c->sendfile = 0;
 603 |     c->read->log = c->log;
 604 |     c->write->log = c->log;
 605 | 
 606 |     stat->state = NGX_HEALTH_SENDING_CHECK;
 607 |     stat->shm->action_time = ngx_current_msec;
 608 |     stat->read_pos = 0;
 609 |     stat->send_pos = 0;
 610 |     stat->body_read_pos = 0;
 611 |     stat->read_buffer->pos = stat->read_buffer->start;
 612 |     stat->read_buffer->last = stat->read_buffer->start;
 613 |     stat->check_start_time = ngx_current_msec;
 614 |     ngx_add_timer(c->read, stat->conf->health_timeout);
 615 |     ngx_log_debug1(NGX_LOG_DEBUG_HTTP, event->log, 0,
 616 |             "healthcheck: Peer connected", stat->index);
 617 | 
 618 |     ngx_http_healthcheck_send_request(c);
 619 | }
 620 | 
 621 | static void ngx_http_healthcheck_try_for_ownership(ngx_event_t *event) {
 622 |     ngx_http_healthcheck_status_t * stat;
 623 |     ngx_int_t                       i_own_it;
 624 | 
 625 |     stat = event->data;
 626 |     if (ngx_terminate || ngx_exiting || ngx_quit) {
 627 |       ngx_http_healthcheck_clear_events(stat->health_ev.log);
 628 |       return;
 629 |     }
 630 | 
 631 |     i_own_it = 0;
 632 |     //  nxg_time_update(0, 0);
 633 |     //  Spinlock.  So don't own for a long time!
 634 |     //  Use spinlock so two worker processes don't try to healthcheck the same
 635 |     //  peer
 636 |     ngx_spinlock(&stat->shm->lock, ngx_pid, 1024);
 637 |     if (stat->shm->owner == ngx_pid) {
 638 |         i_own_it = 1;
 639 |     } else if (ngx_current_msec - stat->shm->action_time >=
 640 |                (stat->conf->health_delay + stat->conf->health_timeout) * 3) {
 641 |         stat->shm->owner = ngx_pid;
 642 |         stat->shm->action_time = ngx_current_msec;
 643 |         stat->state = NGX_HEALTH_WAITING;
 644 |         ngx_http_healthcheck_begin_healthcheck(&stat->health_ev);
 645 |         i_own_it = 1;
 646 |     }
 647 |     if (!ngx_atomic_cmp_set(&stat->shm->lock, ngx_pid, 0)) {
 648 |         ngx_log_error(NGX_LOG_CRIT, event->log, 0,
 649 |           "healthcheck: spinlock didn't work.  Should be %P, but isn't",
 650 |           ngx_pid);
 651 | 
 652 |         stat->shm->lock = 0;
 653 |     }
 654 |     if (!i_own_it) {
 655 |       // Try again for ownership later in case the guy that DOES own it dies or
 656 |       // something
 657 |         ngx_add_timer(&stat->ownership_ev, 5000);
 658 |     }
 659 | }
 660 | 
 661 | void ngx_http_healthcheck_clear_events(ngx_log_t *log) {
 662 |     ngx_uint_t i;
 663 |     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, log, 0,
 664 |             "healthcheck: Clearing events");
 665 | 
 666 |     //  Note: From what I can tell it is safe to ngx_del_timer events
 667 |     //  that are not in the event tree
 668 |     for (i=0; i<ngx_http_healthchecks_arr->nelts; i++) {
 669 |         ngx_del_timer(&ngx_http_healthchecks[i].health_ev);
 670 |         ngx_del_timer(&ngx_http_healthchecks[i].ownership_ev);
 671 |     }
 672 | }
 673 | 
 674 | static ngx_int_t ngx_http_healthcheck_procinit(ngx_cycle_t *cycle) {
 675 |     ngx_uint_t i;
 676 |     ngx_msec_t t;
 677 | 
 678 |     if (ngx_http_healthchecks_arr->nelts == 0) {
 679 |       return NGX_OK;
 680 |     }
 681 | 
 682 |      // Otherwise, the distribution isn't very random because each process
 683 |      // is a fork, so they all have the same seed
 684 |     srand(ngx_pid);
 685 |     ngx_log_debug1(NGX_LOG_DEBUG_HTTP, cycle->log, 0,
 686 |             "healthcheck: Adding events to worker process %P", ngx_pid);
 687 |     for (i=0; i<ngx_http_healthchecks_arr->nelts; i++) {
 688 |         ngx_http_healthchecks[i].shm = &ngx_http_healthchecks_shm[i];
 689 | 
 690 |         if (ngx_http_healthchecks[i].conf->healthcheck_enabled) {
 691 | 
 692 |             ngx_http_healthchecks[i].ownership_ev.handler =
 693 |                 ngx_http_healthcheck_try_for_ownership;
 694 |             ngx_http_healthchecks[i].ownership_ev.log = cycle->log;
 695 |             ngx_http_healthchecks[i].ownership_ev.data =
 696 |                 &ngx_http_healthchecks[i];
 697 |             // I'm not sure why the timer_set needs to be reset to zero.
 698 |             // It shouldn't (??), but it does when you HUP the process
 699 |             ngx_http_healthchecks[i].ownership_ev.timer_set = 0;
 700 | 
 701 |             ngx_http_healthchecks[i].health_ev.handler =
 702 |                 ngx_http_healthcheck_begin_healthcheck;
 703 |             ngx_http_healthchecks[i].health_ev.log = cycle->log;
 704 |             ngx_http_healthchecks[i].health_ev.data =
 705 |                 &ngx_http_healthchecks[i];
 706 |             ngx_http_healthchecks[i].health_ev.timer_set = 0;
 707 | 
 708 |             t = abs(ngx_random() % ngx_http_healthchecks[i].conf->health_delay);
 709 |             ngx_add_timer(&ngx_http_healthchecks[i].ownership_ev, t);
 710 |         }
 711 |     }
 712 |     return NGX_OK;
 713 | }
 714 | 
 715 | static ngx_int_t ngx_http_healthcheck_preconfig(ngx_conf_t *cf) {
 716 |     ngx_http_healthchecks_arr = ngx_array_create(cf->pool, 10,
 717 |             sizeof(ngx_http_healthcheck_status_t));
 718 |     if (ngx_http_healthchecks_arr == NULL) {
 719 |         return NGX_ERROR;
 720 |     }
 721 |     return NGX_OK;
 722 | }
 723 | 
 724 | static ngx_int_t ngx_http_healthcheck_init(ngx_conf_t *cf) {
 725 |     ngx_str_t        *shm_name;
 726 |     ngx_shm_zone_t   *shm_zone;
 727 |     ngx_uint_t         i;
 728 |     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, cf->log, 0,
 729 |       "healthcheck: healthcheck_init");
 730 | 
 731 |     if (ngx_http_healthchecks_arr->nelts == 0) {
 732 |       ngx_http_healthchecks_shm = NULL;
 733 |       return NGX_OK;
 734 |     }
 735 | 
 736 |     shm_name = ngx_palloc(cf->pool, sizeof *shm_name);
 737 |     shm_name->len = sizeof("http_healthcheck") - 1;
 738 |     shm_name->data = (unsigned char *) "http_healthcheck";
 739 | 
 740 |     // I guess a page each is good enough (?)
 741 |     shm_zone = ngx_shared_memory_add(cf, shm_name,
 742 |             ngx_pagesize * (ngx_http_healthchecks_arr->nelts + 1),
 743 |             &ngx_http_healthcheck_module);
 744 | 
 745 |     if (shm_zone == NULL) {
 746 |         return NGX_ERROR;
 747 |     }
 748 |     shm_zone->init = ngx_http_healthcheck_init_zone;
 749 | 
 750 |     for (i=0; i<ngx_http_healthchecks_arr->nelts; i++) {
 751 |       // It says 'temp', but it should last forever-ish
 752 |       ngx_http_healthchecks[i].read_buffer = ngx_create_temp_buf(cf->pool,
 753 |           ngx_http_healthchecks[i].conf->health_buffersize);
 754 |       if (ngx_http_healthchecks[i].read_buffer  == NULL) {
 755 |           return NGX_ERROR;
 756 |       }
 757 |     }
 758 | 
 759 |     return NGX_OK;
 760 | }
 761 | 
 762 | static ngx_int_t
 763 | ngx_http_healthcheck_init_zone(ngx_shm_zone_t *shm_zone, void *data) {
 764 |     ngx_uint_t                       i;
 765 |     ngx_slab_pool_t                *shpool;
 766 | 
 767 |     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, shm_zone->shm.log, 0,
 768 |       "healthcheck: Init zone");
 769 | 
 770 |     // If we're being HUP'd, I can't just use the same 'data' segment because
 771 |     // the number of servers may of changed.  Instead, I need to recreate a
 772 |     // slab
 773 | 
 774 |     shpool = (ngx_slab_pool_t *) shm_zone->shm.addr;
 775 | 
 776 |     ngx_http_healthchecks_shm = ngx_slab_alloc(shpool,
 777 |             (sizeof (ngx_http_healthcheck_status_shm_t)) *
 778 |             ngx_http_healthchecks_arr->nelts);
 779 |     if (ngx_http_healthchecks_shm == NULL) {
 780 |         return NGX_ERROR;
 781 |     }
 782 |     for (i=0; i<ngx_http_healthchecks_arr->nelts; i++) {
 783 |         ngx_http_healthchecks_shm[i].index = i;
 784 |         ngx_http_healthchecks_shm[i].action_time = 0;
 785 |         ngx_http_healthchecks_shm[i].down = 0;
 786 |         ngx_http_healthchecks_shm[i].since = ngx_current_msec;
 787 |     }
 788 |     shm_zone->data = ngx_http_healthchecks_shm;
 789 | 
 790 |     return NGX_OK;
 791 | }
 792 | 
 793 | 
 794 | // --- BEGIN PUBLIC METHODS ---
 795 | ngx_int_t
 796 | ngx_http_healthcheck_add_peer(ngx_http_upstream_srv_conf_t *uscf,
 797 | #if defined(nginx_version) && nginx_version >= 8022
 798 |         ngx_addr_t *peer, ngx_pool_t *pool) {
 799 | #else
 800 |         ngx_peer_addr_t *peer, ngx_pool_t *pool) {
 801 | #endif
 802 |     ngx_http_healthcheck_status_t *status;
 803 |     status = ngx_array_push(ngx_http_healthchecks_arr);
 804 |     if (status == NULL) {
 805 |         return NGX_ERROR;
 806 |     }
 807 |     status->conf = uscf;
 808 |     status->peer = peer;
 809 |     status->index = ngx_http_healthchecks_arr->nelts - 1;
 810 |     status->pc = ngx_pcalloc(pool, sizeof(ngx_peer_connection_t));
 811 |     if (status->pc == NULL) {
 812 |         return NGX_ERROR;
 813 |     }
 814 |     return ngx_http_healthchecks_arr->nelts - 1;
 815 | }
 816 | 
 817 | ngx_int_t ngx_http_healthcheck_is_down(ngx_uint_t index, ngx_log_t *log) {
 818 |     if (index >= ngx_http_healthchecks_arr->nelts) {
 819 |         ngx_log_error(NGX_LOG_CRIT, log, 0,
 820 |             "healthcheck: Invalid index to is_down: %i", index);
 821 |         return 0;
 822 |     } else {
 823 |         return ngx_http_healthchecks[index].conf->healthcheck_enabled &&
 824 |           ngx_http_healthchecks[index].shm->down;
 825 |     }
 826 | }
 827 | // --- END PUBLIC METHODS ---
 828 | 
 829 | // Health status page
 830 | static char* ngx_http_healthcheck_statestr(
 831 |     ngx_http_health_state state) {
 832 |     switch (state) {
 833 |         case NGX_HEALTH_OK:
 834 |             return  "OK";
 835 |         case NGX_HEALTH_BAD_HEADER:
 836 |             return "Malformed header";
 837 |         case NGX_HEALTH_BAD_STATUS:
 838 |             return "Bad status line.  Maybe not HTTP";
 839 |         case NGX_HEALTH_BAD_BODY:
 840 |             return "Bad HTTP body contents";
 841 |         case NGX_HEALTH_BAD_STATE:
 842 |             return "Internal error.  Bad healthcheck state";
 843 |         case NGX_HEALTH_BAD_CONN:
 844 |             return "Error reading contents.  Bad connection";
 845 |         case NGX_HEALTH_BAD_CODE:
 846 |             return "Non 200 HTTP status code";
 847 |         case NGX_HEALTH_TIMEOUT:
 848 |             return "Healthcheck timed out";
 849 |         case NGX_HEALTH_FULL_BUFFER:
 850 |             return "Contents could not fit read buffer";
 851 |         case NGX_HEALTH_EARLY_CLOSE:
 852 |             return "Connection closed early";
 853 |         default:
 854 |             return "Unknown state";
 855 |   }
 856 | }
 857 | 
 858 | ngx_buf_t* ngx_http_healthcheck_buf_append(ngx_buf_t *dst, ngx_buf_t *src,
 859 |     ngx_pool_t *pool) {
 860 |   //TODO: Consider using a buffer chain
 861 |   ngx_buf_t *new_buf;
 862 |     if (dst->last + (src->last - src->pos) > dst->end) {
 863 |         new_buf = ngx_create_temp_buf(pool, ((dst->last - dst->pos) + (src->last - src->pos)) * 2 + 1);
 864 |         if (new_buf == NULL) {
 865 |             return NULL;
 866 |         }
 867 |         ngx_memcpy(new_buf->last, dst->pos, (dst->last - dst->pos));
 868 |         new_buf->last += (dst->last - dst->pos);
 869 |         // TODO: I don't think there's a way to uncreate the dst buffer (??)
 870 |         // Should be ok because these are small and cleared at the end of
 871 |         // the status request
 872 |         dst = new_buf;
 873 |     }
 874 |     ngx_memcpy(dst->last, src->pos, (src->last - src->pos));
 875 |     dst->last += (src->last - src->pos);
 876 |     return dst;
 877 | }
 878 | 
 879 | #define NGX_HEALTH_APPEND_CHECK(dst, src, pool) \
 880 |     do { \
 881 |         dst = ngx_http_healthcheck_buf_append(b, tmp, pool); \
 882 |         if (dst == NULL) { \
 883 |             return NGX_HTTP_INTERNAL_SERVER_ERROR; \
 884 |         } \
 885 |     } while (0);
 886 | 
 887 | static ngx_int_t ngx_http_healthcheck_status_handler(ngx_http_request_t *r) {
 888 |     ngx_int_t          rc;
 889 |     ngx_buf_t         *b, *tmp;
 890 |     ngx_chain_t        out;
 891 |     ngx_uint_t i;
 892 |     ngx_http_healthcheck_status_t *stat;
 893 |     ngx_http_healthcheck_status_shm_t *shm;
 894 |     if (r->method != NGX_HTTP_GET && r->method != NGX_HTTP_HEAD) {
 895 |         return NGX_HTTP_NOT_ALLOWED;
 896 |     }
 897 | 
 898 |     rc = ngx_http_discard_request_body(r);
 899 | 
 900 |     if (rc != NGX_OK) {
 901 |         return rc;
 902 |     }
 903 | 
 904 |     ngx_str_t str_tmp = ngx_string("text/html; charset=utf-8");
 905 |     r->headers_out.content_type = str_tmp;
 906 | 
 907 |     if (r->method == NGX_HTTP_HEAD) {
 908 |         r->headers_out.status = NGX_HTTP_OK;
 909 | 
 910 |         rc = ngx_http_send_header(r);
 911 | 
 912 |         if (rc == NGX_ERROR || rc > NGX_OK || r->header_only) {
 913 |             return rc;
 914 |         }
 915 |     }
 916 | 
 917 |     b = ngx_create_temp_buf(r->pool, 10);
 918 |     tmp = ngx_create_temp_buf(r->pool, 1000);
 919 |     if (b == NULL || tmp == NULL) {
 920 |         return NGX_HTTP_INTERNAL_SERVER_ERROR;
 921 |     }
 922 | 
 923 |     tmp->last = ngx_snprintf(tmp->pos, tmp->end - tmp->pos,
 924 |         "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\n"
 925 |         "\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n"
 926 |         "<html xmlns=\"http://www.w3.org/1999/xhtml\">\n"
 927 |         "<head>\n"
 928 |         "  <title>NGINX Healthcheck status</title>\n"
 929 |         "</head>\n"
 930 |         "<body>\n"
 931 |         "<table border=\"1\">\n"
 932 |         "  <tr>\n"
 933 |         "    <th>Index</th>\n"
 934 |         "    <th>Name</th>\n"
 935 |         "    <th>Owner PID</th>\n"
 936 |         "    <th>Last action time</th>\n"
 937 |         "    <th>Concurrent status values</th>\n"
 938 |         "    <th>Time of concurrent values</th>\n"
 939 |         "    <th>Last response down</th>\n"
 940 |         "    <th>Last health status</th>\n"
 941 |         "    <th>Is down?</th>\n"
 942 |         "  </tr>\n");
 943 | 
 944 |     NGX_HEALTH_APPEND_CHECK(b, tmp, (r->pool));
 945 | 
 946 |     for (i=0; i<ngx_http_healthchecks_arr->nelts; i++) {
 947 |       stat = &ngx_http_healthchecks[i];
 948 |       shm  = stat->shm;
 949 | 
 950 |       tmp->last = ngx_snprintf(tmp->pos, tmp->end - tmp->pos,
 951 |         "  <tr>\n"
 952 |         "    <td>%i</td>\n" // Index
 953 |         "    <td>%V</td>\n" // Name
 954 |         "    <td>%P</td>\n" // PID
 955 |         "    <td>%M</td>\n" // action time
 956 |         "    <td>%i</td>\n" // concurrent status values
 957 |         "    <td>%M</td>\n" // Time concurrent
 958 |         "    <td>%d</td>\n" // Last response down?
 959 |         "    <td>%s</td>\n" // Code of last response
 960 |         "    <td>%A</td>\n" // Is down?
 961 |         "  </tr>\n", stat->index, &stat->peer->name, shm->owner,
 962 |                      shm->action_time, shm->concurrent,
 963 |                      shm->since, (int)shm->last_down,
 964 |                      ngx_http_healthcheck_statestr(shm->down_code),
 965 |                      shm->down);
 966 |       NGX_HEALTH_APPEND_CHECK(b, tmp, r->pool);
 967 |     }
 968 | 
 969 |     tmp->last = ngx_snprintf(tmp->pos, tmp->end - tmp->pos,
 970 |         "</table>\n"
 971 |         "</body>\n"
 972 |         "</html>\n");
 973 |     NGX_HEALTH_APPEND_CHECK(b, tmp, r->pool);
 974 | 
 975 |     r->headers_out.status = NGX_HTTP_OK;
 976 |     r->headers_out.content_length_n = b->last - b->pos;
 977 | 
 978 |     b->last_buf = 1;
 979 |     out.buf = b;
 980 |     out.next = NULL;
 981 | 
 982 |     rc = ngx_http_send_header(r);
 983 | 
 984 |     if (rc == NGX_ERROR || rc > NGX_OK || r->header_only) {
 985 |         return rc;
 986 |     }
 987 | 
 988 |     return ngx_http_output_filter(r, &out);
 989 | }
 990 | #undef NGX_HEALTH_APPEND_CHECK
 991 | // end health status page
 992 | 
 993 | //
 994 | //
 995 | // BEGIN THE BORING PART: Setting config variables
 996 | //
 997 | //
 998 | 
 999 | static char* ngx_http_healthcheck_enabled(ngx_conf_t *cf, ngx_command_t *cmd,
1000 |         void *conf) {
1001 |     ngx_http_upstream_srv_conf_t  *uscf;
1002 |     uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);
1003 |     uscf->healthcheck_enabled = 1;
1004 |     return NGX_CONF_OK;
1005 | }
1006 | 
1007 | static char* ngx_http_healthcheck_delay(ngx_conf_t *cf, ngx_command_t *cmd,
1008 |         void *conf) {
1009 |     ngx_http_upstream_srv_conf_t  *uscf;
1010 |     ngx_str_t *value;
1011 |     value = cf->args->elts;
1012 | 
1013 |     uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);
1014 |     uscf->health_delay = (ngx_uint_t)ngx_atoi(value[1].data, value[1].len);
1015 |     if (uscf->health_delay == NGX_ERROR) {
1016 |         return "Invalid healthcheck delay";
1017 |     }
1018 |     return NGX_CONF_OK;
1019 | }
1020 | static char* ngx_http_healthcheck_timeout(ngx_conf_t *cf, ngx_command_t *cmd,
1021 |         void *conf) {
1022 |     ngx_http_upstream_srv_conf_t  *uscf;
1023 |     ngx_str_t *value;
1024 |     value = cf->args->elts;
1025 | 
1026 |     uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);
1027 |     uscf->health_timeout = ngx_atoi(value[1].data, value[1].len);
1028 |     if (uscf->health_timeout == (ngx_msec_t)NGX_ERROR) {
1029 |         return "Invalid healthcheck timeout ";
1030 |     }
1031 |     return NGX_CONF_OK;
1032 | }
1033 | static char* ngx_http_healthcheck_failcount(ngx_conf_t *cf, ngx_command_t *cmd,
1034 |         void *conf) {
1035 |     ngx_http_upstream_srv_conf_t  *uscf;
1036 |     ngx_str_t *value;
1037 |     value = cf->args->elts;
1038 | 
1039 |     uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);
1040 |     uscf->health_failcount = ngx_atoi(value[1].data, value[1].len);
1041 |     if (uscf->health_failcount == NGX_ERROR) {
1042 |         return "Invalid healthcheck failcount";
1043 |     }
1044 |     return NGX_CONF_OK;
1045 | }
1046 | static char* ngx_http_healthcheck_send(ngx_conf_t *cf, ngx_command_t *cmd,
1047 |         void *conf) {
1048 |     ngx_http_upstream_srv_conf_t  *uscf;
1049 |     ngx_str_t *value;
1050 |     ngx_int_t  num;
1051 |     int i;
1052 |     uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);
1053 |     value = cf->args->elts;
1054 |     num = cf->args->nelts;
1055 |     uscf->health_send.len  = 0;
1056 |     size_t at;
1057 |     uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);
1058 |     for (i = 1; i<num; i++) {
1059 |         if (i !=1) {
1060 |             uscf->health_send.len += 2; // \r\n
1061 |         }
1062 |         uscf->health_send.len += value[i].len;
1063 |     }
1064 |     uscf->health_send.len += (sizeof(CRLF) - 1) * 2;
1065 |     uscf->health_send.data = ngx_pnalloc(cf->pool, uscf->health_send.len + 1);
1066 |     if (uscf->health_send.data == NULL) {
1067 |         return "Unable to alloc data to send";
1068 |     }
1069 |     at = 0;
1070 |     for (i = 1; i<num; i++) {
1071 |         if (i !=1) {
1072 |             ngx_memcpy(uscf->health_send.data + at, CRLF, sizeof(CRLF) - 1);
1073 |             at += sizeof(CRLF) - 1;
1074 |         }
1075 |         ngx_memcpy(uscf->health_send.data + at, value[i].data, value[i].len);
1076 |         at += value[i].len;
1077 |     }
1078 |     ngx_memcpy(uscf->health_send.data + at, CRLF CRLF, (sizeof(CRLF) - 1) * 2);
1079 |     at += (sizeof(CRLF) - 1) * 2;
1080 |     uscf->health_send.data[at] = 0;
1081 |     if (at != uscf->health_send.len) {
1082 |         return "healthcheck: Logic error.  Length doesn't match";
1083 |     }
1084 | 
1085 |     return NGX_CONF_OK;
1086 | }
1087 | 
1088 | static char* ngx_http_healthcheck_expected(ngx_conf_t *cf, ngx_command_t *cmd,
1089 |         void *conf) {
1090 |     ngx_http_upstream_srv_conf_t  *uscf;
1091 |     ngx_str_t *value;
1092 |     value = cf->args->elts;
1093 | 
1094 |     uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);
1095 |     uscf->health_expected.data = value[1].data;
1096 |     uscf->health_expected.len  = value[1].len;
1097 | 
1098 |     return NGX_CONF_OK;
1099 | }
1100 | 
1101 | static char* ngx_http_healthcheck_buffer(ngx_conf_t *cf, ngx_command_t *cmd,
1102 |         void *conf) {
1103 |     ngx_http_upstream_srv_conf_t  *uscf;
1104 |     ngx_str_t *value;
1105 |     value = cf->args->elts;
1106 | 
1107 |     uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);
1108 |     uscf->health_buffersize = ngx_atoi(value[1].data, value[1].len);
1109 |     if (uscf->health_buffersize == NGX_ERROR) {
1110 |         return "Invalid healthcheck buffer size";
1111 |     }
1112 |     return NGX_CONF_OK;
1113 | }
1114 | 
1115 | 
1116 | static char* ngx_http_set_healthcheck_status(ngx_conf_t *cf, ngx_command_t *cmd,
1117 |       void*conf) {
1118 | 
1119 |     ngx_http_core_loc_conf_t  *clcf;
1120 | 
1121 |     clcf = ngx_http_conf_get_module_loc_conf(cf, ngx_http_core_module);
1122 |     clcf->handler = ngx_http_healthcheck_status_handler;
1123 | 
1124 |     return NGX_CONF_OK;
1125 | }
1126 | 
1127 | #undef ngx_http_healthchecks
1128 | 


--------------------------------------------------------------------------------
/ngx_http_healthcheck_module.h:
--------------------------------------------------------------------------------
 1 | #ifndef _NGX_HEALTHCHECK_MODULE_H_
 2 | #define _NGX_HEALTHCHECK_MODULE_H_
 3 | 
 4 | #include <ngx_core.h>
 5 | #include <ngx_http.h>
 6 | #include <nginx.h>
 7 | 
 8 | // I don't define everything here, just the stuff external users will
 9 | // want to call
10 | 
11 | /**
12 |  * Add a peer for healthchecking
13 |  *
14 |  * @param uscf The upstream the peer belongs to
15 |  * @param peer The peer to check
16 |  * @param pool Pool of memory to create peer checking data from
17 |  *
18 |  * @return Integer identifier for this healthcheck or NGX_ERROR if stuff
19 |  * went bad.
20 |  */
21 | #if defined(nginx_version) && nginx_version >= 8022
22 | ngx_int_t ngx_http_healthcheck_add_peer(ngx_http_upstream_srv_conf_t *uscf,
23 |     ngx_addr_t *peer, ngx_pool_t *pool);
24 | #else
25 | ngx_int_t ngx_http_healthcheck_add_peer(ngx_http_upstream_srv_conf_t *uscf,
26 |     ngx_peer_addr_t *peer, ngx_pool_t *pool);
27 | #endif
28 | 
29 | /**
30 |  * Check the health of a peer
31 |  *
32 |  * @param index Integer identifier index to check
33 |  * @param log Gets warning and error messages
34 |  * @return True if the given peer has failed its healthcheck
35 |  */
36 | ngx_int_t ngx_http_healthcheck_is_down(ngx_uint_t index, ngx_log_t *log);
37 | 
38 | #endif
39 | 


--------------------------------------------------------------------------------
/sample_ngx_config.conf:
--------------------------------------------------------------------------------
 1 | worker_processes 5;
 2 | #daemon off;
 3 | 
 4 | events {
 5 |   worker_connections 1000;
 6 | }
 7 | 
 8 | # Only if you want to see lots of spam
 9 | error_log log/error_log debug_http;
10 | 
11 | http {
12 | 
13 |   upstream test_upstreams {
14 |     server localhost:11114;
15 |     server localhost:11115;
16 |     hash $filename;
17 |     hash_again 10;
18 |     healthcheck_enabled;
19 |     healthcheck_delay 1000;
20 |     healthcheck_timeout 1000;
21 |     healthcheck_failcount 1;
22 |     # Important: There is no \n at the end of this.  Or \r.  Make sure you
23 |     # don't have a \n or \r or anything else at the end of your healthcheck
24 |     # response
25 |     healthcheck_expected 'I_AM_ALIVE';
26 |     # Important: HTTP/1.0
27 |     healthcheck_send "GET /health HTTP/1.0" 'Host: www.mysite.com';
28 |     # Optional supervisord module support
29 |     #supervisord none;
30 |     #supervisord_inherit_backend_status;
31 |   }
32 | 
33 |   server {
34 |     listen 11114;
35 |     location / {
36 |       root html_11114;
37 |     }
38 |   }
39 |   server {
40 |     listen 11115;
41 |     location / {
42 |       root html_11115;
43 |     }
44 |   }
45 | 
46 |   server {
47 |     listen 81;
48 | 
49 |     location / {
50 |       set $filename $request_uri;
51 |       if ($request_uri ~* ".*/(.*)") {
52 |         set $filename $1;
53 |       }
54 |       proxy_set_header Host $http_host;
55 |       proxy_pass http://test_upstreams;
56 |       proxy_connect_timeout 3;
57 |     }
58 |     location /stat {
59 |       healthcheck_status;
60 |     }
61 |   }
62 | }
63 | 


--------------------------------------------------------------------------------