├── Setup.hs
├── bin
├── make-plots
├── reload
├── test
├── setup-qdiscs
├── plotBaseline.p
└── plotReloading.p
├── test
└── Spec.hs
├── baseline.png
├── reloading.png
├── .gitignore
├── stack.yaml
├── LICENSE
├── reuse-port-example.cabal
├── reload
└── Main.hs
├── Vagrantfile
├── README.md
└── src
└── Main.lhs
/Setup.hs:
--------------------------------------------------------------------------------
1 | import Distribution.Simple
2 | main = defaultMain
3 |
--------------------------------------------------------------------------------
/bin/make-plots:
--------------------------------------------------------------------------------
1 | gnuplot bin/plotBaseline.p
2 | gnuplot bin/plotReloading.p
3 |
--------------------------------------------------------------------------------
/test/Spec.hs:
--------------------------------------------------------------------------------
1 | main :: IO ()
2 | main = putStrLn "Test suite not yet implemented"
3 |
--------------------------------------------------------------------------------
/baseline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jfischoff/reuse-port-example/HEAD/baseline.png
--------------------------------------------------------------------------------
/reloading.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jfischoff/reuse-port-example/HEAD/reloading.png
--------------------------------------------------------------------------------
/bin/reload:
--------------------------------------------------------------------------------
1 |
2 | while [ 1 ]; do
3 | .stack-work/install/x86_64-linux/lts-7.3/8.0.1/bin/reload .stack-work/install/x86_64-linux/lts-7.3/8.0.1/bin/reuse-server
4 | sleep 0.1
5 | done
6 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .stack-work
2 | reloading.tsv
3 | *.log
4 | baseline.tsv
5 | .DS_Store
6 | .stack
7 | error.log
8 | .cabal-sandbox/
9 | cabal.sandbox.config
10 | dist
11 | *.hp
12 | *.ps
13 | *.prof
14 | *.aux
15 | .vagrant
16 | ~*
17 | *~
18 | .osx-cabal-sandbox/
19 |
--------------------------------------------------------------------------------
/bin/test:
--------------------------------------------------------------------------------
1 | stack build
2 |
3 | sudo ./bin/setup-qdiscs
4 |
5 | stack exec reuse-server > /dev/null &
6 | server=$!
7 |
8 | trap "kill $server || true" SIGTERM SIGINT
9 |
10 | while ! nc -q 1 localhost 7000 /dev/null &
17 | reloader=$!
18 |
19 | trap "kill $reloader || true" SIGTERM SIGINT
20 |
21 | while ! nc -q 1 localhost 7000 = 1.0.0
25 |
26 | # Override the architecture used by stack, especially useful on Windows
27 | # arch: i386
28 | # arch: x86_64
29 |
30 | # Extra directories used by stack for building
31 | # extra-include-dirs: [/path/to/dir]
32 | # extra-lib-dirs: [/path/to/dir]
33 |
34 | # Allow a newer minor version of GHC than the snapshot specifies
35 | # compiler-check: newer-minor
36 | # compiler: ghc-8.0
37 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright Author name here (c) 2016
2 |
3 | All rights reserved.
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are met:
7 |
8 | * Redistributions of source code must retain the above copyright
9 | notice, this list of conditions and the following disclaimer.
10 |
11 | * Redistributions in binary form must reproduce the above
12 | copyright notice, this list of conditions and the following
13 | disclaimer in the documentation and/or other materials provided
14 | with the distribution.
15 |
16 | * Neither the name of Author name here nor the names of other
17 | contributors may be used to endorse or promote products derived
18 | from this software without specific prior written permission.
19 |
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
21 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
22 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
23 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
24 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
25 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
26 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
27 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
28 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
29 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
30 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--------------------------------------------------------------------------------
/reuse-port-example.cabal:
--------------------------------------------------------------------------------
1 | name: reuse-port-example
2 | version: 0.1.0.0
3 | synopsis: REUSE_PORT example
4 | description: A simple example of using REUSE_PORT for zero downtime deploys
5 | homepage: https://github.com/jfischoff/reuse-port-example#readme
6 | license: BSD3
7 | license-file: LICENSE
8 | author: Jonathan Fischoff
9 | maintainer: Jonathan Fischoff
10 | copyright: 2016 Jonathan Fischoff
11 | category: Web
12 | build-type: Simple
13 | -- extra-source-files:
14 | cabal-version: >=1.10
15 |
16 | executable reload
17 | main-is: reload/Main.hs
18 | ghc-options: -Wall
19 | -fno-warn-unused-do-bind
20 | -threaded
21 | -rtsopts
22 | "-with-rtsopts=-N -I0 -qg"
23 | -pgmL markdown-unlit
24 | build-depends: base
25 | , process
26 | , unix
27 | , wreq
28 | , lens
29 | , bytestring
30 |
31 | default-language: Haskell2010
32 |
33 | executable reuse-server
34 | main-is: src/Main.lhs
35 | ghc-options: -Wall
36 | -fno-warn-unused-do-bind
37 | -threaded
38 | -rtsopts
39 | "-with-rtsopts=-N -I0 -qg"
40 | -pgmL markdown-unlit
41 | build-depends: base
42 | , warp
43 | , wai
44 | , http-types
45 | , unix
46 | , network
47 | , bytestring
48 | , markdown-unlit
49 | default-language: Haskell2010
50 |
--------------------------------------------------------------------------------
/reload/Main.hs:
--------------------------------------------------------------------------------
1 | {-# LANGUAGE OverloadedStrings #-}
2 | import Network.Wreq
3 | import Control.Lens
4 | import Control.Concurrent
5 | import Data.Function
6 | import System.Process
7 | import System.Process.Internals
8 | import Control.Monad (unless)
9 | import System.Posix.Types
10 | import Data.List
11 | import Data.ByteString.Lazy (ByteString)
12 | import qualified Data.ByteString.Lazy.Char8 as BSC
13 | import System.Posix.Signals
14 | import Data.Monoid ((<>))
15 | import System.Environment
16 | import Control.Exception
17 |
18 | pidToBS :: CPid -> ByteString
19 | pidToBS (CPid pid) = BSC.pack $ show pid
20 |
21 | waitForNewServer :: CPid -> IO ()
22 | waitForNewServer pid = fix $ \next -> do
23 | resp <- view responseBody <$> get "http://localhost:7000/"
24 | let pidAsBS = pidToBS pid
25 | unless (resp == pidAsBS) $ do
26 | BSC.putStrLn $ "looking for new pid " <> pidAsBS
27 | BSC.putStrLn $ "got " <> resp
28 | threadDelay 100000
29 | next
30 |
31 | findProcesses :: String -> IO [CPid]
32 | findProcesses name = do
33 | processes <- readProcess "ps" ["-e"] []
34 | return $ map (CPid . read . head . words)
35 | $ filter (name `isInfixOf`)
36 | $ lines processes
37 |
38 | withPid :: ProcessHandle -> (CPid -> IO ()) -> IO ()
39 | withPid han f = withProcessHandle han $ \secretHandle ->
40 | case secretHandle of
41 | OpenHandle pid -> f pid
42 | ClosedHandle _ -> return ()
43 |
44 | pauseSYN :: IO a -> IO a
45 | pauseSYN = bracket_ (system $ "sudo nl-qdisc-add --dev=lo --parent=1:4 "
46 | ++ "--id=40: --update plug --buffer"
47 | )
48 | (system $ "sudo nl-qdisc-add --dev=lo --parent=1:4 "
49 | ++ "--id=40: --update plug --release-indefinite"
50 | )
51 |
52 | reload :: String -> IO ()
53 | reload cmdPath = do
54 | processHandle <- pauseSYN $ runProcess cmdPath [] Nothing Nothing Nothing
55 | Nothing Nothing
56 |
57 | withPid processHandle $ \newPid -> do
58 | waitForNewServer newPid
59 |
60 | oldPids <- filter (newPid /=)
61 | <$> findProcesses "reuse-server"
62 |
63 | pauseSYN $ mapM_ (signalProcess sigTERM) oldPids
64 |
65 | main :: IO ()
66 | main = reload . head =<< getArgs
67 |
--------------------------------------------------------------------------------
/Vagrantfile:
--------------------------------------------------------------------------------
1 | # -*- mode: ruby -*-
2 | # vi: set ft=ruby :
3 |
4 | # All Vagrant configuration is done below. The "2" in Vagrant.configure
5 | # configures the configuration version (we support older styles for
6 | # backwards compatibility). Please don't change it unless you know what
7 | # you're doing.
8 | Vagrant.configure("2") do |config|
9 | # The most common configuration options are documented and commented below.
10 | # For a complete reference, please see the online documentation at
11 | # https://docs.vagrantup.com.
12 |
13 | # Every Vagrant development environment requires a box. You can search for
14 | # boxes at https://atlas.hashicorp.com/search.
15 | config.vm.box = "ubuntu/xenial64"
16 |
17 | # Disable automatic box update checking. If you disable this, then
18 | # boxes will only be checked for updates when the user runs
19 | # `vagrant box outdated`. This is not recommended.
20 | # config.vm.box_check_update = false
21 |
22 | # Create a forwarded port mapping which allows access to a specific port
23 | # within the machine from a port on the host machine. In the example below,
24 | # accessing "localhost:8080" will access port 80 on the guest machine.
25 | # config.vm.network "forwarded_port", guest: 80, host: 8080
26 |
27 | # Create a private network, which allows host-only access to the machine
28 | # using a specific IP.
29 | # config.vm.network "private_network", ip: "192.168.33.10"
30 |
31 | # Create a public network, which generally matched to bridged network.
32 | # Bridged networks make the machine appear as another physical device on
33 | # your network.
34 | # config.vm.network "public_network"
35 |
36 | # Share an additional folder to the guest VM. The first argument is
37 | # the path on the host to the actual folder. The second argument is
38 | # the path on the guest to mount the folder. And the optional third
39 | # argument is a set of non-required options.
40 | # config.vm.synced_folder "../data", "/vagrant_data"
41 |
42 | # Provider-specific configuration so you can fine-tune various
43 | # backing providers for Vagrant. These expose provider-specific options.
44 | # Example for VirtualBox:
45 | #
46 | config.vm.provider "virtualbox" do |vb|
47 | # # Display the VirtualBox GUI when booting the machine
48 | # vb.gui = true
49 | #
50 | # # Customize the amount of memory on the VM:
51 | vb.memory = "8192"
52 | vb.cpus = 7
53 | end
54 | #
55 | # View the documentation for the provider you are using for more
56 | # information on available options.
57 |
58 | # Define a Vagrant Push strategy for pushing to Atlas. Other push strategies
59 | # such as FTP and Heroku are also available. See the documentation at
60 | # https://docs.vagrantup.com/v2/push/atlas.html for more information.
61 | # config.push.define "atlas" do |push|
62 | # push.app = "YOUR_ATLAS_USERNAME/YOUR_APPLICATION_NAME"
63 | # end
64 |
65 | # Enable provisioning with a shell script. Additional provisioners such as
66 | # Puppet, Chef, Ansible, Salt, and Docker are also available. Please see the
67 | # documentation for more information about their specific syntax and use.
68 | config.vm.provision "shell", inline: <<-SHELL
69 | apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 575159689BEFB442
70 | echo 'deb http://download.fpcomplete.com/ubuntu xenial main'|sudo tee /etc/apt/sources.list.d/fpco.list
71 |
72 | apt-get update
73 | apt-get -y --allow-unauthenticated install stack
74 | apt-get -y install gnuplot
75 | apt-get -y install libnl-utils
76 | apt-get -y install apache2-utils
77 | apt-get -y install gcc
78 |
79 | SHELL
80 | end
81 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## Zero Downtime Deployment with Warp
2 |
3 | Using [`warp`](https://hackage.haskell.org/package/warp), it is easy to perform zero downtime deployment with
4 | [`SO_REUSEPORT`](https://lwn.net/Articles/542629/). This literate Haskell file creates a server for zero downtime deploys and the repo has utilities that restart it without failed requests.
5 |
6 | ## Outline
7 | - [`SO_REUSEPORT`](#so_reuseport)
8 | - [Start Warp with `SO_REUSEPORT`](#start)
9 | - [Reloading](#reloading)
10 | - [Setup](#setup)
11 | - [Performance](#performance)
12 | - [Design Analysis](#design)
13 | - [Thanks](#thanks)
14 |
15 | ## `SO_REUSEPORT`
16 |
17 | `SO_REUSEPORT` is an extension on newer versions of Linux and BSD (avoid OSX) that allows multiple sockets to bind to the same port. Additionally, Linux will load balance connections between sockets.
18 |
19 | There is a downside to `SO_REUSEPORT`: when the number of sockets bound to a
20 | port changes, there is the possibility that packets for a single TCP connection will get routed to two different sockets. This will lead to a failed request. The likelihood is very low, but to prevent against this, we use a technique developed by [Yelp](https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html).
21 |
22 | #### Boring Haskell Import Statements
23 |
24 | ```haskell
25 | import Control.Exception
26 | import Data.ByteString.Lazy.Char8 (pack)
27 | import Network.HTTP.Types
28 | import Network.Socket
29 | import Network.Wai
30 | import Network.Wai.Handler.Warp
31 | import System.Posix.Process
32 | import System.Posix.Signals
33 | import System.Posix.Types
34 | ```
35 |
36 | ## Start Warp with `SO_REUSEPORT`
37 |
38 | First, we need to create a socket with the `SO_REUSEPORT` flag. We repurpose some `streaming-commons` code.
39 |
40 | This code opens a socket on localhost and crucially sets the `SO_REUSEPORT` flag.
41 |
42 | ```haskell
43 | bindSocketReusePort :: PortNumber -> IO Socket
44 | bindSocketReusePort p =
45 | bracketOnError (socket AF_INET Stream defaultProtocol) close $ \sock -> do
46 | mapM_ (uncurry $ setSocketOption sock)
47 | [ (NoDelay , 1)
48 | , (ReuseAddr, 1)
49 | , (ReusePort, 1) -- <-- Here we add the SO_REUSEPORT flag.
50 | ]
51 | bind sock $ SockAddrInet p $ tupleToHostAddress (127, 0, 0, 1)
52 | listen sock (max 2048 maxListenQueue)
53 | return sock
54 | ```
55 |
56 | The server code takes advantage of two aspects of `warp`'s design.
57 | 1. `warp` let's you start the server with a socket you have previously created.
58 | 2. When the socket is closed, `warp` gracefully shutdowns after it completes the current outstanding requests.
59 |
60 | ```haskell
61 | main :: IO ()
62 | main = do
63 | -- Before we shutdown an old version of the server, we need to run health
64 | -- checks on the new version. The minimal health check is ensuring the new
65 | -- server is responding to requests. We return the PID in every request to
66 | -- verify the server is running. We grab the PID on load here.
67 | CPid processId <- getProcessID
68 | -- We create our socket and setup a handler to close the socket on
69 | -- termination, premature or otherwise.
70 | bracket
71 | (bindSocketReusePort 7000)
72 | close
73 | $ \sock -> do
74 | -- Before we start the server, we install a signal handler for
75 | -- SIGTERM to close the socket. This will start the graceful
76 | -- shutdown.
77 | installHandler sigTERM (CatchOnce $ close sock) Nothing
78 | -- Start the server with the socket we created earlier.
79 | runSettingsSocket defaultSettings sock $ \_ responder ->
80 | -- Finally, we create a single request for testing if the server is
81 | -- running by returning the PID.
82 | responder $ responseLBS status200 [] $ pack $ show processId
83 | ```
84 |
85 | In a real server, we would have many endpoints. We could either return the PID in a header or with a special health endpoint. However, our test server returns the PID regardless of the URL path parts or the HTTP verb.
86 |
87 | ## Reloading
88 |
89 | #### Queuing Disciplines
90 |
91 | Before we can reload, we need to setup a `plug` queuing discipline. This will let us pause `SYN` packets, e.g., new connections, temporarily while we bind or close a socket.
92 |
93 | The following code was copied from the Yelp blog post ([link](https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)). It is also in the repo in the [`bin/setup-qdiscs`](bin/setup-qdiscs) file. All operations require `sudo`.
94 |
95 | ```bash
96 | # Set up the queuing discipline
97 | tc qdisc add dev lo root handle 1: prio bands 4
98 | tc qdisc add dev lo parent 1:1 handle 10: pfifo limit 1000
99 | tc qdisc add dev lo parent 1:2 handle 20: pfifo limit 1000
100 | tc qdisc add dev lo parent 1:3 handle 30: pfifo limit 1000
101 |
102 | # Create a plug qdisc with 1 meg of buffer
103 | nl-qdisc-add --dev=lo --parent=1:4 --id=40: plug --limit 1048576
104 | # Release the plug
105 | nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite
106 |
107 | # Set up the filter, any packet marked with "1" will be
108 | # directed to the plug
109 | tc filter add dev lo protocol ip parent 1:0 prio 1 handle 1 fw classid 1:4
110 |
111 | iptables -t mangle -I OUTPUT -p tcp -s 127.0.0.1 --syn -j MARK --set-mark 1
112 | ```
113 |
114 | #### Reload
115 |
116 | Reloading a new version in production requires a dance with your process supervisor. However, the principle is similar even if the details are different.
117 |
118 | 1. Stop additional `SYN` from being delivered using
119 |
120 | ```bash
121 | sudo nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --buffer
122 | ```
123 | 1. Start a new version of the server and save the PID.
124 | 1. Release the plug and let the `SYN` packets flow.
125 |
126 | ```bash
127 | sudo nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite
128 | ```
129 | 1. Make requests to the health endpoint until the new PID is returned.
130 | 1. Stop `SYN` packets again.
131 | 1. Send `SIGTERM` to the other server processes so they will gracefully
132 | shutdown.
133 | 1. Release the `plug` again.
134 |
135 | An example for demonstrating this process can be found in [`reload/Main.hs`](reload/Main.hs). The `reload` app creates a new server and shuts down all other instances. This is for demonstration purposes. In production, you will want to integrate reloading with your process supervisor.
136 |
137 | ## Setup
138 |
139 | This repo includes a Vagrant file for running a performance test.
140 |
141 | The setup requires `stack`, `ab`, `libnl-utils`, Linux and, optionally `gnuplot`. The easiest way to test it is to use the Vagrant file, which will create a VM with everything installed. Vagrant can be downloaded [here](https://www.vagrantup.com/downloads.html)
142 |
143 | Run with the following steps:
144 |
145 | ```bash
146 | $ git clone https://github.com/jfischoff/reuse-port-example
147 | $ cd reuse-port-example
148 | $ vagrant up
149 | $ vagrant ssh
150 | $ cd /vagrant
151 | $ stack setup
152 | $ bin/test
153 | ```
154 |
155 | [`bin/test`](bin/test) will:
156 |
157 | 1. Build the project.
158 | 1. Setup the queuing disciplines.
159 | 1. Start the server.
160 | 1. Run ab.
161 | 1. Stop the server.
162 | 1. Start the `reload` every 100 ms.
163 | 1. Run ab.
164 |
165 | ## Performance
166 |
167 | Below are the times without constant reloading and with constant reloading. Crucially, no connections are dropped and there are no request failures. 100,000 requests were run.
168 |
169 | #### Without Constant Reloading (Baseline)
170 | - mean: 0.367 ms
171 | - stddev: 0.1 ms
172 | - 99%: 1 ms
173 | - max: 19 ms
174 |
175 | 
176 |
177 | #### With Constant Reloading
178 | - mean: 0.492 ms
179 | - stddev: 1.1 ms
180 | - 99%: 1 ms
181 | - max: 45 ms
182 |
183 | 
184 |
185 | ## Design Analysis
186 |
187 | #### Advantages
188 | - It is relatively self contained
189 | - Reloading is fast
190 |
191 | #### Disadvantages
192 | - The new version of the server is responding to requests **before health checks** are performed. If there is something wrong with the code, say it segfaults because the executable was corrupted, this will cause client impact.
193 | - There is a small amount overhead because the requests are blocked during the reload.
194 |
195 | ### Immutable Alternative
196 |
197 | One alternative would be to use an immutable blue/green deployment strategy, and utilize the load balancer to weigh in a new version.
198 |
199 | #### Advantages
200 | * We weigh in the machine **after health checks**.
201 | * It doesn't necessarily have any overhead (depends on the load balancer YMMV)
202 |
203 | #### Disadvantages
204 | * It can be slow to start up new VM's, which could prevent crucial fixes from
205 | being deployed.
206 | * Requires modifying the load balancers config, which is an easy way to cause
207 | catastrophic failure.
208 | * Arguably more work to setup.
209 |
210 | ### `huptime`
211 |
212 | [`huptime`](https://github.com/amscanne/huptime) was mentioned on the Yelp blog, but was not appropriate for their problem. However, I think it might solve many Haskell web server use cases. If anyone has any experience using `huptime`, let me know through the github issues of this repo or directly at [@jfischoff](https://twitter.com/jfischoff).
213 |
214 | ### Future Work
215 |
216 | An alternative design for utilizing `SO_REUSEPORT` is to create a parent process that keeps a pool of sockets which its workers inherit on reload. This is essentially the design that `nginx` ([nginx reload](http://nginx.org/en/docs/control.html?utm_source=socket-sharding-nginx-release-1-9-1&utm_medium=blog&_ga=1.38701153.370685645.1475165126#upgrade), [nginx and `SO_REUSEPORT`](https://www.nginx.com/blog/socket-sharding-nginx-release-1-9-1/)) has. The primary advantage is that, in the typical case, we would not need to utilize the `plug` queuing discipline.
217 |
218 | Changing the number of child processes will still be problematic, but we could utilize the `plug` queueing discipline approach Yelp developed.
219 |
220 | ## Thanks
221 |
222 | I learned about `SO_REUSEPORT` from Imran Hamead, and the Yelp post was invaluable for preventing errors.
223 |
--------------------------------------------------------------------------------
/src/Main.lhs:
--------------------------------------------------------------------------------
1 | ## Zero Downtime Deployment with Warp
2 |
3 | Using [`warp`](https://hackage.haskell.org/package/warp), it is easy to perform zero downtime deployment with
4 | [`SO_REUSEPORT`](https://lwn.net/Articles/542629/). This literate Haskell file creates a server for zero downtime deploys and the repo has utilities that restart it without failed requests.
5 |
6 | ## Outline
7 | - [`SO_REUSEPORT`](#so_reuseport)
8 | - [Start Warp with `SO_REUSEPORT`](#start)
9 | - [Reloading](#reloading)
10 | - [Setup](#setup)
11 | - [Performance](#performance)
12 | - [Design Analysis](#design)
13 | - [Thanks](#thanks)
14 |
15 | ## `SO_REUSEPORT`
16 |
17 | `SO_REUSEPORT` is an extension on newer versions of Linux and BSD (avoid OSX) that allows multiple sockets to bind to the same port. Additionally, Linux will load balance connections between sockets.
18 |
19 | There is a downside to `SO_REUSEPORT`: when the number of sockets bound to a
20 | port changes, there is the possibility that packets for a single TCP connection will get routed to two different sockets. This will lead to a failed request. The likelihood is very low, but to prevent against this, we use a technique developed by [Yelp](https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html).
21 |
22 | #### Boring Haskell Import Statements
23 |
24 | ```haskell
25 | import Control.Exception
26 | import Data.ByteString.Lazy.Char8 (pack)
27 | import Network.HTTP.Types
28 | import Network.Socket
29 | import Network.Wai
30 | import Network.Wai.Handler.Warp
31 | import System.Posix.Process
32 | import System.Posix.Signals
33 | import System.Posix.Types
34 | ```
35 |
36 | ## Start Warp with `SO_REUSEPORT`
37 |
38 | First, we need to create a socket with the `SO_REUSEPORT` flag. We repurpose some `streaming-commons` code.
39 |
40 | This code opens a socket on localhost and crucially sets the `SO_REUSEPORT` flag.
41 |
42 | ```haskell
43 | bindSocketReusePort :: PortNumber -> IO Socket
44 | bindSocketReusePort p =
45 | bracketOnError (socket AF_INET Stream defaultProtocol) close $ \sock -> do
46 | mapM_ (uncurry $ setSocketOption sock)
47 | [ (NoDelay , 1)
48 | , (ReuseAddr, 1)
49 | , (ReusePort, 1) -- <-- Here we add the SO_REUSEPORT flag.
50 | ]
51 | bind sock $ SockAddrInet p $ tupleToHostAddress (127, 0, 0, 1)
52 | listen sock (max 2048 maxListenQueue)
53 | return sock
54 | ```
55 |
56 | The server code takes advantage of two aspects of `warp`'s design.
57 | 1. `warp` let's you start the server with a socket you have previously created.
58 | 2. When the socket is closed, `warp` gracefully shutdowns after it completes the current outstanding requests.
59 |
60 | ```haskell
61 | main :: IO ()
62 | main = do
63 | -- Before we shutdown an old version of the server, we need to run health
64 | -- checks on the new version. The minimal health check is ensuring the new
65 | -- server is responding to requests. We return the PID in every request to
66 | -- verify the server is running. We grab the PID on load here.
67 | CPid processId <- getProcessID
68 | -- We create our socket and setup a handler to close the socket on
69 | -- termination, premature or otherwise.
70 | bracket
71 | (bindSocketReusePort 7000)
72 | close
73 | $ \sock -> do
74 | -- Before we start the server, we install a signal handler for
75 | -- SIGTERM to close the socket. This will start the graceful
76 | -- shutdown.
77 | installHandler sigTERM (CatchOnce $ close sock) Nothing
78 | -- Start the server with the socket we created earlier.
79 | runSettingsSocket defaultSettings sock $ \_ responder ->
80 | -- Finally, we create a single request for testing if the server is
81 | -- running by returning the PID.
82 | responder $ responseLBS status200 [] $ pack $ show processId
83 | ```
84 |
85 | In a real server, we would have many endpoints. We could either return the PID in a header or with a special health endpoint. However, our test server returns the PID regardless of the URL path parts or the HTTP verb.
86 |
87 | ## Reloading
88 |
89 | #### Queuing Disciplines
90 |
91 | Before we can reload, we need to setup a `plug` queuing discipline. This will let us pause `SYN` packets, e.g., new connections, temporarily while we bind or close a socket.
92 |
93 | The following code was copied from the Yelp blog post ([link](https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)). It is also in the repo in the [`bin/setup-qdiscs`](bin/setup-qdiscs) file. All operations require `sudo`.
94 |
95 | ```bash
96 | # Set up the queuing discipline
97 | tc qdisc add dev lo root handle 1: prio bands 4
98 | tc qdisc add dev lo parent 1:1 handle 10: pfifo limit 1000
99 | tc qdisc add dev lo parent 1:2 handle 20: pfifo limit 1000
100 | tc qdisc add dev lo parent 1:3 handle 30: pfifo limit 1000
101 |
102 | # Create a plug qdisc with 1 meg of buffer
103 | nl-qdisc-add --dev=lo --parent=1:4 --id=40: plug --limit 1048576
104 | # Release the plug
105 | nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite
106 |
107 | # Set up the filter, any packet marked with "1" will be
108 | # directed to the plug
109 | tc filter add dev lo protocol ip parent 1:0 prio 1 handle 1 fw classid 1:4
110 |
111 | iptables -t mangle -I OUTPUT -p tcp -s 127.0.0.1 --syn -j MARK --set-mark 1
112 | ```
113 |
114 | #### Reload
115 |
116 | Reloading a new version in production requires a dance with your process supervisor. However, the principle is similar even if the details are different.
117 |
118 | 1. Stop additional `SYN` from being delivered using
119 |
120 | ```bash
121 | sudo nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --buffer
122 | ```
123 | 1. Start a new version of the server and save the PID.
124 | 1. Release the plug and let the `SYN` packets flow.
125 |
126 | ```bash
127 | sudo nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite
128 | ```
129 | 1. Make requests to the health endpoint until the new PID is returned.
130 | 1. Stop `SYN` packets again.
131 | 1. Send `SIGTERM` to the other server processes so they will gracefully
132 | shutdown.
133 | 1. Release the `plug` again.
134 |
135 | An example for demonstrating this process can be found in [`reload/Main.hs`](reload/Main.hs). The `reload` app creates a new server and shuts down all other instances. This is for demonstration purposes. In production, you will want to integrate reloading with your process supervisor.
136 |
137 | ## Setup
138 |
139 | This repo includes a Vagrant file for running a performance test.
140 |
141 | The setup requires `stack`, `ab`, `libnl-utils`, Linux and, optionally `gnuplot`. The easiest way to test it is to use the Vagrant file, which will create a VM with everything installed. Vagrant can be downloaded [here](https://www.vagrantup.com/downloads.html)
142 |
143 | Run with the following steps:
144 |
145 | ```bash
146 | $ git clone https://github.com/jfischoff/reuse-port-example
147 | $ cd reuse-port-example
148 | $ vagrant up
149 | $ vagrant ssh
150 | $ cd /vagrant
151 | $ stack setup
152 | $ bin/test
153 | ```
154 |
155 | [`bin/test`](bin/test) will:
156 |
157 | 1. Build the project.
158 | 1. Setup the queuing disciplines.
159 | 1. Start the server.
160 | 1. Run ab.
161 | 1. Stop the server.
162 | 1. Start the `reload` every 100 ms.
163 | 1. Run ab.
164 |
165 | ## Performance
166 |
167 | Below are the times without constant reloading and with constant reloading. Crucially, no connections are dropped and there are no request failures. 100,000 requests were run.
168 |
169 | #### Without Constant Reloading (Baseline)
170 | - mean: 0.367 ms
171 | - stddev: 0.1 ms
172 | - 99%: 1 ms
173 | - max: 19 ms
174 |
175 | 
176 |
177 | #### With Constant Reloading
178 | - mean: 0.492 ms
179 | - stddev: 1.1 ms
180 | - 99%: 1 ms
181 | - max: 45 ms
182 |
183 | 
184 |
185 | ## Design Analysis
186 |
187 | #### Advantages
188 | - It is relatively self contained
189 | - Reloading is fast
190 |
191 | #### Disadvantages
192 | - The new version of the server is responding to requests **before health checks** are performed. If there is something wrong with the code, say it segfaults because the executable was corrupted, this will cause client impact.
193 | - There is a small amount overhead because the requests are blocked during the reload.
194 |
195 | ### Immutable Alternative
196 |
197 | One alternative would be to use an immutable blue/green deployment strategy, and utilize the load balancer to weigh in a new version.
198 |
199 | #### Advantages
200 | * We weigh in the machine **after health checks**.
201 | * It doesn't necessarily have any overhead (depends on the load balancer YMMV)
202 |
203 | #### Disadvantages
204 | * It can be slow to start up new VM's, which could prevent crucial fixes from
205 | being deployed.
206 | * Requires modifying the load balancers config, which is an easy way to cause
207 | catastrophic failure.
208 | * Arguably more work to setup.
209 |
210 | ### `huptime`
211 |
212 | [`huptime`](https://github.com/amscanne/huptime) was mentioned on the Yelp blog, but was not appropriate for their problem. However, I think it might solve many Haskell web server use cases. If anyone has any experience using `huptime`, let me know through the github issues of this repo or directly at [@jfischoff](https://twitter.com/jfischoff).
213 |
214 | ### Future Work
215 |
216 | An alternative design for utilizing `SO_REUSEPORT` is to create a parent process that keeps a pool of sockets which its workers inherit on reload. This is essentially the design that `nginx` ([nginx reload](http://nginx.org/en/docs/control.html?utm_source=socket-sharding-nginx-release-1-9-1&utm_medium=blog&_ga=1.38701153.370685645.1475165126#upgrade), [nginx and `SO_REUSEPORT`](https://www.nginx.com/blog/socket-sharding-nginx-release-1-9-1/)) has. The primary advantage is that, in the typical case, we would not need to utilize the `plug` queuing discipline.
217 |
218 | Changing the number of child processes will still be problematic, but we could utilize the `plug` queueing discipline approach Yelp developed.
219 |
220 | ## Thanks
221 |
222 | I learned about `SO_REUSEPORT` from Imran Hamead, and the Yelp post was invaluable for preventing errors.
223 |
--------------------------------------------------------------------------------