├── Setup.hs ├── bin ├── make-plots ├── reload ├── test ├── setup-qdiscs ├── plotBaseline.p └── plotReloading.p ├── test └── Spec.hs ├── baseline.png ├── reloading.png ├── .gitignore ├── stack.yaml ├── LICENSE ├── reuse-port-example.cabal ├── reload └── Main.hs ├── Vagrantfile ├── README.md └── src └── Main.lhs /Setup.hs: -------------------------------------------------------------------------------- 1 | import Distribution.Simple 2 | main = defaultMain 3 | -------------------------------------------------------------------------------- /bin/make-plots: -------------------------------------------------------------------------------- 1 | gnuplot bin/plotBaseline.p 2 | gnuplot bin/plotReloading.p 3 | -------------------------------------------------------------------------------- /test/Spec.hs: -------------------------------------------------------------------------------- 1 | main :: IO () 2 | main = putStrLn "Test suite not yet implemented" 3 | -------------------------------------------------------------------------------- /baseline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jfischoff/reuse-port-example/HEAD/baseline.png -------------------------------------------------------------------------------- /reloading.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jfischoff/reuse-port-example/HEAD/reloading.png -------------------------------------------------------------------------------- /bin/reload: -------------------------------------------------------------------------------- 1 | 2 | while [ 1 ]; do 3 | .stack-work/install/x86_64-linux/lts-7.3/8.0.1/bin/reload .stack-work/install/x86_64-linux/lts-7.3/8.0.1/bin/reuse-server 4 | sleep 0.1 5 | done 6 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .stack-work 2 | reloading.tsv 3 | *.log 4 | baseline.tsv 5 | .DS_Store 6 | .stack 7 | error.log 8 | .cabal-sandbox/ 9 | cabal.sandbox.config 10 | dist 11 | *.hp 12 | *.ps 13 | *.prof 14 | *.aux 15 | .vagrant 16 | ~* 17 | *~ 18 | .osx-cabal-sandbox/ 19 | -------------------------------------------------------------------------------- /bin/test: -------------------------------------------------------------------------------- 1 | stack build 2 | 3 | sudo ./bin/setup-qdiscs 4 | 5 | stack exec reuse-server > /dev/null & 6 | server=$! 7 | 8 | trap "kill $server || true" SIGTERM SIGINT 9 | 10 | while ! nc -q 1 localhost 7000 /dev/null & 17 | reloader=$! 18 | 19 | trap "kill $reloader || true" SIGTERM SIGINT 20 | 21 | while ! nc -q 1 localhost 7000 = 1.0.0 25 | 26 | # Override the architecture used by stack, especially useful on Windows 27 | # arch: i386 28 | # arch: x86_64 29 | 30 | # Extra directories used by stack for building 31 | # extra-include-dirs: [/path/to/dir] 32 | # extra-lib-dirs: [/path/to/dir] 33 | 34 | # Allow a newer minor version of GHC than the snapshot specifies 35 | # compiler-check: newer-minor 36 | # compiler: ghc-8.0 37 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Author name here (c) 2016 2 | 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are met: 7 | 8 | * Redistributions of source code must retain the above copyright 9 | notice, this list of conditions and the following disclaimer. 10 | 11 | * Redistributions in binary form must reproduce the above 12 | copyright notice, this list of conditions and the following 13 | disclaimer in the documentation and/or other materials provided 14 | with the distribution. 15 | 16 | * Neither the name of Author name here nor the names of other 17 | contributors may be used to endorse or promote products derived 18 | from this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 21 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 22 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 23 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 24 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 25 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 26 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 27 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 28 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 29 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 30 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /reuse-port-example.cabal: -------------------------------------------------------------------------------- 1 | name: reuse-port-example 2 | version: 0.1.0.0 3 | synopsis: REUSE_PORT example 4 | description: A simple example of using REUSE_PORT for zero downtime deploys 5 | homepage: https://github.com/jfischoff/reuse-port-example#readme 6 | license: BSD3 7 | license-file: LICENSE 8 | author: Jonathan Fischoff 9 | maintainer: Jonathan Fischoff 10 | copyright: 2016 Jonathan Fischoff 11 | category: Web 12 | build-type: Simple 13 | -- extra-source-files: 14 | cabal-version: >=1.10 15 | 16 | executable reload 17 | main-is: reload/Main.hs 18 | ghc-options: -Wall 19 | -fno-warn-unused-do-bind 20 | -threaded 21 | -rtsopts 22 | "-with-rtsopts=-N -I0 -qg" 23 | -pgmL markdown-unlit 24 | build-depends: base 25 | , process 26 | , unix 27 | , wreq 28 | , lens 29 | , bytestring 30 | 31 | default-language: Haskell2010 32 | 33 | executable reuse-server 34 | main-is: src/Main.lhs 35 | ghc-options: -Wall 36 | -fno-warn-unused-do-bind 37 | -threaded 38 | -rtsopts 39 | "-with-rtsopts=-N -I0 -qg" 40 | -pgmL markdown-unlit 41 | build-depends: base 42 | , warp 43 | , wai 44 | , http-types 45 | , unix 46 | , network 47 | , bytestring 48 | , markdown-unlit 49 | default-language: Haskell2010 50 | -------------------------------------------------------------------------------- /reload/Main.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE OverloadedStrings #-} 2 | import Network.Wreq 3 | import Control.Lens 4 | import Control.Concurrent 5 | import Data.Function 6 | import System.Process 7 | import System.Process.Internals 8 | import Control.Monad (unless) 9 | import System.Posix.Types 10 | import Data.List 11 | import Data.ByteString.Lazy (ByteString) 12 | import qualified Data.ByteString.Lazy.Char8 as BSC 13 | import System.Posix.Signals 14 | import Data.Monoid ((<>)) 15 | import System.Environment 16 | import Control.Exception 17 | 18 | pidToBS :: CPid -> ByteString 19 | pidToBS (CPid pid) = BSC.pack $ show pid 20 | 21 | waitForNewServer :: CPid -> IO () 22 | waitForNewServer pid = fix $ \next -> do 23 | resp <- view responseBody <$> get "http://localhost:7000/" 24 | let pidAsBS = pidToBS pid 25 | unless (resp == pidAsBS) $ do 26 | BSC.putStrLn $ "looking for new pid " <> pidAsBS 27 | BSC.putStrLn $ "got " <> resp 28 | threadDelay 100000 29 | next 30 | 31 | findProcesses :: String -> IO [CPid] 32 | findProcesses name = do 33 | processes <- readProcess "ps" ["-e"] [] 34 | return $ map (CPid . read . head . words) 35 | $ filter (name `isInfixOf`) 36 | $ lines processes 37 | 38 | withPid :: ProcessHandle -> (CPid -> IO ()) -> IO () 39 | withPid han f = withProcessHandle han $ \secretHandle -> 40 | case secretHandle of 41 | OpenHandle pid -> f pid 42 | ClosedHandle _ -> return () 43 | 44 | pauseSYN :: IO a -> IO a 45 | pauseSYN = bracket_ (system $ "sudo nl-qdisc-add --dev=lo --parent=1:4 " 46 | ++ "--id=40: --update plug --buffer" 47 | ) 48 | (system $ "sudo nl-qdisc-add --dev=lo --parent=1:4 " 49 | ++ "--id=40: --update plug --release-indefinite" 50 | ) 51 | 52 | reload :: String -> IO () 53 | reload cmdPath = do 54 | processHandle <- pauseSYN $ runProcess cmdPath [] Nothing Nothing Nothing 55 | Nothing Nothing 56 | 57 | withPid processHandle $ \newPid -> do 58 | waitForNewServer newPid 59 | 60 | oldPids <- filter (newPid /=) 61 | <$> findProcesses "reuse-server" 62 | 63 | pauseSYN $ mapM_ (signalProcess sigTERM) oldPids 64 | 65 | main :: IO () 66 | main = reload . head =<< getArgs 67 | -------------------------------------------------------------------------------- /Vagrantfile: -------------------------------------------------------------------------------- 1 | # -*- mode: ruby -*- 2 | # vi: set ft=ruby : 3 | 4 | # All Vagrant configuration is done below. The "2" in Vagrant.configure 5 | # configures the configuration version (we support older styles for 6 | # backwards compatibility). Please don't change it unless you know what 7 | # you're doing. 8 | Vagrant.configure("2") do |config| 9 | # The most common configuration options are documented and commented below. 10 | # For a complete reference, please see the online documentation at 11 | # https://docs.vagrantup.com. 12 | 13 | # Every Vagrant development environment requires a box. You can search for 14 | # boxes at https://atlas.hashicorp.com/search. 15 | config.vm.box = "ubuntu/xenial64" 16 | 17 | # Disable automatic box update checking. If you disable this, then 18 | # boxes will only be checked for updates when the user runs 19 | # `vagrant box outdated`. This is not recommended. 20 | # config.vm.box_check_update = false 21 | 22 | # Create a forwarded port mapping which allows access to a specific port 23 | # within the machine from a port on the host machine. In the example below, 24 | # accessing "localhost:8080" will access port 80 on the guest machine. 25 | # config.vm.network "forwarded_port", guest: 80, host: 8080 26 | 27 | # Create a private network, which allows host-only access to the machine 28 | # using a specific IP. 29 | # config.vm.network "private_network", ip: "192.168.33.10" 30 | 31 | # Create a public network, which generally matched to bridged network. 32 | # Bridged networks make the machine appear as another physical device on 33 | # your network. 34 | # config.vm.network "public_network" 35 | 36 | # Share an additional folder to the guest VM. The first argument is 37 | # the path on the host to the actual folder. The second argument is 38 | # the path on the guest to mount the folder. And the optional third 39 | # argument is a set of non-required options. 40 | # config.vm.synced_folder "../data", "/vagrant_data" 41 | 42 | # Provider-specific configuration so you can fine-tune various 43 | # backing providers for Vagrant. These expose provider-specific options. 44 | # Example for VirtualBox: 45 | # 46 | config.vm.provider "virtualbox" do |vb| 47 | # # Display the VirtualBox GUI when booting the machine 48 | # vb.gui = true 49 | # 50 | # # Customize the amount of memory on the VM: 51 | vb.memory = "8192" 52 | vb.cpus = 7 53 | end 54 | # 55 | # View the documentation for the provider you are using for more 56 | # information on available options. 57 | 58 | # Define a Vagrant Push strategy for pushing to Atlas. Other push strategies 59 | # such as FTP and Heroku are also available. See the documentation at 60 | # https://docs.vagrantup.com/v2/push/atlas.html for more information. 61 | # config.push.define "atlas" do |push| 62 | # push.app = "YOUR_ATLAS_USERNAME/YOUR_APPLICATION_NAME" 63 | # end 64 | 65 | # Enable provisioning with a shell script. Additional provisioners such as 66 | # Puppet, Chef, Ansible, Salt, and Docker are also available. Please see the 67 | # documentation for more information about their specific syntax and use. 68 | config.vm.provision "shell", inline: <<-SHELL 69 | apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 575159689BEFB442 70 | echo 'deb http://download.fpcomplete.com/ubuntu xenial main'|sudo tee /etc/apt/sources.list.d/fpco.list 71 | 72 | apt-get update 73 | apt-get -y --allow-unauthenticated install stack 74 | apt-get -y install gnuplot 75 | apt-get -y install libnl-utils 76 | apt-get -y install apache2-utils 77 | apt-get -y install gcc 78 | 79 | SHELL 80 | end 81 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Zero Downtime Deployment with Warp 2 | 3 | Using [`warp`](https://hackage.haskell.org/package/warp), it is easy to perform zero downtime deployment with 4 | [`SO_REUSEPORT`](https://lwn.net/Articles/542629/). This literate Haskell file creates a server for zero downtime deploys and the repo has utilities that restart it without failed requests. 5 | 6 | ## Outline 7 | - [`SO_REUSEPORT`](#so_reuseport) 8 | - [Start Warp with `SO_REUSEPORT`](#start) 9 | - [Reloading](#reloading) 10 | - [Setup](#setup) 11 | - [Performance](#performance) 12 | - [Design Analysis](#design) 13 | - [Thanks](#thanks) 14 | 15 | ## `SO_REUSEPORT` 16 | 17 | `SO_REUSEPORT` is an extension on newer versions of Linux and BSD (avoid OSX) that allows multiple sockets to bind to the same port. Additionally, Linux will load balance connections between sockets. 18 | 19 | There is a downside to `SO_REUSEPORT`: when the number of sockets bound to a 20 | port changes, there is the possibility that packets for a single TCP connection will get routed to two different sockets. This will lead to a failed request. The likelihood is very low, but to prevent against this, we use a technique developed by [Yelp](https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html). 21 | 22 | #### Boring Haskell Import Statements 23 | 24 | ```haskell 25 | import Control.Exception 26 | import Data.ByteString.Lazy.Char8 (pack) 27 | import Network.HTTP.Types 28 | import Network.Socket 29 | import Network.Wai 30 | import Network.Wai.Handler.Warp 31 | import System.Posix.Process 32 | import System.Posix.Signals 33 | import System.Posix.Types 34 | ``` 35 | 36 | ## Start Warp with `SO_REUSEPORT` 37 | 38 | First, we need to create a socket with the `SO_REUSEPORT` flag. We repurpose some `streaming-commons` code. 39 | 40 | This code opens a socket on localhost and crucially sets the `SO_REUSEPORT` flag. 41 | 42 | ```haskell 43 | bindSocketReusePort :: PortNumber -> IO Socket 44 | bindSocketReusePort p = 45 | bracketOnError (socket AF_INET Stream defaultProtocol) close $ \sock -> do 46 | mapM_ (uncurry $ setSocketOption sock) 47 | [ (NoDelay , 1) 48 | , (ReuseAddr, 1) 49 | , (ReusePort, 1) -- <-- Here we add the SO_REUSEPORT flag. 50 | ] 51 | bind sock $ SockAddrInet p $ tupleToHostAddress (127, 0, 0, 1) 52 | listen sock (max 2048 maxListenQueue) 53 | return sock 54 | ``` 55 | 56 | The server code takes advantage of two aspects of `warp`'s design. 57 | 1. `warp` let's you start the server with a socket you have previously created. 58 | 2. When the socket is closed, `warp` gracefully shutdowns after it completes the current outstanding requests. 59 | 60 | ```haskell 61 | main :: IO () 62 | main = do 63 | -- Before we shutdown an old version of the server, we need to run health 64 | -- checks on the new version. The minimal health check is ensuring the new 65 | -- server is responding to requests. We return the PID in every request to 66 | -- verify the server is running. We grab the PID on load here. 67 | CPid processId <- getProcessID 68 | -- We create our socket and setup a handler to close the socket on 69 | -- termination, premature or otherwise. 70 | bracket 71 | (bindSocketReusePort 7000) 72 | close 73 | $ \sock -> do 74 | -- Before we start the server, we install a signal handler for 75 | -- SIGTERM to close the socket. This will start the graceful 76 | -- shutdown. 77 | installHandler sigTERM (CatchOnce $ close sock) Nothing 78 | -- Start the server with the socket we created earlier. 79 | runSettingsSocket defaultSettings sock $ \_ responder -> 80 | -- Finally, we create a single request for testing if the server is 81 | -- running by returning the PID. 82 | responder $ responseLBS status200 [] $ pack $ show processId 83 | ``` 84 | 85 | In a real server, we would have many endpoints. We could either return the PID in a header or with a special health endpoint. However, our test server returns the PID regardless of the URL path parts or the HTTP verb. 86 | 87 | ## Reloading 88 | 89 | #### Queuing Disciplines 90 | 91 | Before we can reload, we need to setup a `plug` queuing discipline. This will let us pause `SYN` packets, e.g., new connections, temporarily while we bind or close a socket. 92 | 93 | The following code was copied from the Yelp blog post ([link](https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)). It is also in the repo in the [`bin/setup-qdiscs`](bin/setup-qdiscs) file. All operations require `sudo`. 94 | 95 | ```bash 96 | # Set up the queuing discipline 97 | tc qdisc add dev lo root handle 1: prio bands 4 98 | tc qdisc add dev lo parent 1:1 handle 10: pfifo limit 1000 99 | tc qdisc add dev lo parent 1:2 handle 20: pfifo limit 1000 100 | tc qdisc add dev lo parent 1:3 handle 30: pfifo limit 1000 101 | 102 | # Create a plug qdisc with 1 meg of buffer 103 | nl-qdisc-add --dev=lo --parent=1:4 --id=40: plug --limit 1048576 104 | # Release the plug 105 | nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite 106 | 107 | # Set up the filter, any packet marked with "1" will be 108 | # directed to the plug 109 | tc filter add dev lo protocol ip parent 1:0 prio 1 handle 1 fw classid 1:4 110 | 111 | iptables -t mangle -I OUTPUT -p tcp -s 127.0.0.1 --syn -j MARK --set-mark 1 112 | ``` 113 | 114 | #### Reload 115 | 116 | Reloading a new version in production requires a dance with your process supervisor. However, the principle is similar even if the details are different. 117 | 118 | 1. Stop additional `SYN` from being delivered using 119 | 120 | ```bash 121 | sudo nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --buffer 122 | ``` 123 | 1. Start a new version of the server and save the PID. 124 | 1. Release the plug and let the `SYN` packets flow. 125 | 126 | ```bash 127 | sudo nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite 128 | ``` 129 | 1. Make requests to the health endpoint until the new PID is returned. 130 | 1. Stop `SYN` packets again. 131 | 1. Send `SIGTERM` to the other server processes so they will gracefully 132 | shutdown. 133 | 1. Release the `plug` again. 134 | 135 | An example for demonstrating this process can be found in [`reload/Main.hs`](reload/Main.hs). The `reload` app creates a new server and shuts down all other instances. This is for demonstration purposes. In production, you will want to integrate reloading with your process supervisor. 136 | 137 | ## Setup 138 | 139 | This repo includes a Vagrant file for running a performance test. 140 | 141 | The setup requires `stack`, `ab`, `libnl-utils`, Linux and, optionally `gnuplot`. The easiest way to test it is to use the Vagrant file, which will create a VM with everything installed. Vagrant can be downloaded [here](https://www.vagrantup.com/downloads.html) 142 | 143 | Run with the following steps: 144 | 145 | ```bash 146 | $ git clone https://github.com/jfischoff/reuse-port-example 147 | $ cd reuse-port-example 148 | $ vagrant up 149 | $ vagrant ssh 150 | $ cd /vagrant 151 | $ stack setup 152 | $ bin/test 153 | ``` 154 | 155 | [`bin/test`](bin/test) will: 156 | 157 | 1. Build the project. 158 | 1. Setup the queuing disciplines. 159 | 1. Start the server. 160 | 1. Run ab. 161 | 1. Stop the server. 162 | 1. Start the `reload` every 100 ms. 163 | 1. Run ab. 164 | 165 | ## Performance 166 | 167 | Below are the times without constant reloading and with constant reloading. Crucially, no connections are dropped and there are no request failures. 100,000 requests were run. 168 | 169 | #### Without Constant Reloading (Baseline) 170 | - mean: 0.367 ms 171 | - stddev: 0.1 ms 172 | - 99%: 1 ms 173 | - max: 19 ms 174 | 175 | ![Baseline Scatter Plot](/baseline.png) 176 | 177 | #### With Constant Reloading 178 | - mean: 0.492 ms 179 | - stddev: 1.1 ms 180 | - 99%: 1 ms 181 | - max: 45 ms 182 | 183 | ![Reloading Scatter Plot](/reloading.png) 184 | 185 | ## Design Analysis 186 | 187 | #### Advantages 188 | - It is relatively self contained 189 | - Reloading is fast 190 | 191 | #### Disadvantages 192 | - The new version of the server is responding to requests **before health checks** are performed. If there is something wrong with the code, say it segfaults because the executable was corrupted, this will cause client impact. 193 | - There is a small amount overhead because the requests are blocked during the reload. 194 | 195 | ### Immutable Alternative 196 | 197 | One alternative would be to use an immutable blue/green deployment strategy, and utilize the load balancer to weigh in a new version. 198 | 199 | #### Advantages 200 | * We weigh in the machine **after health checks**. 201 | * It doesn't necessarily have any overhead (depends on the load balancer YMMV) 202 | 203 | #### Disadvantages 204 | * It can be slow to start up new VM's, which could prevent crucial fixes from 205 | being deployed. 206 | * Requires modifying the load balancers config, which is an easy way to cause 207 | catastrophic failure. 208 | * Arguably more work to setup. 209 | 210 | ### `huptime` 211 | 212 | [`huptime`](https://github.com/amscanne/huptime) was mentioned on the Yelp blog, but was not appropriate for their problem. However, I think it might solve many Haskell web server use cases. If anyone has any experience using `huptime`, let me know through the github issues of this repo or directly at [@jfischoff](https://twitter.com/jfischoff). 213 | 214 | ### Future Work 215 | 216 | An alternative design for utilizing `SO_REUSEPORT` is to create a parent process that keeps a pool of sockets which its workers inherit on reload. This is essentially the design that `nginx` ([nginx reload](http://nginx.org/en/docs/control.html?utm_source=socket-sharding-nginx-release-1-9-1&utm_medium=blog&_ga=1.38701153.370685645.1475165126#upgrade), [nginx and `SO_REUSEPORT`](https://www.nginx.com/blog/socket-sharding-nginx-release-1-9-1/)) has. The primary advantage is that, in the typical case, we would not need to utilize the `plug` queuing discipline. 217 | 218 | Changing the number of child processes will still be problematic, but we could utilize the `plug` queueing discipline approach Yelp developed. 219 | 220 | ## Thanks 221 | 222 | I learned about `SO_REUSEPORT` from Imran Hamead, and the Yelp post was invaluable for preventing errors. 223 | -------------------------------------------------------------------------------- /src/Main.lhs: -------------------------------------------------------------------------------- 1 | ## Zero Downtime Deployment with Warp 2 | 3 | Using [`warp`](https://hackage.haskell.org/package/warp), it is easy to perform zero downtime deployment with 4 | [`SO_REUSEPORT`](https://lwn.net/Articles/542629/). This literate Haskell file creates a server for zero downtime deploys and the repo has utilities that restart it without failed requests. 5 | 6 | ## Outline 7 | - [`SO_REUSEPORT`](#so_reuseport) 8 | - [Start Warp with `SO_REUSEPORT`](#start) 9 | - [Reloading](#reloading) 10 | - [Setup](#setup) 11 | - [Performance](#performance) 12 | - [Design Analysis](#design) 13 | - [Thanks](#thanks) 14 | 15 | ## `SO_REUSEPORT` 16 | 17 | `SO_REUSEPORT` is an extension on newer versions of Linux and BSD (avoid OSX) that allows multiple sockets to bind to the same port. Additionally, Linux will load balance connections between sockets. 18 | 19 | There is a downside to `SO_REUSEPORT`: when the number of sockets bound to a 20 | port changes, there is the possibility that packets for a single TCP connection will get routed to two different sockets. This will lead to a failed request. The likelihood is very low, but to prevent against this, we use a technique developed by [Yelp](https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html). 21 | 22 | #### Boring Haskell Import Statements 23 | 24 | ```haskell 25 | import Control.Exception 26 | import Data.ByteString.Lazy.Char8 (pack) 27 | import Network.HTTP.Types 28 | import Network.Socket 29 | import Network.Wai 30 | import Network.Wai.Handler.Warp 31 | import System.Posix.Process 32 | import System.Posix.Signals 33 | import System.Posix.Types 34 | ``` 35 | 36 | ## Start Warp with `SO_REUSEPORT` 37 | 38 | First, we need to create a socket with the `SO_REUSEPORT` flag. We repurpose some `streaming-commons` code. 39 | 40 | This code opens a socket on localhost and crucially sets the `SO_REUSEPORT` flag. 41 | 42 | ```haskell 43 | bindSocketReusePort :: PortNumber -> IO Socket 44 | bindSocketReusePort p = 45 | bracketOnError (socket AF_INET Stream defaultProtocol) close $ \sock -> do 46 | mapM_ (uncurry $ setSocketOption sock) 47 | [ (NoDelay , 1) 48 | , (ReuseAddr, 1) 49 | , (ReusePort, 1) -- <-- Here we add the SO_REUSEPORT flag. 50 | ] 51 | bind sock $ SockAddrInet p $ tupleToHostAddress (127, 0, 0, 1) 52 | listen sock (max 2048 maxListenQueue) 53 | return sock 54 | ``` 55 | 56 | The server code takes advantage of two aspects of `warp`'s design. 57 | 1. `warp` let's you start the server with a socket you have previously created. 58 | 2. When the socket is closed, `warp` gracefully shutdowns after it completes the current outstanding requests. 59 | 60 | ```haskell 61 | main :: IO () 62 | main = do 63 | -- Before we shutdown an old version of the server, we need to run health 64 | -- checks on the new version. The minimal health check is ensuring the new 65 | -- server is responding to requests. We return the PID in every request to 66 | -- verify the server is running. We grab the PID on load here. 67 | CPid processId <- getProcessID 68 | -- We create our socket and setup a handler to close the socket on 69 | -- termination, premature or otherwise. 70 | bracket 71 | (bindSocketReusePort 7000) 72 | close 73 | $ \sock -> do 74 | -- Before we start the server, we install a signal handler for 75 | -- SIGTERM to close the socket. This will start the graceful 76 | -- shutdown. 77 | installHandler sigTERM (CatchOnce $ close sock) Nothing 78 | -- Start the server with the socket we created earlier. 79 | runSettingsSocket defaultSettings sock $ \_ responder -> 80 | -- Finally, we create a single request for testing if the server is 81 | -- running by returning the PID. 82 | responder $ responseLBS status200 [] $ pack $ show processId 83 | ``` 84 | 85 | In a real server, we would have many endpoints. We could either return the PID in a header or with a special health endpoint. However, our test server returns the PID regardless of the URL path parts or the HTTP verb. 86 | 87 | ## Reloading 88 | 89 | #### Queuing Disciplines 90 | 91 | Before we can reload, we need to setup a `plug` queuing discipline. This will let us pause `SYN` packets, e.g., new connections, temporarily while we bind or close a socket. 92 | 93 | The following code was copied from the Yelp blog post ([link](https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)). It is also in the repo in the [`bin/setup-qdiscs`](bin/setup-qdiscs) file. All operations require `sudo`. 94 | 95 | ```bash 96 | # Set up the queuing discipline 97 | tc qdisc add dev lo root handle 1: prio bands 4 98 | tc qdisc add dev lo parent 1:1 handle 10: pfifo limit 1000 99 | tc qdisc add dev lo parent 1:2 handle 20: pfifo limit 1000 100 | tc qdisc add dev lo parent 1:3 handle 30: pfifo limit 1000 101 | 102 | # Create a plug qdisc with 1 meg of buffer 103 | nl-qdisc-add --dev=lo --parent=1:4 --id=40: plug --limit 1048576 104 | # Release the plug 105 | nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite 106 | 107 | # Set up the filter, any packet marked with "1" will be 108 | # directed to the plug 109 | tc filter add dev lo protocol ip parent 1:0 prio 1 handle 1 fw classid 1:4 110 | 111 | iptables -t mangle -I OUTPUT -p tcp -s 127.0.0.1 --syn -j MARK --set-mark 1 112 | ``` 113 | 114 | #### Reload 115 | 116 | Reloading a new version in production requires a dance with your process supervisor. However, the principle is similar even if the details are different. 117 | 118 | 1. Stop additional `SYN` from being delivered using 119 | 120 | ```bash 121 | sudo nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --buffer 122 | ``` 123 | 1. Start a new version of the server and save the PID. 124 | 1. Release the plug and let the `SYN` packets flow. 125 | 126 | ```bash 127 | sudo nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite 128 | ``` 129 | 1. Make requests to the health endpoint until the new PID is returned. 130 | 1. Stop `SYN` packets again. 131 | 1. Send `SIGTERM` to the other server processes so they will gracefully 132 | shutdown. 133 | 1. Release the `plug` again. 134 | 135 | An example for demonstrating this process can be found in [`reload/Main.hs`](reload/Main.hs). The `reload` app creates a new server and shuts down all other instances. This is for demonstration purposes. In production, you will want to integrate reloading with your process supervisor. 136 | 137 | ## Setup 138 | 139 | This repo includes a Vagrant file for running a performance test. 140 | 141 | The setup requires `stack`, `ab`, `libnl-utils`, Linux and, optionally `gnuplot`. The easiest way to test it is to use the Vagrant file, which will create a VM with everything installed. Vagrant can be downloaded [here](https://www.vagrantup.com/downloads.html) 142 | 143 | Run with the following steps: 144 | 145 | ```bash 146 | $ git clone https://github.com/jfischoff/reuse-port-example 147 | $ cd reuse-port-example 148 | $ vagrant up 149 | $ vagrant ssh 150 | $ cd /vagrant 151 | $ stack setup 152 | $ bin/test 153 | ``` 154 | 155 | [`bin/test`](bin/test) will: 156 | 157 | 1. Build the project. 158 | 1. Setup the queuing disciplines. 159 | 1. Start the server. 160 | 1. Run ab. 161 | 1. Stop the server. 162 | 1. Start the `reload` every 100 ms. 163 | 1. Run ab. 164 | 165 | ## Performance 166 | 167 | Below are the times without constant reloading and with constant reloading. Crucially, no connections are dropped and there are no request failures. 100,000 requests were run. 168 | 169 | #### Without Constant Reloading (Baseline) 170 | - mean: 0.367 ms 171 | - stddev: 0.1 ms 172 | - 99%: 1 ms 173 | - max: 19 ms 174 | 175 | ![Baseline Scatter Plot](/baseline.png) 176 | 177 | #### With Constant Reloading 178 | - mean: 0.492 ms 179 | - stddev: 1.1 ms 180 | - 99%: 1 ms 181 | - max: 45 ms 182 | 183 | ![Reloading Scatter Plot](/reloading.png) 184 | 185 | ## Design Analysis 186 | 187 | #### Advantages 188 | - It is relatively self contained 189 | - Reloading is fast 190 | 191 | #### Disadvantages 192 | - The new version of the server is responding to requests **before health checks** are performed. If there is something wrong with the code, say it segfaults because the executable was corrupted, this will cause client impact. 193 | - There is a small amount overhead because the requests are blocked during the reload. 194 | 195 | ### Immutable Alternative 196 | 197 | One alternative would be to use an immutable blue/green deployment strategy, and utilize the load balancer to weigh in a new version. 198 | 199 | #### Advantages 200 | * We weigh in the machine **after health checks**. 201 | * It doesn't necessarily have any overhead (depends on the load balancer YMMV) 202 | 203 | #### Disadvantages 204 | * It can be slow to start up new VM's, which could prevent crucial fixes from 205 | being deployed. 206 | * Requires modifying the load balancers config, which is an easy way to cause 207 | catastrophic failure. 208 | * Arguably more work to setup. 209 | 210 | ### `huptime` 211 | 212 | [`huptime`](https://github.com/amscanne/huptime) was mentioned on the Yelp blog, but was not appropriate for their problem. However, I think it might solve many Haskell web server use cases. If anyone has any experience using `huptime`, let me know through the github issues of this repo or directly at [@jfischoff](https://twitter.com/jfischoff). 213 | 214 | ### Future Work 215 | 216 | An alternative design for utilizing `SO_REUSEPORT` is to create a parent process that keeps a pool of sockets which its workers inherit on reload. This is essentially the design that `nginx` ([nginx reload](http://nginx.org/en/docs/control.html?utm_source=socket-sharding-nginx-release-1-9-1&utm_medium=blog&_ga=1.38701153.370685645.1475165126#upgrade), [nginx and `SO_REUSEPORT`](https://www.nginx.com/blog/socket-sharding-nginx-release-1-9-1/)) has. The primary advantage is that, in the typical case, we would not need to utilize the `plug` queuing discipline. 217 | 218 | Changing the number of child processes will still be problematic, but we could utilize the `plug` queueing discipline approach Yelp developed. 219 | 220 | ## Thanks 221 | 222 | I learned about `SO_REUSEPORT` from Imran Hamead, and the Yelp post was invaluable for preventing errors. 223 | --------------------------------------------------------------------------------