├── images ├── es-cluster-01.png ├── es-cluster-02.png └── es-cluster-03.png ├── article.md └── README.md /images/es-cluster-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yoanisgil/docker-elasticsearch-cluster/HEAD/images/es-cluster-01.png -------------------------------------------------------------------------------- /images/es-cluster-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yoanisgil/docker-elasticsearch-cluster/HEAD/images/es-cluster-02.png -------------------------------------------------------------------------------- /images/es-cluster-03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yoanisgil/docker-elasticsearch-cluster/HEAD/images/es-cluster-03.png -------------------------------------------------------------------------------- /article.md: -------------------------------------------------------------------------------- 1 | # Logging with Docker - Part 1.1 2 | 3 | So it's been a while since [I started this blog series on logging with Docker](https://medium.com/@yoanis_gil/logging-with-docker-part-1-b23ef1443aac) and though I said the next article will be about *gelf* and *fluentd*, I wanted to take the time to provide a more realistic example which illustrates what it takes to integrate logging into your web application. Because let's face it, this: 4 | 5 | ```python 6 | import sys 7 | import time 8 | 9 | while True: 10 | sys.stderr.write('Error\n') 11 | sys.stdout.write('All Good\n') 12 | time.sleep(1) 13 | ``` 14 | 15 | hardly counts as an application, nor you get paid for writing such code ;). That said we will create a very simple [Flask](http://flask.pocoo.org/) application for creating [Countdowns](https://en.wikipedia.org/wiki/Countdown). The application code is already hosted on [GitHub](https://github.com/yoanisgil/countdown-python) 16 | 17 | The examples given throughout the article were created and tested using Docker v1.9. 18 | 19 | # Quick Recap: Docker logging 101 20 | 21 | How logging works: 22 | 23 | - Anything your application write to it’s `stdin/stderr` will be shipped to the configured driver 24 | - Logging driver can be globally configured at the Docker daemon level, when launching it. 25 | - When creating a new container one can specify the logging driver to be used, which effectively overrides de daemon's configuration. 26 | 27 | As of Docker 1.9 there are 6 logging drivers implementations: 28 | 29 | - `json-file`: This is the default driver. Everything gets logged to a JSON-structured file 30 | - `syslog`: Ship logging information to a syslog server 31 | - `journald`: Write log messages to journald (journald is a logging service which comes with [systemd](http://www.freedesktop.org/wiki/Software/systemd/)) 32 | - `gelf`: Writes log messages to a [GELF](https://www.graylog.org/resources/gelf/) endpoint like Graylog or Logstash 33 | - `fluentd`: Write log messages to [fluentd](http://www.fluentd.org/) 34 | - `awslogs`: [Amazon CloudWatch Logs](https://aws.amazon.com/about-aws/whats-new/2014/07/10/introducing-amazon-cloudwatch-logs/) logging driver for Docker 35 | 36 | 37 | 38 | # The Application 39 | 40 | As stated before we will be using a countdown application for illustration purposes. The application is quite simple itself: 41 | 42 | ![Home page](https://raw.githubusercontent.com/yoanisgil/docker-logging-tutorial/master/part-01.1/images/countdown-1.png) 43 | 44 | after providing a description for tracking down the time to your must beloved and anxiously awaited event you will be presented with: 45 | 46 | ![ticker](https://raw.githubusercontent.com/yoanisgil/docker-logging-tutorial/master/part-01.1/images/countdown-2.png) 47 | 48 | Simple right? So let's get to work and break this thing in pieces. 49 | 50 | # Docker ... always Docker 51 | 52 | As you might have already guessed this application was developed to work 100% with Docker so a [Dockerfile](https://github.com/yoanisgil/countdown-python/blob/master/Dockerfile) has been created for that matter, which running a Python [WSGI application](https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface) using [Nginx](https://www.nginx.com/) and [Gunicorn](https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface). The file itself it's a quite a piece of work and it probably deserves it's own post (**NOTE to self**: Lets blog about it!). It's exactly 42 lines,I swear it was not my intention to do it like that, which takes care of: 53 | 54 | - Installing Nginx. 55 | - Installing application dependencies from `requirements.txt`. 56 | - Providing a default Nginx site configuration which makes Nginx and Gunicorn work together. 57 | - Providing a set of utility scripts to be able to launch the application from Gunicorn. 58 | - Configure Supervisord to launch Nginx and Gunicorn. 59 | 60 | If you take a closer look at the Dockerfile you will notice that the application entry point looks like this: 61 | 62 | ``` 63 | CMD ["supervisord", "-n", "-c", "/etc/supervisord/supervisord.conf"] 64 | ``` 65 | 66 | and this is because we're using [Supervisord](http://supervisord.org/), a process control system, which acts as the [Init Process](https://en.wikipedia.org/wiki/Init). If you're wondering why a process control system like Supervisord is required, please take a look at this [excellent article](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/) which explains the PID 1 zombie reaping problem. 67 | 68 | Let's take a look at the life cycle of incoming requests to the application. Each HTTP request will follow the path below : 69 | 70 | HTTP Request <---> Nginx <-- Proxy Pass --> Gunicorn <-- WSGI --> Flask Application 71 | 72 | As we already know, the application needs to send all logging statements to it's `stdin/stderr` output but bear in mind that Docker will only collect the output from the process with ID 1. That said a few tweaks are required to Supervisord, Nginx and Gunicorn so that logging statements are transparently captured. Let's go over them. 73 | 74 | ## Supervisord configuration 75 | 76 | Since we're using Supervisord to launch our application, this is where the Docker daemon will be collecting information from. We need to make sure that everything that get sends to Gunicorn/Nginx `stdin/stderr` output it's actually logged to Supervisord's `stdin/stderr`. Below the relevant pieces of configuration: 77 | 78 | ``` 79 | [program:gunicorn] 80 | command=/usr/local/bin/gunicorn_start.sh 81 | stdout_logfile=/dev/stdout 82 | stdout_logfile_maxbytes=0 83 | stderr_logfile=/dev/stderr 84 | stderr_logfile_maxbytes=0 85 | 86 | [program:nginx] 87 | command=/usr/sbin/nginx 88 | stdout_logfile=/dev/stdout 89 | stdout_logfile_maxbytes=0 90 | stderr_logfile=/dev/stderr 91 | stderr_logfile_maxbytes=0 92 | ``` 93 | 94 | The important bit of config here is **stdout_logfile=/dev/stdout** and **stderr_logfile=/dev/stderr** since it instructs Supervisord to capture processes stdin/stderr output and forward it to Supervisord's stdin/stderr, which in turn will be collected by the Docker daemon and shipped to the configured logging driver. 95 | 96 | # Nginx/Gunicorn configuration 97 | 98 | Since supervisord collects `stdin/stderr` from the daemons it controls we need to instruct them to log everything to the expected descriptors. Bellow the relevants pieces of configuration for Nginx and Gunicorn: 99 | 100 | ####Nginx configuration 101 | ``` 102 | error_log /dev/stderr info; 103 | daemon off; 104 | 105 | events { 106 | worker_connections 1024; 107 | } 108 | 109 | http { 110 | ... 111 | access_log /dev/stdout main; 112 | .... 113 | } 114 | ``` 115 | 116 | ####Gunicorn configuration 117 | ``` 118 | exec gunicorn ${WSGI_MODULE}:${WSGI_APP} \ 119 | --name $NAME \ 120 | --workers $NUM_WORKERS \ 121 | --user=$USER --group=$GROUP \ 122 | --bind=unix:$SOCKFILE \ 123 | --log-level=info \ 124 | --log-file=/dev/stdout 125 | ``` 126 | 127 | # Logging (finally) 128 | 129 | With all the little pieces of configuration described above we can finally focus on logging from an application point of view. As example I've added a logging statement for each 404 error on the website: 130 | 131 | ``` 132 | @app.route('/v/') 133 | def countdown(countdown_id): 134 | count_down = Countdown.query.filter_by(id=countdown_id).first() 135 | 136 | if not count_down: 137 | app.logger.error('404 on countdown with id %s' % countdown_id) 138 | return render_template('404.html', countdown_id=countdown_id), 404 139 | 140 | return render_template('countdown.html', countdown=count_down) 141 | ``` 142 | 143 | At this point it's really up to you to configure your Framework's logging handlers and make sure they log to `stdin/stderr`. In the case of Flask application there is already a logger object, which is a standard Python [Logger object](https://docs.python.org/2/library/logging.html). 144 | 145 | Back to our Countdown application, let's see it in action. To make things easier, I've crafted a Docker Compose file which will take care of building and running the application. After you clone [the repo](https://github.com/yoanisgil/countdown-python) and before you launch the application make sure you edit the `docker-compose.yml` file and change the value of **syslog-address** to match your Docker setup. For instance if you're running Docker on OS X with Machine then you need to update this **syslog-address: "udp://192.168.99.101:5514"** to use the IP address of your virtual machine. With that in mind, let's launch the application: 146 | 147 | 148 | ``` 149 | $ docker-compose build countdown # Build the application image 150 | $ docker-compose up -d countdown 151 | $ docker exec rsyslog tail -f /var/log/messages 152 | ``` 153 | and then visit http://localhost:5000 and you should see the application welcome page. Let's force a 404 error by visiting http://dockerhost:5000/v/42. Go back to the console where the `docker exec` command was launched and you should see something like this: 154 | 155 | ``` 156 | 2015-12-24T04:35:45Z default docker/countdown[1012]: ----------- 157 | 2015-12-24T04:35:45Z default docker/countdown[1012]: ERROR in app [/srv/app/app.py:102]: 158 | 2015-12-24T04:35:45Z default docker/countdown[1012]: 404 on countdown with id 42 159 | 2015-12-24T04:35:45Z default docker/countdown[1012]: ----------- 160 | 2015-12-24T04:35:45Z default docker/countdown[1012]: 192.168.99.1 - - [24/Dec/2015:04:35:45 +0000] "GET /v/42 HTTP/1.1" 404 394 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36" "-" 161 | ``` 162 | 163 | which shows a log statement from the application and from Nginx as well. How was this possible? Let's take a look at the [Docker compose file](https://github.com/yoanisgil/countdown-python/blob/master/docker-compose.yml): 164 | 165 | ``` 166 | countdown: 167 | build: . 168 | environment: 169 | - NUM_WORKERS=1 170 | - APP_NAME=countdown 171 | - WSGI_MODULE=app 172 | - WSGI_APP=app 173 | - PYTHONPATH=/srv/app 174 | - APP_LISTEN_ON=0.0.0.0 175 | - APP_DEBUG=True 176 | - PYTHONUNBUFFERED=1 177 | - DB_DIR=/var/lib/countdown 178 | ports: 179 | - "5000:80/tcp" 180 | volumes: 181 | - ./data:/var/lib/countdown 182 | links: 183 | - syslog 184 | log_driver: syslog 185 | log_opt: 186 | syslog-address: "udp://192.168.99.101:5514" 187 | syslog-tag: "countdown" 188 | syslog: 189 | image: voxxit/rsyslog 190 | ports: 191 | - "5514:514/udp" 192 | container_name: rsyslog 193 | ``` 194 | 195 | As you can see we're linking the container powering the Countdown application with the syslog container. This is not strictly required since it's the Docker daemon who takes care of forwarding logging output towards the syslog container, but it guarantees that the Syslog container is always started as a dependency of the application. The rest is the same, as it was covered on Part 1. We configure our container to use the **syslog** driver and we make sure to add a tag. 196 | 197 | #What's next? 198 | 199 | By now you should have a better idea about how to effectively let Docker capture your logging information without prior knowledge of your application architecture and have it send to a configurable backend. 200 | 201 | The reason I like this approach is because of the degree of freedom it gives to both developers and sysadmins. Logging to `stdin/stderr` during the development process is a breeze since you don't need to tail/grep files located elsewhere. From a sysadmin/operations point of view it allows to easily configure how logging statements are archived, their destination, etc and most importantly without the need of performing any application specific configuration. 202 | 203 | On the next article of these series I will finally approaching the `fluentd, gelf and awslogs`drivers. 204 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Running an Elasticsearch cluster on Docker 1.9 with Swarm and Compose 2 | 3 | 4 | ## Intro 5 | 6 | I recently came across [this excellent article](http://nathanleclaire.com/blog/2015/11/17/seamless-docker-multihost-overlay-networking-on-digitalocean-with-machine-swarm-and-compose-ft.-rethinkdb/) by Nathan LeClaire which goes about spinning up a [RethinkDB cluster](https://www.rethinkdb.com/) on the cloud, powered by Docker 1.9 and Swarm 1.0. It illustrates how using tools like [Docker Compose](https://docs.docker.com/compose/) and [Docker Machine](https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=docker%20machine) one can launch a cluster in just a few minutes. This is all possible thanks to one of Docker's coolest feat ever: Multi-Host Networking. 7 | 8 | Multi-Host networking was labeled production grade on November 2015 and you can read more about it [here](https://blog.docker.com/2015/11/docker-multi-host-networking-ga/). This blog post assumes you're somewhat familiar with this particular feature, but if you're not take look at [this link](https://docs.docker.com/engine/userguide/networking/get-started-overlay/) and maybe [this video](https://www.youtube.com/watch?time_continue=45&v=B2wd_UigNxU) and you should be good to go. 9 | 10 | There are two main parts on this article: 11 | 12 | 1. Launching a Swarm cluster. 13 | 2. Launching an [Elasticsearch](https://www.elastic.co/products/elasticsearch) cluster. 14 | 15 | On the first part we will power up and Swarm cluster comprising one master and three nodes, which we will later use on part two to launch our Elasticsearch cluster. Before we move onto the fun stuff, make sure that: 16 | 17 | - You're running Docker >= 1.9 (verify with *docker version* ) 18 | - Docker Compose >= 1.5.0 (verify with *docker-compose --version*) 19 | - Docker Machine >= 0.5.0 (verify with *docker-machine --version*) 20 | - Have a valid/active [Digital Ocean](https://www.digitalocean.com/) account. If you don't you can get one [here](https://www.digitalocean.com/?refcode=b868d5213417). 21 | 22 | The "source code" of this article is available [on GitHub](https://github.com/yoanisgil/docker-elasticsearch-cluster). PRs are welcomed ;) 23 | 24 | ## The Swarm Cluster 25 | 26 | Before we can do anything else we need a few machines for our cluster, which we can be done in no time with Docker Machine. One thing I really love about Machine it's the number of Cloud providers [it supports](https://docs.docker.com/machine/drivers/) and how easy it is to launch a fully working Docker environment from the command line and have it running in a matter of minutes. 27 | 28 | So, go ahead and grab your Digital Ocean access token. If you're new to Digital Ocean, take a look at the *How To Generate a Personal Access Token* first section first part [this tutorial](https://www.digitalocean.com/community/tutorials/how-to-use-the-digitalocean-api-v2) . 29 | 30 | We will need to setup 5 environment variables: 31 | 32 | ```bash 33 | export DIGITALOCEAN_ACCESS_TOKEN=YOUR_DIGITAL_OCEAN_GOES_HERE 34 | export DIGITALOCEAN_IMAGE=debian-8-x64 35 | export DIGITALOCEAN_PRIVATE_NETWORKING=true 36 | export DIGITALOCEAN_REGION=sfo1 37 | export DIGITALOCEAN_SIZE=1gb 38 | ``` 39 | 40 | Each one of these environment variable will be used later by Docker Machine when creating the servers for running Docker on the cloud. Let's take a look at the meaning of each of these variables: 41 | 42 | - DIGITALOCEAN_ACCESS_TOKEN: This is the token which identifies your account so Digital Ocean knows on behalf of whom API calls are made (and so he can identify the account/user that will be billed ;)). 43 | - DIGITALOCEAN_IMAGE: This is indicates which Linux Distro/OS we want to run on the server. We went here for Debian 8 since Multi-Host networking requires a kernel >= 3.16. 44 | - DIGITALOCEAN_PRIVATE_NETWORKING: This enables communication between servers on the same data-center using a private network (i.e this network is not visible/accessible to the "outside" world). Take a look [here](https://www.digitalocean.com/company/blog/introducing-private-networking/) if you want to learn more about Digital Ocean's private networking feature. 45 | - DIGITALOCEAN_REGION: This is the region where servers will be created. I went for the Frisco region because is located in North America, but feel free to choose a different one especially if it happens to be closer to your geographical location. For a list of available regions take a look at the Digital Ocean's [regions API](https://developers.digitalocean.com/documentation/v2/#regions) 46 | - DIGITALOCEAN_SIZE: This is the amount of RAM to be allocated for the server. Even though you can go for 512mb which is less expensive I strongly suggest that you use 1gb since the price difference is not that significant and you will certainly appreciate the performance gain. 47 | 48 | For a comprehensive list of available options related the Digital Ocean driver for Docker machine take a look at the [online documentation](https://docs.docker.com/machine/drivers/digital-ocean/) 49 | 50 | Swarm requires access to a [Key-Value store](https://en.wikipedia.org/wiki/Key-value_database) so that nodes can be discovered and added tot he cluster (or removed when the node goes down). So let's launch a server for the sole purpose of running the Key Value store, which in our case will be [Consul](https://www.consul.io/): 51 | 52 | docker-machine create -d digitalocean kvstore 53 | eval $(docker-machine env kvstore) 54 | export KV_IP=$(docker-machine ssh kvstore 'ifconfig eth1 | grep "inet addr:" | cut -d: -f2 | cut -d" " -f1') 55 | docker run -d -p ${KV_IP}:8500:8500 -h consul--restart=always progrium/consul -server -bootstrap 56 | 57 | wait a few minutes for the machine to come online and make sure the Consul sever is up and running by running: 58 | 59 | docker-machine ssh kvstore curl -I http://$KV_IP:8500 2>/dev/null 60 | 61 | which should produce an output like this: 62 | 63 | HTTP/1.1 301 Moved Permanently 64 | Location: /ui/ 65 | Date: Tue, 19 Jan 2016 04:48:34 GMT 66 | Content-Type: text/plain; charset=utf-8 67 | 68 | With the Key Value store in place lets launch the Swarm master: 69 | 70 | docker-machine create -d digitalocean --swarm --swarm-master --swarm-discovery="consul://${KV_IP}:8500" --engine-opt="cluster-store=consul://${KV_IP}:8500" --engine-opt="cluster-advertise=eth1:2376" swarm-master 71 | 72 | and now lets summon those minions: 73 | 74 | export NUM_WORKERS=3; 75 | for i in $(seq 1 $NUM_WORKERS); do 76 | docker-machine create -d digitalocean --digitalocean-size=1gb --swarm --swarm-discovery="consul://${KV_IP}:8500" --engine-opt="cluster-store=consul://${KV_IP}:8500" --engine-opt="cluster-advertise=eth1:2376" swarm-node-${i} & 77 | done; 78 | 79 | This operation should take about 15 - 20 mins, so go get a cup of coffee ;). I wanted this to be run in sequence so that you can easily keep track of what's happening. 80 | 81 | I know there is quite a bit to digest here, but if you find it overwhelming please do take the time to read [Nathan's post](http://nathanleclaire.com/blog/2015/11/17/seamless-docker-multihost-overlay-networking-on-digitalocean-with-machine-swarm-and-compose-ft.-rethinkdb/) and I'm sure by the time you're done everything in this article will make perfect sense. 82 | 83 | 84 | ## The Elasticsearch Cluster 85 | 86 | With our infrastructure cluster in place we can now focus on the fun stuff: creating an Elasticsearch cluster. But before that, let's take a very quick look at Elasticsearch (ES from now on). ES is a high-availability/multi-tenant full-text search sever based on [Lucene](https://lucene.apache.org/core/). Yes, I know that was a mouthful to say but take a couple of minutes at the [ES home page](https://www.elastic.co/products/elasticsearch) and you'll get a better idea of what this remarkable piece of software can do for you. 87 | 88 | It might seem a bit late, but there is one question I'd like to address before moving on. Why an ES Cluster? Aren't there a few cloud solutions already available at affordable prices? Yes there are some, like [Amazon Elastic Search ](https://aws.amazon.com/elasticsearch-service/) and [QBox](https://qbox.io/). but you still need at least $50 if you want a decent setup. Yes, it is true that Amazon Elastic Search is cloud based and that you pay only for what you use but it takes some time to get use to Amazon's terms and concepts (and the interface, and the lots of clicks ;)) 89 | 90 | There is also another reason why I think being able to run an ES cluster of your own can save you some time and money. A few months ago I was tasked to evaluate a suitable [API Gateway](http://microservices.io/patterns/apigateway.html) to be put in front of our HTTP API(s). After a few weeks testing solutions from major providers it was clear to me that [Kong](https://getkong.org/) would suit our needs to a large degree. I won't go into the details as to why Kong is such a good solution but if you're in the business of API(s) and Micro Services do take a look at it. So, once it was clear that Kong was the way to go, I needed to be sure that we wouldn't incur into any performance penalties once we started routing API calls through the Kong API Gateway. To make the story short, I ended up creating a small Python script to be able to replay API calls from logs files and [this](https://github.com/yoanisgil/locust-grafana). 91 | 92 | That little projet of mine integrates [Locust](http://locust.io/), [Statsd](https://github.com/etsy/statsd), [Grafana](http://grafana.org/) and [InfluxDB](https://github.com/influxdata/influxdb) in order to visualize in real time the number of requests a given site/URL can handle over a short period of time. It is mostly about gluing all those pieces of software into a reusable stack and of course with a lot of help from Docker and Docker Compose. If I had only know ES around that time it would have saved me **a lot** of time. Why? You will see in no time. 93 | 94 | So, time to launch the ES cluster. Grab this [docker-compose file](https://gist.github.com/yoanisgil/047256dbe21622c1a10a) and save it somewhere in your filesystem.: 95 | 96 | $ mkdir -p ~/tmp/es-cluster 97 | $ cd ~/tmp/es-cluster 98 | $ curl https://gist.githubusercontent.com/yoanisgil/047256dbe21622c1a10a/raw/90bffe5dd2ee2940594ee1019458741f2594acd3/docker-compose.yml > docker-compose.yml 99 | 100 | with the file saved, lets make sure newly launched containers will run on our recently created Swarm cluster: 101 | 102 | $ eval $(docker-machine env --swarm swarm-master) 103 | 104 | and we can confirm that nodes are ready for taking work, by running: 105 | 106 | $ docker info 107 | Containers: 5 108 | Images: 4 109 | Role: primary 110 | Strategy: spread 111 | Filters: health, port, dependency, affinity, constraint 112 | Nodes: 4 113 | swarm-master: 107.170.200.211:2376 114 | └ Status: Healthy 115 | └ Containers: 2 116 | └ Reserved CPUs: 0 / 1 117 | └ Reserved Memory: 0 B / 1.026 GiB 118 | └ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), provider=digitalocean, storagedriver=aufs 119 | swarm-node-1: 107.170.203.99:2376 120 | └ Status: Healthy 121 | └ Containers: 1 122 | └ Reserved CPUs: 0 / 1 123 | └ Reserved Memory: 0 B / 1.026 GiB 124 | └ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), provider=digitalocean, storagedriver=aufs 125 | swarm-node-2: 107.170.194.118:2376 126 | └ Status: Healthy 127 | └ Containers: 1 128 | └ Reserved CPUs: 0 / 1 129 | └ Reserved Memory: 0 B / 1.026 GiB 130 | └ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), provider=digitalocean, storagedriver=aufs 131 | swarm-node-3: 107.170.247.233:2376 132 | └ Status: Healthy 133 | └ Containers: 1 134 | └ Reserved CPUs: 0 / 1 135 | └ Reserved Memory: 0 B / 1.026 GiB 136 | └ Labels: executiondriver=native-0.2, kernelversion=3.16.0-4-amd64, operatingsystem=Debian GNU/Linux 8 (jessie), provider=digitalocean, storagedriver=aufs 137 | CPUs: 4 138 | Total Memory: 4.103 GiB 139 | Name: 6546401664f0 140 | 141 | and here comes the fun. First we will launch a ES master node: 142 | 143 | docker-compose --x-networking --x-network-driver overlay up -d master 144 | 145 | The Docker Compose service relavant to this command looks like this: 146 | 147 | master: 148 | image: elasticsearch:2 149 | ports: 150 | -"9200:9200" 151 | restart: always 152 | container_name: es_master 153 | 154 | Most of these options will look familiar to you and I know providing a *container_name* reduces to 1 the number of containers with the same name that can run on a Swarm node but this will make things a bit easier for running certain actions on the ES master node later on. 155 | 156 | The *es_master* container will act as an contact/entry point for the other nodes to know about the cluster topology. Before we launch more nodes let's install the [Elasticsearch-HQ plugin](https://github.com/royrusso/elasticsearch-HQ), a plugin that will provide us with some very useful monitoring/configuration details about the cluster: 157 | 158 | docker exec es_master plugin install royrusso/elasticsearch-HQ 159 | 160 | Once the plugin has been installed lets take a look at what sort of information you can get from it. Run this command: 161 | 162 | $ echo "http://$(docker inspect --format='{{(index (index .NetworkSettings.Ports "9200/tcp") 0).HostIp}}' es_master):9200/_plugin/hq/" 163 | 164 | grab the resulting URL and enter it on your browser. You should see something like this: 165 | 166 | ![Elasticsearch HQ Plugin - First Load](https://raw.githubusercontent.com/yoanisgil/docker-elasticsearch-cluster/master/images/es-cluster-01.png) 167 | 168 | and, as the screenshot indicates, do not forget to click the **Connect** button when the page loads for the first time. 169 | 170 | We can clearly see that we have 1 node running, so let's launch 3 more: 171 | 172 | docker-compose --x-networking --x-network-driver overlay scale es-node=3 173 | 174 | and this is it! It does not get any easier like these. We have requested Docker Compose to summon 3 containers, each one of them running ES and by specifying **--x-networking --x-network-driver overlay** they will all be connected and reachable within a private network of their own. This all works of course because we're running Docker 1.9 on top of Swarm. Let's take a look at the section of the Docker Compose file which takes care of defining the ES nodes: 175 | 176 | es-node: 177 | image: elasticsearch:2 178 | command: elasticsearch --discovery.zen.ping.unicast.hosts=es_master 179 | restart: always 180 | environment: 181 | -"affinity:container!=*master*" 182 | -"affinity:container!=*es-node*" 183 | 184 | There are two important pieces here: 185 | 186 | - **discovery.zen.ping.unicast.hosts=es_master**: We're instructing ES to use [Zen Discovery](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html) in order to find out the cluster topology. We're telling nodes to contact the *es_master* host as the entry point for the gossiping protocol. Notice that *es_master* is the name we gave to our container running the ES master node. 187 | - -**"affinity:container!=*master*"** / "**affinity:container!=*es-node*"**: We're instructing Swarm to make sure that ES containers cannot run where the ES master container is running and that we only want to run one ES node per Swarm node. 188 | 189 | So let's go back and take a look at HQ plugin page. It should look like this: 190 | 191 | ![Elasticsearch pugling - After nodes added](https://raw.githubusercontent.com/yoanisgil/docker-elasticsearch-cluster/master/images/es-cluster-02.png) 192 | 193 | Finally, lets launch a Kibana container which is able to connect to our ES cluster. First open the *docker-compose.yml* file, look for the string **ES_MASTER_IP** and replace it with the IP address of the ES master node. You can get the IP address by running: 194 | 195 | docker inspect --format='{{(index (index .NetworkSettings.Ports "9200/tcp") 0).HostIp}}' es_master 196 | 197 | With that done, we can launch a Kibana container: 198 | 199 | docker-compose --x-networking --x-network-driver overlay up -d kibana 200 | 201 | and then go grab the URL where Kibana is running and enter it in our browser: 202 | 203 | echo "http://$(docker inspect --format='{{(index (index .NetworkSettings.Ports "5601/tcp") 0).HostIp}}' kibana):5601" 204 | 205 | you should see something like this: 206 | 207 | ![Kibana](https://raw.githubusercontent.com/yoanisgil/docker-elasticsearch-cluster/master/images/es-cluster-03.png) 208 | 209 | And that's it! You can start using your ES cluster with [Logstash](https://www.elastic.co/products/logstash)/[Filebeat](https://www.elastic.co/products/beats/filebeat). Just make sure you properly configure the [elastic search output plugin](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html) and make it point to the ES master node (any other node from the cluster will do as well). 210 | 211 | Go ahead and have fun! Try adding new nodes, removing existing ones and see it all in action. 212 | 213 | And, hey!, do not forget to turn those servers off because you know, you're billed as long as they're running ;). You can do that with docker-machine: 214 | 215 | $ docker-machine rm kvstore swarm-master swarm-node-1 swarm-node-2 swarm-node-3 216 | 217 | # Finito 218 | 219 | By now it should be clear to you how easy it is to run a Elasticsearch cluster on a cloud provider of your own. Moreover you have full control of how and when this is done. I'm not saying that by following the instructions provided here you will end up with a production grade cluster. This is far from it. However, for short lived task, like real time data analysis for making short term decisions I believe this presents some very interesting and unique options (and challenges as well). 220 | 221 | One thing I did not cover in this article was data persistence. That's certainly something to keep in mind if you need data to stick around after the life cycle of your cluster. I must be honest and say that I tried to achieve that by using [Openstorage](https://github.com/libopenstorage/openstorage) but unfortunately I ran into [this issue](https://github.com/libopenstorage/openstorage/issues/109). I will try to follow up on that and see if I can finally get data persistence working within a Swarm cluster but that will go for sure into another blogpost ;). 222 | 223 | For those asking themselves why ES would have saved me a lot of time, the answer it's easy: I only needed to log any performance information to a location (file, syslog, etc) which Logstash/Filebeast knows how to read. The rest was about getting ES to index the information and spent some time toying around with Kibana for producing the reports/graphs I needed. 224 | 225 | # Big Thanks 226 | 227 | Yes, that's absolutely something I need to do. Here it goes in no particular order of priority: 228 | 229 | - To Nathan Leclaire for his [excellent post](http://nathanleclaire.com/blog/2015/11/17/seamless-docker-multihost-overlay-networking-on-digitalocean-with-machine-swarm-and-compose-ft.-rethinkdb/). Can thank you enough mate, your post the starting point. 230 | - To Elasticsearch for crafting such a magic piece of software. 231 | - To Roy Russo for his excellent ElasticSearch HQ plugin. 232 | - To Docker and I think we'all know why :). 233 | 234 | --------------------------------------------------------------------------------