├── 001-getting-started
├── 001-getting-started.md
└── images
│ ├── image1.svg
│ └── image2.svg
├── 002-elasticsearch-and-the-jvm
└── 002-elasticsearch-and-the-jvm.md
├── 003-about-lucene
├── 003-about-lucene.md
└── images
│ └── image2.svg
├── 004-cluster-design
├── 004-cluster-design.md
└── images
│ ├── image1.svg
│ ├── image4.png
│ └── image5.png
├── 005-design-event-logging
├── 005-design-event-logging.md
└── images
│ └── image6.svg
├── 006-operating-daily
└── 006-operating-daily.md
├── 007-monitoring-es
├── 007-monitoring-es.md
└── images
│ ├── image10.png
│ ├── image7.png
│ ├── image8.png
│ └── image9.png
├── 100-use-cases-reindexing-36-billion-docs
├── 100-use-cases-reindexing-36-billion-docs.md
└── images
│ ├── image10.png
│ ├── image11.png
│ ├── image12.png
│ ├── image13.png
│ ├── image14.png
│ ├── image15.png
│ ├── image3.png
│ ├── image7.svg
│ ├── image8.svg
│ └── image9.png
├── 101-use-case-migrating-cluster-over-ocean
├── 101-use-case-migrating-cluster-over-ocean.md
└── images
│ ├── image2.png
│ ├── image2.svg
│ └── image3.png
├── 102-use-case-advanced-architecture-high-volume-reindexing
├── 102-use-case-advanced-architecture-high-volume-reindexing.md
└── images
│ ├── image1.png
│ ├── image2.svg
│ ├── image3.svg
│ ├── image4.svg
│ ├── image5.svg
│ └── image6.svg
├── 103-use-case-migrating-130tb-cluster-without-downtime
├── 103-use-case-migrating-130tb-cluster-without-downtime.md
└── images
│ ├── image16.svg
│ ├── image17.svg
│ ├── image18.svg
│ ├── image19.png
│ ├── image20.svg
│ ├── image21.svg
│ └── image22.svg
├── LICENSE
├── README.md
├── ZH-CN
├── 001-入门
│ ├── 001-入门.md
│ └── images
│ │ ├── image1.svg
│ │ └── image2.svg
├── 002-Elasticsearch和JVM
│ └── 002-elasticsearch-and-the-jvm.md
└── 003-关于Lucene
│ ├── 003-关于Lucene.md
│ └── images
│ └── image2.svg
├── _config.yml
└── images
└── image1.png
/001-getting-started/001-getting-started.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Getting Started with Elasticsearch
6 |
7 | This chapter is for people who have not used Elasticsearch yet. It covers Elasticsearch basic concepts and guides you into deploying and using your first single node cluster. Every concept explained here are detailed further in this book.
8 |
9 | In this introduction chapter you will learn:
10 |
11 | - The basic concepts behind Elasticsearch
12 | - What's an Elasticsearch cluster
13 | - How to deploy your first, single node Elasticsearch cluster on the most common operating systems
14 | - How to use Elasticsearch to index documents and find content
15 | - Elasticsearch configuration basics
16 | - What's an Elasticsearch plugin and how to use them
17 |
18 | ---
19 |
20 | ## Prerequisites
21 |
22 | In order to read this book and perform the operations described along its chapters, you need:
23 |
24 | - A machine or virtual machine running one of the popular Linux or Unix environments: Debian / Ubuntu, RHEL / CentOS or FreeBSD. Running Elasticsearch on Mac OS or Windows is not covered in this book
25 | - A basic knowledge of UNIX command line and the use of a terminal
26 | - Your favorite text editor
27 |
28 | If you have never used Elasticsearch before, I recommend to create a virtual machine so you won't harm your main system in case of mistake. You can either run it locally using a virtualization tool like [Virtualbox](https://www.virtualbox.org/) or on your favorite cloud provider.
29 |
30 | ---
31 |
32 | ## Elasticsearch basic concepts
33 |
34 | Elasticsearch is a distributed, scalable, fault tolerant open source search engine written in Java. It provides a powerful REST API both for adding or searching data and updating the configuration. Elasticsearch is led by Elastic, a company created by Shay Banon, who started the project on top of Lucene.
35 |
36 | ### REST APIs
37 |
38 | A REST API is an application program interface (API) that uses HTTP requests to `GET`, `PUT`, `POST` and `DELETE` data. An API for a website is code that allows two software programs to communicate with each another. The API spells out the proper way for a developer to write a program requesting services from an operating system or other application. REST is the Web counterpart of databases CRUD (Create, Read, Update, Delete).
39 |
40 | ### Open Source
41 |
42 | Open source means that Elasticsearch source code, the recipe to build the software, is public, free, and that anyone can contribute to the project by adding missing feature, documentation or fixing bugs. If accepted by the project, their work is then available to the whole community. Because Elasticsearch is open source, the company behind it can go bankrupt or stop maintaining the project without killing it. Someone else will be able to take over it and keep the project alive.
43 |
44 | ### Java
45 |
46 | Java is a programming language created in 1995 by Sun Microsystems. Java applications runs on the top of the Java Virtual Machine (JVM), which means that it is independent of the platform it has been written on. Java is most well known for its Garbage Collector (GC), a powerful way to manage memory.
47 |
48 | Java is not Javascript, which was developed in the mid 90s by Netscape INC. Despite having very similar names, Java and Javascript are two different languages, with a different purpose.
49 |
50 | > Javascript is to Java what hamster is to ham. – Jeremy Keith
51 |
52 | ### Distributed
53 |
54 | Elasticsearch runs on as many hosts as required by the workload or the amount of data. Hosts communicate and synchronize using messages over the network. A networked machine running Elasticsearch is called a node, and the whole group of nodes sharing the same cluster name is called a cluster.
55 |
56 | ### Scalable
57 |
58 | Elasticsearch scales horizontally. Horizontal scaling means that the cluster can grow by adding new nodes. When adding more machines, you don't need to restart the whole cluster. When a new node joins the cluster, it gets a part of the existing data. Horizontal scaling is the opposite of vertical scaling, where the only way to grow is running a software on a bigger machine.
59 |
60 | ### Fault tolerant
61 |
62 | Elasticsearch ensures the data is replicated at least once - unless specified - on 2 separate nodes. When a node leaves the cluster, Elasticsearch rebuilds the replication on the remaining nodes, unless there's no more node to replicate to.
63 |
64 | ---
65 |
66 | ## What's an Elasticsearch cluster?
67 |
68 | A cluster is a host or a group of hosts running Elasticsearch and configured with the same `cluster name`. The default `cluster name` is `elasticsearch` but using it in production is not recommended.
69 |
70 | Each host in an Elasticsearch cluster can fulfill one or multiple roles in the following:
71 |
72 | ### Master node
73 |
74 | The master nodes control the cluster. They give joining nodes information about the cluster, decide where to move the data, and reallocate the missing data when a node leaves. When multiple nodes can handle the master role, Elasticsearch elects an acting master. The acting master is called `elected master` When the elected master leaves the cluster, another master node takes over the role of elected master.
75 |
76 | ### Ingest nodes
77 |
78 | An ingest node pre-process's documents before the actual document indexing happens. The ingest node intercepts bulk and index requests, it applies transformations, and it then passes the documents back to the index or bulk APIs.
79 |
80 | All nodes enable ingest by default, so any node can handle ingest tasks. You can also create dedicated ingest nodes.
81 |
82 | ### Data Nodes
83 |
84 | Data nodes store the indexed data. They are responsible for managing stored data, and performing operations on that data when queried.
85 |
86 | ### Tribe Nodes
87 |
88 | Tribe nodes connect to multiple Elasticsearch clusters and performs operations such as search accross every connected clusters.
89 |
90 | ### A Minimal, Fault Tolerant Elasticsearch Cluster
91 |
92 | 
93 |
94 | A minimal fault tolerant Elasticsearch cluster should be composed of:
95 |
96 | * 3 master nodes
97 | * 2 ingest nodes
98 | * 2 data nodes
99 |
100 | Having 3 master nodes is important to make sure that the cluster won't be in a state of split brain in case of network separation, by making sure that there are at least 2 eligible master nodes present in the cluster. If the number of eligible master nodes falls behind 2, then the cluster will refuse any new indexing until the problem is fixed.
101 |
102 | ---
103 |
104 | ## What's an Elasticsearch index
105 |
106 | An `index` is a group of documents that with similar characteristics. It is identified by a name which is used when performing operations against stored documents or the `index` structure itself. An `index` structure is defined by a `mapping`, a `JSON` file describing both the document characteristics and the `index` options such as the replication factor. In an Elasticsearch cluster, you can define as many `indexes` as you want.
107 |
108 | An Elasticsearch `index` is composed of 1 or multiple `shards`. A `shard` is a Lucene index, and the number of `shards` is defined at the `index` creation time. Elasticsearch allocates an `index` `shards` accross the cluster, either automatically or according to user defined rules.
109 |
110 | Lucene is the name of the search engine that powers Elasticsearch. It is an open source project from the Apache Foundation. You most probably never hear about Lucene when operating an Elasticsearch cluster, but this book covers the basics you need to know.
111 |
112 | A `shard` is made of one or multiple `segments`, which are binary files where Lucene indexes the stored documents.
113 |
114 | 
115 |
116 | If you're familiar with relational databases such as MySQL, then an `index` is a database, the `mapping` is the database schema, and the shards represent the database data. Due to the distributed nature of Elasticsearch, and the specificities of Lucene, the comparison with a relational database stops here.
117 |
118 | ---
119 |
120 | ## Deploying your first Elasticsearch cluster
121 |
122 | ### Deploying Elasticsearch on Debian
123 |
124 | TODO [issue #9](https://github.com/fdv/running-elasticsearch-fun-profit/issues/9)
125 |
126 | ### Deploying Elasticsearch on RHEL / CentOS
127 |
128 | TODO [issue #9](https://github.com/fdv/running-elasticsearch-fun-profit/issues/9)
129 |
130 | ---
131 |
132 | ## First step using Elasticsearch
133 |
134 | TODO [issue #10](https://github.com/fdv/running-elasticsearch-fun-profit/issues/10)
135 |
136 | ---
137 |
138 | ## Elasticsearch Configuration
139 |
140 | TODO [issue #10](https://github.com/fdv/running-elasticsearch-fun-profit/issues/10)
141 |
142 | ## Elasticsearch Plugins
143 |
144 | TODO [issue #10](https://github.com/fdv/running-elasticsearch-fun-profit/issues/10)
145 |
--------------------------------------------------------------------------------
/001-getting-started/images/image1.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/001-getting-started/images/image2.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/002-elasticsearch-and-the-jvm/002-elasticsearch-and-the-jvm.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Elasticsearch and the Java Virtual Machine
6 |
7 | Elasticsearch is a software written in Java. It requires the Java Runtime Environment (JRE) deployed on the same host to run. Currently supported versions of Elasticsearch can run on the following operating systems / distributions and Java.
8 |
9 | ## Supported JVM and operating systems / distributions
10 |
11 | The following matrices present the various operating systems and Java Virtual Machines (JVM) officially supported by Elastic for both 2.4.x and 5.5.x versions. Every operating system or JVM not mentioned here is not supported by Elastic and therefor should not be used.
12 |
13 | ### Operating system matrix
14 |
15 | | | CentOS/RHEL 6.x/7.x | Oracle Enterprise Linux 6/7 with RHEL Kernel only | Ubuntu 14.04 | Ubuntu 16.04 | **Ubuntu 18.04** | SLES 11 SP4\*\*/12 | SLES 12 | openSUSE Leap 42 |
16 | | --- |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
17 | | **ES 5.0.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
18 | | **ES 5.1.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
19 | | **ES 5.2.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
20 | | **ES 5.3.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
21 | | **ES 5.4.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
22 | | **ES 5.5.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
23 | | **ES 6.0.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
24 | | **ES 6.1.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
25 | | **ES 6.2.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
26 | | **ES 6.3.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
27 | | **ES 6.4.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
28 | | **ES 6.5.x** | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
29 |
30 |
31 | | | Windows Server 2012/R2 | Windows Server 2016 | Debian 7 | Debian 8 | Debian 9 | **Solaris / SmartOS** | Amazon Linux |
32 | | --- |:---:|:---:|:---:|:---:|:---:|:---:|:---:|
33 | | **ES 5.0** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
34 | | **ES 5.1.x** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
35 | | **ES 5.2.x** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
36 | | **ES 5.3.x** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
37 | | **ES 5.4.x** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
38 | | **ES 5.5.x** | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
39 | | **ES 6.0.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
40 | | **ES 6.1.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
41 | | **ES 6.2.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
42 | | **ES 6.3.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
43 | | **ES 6.4.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
44 | | **ES 6.5.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
45 |
46 | Elasticsearch runs on both OpenSolaris and FreeBSD. FreeBSD 11.1 provides an Elasticsearch 6.4.2 package maintained by [Mark Felder](mailto:feld@freebsd.org), but neither of these operating systems are officially supported by Elastic.
47 |
48 | ### Java Virtual Machine matrix
49 |
50 | | | Oracle/OpenJDK 1.8.0u111+ | Oracle/OpenJDK 9 | OpenJDK 10 | OpenJDK 11 | Azul Zing 16.01.9.0+ | IBM J9 |
51 | | --- |:---:|:---:|:---:|:---:|:---:| --- |
52 | | **ES 5.0.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
53 | | **ES 5.1.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
54 | | **ES 5.2.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
55 | | **ES 5.3.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
56 | | **ES 5.4.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
57 | | **ES 5.5.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
58 | | **ES 5.6**.x | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
59 | | **ES 6.0.x** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
60 | | **ES 6.1.x** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
61 | | **ES 6.2.x** | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
62 | | **ES 6.3.x** | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
63 | | **ES 6.4.x** | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
64 | | **ES 6.5.x** | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
65 |
66 |
67 |
68 | ## Memory management
69 |
70 | TODO
71 |
72 | ## Garbage collection
73 |
74 | Java is a garbage collected language. The developer does not have to manage the memory allocation. The Java Virtual Machine periodically runs a specific system thread called GC Thread that takes care of the various garbage collection activities. One of them is reclaiming the memory occupied by objects that are no longer in use by the program.
75 |
76 | Java 1.8 comes with 3 different garbage collector families, which all have their own feature.
77 |
78 | The *Single Collector* uses a single thread to perform the whole garbage collection process. It is efficient on single processor machines, as it suppresses the overhead implied by the communication between threads, but not suitable for most real world use today. It was designed for heaps managing small datasets, of an order of 100MB.
79 |
80 | The *Parallel Collector* runs small garbage collections in parallel. Running parallel collections reduces the garbage collection overhead. It was designed for medium to large datasets running on multi threaded hosts.
81 |
82 | The *Mostly Concurrent Collector* perform most of its work concurrently to keep garbage-collection pauses short. It is designed for large sized datasets, when response time matters, because the technique used to minimise pauses can affect the application performances. Java 1.8 offers two Mostly Concurrent Collectors, the *Concurrent Mark & Sweep Garbage Collector*, and the *Garbage First Garbage Collector*, also known as G1GC.
83 |
84 | ### Concurrent Mark & Sweep Garbage Collector
85 |
86 | TODO
87 |
88 | ### Garbage First Garbage Collector
89 |
90 | TODO
91 |
--------------------------------------------------------------------------------
/003-about-lucene/003-about-lucene.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # A few things you need to know about Lucene
6 |
7 | Before you start to think about choosing the right hardware, there are a few things you need to know about [Lucene](http://lucene.apache.org/).
8 |
9 | Lucene is the name of the search engine that powers Elasticsearch. It is an open source project from the Apache Foundation. There's no need to interact with Lucene directly, at least most of the time, when running Elasticsearch. But there's a few important things to know before choosing the cluster storage and file system.
10 |
11 | ## Lucene segments
12 |
13 | Each Elasticsearch index is divided into shards. Shards are both logical and physical divisions of an index. Each Elasticsearch shard is a Lucene index. The maximum number of documents you can have in a Lucene index is 2,147,483,519. The Lucene index is divided into smaller files called segments. A segment is a small Lucene index. Lucene searches in all segments sequentially.
14 |
15 | 
16 |
17 | Lucene creates a segment when a new writer is opened, and when a writer commits or is closed. It means segments are immutable. When you add new documents into your Elasticsearch index, Lucene creates a new segment and writes it. Lucene can also create more segments when the indexing throughput is important.
18 |
19 | From time to time, Lucene merges smaller segments into a larger one. the merge can also be triggered manually from the Elasticsearch API.
20 |
21 | This behavior has a few consequences from an operational point of view.
22 |
23 | The more segments you have, the slower the search. This is because Lucene has to search through all the segments in sequence, not in parallel. Having a little number of segments improves search performances.
24 |
25 | Lucene merges have a cost in terms of CPU and I/Os. It means they might slow your indexing down. When performing a bulk indexing, for example an initial indexing, it is recommended to disable the merges completely.
26 |
27 | If you plan to host lots of shards and segments on the same host, you might choose a filesystem that copes well with lots of small files and does not have an important inode limitation. This is something we'll deal in details in the part about choosing the right file system.
28 |
29 | ## Lucene deletes and updates
30 |
31 | Lucene performs copy on write when updating and deleting a document. It means the document is never deleted from the index. Instead, Lucene marks the document as deleted and creates another one when an update is triggered.
32 |
33 | This copy on write has an operational consequence. As you'll update or delete documents, your indices will grow on the disk unless you delete them completely. One solution to actually remove the marked documents is to force Lucene segments merges.
34 |
35 | During a merge, Lucene takes 2 segments, and moves the content into a third, new one. Then the old segments are deleted from the disk. It means Lucene needs enough free space on the disk to create a segment the size of both segments it needs to merge.
36 |
37 | A problem can arise when force merging a huge shard. If the shard size is \> half of the disk size, you provably won't be able to fully merge it, unless most of the data is made of deleted documents.
38 |
--------------------------------------------------------------------------------
/003-about-lucene/images/image2.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/004-cluster-design/images/image4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/004-cluster-design/images/image4.png
--------------------------------------------------------------------------------
/004-cluster-design/images/image5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/004-cluster-design/images/image5.png
--------------------------------------------------------------------------------
/005-design-event-logging/005-design-event-logging.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Design for Event Logging
6 |
7 | Elasticsearch has made a blast in the event analysis world thanks --- or because of --- the famous Elasticsearch / Logstash / Kibana (ELK) trinity. In this specific use case, Elasticsearch acts as a hot storage that makes normalized events searchable.
8 |
9 | The usual topology of an event analysis infrastructure is more or less the same whatever the technical stack.
10 |
11 | Heterogeneous events are pushed from various locations into a queue. Queuing has 2 purposes: make sure the data processing won't act as a bottleneck in case of an unexpected spike, and make sure no event is lost if the data processing stack crashes.
12 |
13 | A data processing tool normalizes the events. You have 0 chance to have homogeneous events in an event analysis infrastructure. Events can be logs, metrics, or whatever you can think about, and they need to be normalized to be searchable.
14 |
15 | The data processing tool forwards the events to a hot storage where they can be searched. Here, the hot storage is, indeed, Elasticsearch.
16 |
17 | ## Design of an event logging infrastructure cluster
18 |
19 | Event analysis is the typical use case where you can start small, with a single node cluster, and scale when needed. Most of the time, you won't collect all the events you want to analyse from day 1, so it's OK not to over engineer things.
20 |
21 | The event logging infrastructure is the typical tricky use case that might have you pulling your hair for some time saying Elasticsearch is the worst software ever. It's both extremely heavy on writes, with only a few search query.
22 |
23 | Writes can easily become the bottleneck of the infrastructure, either from a CPU or storage point of view, one more reason to choose the software prior to Elasticsearch wisely to avoid losing events.
24 |
25 | Searches are performed on such an amount of data that one of them might trigger an out of memory error on the Java heap space, or an infinite garbage collection.
26 |
27 | Before you start, there's a few things you need to think about. Since we are focusing on designing an Elasticsearch cluster, we'll start from the moment events are normalized and pushed into Elasticsearch.
28 |
29 | ### Throughput: how many events per second (eps) are you going to collect?
30 |
31 | This is not a question you can answer out of the box unless you already have a central events collection platform. It's an important one though, as it will define most of your hardware requirements. Event logging varies a lot according to your platform activity, so I'd recommend tracking them for a week or more before you start building your Elasticsearch cluster.
32 |
33 | One important things to know is: do you need realtime indexing, or can you accept some lag. If the latter is an option, then you can let the lag being indexed after a spike of events happen, so you don't need to build for the maximum amount of events you can get.
34 |
35 | ### Retention: how long do you want to keep your data, hot and cold?
36 |
37 | Hot data is data you can access immediately, while cold data is what can be accessed within a reasonable amount of time. Retention depends both on your needs and national regulation. For example, in France, we're supposed to keep our access logs during a full year, financial transactions need to be kept for 3 to 5 years, etc.
38 |
39 | On Elasticsearch, hot data means opened, accessible indexes. Cold data means closed indexes, or backups of an index snapshot you can easily and quickly transfer and reopen.
40 |
41 | ### Size: what is the average size of a collected event?
42 |
43 | This metric is important as well. Knowing about throughput * retention period * events size will help you define the amount and type of storage you need, hence the cost of your events logging platform.
44 |
45 | Storage = throughput * events size * retention period.
46 |
47 | Hot data is made of opened, searchable indices. They are the ones you'll search into on a regular basis, for debugging or statistics purpose.
48 |
49 | ### Fault tolerance: can you afford losing your indexed data?
50 |
51 | Most of the time, losing your search backend is an option. Lots of people use the ELK stack to store application logs so they are easier to debug, and they are not a critical part of their infrastructure. Logs are also stored somewhere else, for example on a central syslog server so they are still searchable using some shell skills.
52 |
53 | When you can lose your search backend for a few hours, or don't want to invest in a full cluster, then a single Elasticsearch server is enough, provided your throughput allows it.
54 |
55 | The host minimal configuration is then:
56 |
57 | ```yaml
58 | master: true
59 | data: true
60 | index:
61 | number_of_replicas: 0
62 | ```
63 |
64 | If you start combining events analysis with alerting, or if you need your events to be searchable in realtime without downtime, then things get a bit more expensive. For example, you might want to correlate your whole platform auth.log to look for intrusion attempts or port scanning, so you can deploy new firewall rules accordingly. Then you'll have to start with a 3 nodes cluster. 3 nodes is a minimum since you need 2 active master nodes to avoid a split brain.
65 |
66 | 
67 |
68 | Here, the minimal hosts configuration for the master / ingest node is:
69 |
70 | ```yaml
71 | master: true
72 | data: false
73 | index:
74 | number_of_replicas: 1
75 | ```
76 |
77 | And for the data nodes:
78 |
79 | ```yaml
80 | master: true
81 | data: true
82 | index:
83 | number_of_replicas: 1
84 | ```
85 |
86 | If you decide to go cheap and combine the master and data nodes in a 3 hosts cluster, never use bulk indexing.
87 |
88 | Bulk indexing can put lots of pressure on the server memory, leading the master to exit the cluster. If you plan to run bulk indexing, then add one or 2 dedicated ingest node.
89 |
90 | The same applies to high memory consuming queries. If you plan to run such queries, then move your master nodes out of the data nodes.
91 |
92 | ### Queries
93 |
94 | The last thing you need to know about is the type of queries that are going to be ran against your Elasticsearch cluster. If you need to run simple queries, like looking for an error message, then memory pressure won't be a real problem, even against large data. Things get more interested when you need to perform complex filtered queries, or aggregations against a large set of data. Then you'll put lots of pressure on the cluster memory.
95 |
96 | ## Which hardware do I need?
97 |
98 | Once you've gathered all your prerequisites, it's time for hardware selection.
99 |
100 | Unless you're using ZFS as a filesystem to profit from compression and snapshots, you should not need more than 64GB RAM. ZFS is popular both to manage extremely large file systems and for its features, but is greedy on memory.
101 |
102 | Choose the CPU depending on both your throughput and your filesystem. ZFS is more greedy than ext4, for example. Elasticsearch index thread pool is equal to the number of available processors + 1, with a default queue of 200. So if you have a 24 core host, Elasticsearch will be able to manage 25 indexing at once, with a queue of 200. Everything else will be rejected.
103 |
104 | You can choose to use bulk indexing, which will allow you to index more events at the same time. The default thread pool and queue size are the same as the index thread pool.
105 |
106 | The storage part will usually be your bottleneck.
107 |
108 | Indeed, local storage and SSD are preferred, but lots of people will choose spinning disks or an external storage with fiberchannel to have more space.
109 |
110 | Whatever you choose, the more disks, the better. More disks provide you more axis, hence a faster indexing. If you go with some RAID10, then choose smaller disks, as very large disks such as 4TB+ spinning disks will take ages to rebuild.
111 |
112 | On a single node infrastructure, my favorite setup for a huge host is a RAID10 with as many 3.8TB SSD disks possible. Some servers can host up to 36 of them, which makes 18 available axes for more or less 55TB of usable space.
113 |
114 | On a multiple node infrastructure, I prefer to multiply the smaller hosts with a RAID0 and 8TB to 10TB space. This works great with 8 data nodes and more since rebuilding takes lots of time.
115 |
116 | ## How to design my indices?
117 |
118 | As usual, it depends on your needs, but this is the time to play with aliases and timestamped indexes. For example, if you're storing the output of your infrastructure auth.log, your indices can be:
119 |
120 | ```bash
121 | auth-\$(date +%Y-%m-%d)
122 | ```
123 |
124 | You'll probably want to have 1 index for each type of event you want to index, so you can build various, more adapted mappings. Event collection for syslog does not require the same index topology as an application event tracing, or even some temperature metrics you might want to put in a TSDB.
125 |
126 | While doing it, remember that too many indexes and too many shards might put lots of pressure on a single host. Constant writing creates lots of Lucene segments, so make sure Elasticsearch won't have \"too many open files\" issues.
127 |
128 | ## What about some tuning?
129 |
130 | Here starts the fun part.
131 |
132 | Depending on your throughput, you might need a large [indexing buffer](https://www.elastic.co/guide/en/elasticsearch/reference/current/indexing-buffer.html). The indexing buffer is a bunch of memory that stores the data to index. It differs from the index and bulk thread pools which manage the operations.
133 |
134 | Elasticsearch default index buffer is 10% of the memory allocated to the heap. But for heavy indexing operations, you might want to raise it to 30%, if not 40%.
135 |
136 | ```yaml
137 | indices:
138 | memory:
139 | index_buffer_size: "40%"
140 | ```
141 |
142 | Elasticsearch provides a per node [query cache](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-cache.html). Let's put it this way: you don't need caching on an event logging infrastructure. There's a way to disable it completely and that's what we want.
143 |
144 | ```yaml
145 | indices:
146 | query:
147 | cache.enabled: false
148 | ```
149 |
150 | You will also have a look at the indexing thread pool. I don't recommend changing the thread pool size, but depending on your throughput, changing the queue size might be a good idea in case of indexing spikes.
151 |
152 | ```yaml
153 | thread_pool:
154 | bulk:
155 | queue_size: 3000
156 | index:
157 | queue_size: 3000
158 | ```
159 |
160 | Finally, you will want to disable the store throttle if you're running on enough fast disks.
161 |
162 | ```yaml
163 | store:
164 | throttle.type: 'none'
165 | ```
166 |
167 | One more thing: when you don't need data in realtime, but can afford waiting a bit, you can cut your cluster a little slack by raising the indices refresh interval.
168 |
169 | ```yaml
170 | index:
171 | refresh_interval: "1m"
172 | ```
173 |
--------------------------------------------------------------------------------
/005-design-event-logging/images/image6.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/006-operating-daily/006-operating-daily.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Operating Daily
6 |
7 | ## Elasticsearch most common operations
8 |
9 | ### Mass index deletion with pattern
10 |
11 | I often have to delete hundreds of indexes at once. Their name usually follow some patterns, which makes batch deletion easier.
12 |
13 | ```bash
14 | for index in $(curl -XGET esmaster:9200/_cat/indices | awk '/pattern/ {print $3}'); do
15 | curl -XDELETE "localhost:9200/${index}?master_timeout=120s"
16 | done
17 | ```
18 |
19 | ### Mass optimize, indexes with the most deleted docs first
20 |
21 | Lucene, which powers Elasticsearch has a specific behavior when it comes to delete or update documents. Instead of actually deleting or overwriting the data, if flags it as deleted and write a new one. The only way to get rid of a deleted document is to run an *optimize* on your indexes.
22 |
23 | This snippet sorts your existing indexes by the number of deleted documents before it runs the optimize.
24 |
25 | ```bash
26 | for indice in $(CURL -XGET esmaster:9200/_cat/indices | sort -rk 7 | awk '{print $3}'); do
27 | curl -XPOST "localhost:9200/${indice}/_optimize?max_num_segments=1"
28 | done
29 | ```
30 |
31 | ### Restart a cluster using rack awareness
32 |
33 | Using rack awareness allows to split your replicated data evenly between hosts or data center. It's convenient to restart half of your cluster at once instead of host by host.
34 |
35 | ```bash
36 | curl -XPUT 'localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '
37 | {
38 | "transient" : {
39 | "cluster.routing.allocation.enable": "none"
40 | }
41 | }
42 | '
43 |
44 | for host in $(curl -XGET esmaster:9200/_cat/nodeattrs?attr | awk '/rack_id/ {print $2}'); do
45 | ssh ${host} service elasticsearch restart
46 | done
47 |
48 | sleep 60
49 |
50 | curl -XPUT -H 'Content-Type: application/json' "localhost:9200/_cluster/settings" -d '
51 | {
52 | "transient" : {
53 | "cluster.routing.allocation.enable": "all
54 | }
55 | }
56 | '
57 | ```
58 |
59 | ### Optimize your cluster restart
60 |
61 | There's a simple way to accelerate your cluster restart. Once you've brought your masters back, run this snippet. Most of the options are self explanatory:
62 |
63 | ```bash
64 | curl -XPUT 'localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '
65 | {
66 | "transient" : {
67 | "cluster.routing.allocation.cluster_concurrent_rebalance": 20,
68 | "indices.recovery.concurrent_streams": 20,
69 | "cluster.routing.allocation.node_initial_primaries_recoveries": 20,
70 | "cluster.routing.allocation.node_concurrent_recoveries": 20,
71 | "indices.recovery.max_bytes_per_sec": "2048mb",
72 | "cluster.routing.allocation.disk.threshold_enabled" : true,
73 | "cluster.routing.allocation.disk.watermark.low" : "90%",
74 | "cluster.routing.allocation.disk.watermark.high" : "98%",
75 | "cluster.routing.allocation.enable": "primary"
76 | }
77 | }
78 | '
79 | ```
80 |
81 | Then, once your cluster is back to yellow, run that one:
82 |
83 | ```bash
84 | curl -XPUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '
85 | {
86 | "transient" : {
87 | "cluster.routing.allocation.enable": "all"
88 | }
89 | }
90 | '
91 | ```
92 |
93 | ### Remove data nodes from a cluster the safe way
94 |
95 | ```bash
96 | curl -XPUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '
97 | {
98 | "transient" : {
99 | "cluster.routing.allocation.exclude._ip" : ",,"
100 | }
101 | }
102 | '
103 | ```
104 |
105 | ## Get useful information about your cluster
106 |
107 | ### Nodes information
108 |
109 | This snippet gets the most useful information from your Elasticsearch nodes:
110 |
111 | * hostname
112 | * role (master, data, nothing)
113 | * free disk space
114 | * heap used
115 | * ram used
116 | * file descriptors used
117 | * load
118 |
119 | ```bash
120 | curl -XGET "localhost:9200/_cat/nodes?v&h=host,r,d,hc,rc,fdc,l"
121 | ```
122 |
123 | Output:
124 | ```
125 | host r d hc rc fdc l
126 |
127 | 192.168.1.139 d 1tb 9.4gb 58.2gb 20752 0.20
128 | 192.168.1.203 d 988.4gb 16.2gb 59.3gb 21004 0.12
129 | 192.168.1.146 d 1tb 14.1gb 59.2gb 20952 0.18
130 | 192.168.1.169 d 1tb 14.3gb 58.8gb 20796 0.10
131 | 192.168.1.180 d 1tb 16.1gb 60.5gb 21140 0.17
132 | 192.168.1.188 d 1tb 9.5gb 59.4gb 20928 0.19
133 | ```
134 |
135 | Then, it's easy to sort the output to get interesting information.
136 |
137 | Sort by free disk space
138 |
139 | ```bash
140 | curl -XGET "localhost:9200/_cat/nodes?h=host,r,d,hc,rc,fdc,l" | sort -hrk 3
141 | ```
142 |
143 | Sort by heap occupancy:
144 |
145 | ```bash
146 | curl -XGET "localhost:9200/_cat/nodes?h=host,r,d,hc,rc,fdc,l" | sort -hrk 4
147 | ```
148 |
149 | And so on.
150 |
151 | ### Monitor your search queues
152 |
153 | It's sometimes useful to know what happens on your data nodes search queues. Beyond the search thread pool(default thread pool being ((CPU * 3) / 2) + 1 on each data node, queries get stacked into the search queue, a 1000 buffer.
154 |
155 | ```bash
156 | while true; do
157 | curl -XGET "localhost:9200/_cat/thread_pool?v&h=host,search.queue,search.active,search.rejected,search.completed" | sort -unk 2,3
158 | sleep 5
159 | done
160 | ```
161 |
162 | That code snippet only displays the data node running active search queries so it's easier to read on large cluster.
163 |
164 | ### Indices information
165 |
166 | This snippet gets most information you need about your indices. You can then grep on what you need to know: open, closed, green / yellow / red...
167 |
168 | ```bash
169 | curl -XGET "localhost:9200/_cat/indices?v"
170 | ```
171 |
172 | ### Shard allocation information
173 |
174 | Shards movement have lots of impact on your cluster performances. These snippets allows you to get the most critical information about your shards.
175 |
176 | ```bash
177 | curl -XGET "localhost:9200/_cat/shards?v"
178 | ```
179 |
180 | Output:
181 |
182 | ```
183 | 17_20140829 4 r STARTED 2894319 4.3gb 192.168.1.208 esdata89
184 | 17_20140829 10 p STARTED 2894440 4.3gb 192.168.1.206 esdata87
185 | 17_20140829 10 r STARTED 2894440 4.3gb 192.168.1.199 esdata44
186 | 17_20140829 3 p STARTED 2784067 4.1gb 192.168.1.203 esdata48
187 | ```
188 |
189 | ### Recovery information
190 |
191 | Recovery information comes under the form of a JSON output but it's still easy to read to understand what happens on your cluster.
192 |
193 | ```bash
194 | curl -XGET "localhost:9200/_recovery?pretty&active_only"
195 | ```
196 |
197 | ### Segments information (can be extremely verbose)
198 |
199 | ```bash
200 | curl -XGET "localhost:9200/_cat/nodes?h=host,r,d,hc,rc,fdc,l" | sort -hrk 3
201 | ```
202 |
203 | ### Cluster stats
204 |
205 | ```bash
206 | curl -XGET "localhost:9200/_cluster/stats?pretty"
207 | ```
208 |
209 | ### Nodes stats
210 |
211 | ```bash
212 | curl -XGET "localhost:9200/_nodes/stats?pretty"
213 | ```
214 |
215 | ### Indice stats
216 |
217 | ```bash
218 | curl -XGET "localhost:9200/someindice/_stats?pretty"
219 | ```
220 |
221 | ### Indice mapping
222 |
223 | ```bash
224 | curl -XGET "localhost:9200/someindice/_mapping"
225 | ```
226 |
227 | ### Indice settings
228 |
229 | ```bash
230 | curl -XGET "localhost:9200/someindice/_mapping/settings"
231 | ```
232 |
233 | ### Cluster dynamic settings
234 |
235 | ```bash
236 | curl -XGET "localhost:9200/_cluster/settings"
237 | ```
238 |
239 | ### All the cluster settings (can be extremely verbose)
240 |
241 | ```bash
242 | curl -XGET "localhost:9200/_settings"
243 | ```
244 |
--------------------------------------------------------------------------------
/007-monitoring-es/007-monitoring-es.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Monitoring Elasticsearch
6 |
7 | Is your cluster healthy for real?
8 |
9 | Monitoring Elasticsearch is the most important and most difficult part of deploying a cluster. The elements to monitor are countless, and not all of them are worth raising an alert. There are some common points though, but the fine monitoring really depends on the workload and use you need.
10 |
11 | This chapter is divided into 3 different parts, covering the 3 most important environments to monitor:
12 |
13 | * monitoring at the cluster level,
14 | * monitoring at the host level,
15 | * monitoring at the index level.
16 |
17 | Each parts extensively covers the critical things to have a look at, and gives you an overview to the little thing that might be worse checking when troubleshooting.
18 |
19 | ## Tools
20 |
21 | Elastic provides an extensive monitoring system through the X-Pack plugin. X-Pack has a free license with some functional limitations. The free license only lets you manage a single cluster, a limited amount of nodes, and has a limited data retention. X-Pack documentation is available at [https://www.elastic.co/guide/en/x-pack/index.html](https://www.elastic.co/guide/en/x-pack/index.html)
22 |
23 | 
24 |
25 | I have released 3 Grafana dashboards to monitor Elasticsearch Clusters using the data pushed by the X-Pack monitoring plugin. They provide much more information then the X-Pack monitoring interface, and are meant to be used when you need to gather data from various sources. They are not meant to replace X-Pack since they don't provide security, alerting or machine learning feature.
26 |
27 | Monitoring at the cluster level: [https://grafana.com/dashboards/3592](https://grafana.com/dashboards/3592)
28 |
29 | 
30 |
31 | Monitoring at the node level: [https://grafana.com/dashboards/3595](https://grafana.com/dashboards/3595)
32 |
33 | 
34 |
35 | Monitoring at the index level: [https://grafana.com/dashboards/3598](https://grafana.com/dashboards/3598)
36 |
37 | 
38 |
39 | These dashboards are meant to provide a look at everything Elasticsearch sends to the monitoring node. It doesn't mean you'll actually need this data.
40 |
41 | ## Monitoring at the host level
42 |
43 | TODO
44 |
45 | ## Monitoring at the node level
46 |
47 | TODO
48 |
49 | ## Monitoring at the cluster level
50 |
51 | TODO
52 |
53 | ## Monitoring at the index level
54 |
55 | TODO
56 |
--------------------------------------------------------------------------------
/007-monitoring-es/images/image10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/007-monitoring-es/images/image10.png
--------------------------------------------------------------------------------
/007-monitoring-es/images/image7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/007-monitoring-es/images/image7.png
--------------------------------------------------------------------------------
/007-monitoring-es/images/image8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/007-monitoring-es/images/image8.png
--------------------------------------------------------------------------------
/007-monitoring-es/images/image9.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/007-monitoring-es/images/image9.png
--------------------------------------------------------------------------------
/100-use-cases-reindexing-36-billion-docs/images/image10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/100-use-cases-reindexing-36-billion-docs/images/image10.png
--------------------------------------------------------------------------------
/100-use-cases-reindexing-36-billion-docs/images/image11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/100-use-cases-reindexing-36-billion-docs/images/image11.png
--------------------------------------------------------------------------------
/100-use-cases-reindexing-36-billion-docs/images/image12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/100-use-cases-reindexing-36-billion-docs/images/image12.png
--------------------------------------------------------------------------------
/100-use-cases-reindexing-36-billion-docs/images/image13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/100-use-cases-reindexing-36-billion-docs/images/image13.png
--------------------------------------------------------------------------------
/100-use-cases-reindexing-36-billion-docs/images/image14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/100-use-cases-reindexing-36-billion-docs/images/image14.png
--------------------------------------------------------------------------------
/100-use-cases-reindexing-36-billion-docs/images/image15.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/100-use-cases-reindexing-36-billion-docs/images/image15.png
--------------------------------------------------------------------------------
/100-use-cases-reindexing-36-billion-docs/images/image3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/100-use-cases-reindexing-36-billion-docs/images/image3.png
--------------------------------------------------------------------------------
/100-use-cases-reindexing-36-billion-docs/images/image9.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/100-use-cases-reindexing-36-billion-docs/images/image9.png
--------------------------------------------------------------------------------
/101-use-case-migrating-cluster-over-ocean/101-use-case-migrating-cluster-over-ocean.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Use Case: Migrating a Cluster Across the Ocean Without Downtime
6 |
7 | I had to migrate a whole cluster from Canada to France without downtime.
8 |
9 | With only 1.8TB of data, the cluster was quite small. However, crossing the ocean on an unreliable network made the process long and hazardous.
10 |
11 | My main concern was about downtime: it was not an option. Otherwise I would have shutdown the whole cluster, rsync the data and restarted the Elasticsearch processes.
12 |
13 |
14 | To avoid downtime, I decided to connect both clusters and rely on Elasticsearch elasticity. It was made possible because this (rather small) cluster relied on unicast for discovery. With unicast discovery, you add a list of node in your Elasticsearch configuration, and you let it discover his pairs. This is something I did once, but not cross continent!
15 |
16 | First step was to connect both clusters using unicast. To do this, I've added the IP address of the Canadian master nodes to one of the French cluster nodes configuration. I updated both machines firewall rules to they were able to communicate on port 9300, then restarted the Elasticsearch process.
17 |
18 | At first, I only launched one French node, the one I planned to communicate with the Canadian one as a gateway. After a few hours of shard relocation, everything was green again, and I just I was able to shutdown the first Canadian data node.
19 |
20 | That's when I launched the 2 other French nodes. They only knew about each other and the *gateway* node. They did not know anything about the Canadian ones, but it worked like a charm.
21 |
22 | If for some reason you can't expose your new Elasticsearch cluster, what you can do is add a http only node, you will use as a bridge. Just ensure it can communicate with both clusters by adding 1 IP of each of their nodes, it works quite well, even with 1 public and 1 private subnet. This gateway provides another advantage: you don't need to update your clusters configuration to make them discover each other.
23 |
24 | Once again, it took a few hours to relocate the shards within the cluster, but it was still working like a charm, getting his load of reads and writes from the application.
25 |
26 | Once the cluster was all green, I could shutdown the second Canadian node, then the third after some relocation madness.
27 |
28 | You may have noticed that at that time, routing nodes were still in Canada, and data in France.
29 |
30 | 
31 |
32 | That's right. The latest part of it was playing with DNS.
33 |
34 | 
35 |
36 | The main ES hostname the application accesses is managed using Amazon Route53. Route53 provides some nice round robin thing so the same A record can point to many IPs or CNAME with a weight system. It's pretty cool even though it does not provide failover. If one of your nodes crash, it needs to unregister itself from route53.
37 |
38 | As soon as the data transfer was OK, I was able to update route53, adding 3 new records to route53. Then, I deleted the old records and removed the routing nodes from the cluster. Mission successful.
39 |
--------------------------------------------------------------------------------
/101-use-case-migrating-cluster-over-ocean/images/image2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/101-use-case-migrating-cluster-over-ocean/images/image2.png
--------------------------------------------------------------------------------
/101-use-case-migrating-cluster-over-ocean/images/image3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/101-use-case-migrating-cluster-over-ocean/images/image3.png
--------------------------------------------------------------------------------
/102-use-case-advanced-architecture-high-volume-reindexing/102-use-case-advanced-architecture-high-volume-reindexing.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Use Case: An Advanced Elasticsearch Architecture for High-volume Reindexing
6 |
7 | I've found a new and funny way to play with [Elasticsearch](http://elastic.co/) to reindex a production cluster without disturbing our clients. If you haven't already, you might enjoy what we did last summer [reindexing 36 billion documents in 5 days within the same cluster](https://thoughts.t37.net/how-we-reindexed-36-billions-documents-in-5-days-within-the-same-elasticsearch-cluster-cd9c054d1db8#.5lw3khgtb).
8 |
9 | Reindexing that cluster was easy because it was not on production yet. Reindexing a whole cluster where regular clients expect to get their data in real time offers new challenges and more problems to solve.
10 |
11 | As you can see on the screenshot below, our main bottleneck the first time we reindexed Blackhole, the well named, was the CPU. Having the whole cluster at 100% and a load of 20 is not an option, so we need to find a workaround.
12 |
13 | 
14 |
15 | This time, we won't reindex Blackhole but Blink. Blink stores the data we display in our clients dashboards. We need to reindex them every time we change the mapping to enrich that data and add new feature our clients and colleagues love.
16 |
17 | ---
18 |
19 | ## A glimpse at our infrastructure
20 |
21 | Blink is a group of 3 clusters built around 27 physical hosts each, having 64GB RAM and 4 core / 8 thread Xeon D-1520's. They are small, affordable and disposable hosts. The topology is the same for each cluster:
22 |
23 | * 3 master nodes (2 in our main data center and 1 in our backup data center plus a virtual machine ready to launch in case of major outage)
24 | * 4 http query nodes (2 in each data center)
25 | * 20 data nodes (10 in each data center)
26 |
27 | The data nodes have 4*800GB SSD drives in RAID0, about 58TB per cluster. The data and nodes are configured with Elasticsearch zones awareness. With 1 replica for each index, that makes sure we have 100% of the data in each data center so we're crash proof.
28 |
29 | 
30 |
31 | We didn't allocate the http query nodes to a specific zone for a reason: we want to use the whole cluster when possible, at the cost of 1.2ms of network latency. From [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html):
32 |
33 | When executing search or GET requests, with shard awareness enabled, Elasticsearch will prefer using local shards – shards in the same awareness group – to execute the request. This is usually faster than crossing racks or awareness zones.
34 |
35 | In front of the clusters, we have a layer 7 load balancer made of 2 servers each running Haproxy and holding various virtual IP addresses (VIP). A keepalived ensures the active load balancer hold's the VIP. Each load balancer runs in a different data center for fault tolerance. Haproxy uses the allbackups configuration directive so we access the query nodes in the second data center only when the two first ones are down.
36 |
37 | ```haproxy
38 | frontend blink_01
39 | bind 10.10.10.1:9200
40 | default_backend be_blink01
41 |
42 | backend be_blink01
43 | balance leastconn
44 | option allbackups
45 | option httpchk GET /_cluster/health
46 | server esnode01 10.10.10.2:9200 check port 9200 inter 3s fall 3
47 | server esnode02 10.10.10.3:9200 check port 9200 inter 3s fall 3
48 | server esnode03 10.10.10.4:9200 check port 9200 inter 3s fall 3 backup
49 | server esnode04 10.10.10.5:9200 check port 9200 inter 3s fall 3 backup
50 | ```
51 |
52 | So our infrastructure diagram becomes:
53 | 
54 |
55 | In front of the Haproxy, we have an applicative layer called Baldur. Baldur was developed by my colleague [Nicolas Bazire](https://github.com/nicbaz) to handle multiple versions of a same Elasticsearch index and route queries amongst multiple clusters.
56 |
57 | There's a reason why we had to split the infrastructure in multiple clusters even though they all run the same version of Elasticsearch, the same plugins, and they do exactly the same things. Each cluster supports about 10,000 indices, and 30,000 shards. That's a lot, and Elasticsearch master nodes have a hard time dealing with so many indexes and shards.
58 |
59 | Baldur is both an API and an applicative load balancer built on Nginx with the LUA plugin. It connects to a MySQL database and has a local memcache based cache. Baldur was built for 2 reasons:
60 |
61 | to tell our API the active index for a dashboard
62 |
63 | to tell our indexers which indexes they should write in, since we manage multiple versions of the same index.
64 |
65 | In elasticsearch, each index has a defined naming: `_`
66 |
67 | In baldur, we use have 2 tables:
68 |
69 | The first one is the indexes table with the triplet
70 |
71 | ```
72 | id / cluster id / mapping id
73 | ```
74 |
75 | That's how we manage to index into multiple versions of the same index with the ongoing data during the migration process from one mapping to another.
76 |
77 | The second table is the reports table with the triplet
78 |
79 | ```
80 | client id / report id / active index id
81 | ```
82 |
83 | So the API knows which index it should use as active.
84 |
85 | Just like the load balancers, Baldur holds a VIP managed by another Keepalived, for fail over.
86 | 
87 |
88 | ---
89 |
90 | ## Using Elasticsearch for fun and profit
91 |
92 | Since you know everything you need about our infrastructure, let's talk about playing with our Elasticsearch cluster the smart way for fun and, indeed, profit.
93 |
94 | Elasticsearch and our indexes naming allows us to be lazy so we can watch more cute kitten videos on Youtube. To create an index with the right mapping and settings, we use Elasticsearch templates and auto create index patterns.
95 |
96 | Every node in the cluster has the following configuration:
97 |
98 | ```yaml
99 | action:
100 | auto_create_index: "+_*,+_*,-*"
101 | ```
102 |
103 | And we create a template in Elasticsearch for every mapping we need.
104 |
105 | ```bash
106 | curl -XPUT "localhost:9200/_template/template_" -H 'Content-Type: application/json' -d '
107 | {
108 | "template": "_*",
109 | "settings": {
110 | "number_of_shards": 1
111 | },
112 | "mappings": {
113 | "add some json": "here"
114 | },
115 | "mappings": {
116 | "add some json": "here"
117 | }
118 | }
119 | '
120 | ```
121 |
122 | Every time the indexer tries to write into a not yet existing index, Elasticsearch creates it with the right mapping. That's the magic.
123 |
124 | Except this time, we don't want to create empty indexes with a single shard as we're going to copy existing data.
125 |
126 | After playing with Elasticsearch for years, we've noticed that the best size / shard was about 10GB. This allows faster reallocation and recovery at a cost of more Lucene segments during heavy writing and more frequent optimization.
127 |
128 | On Blink, 1,000,000 documents weight about 2GB so we're creating indexes with 1 shard for each 5 million documents + 1 when the dashboard already has more than 5 million documents.
129 |
130 | Before reindexing a client, we run a small script to create the new indexes with the right amount of shards. Here's a simplified version without error management for your eyes only.
131 |
132 | ```bash
133 | curl -XPUT "localhost:9200/_" -H 'Content-Type: application/json' -d '
134 | { "settings.index.number_of_shards" : \'$(( $(curl -XGET "localhost:9200/_/_count" | cut -f 2 -d : | cut -f 1 -d ",") / 5000000 + 1))\'
135 | }
136 | '
137 | ```
138 |
139 | Now we're able to reindex, except we didn't solve the CPU issue. That's where fun things start.
140 |
141 | What we're going to do is to leverage Elasticsearch zone awareness to dedicate a few data nodes to the writing process. You can also add some new nodes if you can't afford removing a few from your existing cluster, it works exactly the same way.
142 |
143 | First, let's kick out all the indexes from those nodes.
144 |
145 | ```bash
146 | curl -XPUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '
147 | {
148 | "transient" : {
149 | "cluster.routing.allocation.exclude._ip" : ",,"
150 | }
151 | }
152 | '
153 | ```
154 |
155 | Elasticsearch then moves all the data from these nodes to the remaining ones. You can also shutdown those nodes and wait for the indexes to recover but you might lose data.
156 |
157 | Then, for each node, we edit Elasticsearch configuration to assign these nodes to a new zone called *envrack* (fucked up in French). We put all these machines in the secondary data center to use the spare http query nodes for the indexing process.
158 |
159 | ```yaml
160 | node:
161 | zone: 'envrack'
162 | ```
163 |
164 | Then restart Elasticsearch so it runs with the new configuration.
165 |
166 | We don't want Elasticsearch to allocate the existing indexes to the new zone when we bring back these nodes online, so we update these index settings accordingly.
167 |
168 | ```bash
169 | curl -XPUT "localhost:9200/_*/_settings" -H 'Content-Type: application/json' -d '
170 | {
171 | "routing.allocation.exclude.zone" : "envrack"
172 | }
173 | '
174 | ```
175 |
176 | The same way, we don't want the new indexes to be allocated to the production zones, so we update the creation script.
177 |
178 | ```bash
179 | #!/bin/bash
180 |
181 | shards=1
182 | counter=$(curl -XGET [[http://esnode01:9200/]{.underline}](http://esnode01:9200/)_/_count | cut -f 2 -d : | cut -f 1 -d ",")
183 |
184 | if [ $counter -gt 5000000 ]; then
185 | shards=$(( $counter / 5000000 + 1 ))
186 | fi
187 |
188 | curl -XPUT "localhost:9200/_" -H 'Content-Type: application/json' -d '
189 | {
190 | "settings" : {
191 | "index.number_of_shards" : '$counter',
192 | "index.numer_of_replicas" : 0,
193 | "routing.allocation.exclude.zone" : "barack,chirack"
194 | }
195 | }
196 | '
197 | ```
198 |
199 | More readable than a oneliner isn't it?
200 |
201 | We don't add a replica for 2 reasons:
202 |
203 | * The cluster is zone aware and we only have one zone for the reindexing
204 | * Indexing with a replica means indexing twice, so using twice as much CPU. Adding a replica after indexing is just transferring the data from one host to another.
205 |
206 | Indeed, losing a data node means losing data. If you can't afford reindexing an index multiple times in case of a crash, don't do this and add another zone or allow your new indexes to use the data from the existing zone in the backup data center.
207 |
208 | There's one more thing we want to do before we start indexing.
209 |
210 | Since we've set the new zone in the secondary data center, we update the http query nodes configuration to make them zone aware so they read the local shards in priority. We do the same with the active nodes so they read their zone first. That way, we can query the passive http query nodes when reading during the reindexing process with little hassle on what the clients access.
211 |
212 | In the main data center:
213 |
214 | ```yaml
215 | node:
216 | zone: "barack"
217 | ```
218 |
219 | And in the secondary:
220 |
221 | ```yaml
222 | node:
223 | zone: "chirack"
224 | ```
225 |
226 | Here's what our infrastructure looks like now.
227 |
228 | 
229 |
230 | It's now time to reindex.
231 |
232 | We first tried to reindex taking the data from our database clusters, but it put them on their knees. We have large databases and our dashboard are made of documents crawled over time, which means large queries on a huge dataset, with random accesses only. In one word: sluggish.
233 |
234 | What we're doing then is copy the existing data, from the old indexes to the new ones, then add the stuff that makes our data richer.
235 |
236 | To copy the content of an existing index into a new one, [Logstash](https://www.elastic.co/products/logstash) from Elastic is a convenient tool. It takes the data from a source, transforms it if needed and pushes it into a destination.
237 |
238 | Our Logstash configuration are pretty straightforward:
239 |
240 | ```bash
241 | input {
242 | elasticsearch {
243 | hosts => [ "esnode0{3,4}" ]
244 | index => "_INDEX_ID"
245 | size => 1000
246 | scroll => "5m"
247 | docinfo => true
248 | }
249 |
250 | }
251 |
252 | output {
253 | elasticsearch {
254 | host => "esdataXX"
255 | index => "_INDEX_ID"
256 | protocol => "http"
257 | index_type => "%{[[[@metadata]{.underline}](http://twitter.com/metadata)][_type]}"
258 | document_id => "%{[[[@metadata]{.underline}](http://twitter.com/metadata)][_id]}"
259 | workers => 10
260 | }
261 |
262 | stdout {
263 | codec => dots
264 | }
265 | }
266 | ```
267 |
268 | We can now run Logstash from a host inside the secondary data center.
269 |
270 | Here, we:
271 |
272 | * read from the passive http query nodes. Since they're zone aware, they query the data in the same zone in priority
273 | * write on the data nodes inside the indexing zone so we won't load the nodes accessed by our clients
274 |
275 | 
276 |
277 | Once we've done with reindexing a client, we update Baldur to change the active indexes for that client. Then, we add a replica and move the freshly baked indexes inside the production zones.
278 |
279 | ```bash
280 | curl -XPUT "localhost:9200/_" -H 'Content-Type: application/json' -d '
281 | {
282 | "settings" : {
283 | "index.numer_of_replicas" : 1,
284 | "routing.allocation.exclude.zone" : "envrack",
285 | "routing.allocation.include.zone" : "barack,chirack"
286 | }
287 | }
288 | '
289 | ```
290 |
291 | Now, we're ready to delete the old indexes for that client.
292 |
293 | ```bash
294 | curl -XDELETE "localhost:9200/_"
295 | ```
296 |
297 | ---
298 |
299 | ## Conclusion
300 |
301 | This post doesn't deal with cluster optimization for massive indexing on purpose. The Web is full of articles on that topic so I decided it didn't need another one.
302 |
303 | What I wanted to show is how we managed to isolate the data within the same cluster so we didn't disturb our clients. Considering our current infrastructure, building 3 more clusters might have been easier, but it has a double cost we didn't want to pay.
304 |
305 | First, it means doubling the infrastructure, so buying even more servers you won't use anymore after the reindexing process. And it means buying these servers 1 or 2 months upfront to make sure they're delivered in time.
306 |
307 | I hope you enjoyed reading that post as much as I enjoyed sharing my experience on the topic. If you did, please share it around you, it might be helpful to someone!
308 |
--------------------------------------------------------------------------------
/102-use-case-advanced-architecture-high-volume-reindexing/images/image1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/102-use-case-advanced-architecture-high-volume-reindexing/images/image1.png
--------------------------------------------------------------------------------
/103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours with 0 Downtime and a Rollback Strategy
6 |
7 | Do you remember [Blackhole, the 36 billion documents Elasticsearch cluster](https://thoughts.t37.net/how-we-reindexed-36-billions-documents-in-5-days-within-the-same-elasticsearch-cluster-cd9c054d1db8)we had to reindex a while ago? Blackhole is now a 130TB grownup with 100 billion documents, and my last task before I left Synthesio was migrating the little boy to Elasticsearch 5.1. This post is a more detailed version of the talk I gave November the 23rd at the ElasticFR meetup in Paris.
8 |
9 | There were many reasons for upgrading Blackhole: feature, performances, better monitoring data exposed. But for me, the main reason to do it before I leave was **for the lulz**. I love running large clusters, whatever the software, I love [performing huge migrations](https://thoughts.t37.net/how-we-upgraded-a-22tb-mysql-cluster-from-5-6-to-5-7-in-9-months-cc41b391895d), and the bigger, the better.
10 |
11 | ---
12 |
13 | ## Elasticsearch @Synthesio, November 2017
14 |
15 | At [Synthesio](https://www.synthesio.com/), we're using Elasticsearch pretty much everywhere as soon as we need hot storage. Cold storage is provided by MySQL and queuing by a bit more than 100TB of Apache Kafka.
16 |
17 | There are 8 clusters running in production, with a bit more than 600 bare metal servers, with 1.7PB storage and 37.5TB RAM. Clusters are hosted in 3 data centers. One of them is dedicated to running each cluster third master host to avoid split brains when we lose a whole data center, which happens from time to time.
18 |
19 | The servers are mostly 6 core, 12 thread Xeon E5--1650v3's with 64GB RAM and 4*800GB SSD or 2*1.2TB NVME, in RAID0. Some clusters have 12 core bi Xeon E5--2687Wv4's with 256GB RAM.
20 |
21 | The average cluster stats are 85k writes / second, with 1.5M in peak, and 800 reads / second, some clusters having a continuous 25k search / second. Doc size varies from 150kB to 200MB.
22 |
23 | ---
24 |
25 | ## The Blackhole Cluster
26 |
27 | Blackhole is a 77 node cluster, with 200TB storage, 4.8TB RAM, 2.4TB being allocated to Java, and 924 CPU cores. It is made of 3 master nodes, 6 ingest nodes, and 68 data nodes. The cluster holds 1137 indices, with 13613 primary shards, and 1 replica, for 201 billion documents. It gets about 7000 new documents / second, with an average of 800 searches / second on the whole dataset.
28 |
29 | Blackhole data nodes are spread between 2 data centers. By using rack awareness, we make sure that each data center holds 100% of the data, for high availability. Ingest nodes are rack aware as well, to leverage Elasticsearch prioritizing nodes within the same rack when running a query. This allows us to minimize the latency when running a query. A Haproxy controls both the ingest nodes health and proper load balancing amongst all of them.
30 |
31 | 
32 |
33 | Blackhole is feeding a small part of a larger processing chain. After multiple enrichment and transformations, the data is pushed into a large Kafka queue. A working unit reads the Kafka queue and pushes the data into Blackhole.
34 |
35 | 
36 |
37 | This has many pros, the first one being to be able to replay a whole part of the process in case of error. The only con here is having enough disk space for the data retention, but in 2017 disk space is not a problem anymore, even on a 10s of TB scale.
38 |
39 | ---
40 |
41 | ## Migration Strategies: Cluster restart VS Reindex API VS Logstash VS the Fun Way
42 |
43 | There are many ways to migrate an Elasticsearch cluster from a major version to another.
44 |
45 | ### The Cluster Restart Strategy
46 |
47 | Elasticsearch regular upgrade path from 2.x to 5.x requires to close every index using the [`_close` API endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-open-close.html), upgrade the software version, start the nodes, then open the indexes again using the `_open` API endpoint.
48 |
49 | Relying on the cluster restart strategy means keeping indexes created with Elasticsearch 2. This has no immediate consequence, except being unable to upgrade to Elasticsearch 6 without a full reindex. As this is something we do from time to time anyway, it was not a blocking problem.
50 |
51 | On the cons side, the cluster restart strategy requires to shutdown the whole cluster for a while, which was not acceptable.
52 |
53 | Someone once said there's a Chinese proverb for everything, and if it doesn't exist yet, you can make it a Chinese proverb anyway.
54 |
55 | > When migration requires downtime, throwing more hardware solves all your problems.
56 | > --- Traditional Chinese proverb.
57 |
58 | Throwing hardware at our problems meant we could rely on 2 more migration strategies.
59 |
60 | ### The Reindex API Strategy
61 |
62 | The first one is using [Elasticsearch reindex API](https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-reindex.html). We have already used it to migrate some clusters from Elasticsearch 1.7 to 5.1. It has many cons though, so we decided not to use it this time.
63 |
64 | Error handling is suboptimal, and an error on a bulk index means we will lose documents in the process without knowing it.
65 |
66 | It is slow. Elasticsearch reindex API relies on scrolling, and [sliced scrolls](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html)are not available until version 6.0.
67 |
68 | There's also another problem on live indexes. And a huge one: losing data consistency.
69 |
70 | To ensure data consistency between the source and destination index, either you never update your data and it's OK, or you decide that all your indexes are write only during the whole reindexing, which implies an application downtime. Otherwise, you have a risk of race condition between your ongoing and the reindex process if the source cluster is updated before the destination cluster just when the data to update needs to be changed. The risk is small but still exists.
71 |
72 | ### The Logstash Strategy
73 |
74 | We've been using Logstash a lot on Elasticsearch 1.7, as there was no reindex API yet. Logstash is faster than the reindex API, and you can use it inside a script which makes failure management easier.
75 |
76 | Logstash has many cons as well, beside the race condition problem. The biggest one is that it is unreliable, and the risk of losing data in the process, without even noticing it, is too high. Logstash console output makes it difficult to troubleshoot errors as it is either too verbose or not enough.
77 |
78 | ### The Fun Way
79 |
80 | The fun way mixes the Cluster Restart Strategy and throwing hardware at problems, with the con of being able to rollback anytime even after the migration is over. But I don't want to spoil you yet 😈.
81 |
82 | ---
83 |
84 | ## Migrating Blackhole for Real
85 |
86 | The Blackhole migration took place on a warm, sunny Saturday. The birds were singing, the sun was shining, and the coffee was flowing in my cup.
87 |
88 | Migration Prerequisites
89 |
90 | Before starting the migration, we had a few prerequisites to fulfill:
91 |
92 | * Making sure our mapping template compatibility with Elasticsearch 5.
93 | * Using the [Elasticsearch Migration Helper](https://github.com/elastic/elasticsearch-migration/tree/2.x) plugin on Blackhole, just in case.
94 | * Create the next 10 daily indexes, just in case we missed something with the mapping template.
95 | * Telling our hosting provider that we would transfer more than 130TB on the network in the coming hours.
96 |
97 | ### Expanding Blackhole
98 |
99 | The first migration step was throwing more hardware at Blackhole.
100 |
101 | We added 90 new servers, split in 2 data centers. Each server has a 6 core Xeon E5--1650v3 CPU, 64GB RAM, and 2 * 1.2TB NVME drives, setup as a RAID0. These servers were set up to use a dedicated network range as we planned to use them to replace the old Blackhole cluster and didn't want to mess with the existing IP addresses.
102 |
103 | These servers were deployed with a Debian Stretch and Elasticsearch 2.3. We had some issues as Elasticsearch 2 systemd scripts don't work on Stretch, so we had to run the service manually. We configured Elasticsearch to use 2 new racks, Barack and Chirack. Then, we updated the replication factor to 3.
104 | 
105 |
106 | ```bash
107 | curl -XPUT "localhost:9200/*/_settings" -H 'Content-Type: application/json' -d '{
108 | "index" : {
109 | "number_of_replicas" : 3
110 | }
111 | }
112 | '
113 | ```
114 |
115 | On the vanity metrics level, Blackhole had:
116 |
117 | * 167 servers,
118 | * 53626 shards,
119 | * 279TB of data for 391TB of storage,
120 | * 10,84TB RAM, 5.42TB being allocated to Java,
121 | * 2004 cores.
122 |
123 | 
124 |
125 | If you're wondering why we didn't decide to save time, only raising the replication factor to 2, then do it, lose a data node, enjoy, and read the basics of distributed systems before you want to run one in production.
126 |
127 | Expanding Blackhole, we had to change a few dynamic settings for allocation and recoveries.
128 |
129 | Blackhole initial settings were:
130 |
131 | ```yaml
132 | cluster:
133 | routing:
134 | allocation:
135 | disk:
136 | threshold_enabled: true
137 | watermark:
138 | low: "78%"
139 | high: "79%"
140 | node_initial_primaries_recoveries: 50
141 | node_concurrent_recoveries: 20
142 | allow_rebalance": "always"
143 | cluster_concurrent_rebalance: 50
144 | rebalance.enable: "all"
145 |
146 | indices:
147 | recovery:
148 | max_bytes_per_sec: "2048mb"
149 | concurrent_streams: 30
150 | ```
151 |
152 | We decided to speed up the cluster recovery a bit, and disable the reallocation completely to avoid mixing both of them until the migration was over. To make sure the cluster would use as much disk space as possible without problems, we raised the watermark thresholds to the maximum.
153 |
154 | ```yaml
155 | cluster:
156 | routing:
157 | allocation:
158 | disk:
159 | watermark.low : "98%"
160 | watermark.high : "99%"
161 | rebalance.enable: "none"
162 |
163 | indices:
164 | recovery:
165 | max_bytes_per_sec: "4096mb"
166 | concurrent_streams: 50
167 | ```
168 |
169 | Them Came the Problems
170 |
171 | Transferring 130TB of data at up to 4Gb/s puts lots of pressure on the hardware.
172 |
173 | The load on most machines was up to 40, with 99% of the CPU in use. Iowait went from 0 to 60% on most of our servers. As a result, Elasticsearch [bulk thread pool](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html) queue started to fill dangerously despite being configured to 4000, with a risk of rejected data.
174 |
175 | Thankfully, there's a trick for that.
176 |
177 | Elasticsearch provides a concept of zone, which can be combined with rack awareness for a better allocation granularity. For example, you can dedicate lot of hardware to the freshest, most frequently accessed content, less hardware to content accessed less frequently and even less hardware to content that is never accessed. Zones are configured on the host level.
178 |
179 | 
180 |
181 | We decided to create a zone that would only hold the data of the day, so the hardware would be less stressed by the migration.
182 |
183 | To do it without rollbacking, we decided to disable the recovery, before we forced the indices allocation.
184 |
185 | ```bash
186 | curl -XPUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '
187 | {
188 | "transient" : {
189 | "cluster.routing.allocation.enable" : "none"
190 | }
191 | }
192 | '
193 |
194 | curl -XPUT "localhost:9200/*/_settings" -H 'Content-Type: application/json' -d '
195 | {
196 | "index.routing.allocation.exclude.zone" : "fresh"
197 | }
198 | '
199 |
200 | curl -XPUT "localhot:9200/latest/_settings" -H 'Content-Type: application/json' -d '
201 | {
202 | "index.routing.allocation.exclude.zone" : "",
203 | "index.routing.allocation.include.zone" : "fresh"
204 | }
205 | '
206 | ```
207 |
208 | After a few minutes, the cluster was quiet and we were able to resume the migration.
209 |
210 | ```bash
211 | curl -XPUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '
212 | {
213 | "transient" : {
214 | "cluster.routing.allocation.enable" : "all"
215 | }
216 | }'
217 | ```
218 |
219 | Another way to do it is by playing with the `_ip` exclusion, but when you have more than 150 data nodes, it becomes a bit complicated. Also, you need to know that include and exclude are mutually exclusive, and can lead to some headache the first time you use them.
220 |
221 | ### Splitting Blackhole in 2
222 |
223 | The next step of the migration was creating a full clone of Blackhole. To clone a cluster, all you need is:
224 |
225 | * love
226 | * a bunch of data node with 100% of the data
227 | * a master node from the cluster to clone
228 |
229 | Before doing anything, we disabled the shard allocation globally.
230 |
231 | ```bash
232 | curl -XPUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '
233 | {
234 | "transient" : {
235 | "cluster.routing.allocation.enable" : "none"
236 | }
237 | }
238 | '
239 | ```
240 |
241 | Then, we shut down Elasticsearch on Barack, Chirack and one of the cluster master nodes.
242 | 
243 |
244 | Removing nodes to create a new Blackhole
245 |
246 | Then, we reduced the replica number on Blackhole to 1, and enabled allocation.
247 |
248 | ```bash
249 | curl -XPUT "localhost:9200/*/_settings" -H 'Content-Type: application/json' -d '
250 | {
251 | "index" : {
252 | "number_of_replicas" : 1
253 | }
254 | }'
255 |
256 | curl -XPUT "localhost;9200/_cluster/settings" -H 'Content-Type: application/json' -d
257 | '{
258 | "transient" : {
259 | "cluster.routing.allocation.enable" : "all"
260 | }
261 | }
262 | '
263 | ```
264 |
265 | **The following step were performed with Elasticsearch being stopped on the removed hosts.**
266 |
267 | We changed the excluded master node IP address to move it to a new Blackhole02 cluster network range, as well as the `discovery.zen.ping.unicast.hosts setting` so it was unable to talk to the old cluster anymore. We didn't change the cluster.name since we wanted to reuse all the existing information.
268 |
269 | We also reconfigured the nodes within the Barack and Chirack racks to talk to that new master, then added 2 other fresh masters to respect the `discovery.zen.minimum_master_nodes: 2` settings.
270 |
271 | Then, we started Elasticsearch first on the master taken from Blackhole, then on the 2 new master nodes. We had a new cluster without data nodes, but with all the index and shards information. This was done on purpose so we could close all the indexes without losing time with the data nodes being here, trying to reallocate or whatever.
272 |
273 | We then closed all the existing indexes:
274 |
275 | ```bash
276 | curl -XPUT "localhost:9200/*/_close"
277 | ```
278 |
279 | It was time to upgrade Elasticsearch on that new Cluster. This was done in a few minutes running our [Ansible](https://ansible.org/) playbook.
280 |
281 | We launched Elasticsearch on the master nodes first, to upgrade the cluster from 2 to 5. It took less than 20 seconds. I was shocked as I expected the process to take a few hours. Did I ever know, I would have asked for a maintenance window, but we would have lost the ability to rollback.
282 |
283 | Then, we started the data nodes, enabled allocation again, and 30 minutes later, the cluster was green.
284 |
285 | The last thing was to add a work unit to feed that Blackhole02 cluster and catch up with the data. This was made possible by saving the Kafka offset before we shut down the Barack and Chirack data nodes.
286 |
287 | ## Conclusion
288 |
289 | The whole migration took less than 20 hours, including transferring 130TB of data on a dual data center setup.
290 |
291 | 
292 | The most important point here was that we were able to rollback at any time, including after the migration if something was wrong on the application level.
293 |
294 | Deciding to double the cluster for a while was mostly a financial debate, but it had lots of pros, starting with the security it brought, as well as changing the whole hardware that had been running for 2 years.
295 |
--------------------------------------------------------------------------------
/103-use-case-migrating-130tb-cluster-without-downtime/images/image17.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/103-use-case-migrating-130tb-cluster-without-downtime/images/image19.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/103-use-case-migrating-130tb-cluster-without-downtime/images/image19.png
--------------------------------------------------------------------------------
/103-use-case-migrating-130tb-cluster-without-downtime/images/image22.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 Fred de Villamil
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Operating Elasticsearch
6 | ## for Fun and Profit
7 |
8 | ---
9 |
10 | 
11 |
12 |
13 | ## [Fred de Villamil](https://thoughts.t37.net)
14 |
15 | ---
16 |
17 | ## [Read online](https://fdv.github.io/running-elasticsearch-fun-profit)
18 |
19 | ---
20 |
21 | ## Code of Conduct
22 |
23 | - Behave like normal, friendly, welcoming human beings or get the hell out.
24 | - Any reference to a non scientific, verifiable element is irrelevant.
25 |
26 | ---
27 |
28 | ## TOC
29 |
30 | - [Getting Started with Elasticsearch](001-getting-started/001-getting-started.md/#getting-started-with-elasticsearch)
31 | * [Prerequisites](001-getting-started/001-getting-started.md/#prerequisites)
32 | * [Elasticsearch basic concepts](001-getting-started/001-getting-started.md/#elasticsearch-basic-concepts)
33 | + [REST APIs](001-getting-started/001-getting-started.md/#rest-apis)
34 | + [Open Source](001-getting-started/001-getting-started.md/#open-source)
35 | + [Java](001-getting-started/001-getting-started.md/#java)
36 | + [Distributed](001-getting-started/001-getting-started.md/#distributed)
37 | + [Scalable](001-getting-started/001-getting-started.md/#scalable)
38 | + [Fault tolerant](001-getting-started/001-getting-started.md/#fault-tolerant)
39 | * [What's an Elasticsearch cluster?](001-getting-started/001-getting-started.md/#whats-an-elasticsearch-cluster)
40 | + [Master node](001-getting-started/001-getting-started.md/#master-node)
41 | + [Ingest nodes](001-getting-started/001-getting-started.md/#ingest--nodes)
42 | + [Data Nodes](001-getting-started/001-getting-started.md/#data-nodes)
43 | + [Tribe Nodes](001-getting-started/001-getting-started.md/#tribe-nodes)
44 | + [A Minimal, Fault Tolerant Elasticsearch Cluster](001-getting-started/001-getting-started.md/#a-minimal-fault-tolerant-elasticsearch-cluster)
45 | * [What's an Elasticsearch index](001-getting-started/001-getting-started.md/#whats-an-elasticsearch-index)
46 | * [Deploying your first Elasticsearch cluster](001-getting-started/001-getting-started.md/#deploying-your-first-elasticsearch-cluster)
47 | + [Deploying Elasticsearch on Debian](001-getting-started/001-getting-started.md/#deploying-elasticsearch-on-debian)
48 | + [Deploying Elasticsearch on RHEL / CentOS](001-getting-started/001-getting-started.md/#deploying-elasticsearch-on-rhel--centos)
49 | * [First step using Elasticsearch](001-getting-started/001-getting-started.md/#first-step-using-elasticsearch)
50 | * [Elasticsearch Configuration](001-getting-started/001-getting-started.md/#elasticsearch-configuration)
51 | * [Elasticsearch Plugins](001-getting-started/001-getting-started.md/#elasticsearch-plugins)
52 |
53 | - [Elasticsearch and the Java Virtual Machine](002-elasticsearch-and-the-jvm/002-elasticsearch-and-the-jvm.md/#elasticsearch-and-the-java-virtual-machine)
54 | * [Supported JVM and operating systems / distributions](002-elasticsearch-and-the-jvm/002-elasticsearch-and-the-jvm.md/#supported-jvm-and-operating-systems--distributions)
55 | + [Operating system matrix](002-elasticsearch-and-the-jvm/002-elasticsearch-and-the-jvm.md/#operating-system-matrix)
56 | + [Java Virtual Machine matrix](002-elasticsearch-and-the-jvm/002-elasticsearch-and-the-jvm.md/#java-virtual-machine-matrix)
57 | * [Memory management](002-elasticsearch-and-the-jvm/002-elasticsearch-and-the-jvm.md/#memory-management)
58 | * [Garbage collection](002-elasticsearch-and-the-jvm/002-elasticsearch-and-the-jvm.md/#garbage-collection)
59 | + [Concurrent Mark & Sweep Garbage Collector](002-elasticsearch-and-the-jvm/002-elasticsearch-and-the-jvm.md/#concurrent-mark--sweep-garbage-collector)
60 | + [Garbage First Garbage Collector](002-elasticsearch-and-the-jvm/002-elasticsearch-and-the-jvm.md/#garbage-first-garbage-collector)
61 |
62 | - [A few things you need to know about Lucene](003-about-lucene/003-about-lucene.md#a-few-things-you-need-to-know-aboutlucene)
63 | * [Lucene segments](003-about-lucene/003-about-lucene.md#lucene-segments)
64 | * [Lucene deletes and updates](003-about-lucene/003-about-lucene.md#lucene-deletes-andupdates)
65 |
66 | - [Designing the Perfect Elasticsearch Cluster](004-cluster-design/004-cluster-design.md/#designing-the-perfect-elasticsearch-cluster)
67 | * [Elasticsearch is elastic, for real](004-cluster-design/004-cluster-design.md/#elasticsearch-is-elastic-for-real)
68 | * [Design for failure](004-cluster-design/004-cluster-design.md/#design-for-failure)
69 | * [Hardware](004-cluster-design/004-cluster-design.md/#hardware)
70 | + [CPU](004-cluster-design/004-cluster-design.md/#cpu)
71 | + [Memory](004-cluster-design/004-cluster-design.md/#memory)
72 | + [Network](004-cluster-design/004-cluster-design.md/#network)
73 | + [Storage](004-cluster-design/004-cluster-design.md/#storage)
74 | * [Software](004-cluster-design/004-cluster-design.md/#software)
75 | + [The Linux (or FreeBSD) kernel](004-cluster-design/004-cluster-design.md/#the-linux-or-freebsd-kernel)
76 | + [The Java Virtual Machine](004-cluster-design/004-cluster-design.md/#the-java-virtualmachine)
77 | + [The filesystem](004-cluster-design/004-cluster-design.md/#the-filesystem)
78 | * [Designing your indices](004-cluster-design/004-cluster-design.md/#designing-your-indices)
79 | + [Sharding](004-cluster-design/004-cluster-design.md/#sharding)
80 | + [Replication](004-cluster-design/004-cluster-design.md/#replication)
81 | * [Optimising allocation](004-cluster-design/004-cluster-design.md/#optimising-allocation)
82 | * [Troubleshooting and scaling](004-cluster-design/004-cluster-design.md/#troubleshooting-and-scaling)
83 | + [CPU](004-cluster-design/004-cluster-design.md/#cpu-1)
84 | + [Memory](004-cluster-design/004-cluster-design.md/#memory-1)
85 |
86 | - [Design for Event Logging](005-design-event-logging/005-design-event-logging.md/#design-for-event-logging)
87 | * [Design of an event logging infrastructure cluster](005-design-event-logging/005-design-event-logging.md/#design-of-an-event-logging-infrastructure-cluster)
88 | + [Throughput: how many events per second (005-design-event-logging/005-design-event-logging.md//eps) are you going to collect?](005-design-event-logging/005-design-event-logging.md/#throughput--how-many-events-per-second--eps--are-you-going-to-collect-)
89 | + [Retention: how long do you want to keep your data, hot and cold?](005-design-event-logging/005-design-event-logging.md/#retention--how-long-do-you-want-to-keep-your-data--hot-and-cold-)
90 | + [Size: what is the average size of a collected event?](005-design-event-logging/005-design-event-logging.md/#size--what-is-the-average-size-of-a-collected-event-)
91 | + [Fault tolerance: can you afford losing your indexed data?](005-design-event-logging/005-design-event-logging.md/#fault-tolerance--can-you-afford-losing-your-indexed-data-)
92 | + [Queries](005-design-event-logging/005-design-event-logging.md/#queries)
93 | * [Which hardware do I need?](005-design-event-logging/005-design-event-logging.md/#which-hardware-do-i-need-)
94 | * [How to design my indices?](005-design-event-logging/005-design-event-logging.md/#how-to-design-my-indices-)
95 | * [What about some tuning?](005-design-event-logging/005-design-event-logging.md/#what-about-some-tuning-)
96 |
97 | - [Operating Daily](006-operating-daily/006-operating-daily.md/#operating-daily)
98 | * [Elasticsearch most common operations](006-operating-daily/006-operating-daily.md/#elasticsearch-most-common-operations)
99 | + [Mass index deletion with pattern](006-operating-daily/006-operating-daily.md/#mass-index-deletion-with-pattern)
100 | + [Mass optimize, indexes with the most deleted docs first](006-operating-daily/006-operating-daily.md/#mass-optimize--indexes-with-the-most-deleted-docs-first)
101 | + [Restart a cluster using rack awareness](006-operating-daily/006-operating-daily.md/#restart-a-cluster-using-rack-awareness)
102 | + [Optimize your cluster restart](006-operating-daily/006-operating-daily.md/#optimize-your-cluster-restart)
103 | + [Remove data nodes from a cluster the safe way](006-operating-daily/006-operating-daily.md/#remove-data-nodes-from-a-cluster-the-safe-way)
104 | * [Get useful information about your cluster](006-operating-daily/006-operating-daily.md/#get-useful-information-about-your-cluster)
105 | + [Nodes information](006-operating-daily/006-operating-daily.md/#nodes-information)
106 | + [Monitor your search queues](006-operating-daily/006-operating-daily.md/#monitor-your-search-queues)
107 | + [Indices information](006-operating-daily/006-operating-daily.md/#indices-information)
108 | + [Shard allocation information](006-operating-daily/006-operating-daily.md/#shard-allocation-information)
109 | + [Recovery information](006-operating-daily/006-operating-daily.md/#recovery-information)
110 | + [Segments information (006-operating-daily/006-operating-daily.md//can be extremely verbose)](006-operating-daily/006-operating-daily.md/#segments-information--can-be-extremely-verbose-)
111 | + [Cluster stats](006-operating-daily/006-operating-daily.md/#cluster-stats)
112 | + [Nodes stats](006-operating-daily/006-operating-daily.md/#nodes-stats)
113 | + [Indice stats](006-operating-daily/006-operating-daily.md/#indice-stats)
114 | + [Indice mapping](006-operating-daily/006-operating-daily.md/#indice-mapping)
115 | + [Indice settings](006-operating-daily/006-operating-daily.md/#indice-settings)
116 | + [Cluster dynamic settings](006-operating-daily/006-operating-daily.md/#cluster-dynamic-settings)
117 | + [All the cluster settings (006-operating-daily/006-operating-daily.md//can be extremely verbose)](006-operating-daily/006-operating-daily.md/#all-the-cluster-settings--can-be-extremely-verbose-)
118 |
119 |
120 | - [Monitoring Elasticsearch](007-monitoring-es/007-monitoring-es.md/#monitoring-elasticsearch)
121 | * [Tools](007-monitoring-es/007-monitoring-es.md/#tools)
122 | * [Monitoring at the host level](007-monitoring-es/007-monitoring-es.md/#monitoring-at-the-host-level)
123 | * [Monitoring at the node level](007-monitoring-es/007-monitoring-es.md/#monitoring-at-the-node-level)
124 | * [Monitoring at the cluster level](007-monitoring-es/007-monitoring-es.md/#monitoring-at-the-cluster-level)
125 | * [Monitoring at the index level](007-monitoring-es/007-monitoring-es.md/#monitoring-at-the-index-level)
126 |
127 | - [How we reindexed 36 billion documents in 5 days within the same Elasticsearch cluster](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#how-we-reindexed-36-billion-documents-in-5-days-within-the-same-elasticsearch-cluster)
128 | * [The "Blackhole" cluster](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#the--blackhole--cluster)
129 | * [Elasticsearch configuration](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#elasticsearch-configuration)
130 | * [Tuning the Java virtual machine](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#tuning-the-java-virtual-machine)
131 | + [Blackhole Initial indexing](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#blackhole-initial-indexing)
132 | * [Blackhole initial migration](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#blackhole-initial-migration)
133 | * [Blackhole reindexing](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#blackhole-reindexing)
134 | + [The reindexing process](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#the-reindexing-process)
135 | + [Logstash configuration](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#logstash-configuration)
136 | + [Reindexing Elasticsearch configuration](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#reindexing-elasticsearch-configuration)
137 | + [Introducing Yoko and Moulinette](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#introducing-yoko-and-moulinette)
138 | * [Reindexing in 5 days](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#reindexing-in-5-days)
139 | * [Conclusion](100-use-cases-reindexing-36-billion-docs/100-use-cases-reindexing-36-billion-docs.md/#conclusion)
140 |
141 | - [Use Case: Migrating a Cluster Across the Ocean Without Downtime](101-use-case-migrating-cluster-over-ocean/101-use-case-migrating-cluster-over-ocean.md/#use-case--migrating-a-cluster-across-the-ocean-without-downtime)
142 |
143 | - [Use Case: An Advanced Elasticsearch Architecture for High-volume Reindexing](102-use-case-advanced-architecture-high-volume-reindexing/102-use-case-advanced-architecture-high-volume-reindexing.md/#use-case--an-advanced-elasticsearch-architecture-for-high-volume-reindexing)
144 | * [A glimpse at our infrastructure](102-use-case-advanced-architecture-high-volume-reindexing/102-use-case-advanced-architecture-high-volume-reindexing.md/#a-glimpse-at-our-infrastructure)
145 | * [Using Elasticsearch for fun and profit](102-use-case-advanced-architecture-high-volume-reindexing/102-use-case-advanced-architecture-high-volume-reindexing.md/#using-elasticsearch-for-fun-and-profit)
146 | * [Conclusion](102-use-case-advanced-architecture-high-volume-reindexing/102-use-case-advanced-architecture-high-volume-reindexing.md/#conclusion)
147 |
148 | - [Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours with 0 Downtime and a Rollback Strategy](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#migrating-a-130tb-cluster-from-elasticsearch-2-to-5-in-20-hours-with-0-downtime-and-a-rollback-strategy)
149 | * [Elasticsearch @Synthesio, November 2017](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#elasticsearch--synthesio--november-2017)
150 | * [The Blackhole Cluster](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#the-blackhole-cluster)
151 | * [Migration Strategies: Cluster restart VS Reindex API VS Logstash VS the Fun Way](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#migration-strategies--cluster-restart-vs-reindex-api-vs-logstash-vs-the-fun-way)
152 | + [The Cluster Restart Strategy](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#the-cluster-restart-strategy)
153 | + [The Reindex API Strategy](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#the-reindex-api-strategy)
154 | + [The Logstash Strategy](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#the-logstash-strategy)
155 | + [The Fun Way](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#the-fun-way)
156 | * [Migrating Blackhole for Real](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#migrating-blackhole-for-real)
157 | + [Expanding Blackhole](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#expanding-blackhole)
158 | + [Splitting Blackhole in 2](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#splitting-blackhole-in-2)
159 | * [Conclusion](103-use-case-migrating-130tb-cluster-without-downtime/103-use-case-migrating-130tb-cluster-without-downtime.md/#conclusion)
160 |
161 | ---
162 |
163 | ## Styling
164 |
165 | This is the Markdown styling used in this book. If you plan to contribute, please use it.
166 |
167 | ### Chapter title
168 |
169 | ```markdown
170 | # This is a chapter title
171 |
172 | ```
173 |
174 | ### Chapter part
175 |
176 | ```markdown
177 | ---
178 |
179 | ## A chapter part title is preceded by an horizontal line
180 | ```
181 |
182 | ### Chapter subpart
183 |
184 | ```markdown
185 | ### A level 1 subpart
186 | #### A level 2 subpart
187 | ```
188 |
189 | ### Images
190 |
191 | ```markdown
192 | 
193 | ```
194 |
195 | ### Code:
196 |
197 | ```markdown
198 | An `inline code block` goes like this
199 | ```
200 |
201 | API calls go the Curl way
202 |
203 | ```bash
204 | curl -X POST "localhost:9200/_search" -H 'Content-Type: application/json' -d'
205 | {
206 | "query" : {
207 | "match_all" : {}
208 | },
209 | "stats" : ["group1", "group2"]
210 | }
211 | '
212 | ```
213 |
214 | Yaml code is expanded for more readability
215 | ```yaml
216 | ---
217 | some:
218 | value:
219 | goes: "like this"
220 | ```
221 |
222 | ### Links
223 |
224 | ```markdown
225 | [An internal link](has/a/relative.path)
226 | [An external link](https://has.an.absolute/path)
227 | ```
228 |
229 | ### Lists
230 |
231 | Urdered lists:
232 |
233 | ```markdown
234 | Only one line break between a paragraph and
235 |
236 | * An
237 | * unordered
238 | * list
239 | * with
240 | * subitems
241 | ```
242 |
243 | Ordered lists:
244 |
245 | ```markdown
246 | 1. An
247 | 2. Ordered
248 | 3. List
249 | 1. With
250 | 2. subitems
251 | ```
252 |
253 |
--------------------------------------------------------------------------------
/ZH-CN/001-入门/001-入门.md:
--------------------------------------------------------------------------------
1 | ```
2 | 涵盖ELASTICSEARCH 5.5.x,正在升级到ES 6.5.x
3 | ```
4 |
5 | # Elasticsearch入门
6 |
7 | 这一章是写给还没有使用过Elasticsearch的人,它涵盖了Elasticsearch基本的概念,并指引你部署和使用你的第一个单节点的集群。每个在这里提及的概念在后文都会有详细的解释。
8 |
9 | 在这个章节你将学习到:
10 |
11 | - Elasticsearch背后的基础概念
12 | - 什么是Elasticsearch集群
13 | - 如何在最常用的操作系统中部署你的第一个、单节点的Elasticsearch集群
14 | - 如何使用Elasticsearch索引文档和查找内容
15 | - Elasticsearch基础配置
16 | - 什么是Elasticsearch插件和如何使用他们
17 |
18 | ---
19 |
20 | ## 读前须知
21 |
22 | 为了阅读这本书和进行这个章节描述的操作,你需要:
23 |
24 | - 一台运行着主流Linux或Unix环境机器或者虚拟机,如Debian / Ubuntu,RHEL / CentOS 或 FreeBSD. 在Mac OS和Windows上运行Elasticsearch不在这本书的涵盖范围内
25 | - 一些基础的UNIX命令行知识和终端的使用
26 | - 你最喜欢的文本编辑器
27 |
28 | 如果你之前还没有使用过Elasticsearch,我建议你创建一个虚拟机,防止你意外损坏你的宿主机系统。你也可以使用一个虚拟化工具如[Virtualbox](https://www.virtualbox.org/)来运行它,或着在你最喜欢的云服务提供商机器上。
29 |
30 | ---
31 |
32 | ## Elasticsearch基础概念
33 |
34 | Elasticsearch是一个使用Java编写的分布式的、可扩展的、具备容错能力的开源的搜索引擎。它提供了一个强大的REST API,用于添加、搜索数据和更新配置。Elasticsearch由Elastic公司开发,公司创建者Shay Banon基于Lucene开发了这个项目
35 |
36 | ### REST API
37 |
38 | REST API是使用HTTP请求来`GET`,`PUT`,`POST`和`DELETE`数据的应用程序接口。一个网站提供的API是允许两个软件程序交互的代码,API提供了一种很好的方式,使得开发者可以在一个操作系统或者其他应用程序编写一段程序来请求网站服务。REST是互联网中与数据库的CRUD(Create,Read,Update,Delete)相对应的概念。
39 |
40 | ### 开源
41 |
42 | 开源意味着Elasticsearch的源代码,也就是构建软件的“配方”,是公开的、免费的并且每个人都可以通过添加缺失的功能、文档或者修复漏洞来对代码做出贡献。如果贡献被项目所接受,那么他们的工作就会被整个社区所知。因为Elasticsearch是开源的,所以无论它背后的公司破产倒闭或者不再维护它,它都不会消亡,因为其他人将会接管它并且使它一直存活。
43 |
44 | ### Java
45 |
46 | Java是一种编程语言,在1995由Sun Microsystems创建。Java应用程序运行在Java虚拟机(JVM)上,这意味着它不依赖于它所运行的平台。Java最让人熟知的是他的垃圾回收器(GC),一种管理内存强有力的方式。
47 |
48 | Java不是Javascript, 后者是在90年代中期由Netscape INC开发的编程语言。两者除了相似的名字,完全是两种不同的语言,使用的目的也不同。
49 |
50 | > Javascript is to Java what hamster is to ham. – Jeremy Keith
51 |
52 | ### 分布式
53 |
54 | Elasticsearch运行在许多机器上,机器的数量由工作负载和数据量而定。机器间使用网络消息相互通信和同步。一台联网的、运行着Elasticsearch的机器称为一个节点,整个共享这相同集群名字的节点群成为集群。
55 |
56 | ### 可扩展的
57 |
58 | Elasticsearch是水平扩展的。水平扩展意味着集群可以通过添加新的节点来扩大规模。当添加机器时,你不需要重启整个集群。当一个新的节点加入集群,它将会获得已有数据中的一部分。和水平扩展相对的是垂直扩展,它扩展的唯一方式就是让软件运行在配置更高的机器上。
59 |
60 | ### 容错的
61 |
62 | 除非指定副本数,Elasticsearch确保了数据至少在两个不同的节点被备份了一次。当其中一个节点离开集群时,Elasticsearch会重新在剩余的节点中构建副本,除非没有剩余的可以备份的节点了。
63 |
64 | ---
65 |
66 | ## 什么是Elasticsearch集群?
67 |
68 | 一个集群可以是运行着Elasticsearch的一台机器或者配置了相同`cluster name`的一群机器。默认的`cluster name`是`elasticsearch`,但是不推荐使用在生产环境中。
69 |
70 | 在Elasticsearch集群中每台机器将承担下面一个或者多个角色:
71 |
72 | ### Master节点
73 |
74 | Master节点控制整个集群。它将集群信息传递给正在加入集群的节点,决定数据如何移动,当一个节点离开集群时重新分配丢失的数据等。当多个节点都能够承担Master节点,Elasticsearch将通过选举产生一个活动的Master。这个活动的Master被称为`elected master`,当这个被选举的Master离开集群时,其他Master节点将接管`elected master`的角色。
75 |
76 | ### Ingest节点
77 |
78 | Ingest节点在文档实际被索引之前对它们进行预处理。Ingest节点会截取bulk和index请求,先对它们进行预处理,再将预处理后的文档发回给index或者bulk API。
79 |
80 | 默认所有节点都开启了Ingest,所以任何节点都能够处理Ingest任务。你也可以创建专用的Ingest节点。
81 |
82 | ### Data节点
83 |
84 | Data节点存储索引好的数据。它们负责管理存储的数据,并且在数据被查询的时候对数据执行操作。
85 |
86 | ### Tribe节点
87 |
88 | Tribe节点连接了多个Elasticsearch集群,它在每个被连接的集群上执行诸如搜索等操作。
89 |
90 | ### 一个最小的、具有容错能力的Elasticsearch集群
91 |
92 | 
93 |
94 | 一个最小的、具有容错能力的Elasticsearch集群应该包括:
95 |
96 | * 3个master节点
97 | * 2个ingest节点
98 | * 2个data节点
99 |
100 | 拥有3个master节点能够确保集群中存在至少2个有选举权的master节点,保证在出现网络分区时不出现脑裂的状态。如果有选举权的master节点小于2个,集群将会拒绝任何新的索引请求直到问题被修复。
101 |
102 | ---
103 |
104 | ## 什么是Elasticsearch索引
105 |
106 | 索引是一系列的拥有相同特征的文档集合。索引由它的名称所确定,名称在对存储的文档或者索引结构本身执行操作时使用。索引结构由映射定义,它是描述了文档特征和索引选项例如“replication factor”的一个`JSON`文件。在Elasticsearch集群中,你可以根据需要定义任意数量的索引。
107 |
108 | Elasticsearch索引由1个或多个分片组成。分片是一个Lucene索引,它的数量在索引被创建的时候就确定了。Elasticsearch在整个集群中分配一个索引的所有分片,可以自动分配或根据用户定义的规则。
109 |
110 | Lucene是Elasticsearch底层的搜索引擎,它是Apache基金会的开源项目。你很可能在操作Elasticsearch集群时不会意识到Lucene,但是这本书将涵盖所有你需要知道的基础知识。
111 |
112 | 一个分片由1个或多个数据段组成,这些数据段是二进制文件,也是Lucene索引存储的文档的地方。
113 |
114 | 
115 |
116 | 如果你熟悉关系型数据库如MySQL,那么索引对应于数据库中的库,映射对应于库的schema,分片对应于数据库中的数据。由于Elasticsearch的分布式特性和Lucece的特异性,不再将它们和关系型数据库进行对比。
117 |
118 | ---
119 |
120 | ## 部署你的第一个Elasticsearch集群
121 |
122 | ### 在Debian上部署Elasticsearch
123 |
124 | TODO [issue #9](https://github.com/fdv/running-elasticsearch-fun-profit/issues/9)
125 |
126 | ### 在RHEL / CentOS上部署Elasticsearch
127 |
128 | TODO [issue #9](https://github.com/fdv/running-elasticsearch-fun-profit/issues/9)
129 |
130 | ---
131 |
132 | ## 使用Elasticsearch的第一步
133 |
134 | TODO [issue #10](https://github.com/fdv/running-elasticsearch-fun-profit/issues/10)
135 |
136 | ---
137 |
138 | ## Elasticsearch配置
139 |
140 | TODO [issue #10](https://github.com/fdv/running-elasticsearch-fun-profit/issues/10)
141 |
142 | ## Elasticsearch插件
143 |
144 | TODO [issue #10](https://github.com/fdv/running-elasticsearch-fun-profit/issues/10)
--------------------------------------------------------------------------------
/ZH-CN/001-入门/images/image1.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/ZH-CN/001-入门/images/image2.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/ZH-CN/002-Elasticsearch和JVM/002-elasticsearch-and-the-jvm.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # Elasticsearch和Java虚拟机
6 |
7 | Elasticsearch是使用Java编写的软件,它需要部署在同一台机器的Java运行时环境(JRE)去运行。目前支持的Elasticsearch版本可以在以下操作系统/发行版和Java上运行。
8 |
9 | ## 支持的JVM和操作系统/发行版
10 |
11 | 下面的表格展示了Elastic为2.4.x和5.5.x版本官方所支持的各种的操作系统和Java虚拟机。下面没有提及的操作系统或者JVM是不被Elastic所支持的,因此不应该使用。
12 |
13 | ### 操作系统
14 |
15 | | | CentOS/RHEL 6.x/7.x | Oracle Enterprise Linux 6/7 with RHEL Kernel only | Ubuntu 14.04 | Ubuntu 16.04 | **Ubuntu 18.04** | SLES 11 SP4\*\*/12 | SLES 12 | openSUSE Leap 42 |
16 | | --- |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
17 | | **ES 5.0.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
18 | | **ES 5.1.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
19 | | **ES 5.2.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
20 | | **ES 5.3.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
21 | | **ES 5.4.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
22 | | **ES 5.5.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
23 | | **ES 6.0.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
24 | | **ES 6.1.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
25 | | **ES 6.2.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
26 | | **ES 6.3.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
27 | | **ES 6.4.x** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
28 | | **ES 6.5.x** | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
29 |
30 |
31 | | | Windows Server 2012/R2 | Windows Server 2016 | Debian 7 | Debian 8 | Debian 9 | **Solaris / SmartOS** | Amazon Linux |
32 | | --- |:---:|:---:|:---:|:---:|:---:|:---:|:---:|
33 | | **ES 5.0** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
34 | | **ES 5.1.x** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
35 | | **ES 5.2.x** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
36 | | **ES 5.3.x** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
37 | | **ES 5.4.x** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
38 | | **ES 5.5.x** | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
39 | | **ES 6.0.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
40 | | **ES 6.1.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
41 | | **ES 6.2.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
42 | | **ES 6.3.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
43 | | **ES 6.4.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
44 | | **ES 6.5.x** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ |
45 |
46 | Elasticsearch可以运行在OpenSolaris和FreeBSD上。FreeBSD 11.1提供了一个由[Mark Felder](mailto:feld@freebsd.org)维护的Elasticsearch 6.4.2版本包,但是这些操作系统都不被Elastic官方支持。
47 |
48 | ### Java虚拟机
49 |
50 | | | Oracle/OpenJDK 1.8.0u111+ | Oracle/OpenJDK 9 | OpenJDK 10 | OpenJDK 11 | Azul Zing 16.01.9.0+ | IBM J9 |
51 | | --- |:---:|:---:|:---:|:---:|:---:| --- |
52 | | **ES 5.0.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
53 | | **ES 5.1.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
54 | | **ES 5.2.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
55 | | **ES 5.3.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
56 | | **ES 5.4.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
57 | | **ES 5.5.x** | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
58 | | **ES 5.6**.x | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
59 | | **ES 6.0.x** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
60 | | **ES 6.1.x** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
61 | | **ES 6.2.x** | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
62 | | **ES 6.3.x** | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
63 | | **ES 6.4.x** | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
64 | | **ES 6.5.x** | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
65 |
66 |
67 |
68 | ## 内存管理
69 |
70 | TODO
71 |
72 | ## 垃圾回收
73 |
74 | Java是一个垃圾收集语言。开发者不必管理内存的分配,Java虚拟机周期性地运行一个称为GC线程的特定系统线程来负责不同的垃圾收集活动。这些活动之一是回收不再被程序使用的对象占用的内存。
75 |
76 | Java 1.8 拥有3个不同的垃圾收集器家族,每个都拥有自己的特性。
77 |
78 | *Single Collector* 使用一个单线程来执行整个垃圾回收过程。它在单处理器的机器上非常高效,因为它消除了线程之间通信所隐含的开销,但是它不适合大部分今天现实世界中的用途。它被设计用于管理堆中小的100M数量级的数据集。
79 |
80 | *Parallel Collector* 并行地运行多个小规模的垃圾回收器。运行并行的收集器减少了垃圾收集的开销。它专为在多线程主机上运行的中型到大型数据集而设计。
81 |
82 | *Mostly Concurrent Collector* 同时执行其大部分工作,以防止垃圾收集暂停。它适用于大型数据集且响应时间很重要的情况,因为用于最小化暂停的技术会影响应用程序性能。Java 1.8提供了两个主要的并发收集器:*Concurrent Mark & Sweep Garbage Collector* 和 *Garbage First Garbage Collector*,也被称为G1GC。
83 |
84 | ### 并发标记和扫描垃圾收集器
85 |
86 | TODO
87 |
88 | ### G1垃圾收集器
89 |
90 | TODO
91 |
--------------------------------------------------------------------------------
/ZH-CN/003-关于Lucene/003-关于Lucene.md:
--------------------------------------------------------------------------------
1 | ```
2 | WIP, COVERS ELASTICSEARCH 5.5.x, UPDATING TO ES 6.5.x
3 | ```
4 |
5 | # 关于Lucene你需要知道的
6 |
7 | 在你开始考虑选择合适的硬件前,有一些关于[Lucene](http://lucene.apache.org/)你需要知道的。
8 |
9 | Lucene是Elasticsearch所使用的搜索引擎的名字,它是一个来自Apache基金会的开源项目。当运行Elasticsearch的时候,在大部分情况下,我们不需要直接和Lucene交互。但是有一些在我们选择集群存储和文件系统前需要知道的重要的事情。
10 |
11 | ## Lucene段
12 |
13 | 每个Elasticsearch索引都被分为分片。分片既是一个索引的逻辑也是物理划分。每个Elasticsearch分片都是一个Lucene索引。在一个Lucene索引中你可以拥有的最大文档数是2,147,483,519。Lucene索引被分为更小的称为段的文件,一个段是一个小的Lucene索引。Lucene按顺序搜索所有段。
14 |
15 | 
16 |
17 | 当一个新的writer被打开,以及一个writer被提交或者被关闭时,Lucene会创建一个段。这意味着段是不可变的。当你向Elasticsearch索引中加入新的文档,Lucene创建一个新的段并且写入它。当索引的吞吐量很重要时,Lucene也能创建更多的段。
18 |
19 | Lucene不时地将较小的段合并为较大的段。 也可以使用Elasticsearch API手动触发合并。
20 |
21 | 从操作的角度来看,这种行为会产生一些影响。
22 |
23 | 集群中拥有的段越多,搜索的速度也越慢。这是因为Lucene需要顺序地搜索这些所有的段,而不是并行的。因此拥有少量的段可以加快搜索速度。
24 |
25 | Lucene的合并操作需要CPU和I/O开销,这意味着它们可能减慢你的索引速度。当执行一个bulk索引时,比如初始化索引,推荐完全地停止合并操作。
26 |
27 | 如果你计划在同一台机器上存储许多分片和段,您可能需要选择一个能够很好地处理大量小文件的文件系统,并且没inode限制。关于选择正确的文件系统的部分我们将详细讨论。
28 |
29 | ## Lucene删除和更新
30 |
31 | 在更新和删除文档时,Lucene会执行写时拷贝。这意味着文档永远不会从索引中被删除。相反,Lucene将文档标记为已删除,并在触发更新文档时创建另一个文档。
32 |
33 | 写时拷贝带来的操作后果是,当你更新或删除文档时,除非你完全删除它们,否则磁盘上索引空间将不断增长。一种实际删除被标记的文档的解决方案是强制Lucene进行段合并。
34 |
35 | 在合并时,Lucene将2个段的内容移动到第三个新段,然后从磁盘中删除旧段。这意味着Lucene需要足够的可用空间来创建一个和需要合并的两个段大小相同的段。
36 |
37 | 当强制合并一个巨大的分片时可能会出现问题。如果这个分片大小\>磁盘大小的一半,那么你可能无法完全合并它,除非分片中大多数数据都是由已删除的文档组成的。
38 |
--------------------------------------------------------------------------------
/ZH-CN/003-关于Lucene/images/image2.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-midnight
--------------------------------------------------------------------------------
/images/image1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fdv/running-elasticsearch-fun-profit/9e6814b88cdff4263de742a8a810a01f312df15e/images/image1.png
--------------------------------------------------------------------------------