├── _config.yml ├── index.md └── README.md /_config.yml: -------------------------------------------------------------------------------- 1 | title: Elasticsearch-hints 2 | description: Some useful links regarding Elasticsearch 3 | google_analytics: 4 | show_downloads: true 5 | theme: jekyll-theme-minimal 6 | 7 | gems: 8 | - jekyll-mentions 9 | -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | # elasticsearch-hints 2 | 3 | These links are outcome of 4+ years of tuning/running our ES clusters (on premise and in a cloud). To be continued... 4 | 5 | ## Internals 6 | 7 | - [A Dive into the Elasticsearch Storage](https://www.found.no/foundation/dive-into-elasticsearch-storage/) 8 | 9 | In this article we'll investigate the files written to the data directory by various parts of Elasticsearch. We will look at node, index and shard level files and give a short explanation of their contents in order to establish an understanding of the data written to disk by Elasticsearch. 10 | 11 | 12 | ## Java tuning 13 | 14 | - [Elasticsearch Java Virtual Machine settings explained](http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html) 15 | - [Tuning Garbage Collection for Mission-Critical Java Applications](http://blog.mgm-tp.com/2013/03/garbage-collection-tuning/) 16 | - [G1: One Garbage Collector To Rule Them All](http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All) 17 | - [Use Lucene’s MMapDirectory on 64bit platforms, please!](http://blog.thetaphi.de/) 18 | - [Black Magic cookbook](http://product.hubspot.com/blog/g1gc-tuning-your-hbase-cluster) 19 | - [G1GC Fundamentals: Lessons from Taming Garbage Collection](http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection) 20 | - [JVM Garbage Collector settings 21 | investigation](https://tigase.tech/attachments/download/4808/GC-result.pdf) 22 | 23 | Comparison of jvm gc. Fantastic job! 24 | 25 | 26 | How to start using G1 27 | ``` 28 | #ES_JAVA_OPTS="" 29 | ES_JAVA_OPTS="-XX:-UseParNewGC -XX:-UseConcMarkSweepGC -XX:+UseG1GC" 30 | 31 | ``` 32 | 33 | 34 | ## Durability & reliability 35 | 36 | - [Call me maybe: Elasticsearch](https://aphyr.com/posts/317-call-me-maybe-elasticsearch) 37 | 38 | In this post, we’ll explore Elasticsearch’s behavior under various types of network failure. 39 | 40 | - [Call me maybe: Elasticsearch 1.5.0](https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0) 41 | 42 | Data-loss scenarios 43 | 44 | - [Transactions in Elasticsearch](https://blog.codecentric.de/en/2014/10/transactions-elasticsearch/) 45 | 46 | How to achieve transactions in Elasticsearch? 47 | 48 | 49 | ## Performance 50 | 51 | - [Elasticsearch Refresh Interval vs Indexing Performance](http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/) 52 | 53 | Because refreshing is expensive, one way to improve indexing throughput is by increasing refresh_interval. Less refreshing means less load, and more resources can go to the indexing threads. How does all this translate into performance? Below is what our benchmarks revealed when we looked at it 54 | 55 | - [A-Z Guide on Scaling Elasticsearch](https://qbox.io/blog/a-z-guide-on-scaling-elasticsearch) 56 | 57 | In this article we will discuss the system settings in detail. This will guide you on the parameters and values to be considered in various levels including the operating system (we are considering the Unix-based systems here). Focus will also be given to the memory settings in Elasticsearch, and we will look even deeper into the heap memory management and fine tuning of the same. 58 | 59 | ## Monitoring 60 | 61 | - [Top 10 Elasticsearch Metrics to Watch](http://blog.sematext.com/2015/05/05/top-10-elasticsearch-metrics-to-watch/) 62 | 63 | This should be especially helpful to those readers new to Elasticsearch, and also to experienced users who want a quick start into performance monitoring of Elasticsearch. 64 | 65 | - [How to monitor Elasticsearch performance](https://www.datadoghq.com/blog/monitor-elasticsearch-performance-metrics/) 66 | 67 | Very good article from Datadog 68 | 69 | - [What should you monitor](https://support.lucidworks.com/hc/en-us/articles/201298247-What-should-you-monitor) 70 | 71 | Good checklist (with the explanations) 72 | 73 | ## Best practices 74 | 75 | - [Elasticsearch Indexing Performance Cheatsheet](https://blog.codecentric.de/en/2014/05/elasticsearch-indexing-performance-cheatsheet/) 76 | - [Six Ways to Crash Elasticsearch](https://www.found.no/foundation/crash-elasticsearch/) 77 | - [Playing HTTP Tricks with Nginx](https://www.elastic.co/blog/playing-http-tricks-nginx) 78 | - [Elasticsearch Tuning Plan](https://gist.github.com/mrflip/5366376) 79 | 80 | Nice check list 81 | 82 | - [Choosing a fast unique identifier (UUID) for Lucene](http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html) 83 | 84 | If have your own natural ID for each document, try to pick an ID that is friendly to Lucene. 85 | 86 | ## Books 87 | 88 | *In order of my personal preferences* 89 | 90 | - [Relevant Search](https://www.manning.com/books/relevant-search) - best book available on the market 91 | - [Deep Learning for search](https://www.manning.com/books/deep-learning-for-search) 92 | - [Mastering Elasticsearch - Second Edition](http://www.amazon.co.uk/Mastering-Elasticsearch-Second-Rafal-Kuc/dp/1783553790) 93 | - [ElasticSearch Cookbook Second Edition](http://www.amazon.co.uk/ElasticSearch-Cookbook-Second-Edition-Alberto/) 94 | - [Elasticsearch Server Second Edition](http://www.amazon.co.uk/Elasticsearch-Server-Second-Edition-Rogozi/dp/1783980524/) 95 | 96 | ## Video 97 | 98 | - ["Surviving Elasticsearch"](https://www.youtube.com/watch?v=gT-L6r37SPA) 99 | - https://berlinbuzzwords.de/18/session/elasticsearch-index-management-paas-style-logging-system 100 | 101 | ## Reading 102 | 103 | - https://medium.com/airbnb-engineering/listing-embeddings-for-similar-listing-recommendations-and-real-time-personalization-in-search-601172f7603e 104 | - https://www.elastic.co/blog/modeling-data-for-fast-aggregations 105 | - https://hackernoon.com/learning-to-rank-for-flight-itinerary-search-8594761eb867 106 | 107 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # elasticsearch-hints 2 | 3 | These links are outcome of 4+ years of tuning/running our ES clusters (on premise and in a cloud). 4 | 5 | > This list is not in the active development. Partially merged into https://github.com/dzharii/awesome-elasticsearch 6 | 7 | 8 | ## Internals 9 | 10 | - [A Dive into the Elasticsearch Storage](https://www.found.no/foundation/dive-into-elasticsearch-storage/) 11 | 12 | In this article we'll investigate the files written to the data directory by various parts of Elasticsearch. We will look at node, index and shard level files and give a short explanation of their contents in order to establish an understanding of the data written to disk by Elasticsearch. 13 | 14 | 15 | ## Java tuning 16 | 17 | - [Elasticsearch Java Virtual Machine settings explained](http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html) 18 | - [Tuning Garbage Collection for Mission-Critical Java Applications](http://blog.mgm-tp.com/2013/03/garbage-collection-tuning/) 19 | - [G1: One Garbage Collector To Rule Them All](http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All) 20 | - [Use Lucene’s MMapDirectory on 64bit platforms, please!](http://blog.thetaphi.de/) 21 | - [Black Magic cookbook](http://product.hubspot.com/blog/g1gc-tuning-your-hbase-cluster) 22 | - [G1GC Fundamentals: Lessons from Taming Garbage Collection](http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection) 23 | - [JVM Garbage Collector settings 24 | investigation](https://tigase.tech/attachments/download/4808/GC-result.pdf) 25 | 26 | Comparison of jvm gc. Fantastic job! 27 | 28 | - [Garbage Collection Settings for Elasticsearch Master Nodes](https://dzone.com/articles/garbage-collection-settings-for-elasticsearch-mast) 29 | 30 | Fine tunine your garbage collector 31 | 32 | - [Understanding G1 GC Log Format](https://dzone.com/articles/understanding-g1-gc-log-format) 33 | 34 | To tune and troubleshoot G1 GC enabled JVMs, one must have a proper understanding of G1 GC log format. This article walks through key things that one should know about the G1 GC log format. 35 | 36 | How to start using G1 37 | ``` 38 | #ES_JAVA_OPTS="" 39 | ES_JAVA_OPTS="-XX:-UseParNewGC -XX:-UseConcMarkSweepGC -XX:+UseG1GC" 40 | 41 | ``` 42 | 43 | 44 | ## Durability & reliability 45 | 46 | - [Call me maybe: Elasticsearch](https://aphyr.com/posts/317-call-me-maybe-elasticsearch) 47 | 48 | In this post, we’ll explore Elasticsearch’s behavior under various types of network failure. 49 | 50 | - [Call me maybe: Elasticsearch 1.5.0](https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0) 51 | 52 | Data-loss scenarios 53 | 54 | - [Transactions in Elasticsearch](https://blog.codecentric.de/en/2014/10/transactions-elasticsearch/) 55 | 56 | How to achieve transactions in Elasticsearch? 57 | 58 | 59 | ## Performance 60 | 61 | - [Elasticsearch Refresh Interval vs Indexing Performance](http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/) 62 | 63 | Because refreshing is expensive, one way to improve indexing throughput is by increasing refresh_interval. Less refreshing means less load, and more resources can go to the indexing threads. How does all this translate into performance? Below is what our benchmarks revealed when we looked at it 64 | 65 | - [A-Z Guide on Scaling Elasticsearch](https://qbox.io/blog/a-z-guide-on-scaling-elasticsearch) 66 | 67 | In this article we will discuss the system settings in detail. This will guide you on the parameters and values to be considered in various levels including the operating system (we are considering the Unix-based systems here). Focus will also be given to the memory settings in Elasticsearch, and we will look even deeper into the heap memory management and fine tuning of the same. 68 | 69 | ## Monitoring 70 | 71 | - [Top 10 Elasticsearch Metrics to Watch](http://blog.sematext.com/2015/05/05/top-10-elasticsearch-metrics-to-watch/) 72 | 73 | This should be especially helpful to those readers new to Elasticsearch, and also to experienced users who want a quick start into performance monitoring of Elasticsearch. 74 | 75 | - [How to monitor Elasticsearch performance](https://www.datadoghq.com/blog/monitor-elasticsearch-performance-metrics/) 76 | 77 | Very good article from Datadog 78 | 79 | - [What should you monitor](https://support.lucidworks.com/hc/en-us/articles/201298247-What-should-you-monitor) 80 | 81 | Good checklist (with the explanations) 82 | 83 | ## Best practices 84 | 85 | - [Elasticsearch Indexing Performance Cheatsheet](https://blog.codecentric.de/en/2014/05/elasticsearch-indexing-performance-cheatsheet/) 86 | - [Six Ways to Crash Elasticsearch](https://www.found.no/foundation/crash-elasticsearch/) 87 | - [Playing HTTP Tricks with Nginx](https://www.elastic.co/blog/playing-http-tricks-nginx) 88 | - [Elasticsearch Tuning Plan](https://gist.github.com/mrflip/5366376) 89 | 90 | Nice check list 91 | 92 | - [Choosing a fast unique identifier (UUID) for Lucene](http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html) 93 | 94 | If have your own natural ID for each document, try to pick an ID that is friendly to Lucene. 95 | 96 | ## Books 97 | 98 | *In order of my personal preferences* 99 | 100 | - [Relevant Search](https://www.manning.com/books/relevant-search) - best book available on the market 101 | - [Deep Learning for search](https://www.manning.com/books/deep-learning-for-search) 102 | - [Mastering Elasticsearch - Second Edition](http://www.amazon.co.uk/Mastering-Elasticsearch-Second-Rafal-Kuc/dp/1783553790) 103 | - [ElasticSearch Cookbook Second Edition](http://www.amazon.co.uk/ElasticSearch-Cookbook-Second-Edition-Alberto/) 104 | - [Elasticsearch Server Second Edition](http://www.amazon.co.uk/Elasticsearch-Server-Second-Edition-Rogozi/dp/1783980524/) 105 | 106 | ## Video 107 | 108 | - ["Surviving Elasticsearch"](https://www.youtube.com/watch?v=gT-L6r37SPA) 109 | - https://berlinbuzzwords.de/18/session/elasticsearch-index-management-paas-style-logging-system 110 | 111 | ## Reading 112 | 113 | - https://medium.com/airbnb-engineering/listing-embeddings-for-similar-listing-recommendations-and-real-time-personalization-in-search-601172f7603e 114 | - https://www.elastic.co/blog/modeling-data-for-fast-aggregations 115 | - https://hackernoon.com/learning-to-rank-for-flight-itinerary-search-8594761eb867 116 | 117 | --------------------------------------------------------------------------------