Hi all again!!
This is the third tech-paper of this serie dedicated to the optimization of an Elasticsearch Cluster.
Here we can find how to configure your searches in order to make them faster and more efficient.
Avoid big http requests
In search tasks Elasticsearch needs to fetch the Id of all the documents, this operation has a high performance cost in big documents due the way the filesystem cache works.
In order to control this, you can configure it:
http.max_context_length (default value 100 MB) ES will refuse to accept bigger http requests that the size configured in this value.
Model your data
Denormalize your data (create redundant copies of the most used-searched data in different parts of the index itself) ,the goal of this modeling is to decrease the searching time ,making data more accessible to be found by the queries you execute.
In elasticsearch there are 2 approaches at configuration level that deserves to have a look.
- Nested Document /Query
- Parent & Child Relationship
Cache is your friend
To enable cache in your requests (query context) :
To enable cache in your queries(filter context):
Also, after a cluster restart, it is highly recommended to have a “warm up time”, as the filesystem cache will be empty , give it enough time to the FS cache to load enough amount of documents to make search operations fast again.
Prioritize different cache options in Elasticsearch (filesystem cache, the request cache or the query cache) , when you execute same search operations in a row, you can go to different shards of the cluster, so in one of the searches the cache is located in another shard, so you loose search consistency. To avoid this behaviour you can set up:
* I will dive deeper in cache option in next tech-papers
After rebooting your elasticsearch cluster, you do not have any cached data, but you have the option to say to put in memory while is starting the cluster the documents/indexes that you consider.*
*Do not overuse this conf, do it according to your hardware and OS’s configuration
Limit your borders
It is always better to make a search in determinate data and not in all the data.
Keep an eye in DNS cache
By default Elasticsearch (JVM) keep positive hostnames resolutions indefinitely in cache , in arquitectures with the option node-to-node resolution, the dns resolution might change, so you can have a problem with that.
The best option is to configure a ttl in your JVM options:
Monitor your cache
You can control the current status of the cluster cache thanks to the indices stats (you can have quite a few different statistics about what is going on in your cluster).
SysAdmin landed happily in Datio after a few years managing physical and cloud platforms. Words like performance optimization, control, availability, reliability, scalability and system integrity are hardcoded in my IT-DNA.
When I am not in front of a black screen, I love to spend time with my 2 little khalessis and also run in the mountains where I live, fill me up of pure energy.