In the second chapter of this serie, we are going to face one of the main issues that a SysAdmin always have to deal with, when we set up a production cluster in any technology;
- How many resources are we going to need?
First of all, our Elasticsearch cluster is going to be deployed in a virtual environment running under Openstack or Vmware (not in bare metal servers or running like a framework in Mesos).
Firstly , we have done our “numbers” testing in a development environment and using tools like:
- Rally: Official benchmarking tool for elasticsearch. https://www.elastic.co/blog/announcing-rally-benchmarking-for-elasticsearch
- Jmeter: Tool for load testing and measurements actions. http://jmeter.apache.org/
Once we “really know” the load we expect the cluster is going to handle, it is time to decide the number of nodes, and which role is going to assume each one, in our case :
- master nodes: 3
- data nodes: 3
- coordinator node: 1
Our cluster performs quite high networking consum tasks (data transfer, allocation of shards) , so basically, we need at least 1GB bandwidth in your network, or even better 10GB, we must focus in getting the lowest latency possible.
It is not recommended (although Elasticsearch allows it) to use multi-datacenter architecture. High network latency is one of the worst enemies inside an Elasticsearch cluster.
Depending on the role inside the cluster, the hardware requirements will change. So keep in mind the following:
- Master nodes: They do not store any data, only metadata, and also they do not do any CRUD operations, so with a medium-size instance would be more than enough.
- Data node: Store and organize all the indexes and shards, also they do all the CRUD operations.
- Coordinator nodes: Aggregate and serve all the requests.
Based on this info, we’re going to proceed to optimize (X) based on this summary:
If we can afford it, we should go and get SSD disks without any doubt, HDD is the second most recommended option.
Avoid the use of network storage like NAS,NFS or SMB, as we already know, network latency affects quite a lot to the performance of the cluster.
RAID0 is the storage conf recommended because it offers us high write performance.
Tasks like indexing, analyze, searching , and CRUD operations, have a high impact on the CPU, so it is a very important point to think about when we are setting up an Elasticsearch cluster.
CPU use is divided in thread pools of different kinds (index,bulk,get, etc.) ,all of this can be configured (we will show you in the next tech papers).
Choose some cpu processors with good performance in horizontal designs, and mainly in the data nodes, the more processors we install, the better performance we will have.
As elasticsearch runs on Java, from the RAM perspective, memory runs divided in 2, one part is allocated in the java heap space and everything else, this is important to be aware of.
It is recommended to reserve 50% of the total memory to JVM heap size, also the max size (due to jvm issues with reservations over 32 GB, it is not recommended to go over this size)
In the next tech-paper you will find an approach to the optimization of the configuration of elasticsearch for search tasks.
SysAdmin landed happily in Datio after a few years managing physical and cloud platforms. Words like performance optimization, control, availability, reliability, scalability and system integrity are hardcoded in my IT-DNA.
When I am not in front of a black screen, I love to spend time with my 2 little khalessis and also run in the mountains where I live, fill me up of pure energy.