Elasticsearch universe: Architectural perspective

Hi Folks!!

This is going to be the first of a 4 articles serie, the subject on which I am going to write about is ElasticSearch, and how we  can set up a proper architecture , and also how we can optimize our configuration to improve the performance and reliability of the service.

A few details about Elasticsearch:

  • Elasticsearch is based on  Lucene , a distributed open source search and analytics engine, designed for horizontal scalability, multi-tenant, and easy management.

One of the most powerful features is the possibility   to interact with the data stored with 2 differents modes:

  • API: It is a way to interact between client and server through http methods  (GET, POST, PUT, DELETE).
  • Elasticsearch DSL: Library located on top of Elasticsearch, which enable us to write write and run queries against the data stored.

After this short description of Elasticsearch, next step is to figure out how an Elasticsearch cluster must be set up in production environments.

My first thoughts in  the approach to the architecture are summarized in these terms:

  • High Availability
  • Fault tolerance

Diving in the configuration options of Elasticsearch, we find that Elastic offers us different roles inside a cluster.

  • Master:  an instance with this role enabled, holds the cluster state and handle where are all the shards located in the cluster. When a cluster master goes down, the cluster automatically starts a new master election process and picks a new master from all the nodes with the master role.
  • Data: these instances stores all the data (indexes in Elasticsearch) and also make all the CRUD, search and aggregations operations.
  • Coordinator: it is a node  that does not hold data, is not master-eligible and  not ingest enabled,It “only”· route requests, distributing bulk indexing operations, we can say It does work as “load balancer”

So, once we know the different roles we can configure in an Elasticsearch cluster, keep moving Bro !!!

So, let’s start by how to set up such an stable architecture:

First of all, we need to avoid what we call “split brain”. Let’s say we have a cluster with two instances and the role master is enabled.

Something that never happens could happen, for instance  a network failure, and the communication between both masters get interrupted:

Both masters,believe the other node-master is unavailable so they decide to  become active:

The implication  of this behaviour is that we have 2 nodes with the role master enabled, so both will receive indexing requests, and  will try to allocate the shards (primary and replica) , so there will be 2 primaries replicas, which makes our cluster inconsistent.

In order to avoid this  behaviour , we need to follow the next steps:

First of all, in the main conf file elasticsearch.yml


This means the amount of master nodes must have connectivity between them, The math rule is based is N/2 + 1 (where N is the number of nodes) , so basically this means we need at least 3 nodes with the role master enabled to achieve the fault tolerance and High availability.

Once we do have all the info we can start to figure out how a proper physical architecture should be, and the picture that appears is as follow:

That’s all for this first approach to the wonderful world of elasticsearch, see you in the next chapter!!!


Related links:





Félix Rodríguez

SysAdmin landed happily in Datio after a few years managing physical and cloud platforms. Words like performance optimization, control, availability, reliability, scalability and system integrity are hardcoded in my IT-DNA. When I am not in front of a black screen, I love to spend time with my 2 little khalessis and also run in the mountains where I live, fill me up of pure energy.

More Posts

Follow Me:

2 thoughts on “Elasticsearch universe: Architectural perspective”

  1. Hello Felix and congratulations for your post!

    If you’re kind enough I would like to ask you a question. You said that we need at least 3 Master nodes to achieve HA/fault tolerance, however taking a look into your diagram, looks like coordinator node could be a SPOF. How do you deal with a faulty coordinator node?

    Kind regards

  2. Hi
    First of all thanks for your words, and for your interest in my post.Regarding your question, every node in the cluster, behave like a coordinator node (master and data nodes can route requests, handle the search reduce phase, and bulk indexing) and also their own duties .The main reason to set up a dedicated coordinator node, is to offload master and data nodes.
    However setting up several coordinator nodes,is not recommended, as master nodes needs to get the cluster state of every single node, and this increse the global load.
    Also If coordinator node gets down , the cluster keeps alive.
    Thanks again

Comments are closed.