Mesos provides an abstraction layer over a cluster of machines with heterogeneous resources and makes it seem like a single big machine, emulating a pool of resources, being able to run different kinds of workloads. With this approach, Mesos removes the need to allocate specific machines for each kind of workload (static partitioning).
Mesos is a cluster resource manager which targets the principal requirements of these kinds of technologies:
- Efficient isolation and sharing of resources across distributed applications
- Optimal usage of the resources of the cluster
- Support a wide range of frameworks
- Being able to deal with hardware heterogeneity
- Support different types of resources
- High scalability, being able to have clusters of tens of thousands of nodes
- Fault tolerance guarantees for the system and the applications
- High availability of the central scheduler component
Let’s take a look into its architecture in order to begin to understand how it works. There are three main components in the Mesos architecture:
- Mesos frameworks: are applications that run on top of mesos. A framework is divided into two parts: the scheduler and the executor, the first one acts as a controller and the second one is responsible for doing the work.
- Mesos master: is the mediator between the agent’s resources and the frameworks and is responsible for the fine-grained sharing of resources among the frameworks.
- Mesos agent (aka Mesos slave in versions previous to 1.0): manages the resources on physical nodes and runs the framework’s executors.
Taking a closer look at the picture there are other relevant terms and elements that deserve a quick explanation:
- Apache Zookeeper is used for electing the leader of the Mesos cluster and also could be used by agents to join the cluster.
- Offers: is a list of some or all of the agent’s node’s available resources (CPU, memory, disk, ports, GPU and other custom resources).
- Tasks: is a unit of work that is scheduled by a framework and executed on an agent node.
Now that we have big picture overview of the Mesos architecture let’s dive into the roles and responsibilities of each one of the main components:
- Mesos masters are the core of the cluster, they serve the following main purposes:
- Guarantee the HA of the cluster, Mesos uses a fail-fast approach in combination with one active and several standby masters coordinated through Zookeeper. Mesos leverage Paxos protocols for replicating the status of the cluster in a durable way.
- Hosting the primary UI which provides information about available resources in the cluster, registered frameworks and their tasks, links with the slave UIs, offers, …
- They act as the central source of true for all running tasks, the masters store in memory the metadata relate to tasks, for the completed ones they make a best effort (there is only a fixed amount of memory available for completed tasks), this allow masters to serve the UI and data about the tasks with the minimal latency.
- Offering resources to the frameworks using the master’s allocator module, which contains the logic for deciding which resources to offer to each registered framework and when. This module is pluggable and Mesos provides a default algorithm called DRF (Dominant Resource Fairness) and a set of mechanisms for tuning the framework’s SLA: roles, weights, reservations, quotas, …
- Mesos agents responsibilities include:
- Launching and managing the containers that host the executors (everything in Mesos runs inside a container).
- Providing a UI for accessing data inside the containers (sandboxes): files related to dependencies resolved by Mesos fetcher, output of executors (stdout, stderr), …
- Managing the communication between their local executors and Mesos masters acting as intermediaries.
- Publishing information related to the host they are running in, including information about running tasks and executors, available resources of the host and other metadata.
- Managing status update of tasks: it is responsibility of the Mesos agent to guarantee the delivery of the status update of the tasks to the schedulers.
- Checkpoints of his state in order to support rolling updates of the cluster.
- Frameworks. As we have said a framework has two parts the scheduler and the executor.
- Scheduler has the following main responsibilities:
- Registering itself in the Mesos master, and as a result gets a unique framework id.
- Handling the offers: launch tasks when the resource requirements and constraints match with the received offers from the Mesos master.
- Handling the lifecycle of the task, being responsible for handling task failures and errors.
- Persisting the state in order to accomplish high availability and fault tolerance
- Receive requests from the clients (users of the framework), it is usual to expose a REST API, but it is something which is up to each framework.
- Executor has three responsibilities:
- Executing the task launched by the scheduler.
- Notify the scheduler with the status of each task.
- Handling other kind request (no tasks) from the scheduler.
- Scheduler has the following main responsibilities:
From the previous points you can deduce that Mesos provides two-level resource scheduling, the first level happens at the Mesos master that is responsible for deciding which resources to offer to each framework and when; the second level happens at the framework’s scheduler level which is responsible for accepting or rejecting the offers received from the master.
It is important to understand what a task is. A task represents a job to do and physically it can be anything: it can be a new thread inside the executor, or separate physical process, it depends on each framework and it is up to you if you are building your own. Two popular frameworks leverage different approaches of task, in the Kafka framework a task is a broker of a Kafka cluster, which physically is a different process than the executor; in the Spark framework the tasks are threads spawned inside the executor process. There can be any number of tasks associated with an executor but it is usual to find one task per executor.
As we have said, everything in Mesos runs inside a container: the schedulers, the executors with their tasks. Mesos supports Docker and its own container technologies, based on Linux control groups, and provides the mechanism for adding other container technologies. It is not within the scope of this post to talk about the containers inside Mesos, but it is relevant to understand that these are the basis under which some of the main features of Mesos rely, such as sharing and isolating resources and it can also help us to understand what is happening under the hood when we take a look at the physical processes running in the cluster.
Well, we have a scheduler that is able to handle the failures of the task sent to the executors, but who cares about the scheduler, who is responsible for keeping it up and running? If you have a good implementation of your scheduler and you are persisting the state on a regular basis there is only one more thing that you need: someone else taking care of you. A usual pattern that you will find when you get involved in Mesos is launching the scheduler of your frameworks via Marathon, which is another framework specialized in “long running tasks”. Marathon will do the work of keeping your scheduler up and running for you, you’ll only have to worry about recovering the persisted state when you get a failure back from the framework’s scheduler. From this point of view, Marathon could be seen as the framework of frameworks. But Marathon is able to run any kind of long running task, called apps in his argot.
To close this brief overview of Apache Mesos architecture we are going to introduce a built-in Mesos executor: the CommandExecutor. This executor allows us to execute shell commands and Docker containers on behalf of any framework scheduler. It is good practice to leverage the features of the CommandExecutor if you are going to build a framework and your executor has not got very complicated requirements.
And lastly, our big finale, leveraging the CommandScheduler, a built-in Mesos scheduler, and the previously mentioned CommandExecutor via mesos-execute:
mesos-execute --master=127.0.1.1:5050 --name="FarewellMesosTask" --command='echo "Thats all folks!!!";sleep 15'
I’m a technical engineer in computer systems with more than 14 years experience as a specialist in Real Time and Event Driven Architectures, having a vast background in technologies related to stream processing, complex event processing, messaging middelwares, rule engines and integration solutions. Now I’m facing new challenges in leading the technical architecture area in Datio.