WHAT IS DRF AND HOW IT WORKS?
As we have said in the previous post, Mesos provides two-level resource scheduling, the first level happens at the Mesos master that is responsible for deciding what resources are offered to each framework and when; the second level happens at the framework’s scheduler level which is responsible for accepting or rejecting the offers received from the master.
The Mesos master uses the allocation module for deciding how many resources to offer to each framework and in what order. This module is pluggable and Mesos provides a default algorithm called DRF (Dominant Resource Fairness).
The main idea behind DRF is trying to maximize the minimal dominant share across all users (frameworks). The dominant resource (cpu, memory, disk, …) is the one that a framework percentage-wise demands most. Imagine a cluster with a total amount of 30 CPUs and 512GB of memory, framework A demanding 5 CPUs and 16GB of memory and framework B demanding 4 CPUs and 128GB of memory. Framework A has the CPU as the dominant resource and framework B has the memory, or the same expressed as a share of the whole cluster resources: framework A 15% CPU and 3.125% of memory; framework B 13.3% CPU and 25% of memory.
At each round of the resource offering, the allocation module applies DRF to identify the dominant shares of the frameworks and offers the resources first to the one with smallest dominant share, then to the second smallest one and so on.
There is an additional set of features that Mesos provides in order to guarantee and tune the SLA’s of the framework’s resource allocations: roles, quotas, reservations, oversubscription, weights, etc, however, it is out of the scope of this brief introduction.
TWO EASY STEPS TO MONITOR THE OFFERS AND OTHER RELEVANT EVENTS
Now that we understand the basis of the DRF we are going to introduce a way to monitor how Mesos agents publish their available resources. The Mesos master manages resources offers and the framework’s schedulers reject offers or accept them and launch tasks…and also some other interesting events related to the lifecycle of the framework and its tasks.
The first step is to toggle the Mesos master log level to 3 for the amount of time that we want to monitor the frameworks, offers and tasks. For doing so you only have to make the following POST request:
http://:5050/logging/toggle?level=3&duration=1mins
With this level of log we’ll start seeing some key logs from the allocation module, the hierarchical Dominant Resource Fairness, and from the master itself. For more information about Mesos logging you can check it here.
As a second step, we’ll go to the dir containing the Mesos master logs and access the logs using the following command in order to avoid getting overwhelmed with the amount of log activity:
tail -f mesos-master.INFO | grep -Ei "added|removed|allocating|accept|decline|adding|launching|forwarding”
INTERPRETING THE IDENTIFIED LOG PATTERNS
Let’s explain each state or event identified in the log pattern including one sample.
Two main events of the framework lifecycle
- added: represents the moment when a framework is registered with the Mesos master, the log includes the unique id assigned to the framework. This unique id is fundamental and will allow us to correlate all the traces related to a concrete framework.
I1027 00:41:36.338513 1166 hierarchical.cpp:271] Added framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
- removed: represents the moment when a framework is unregistered from the Mesos master, for example the framework’s scheduler stops.
I1027 00:42:01.495795 1163 hierarchical.cpp:333] Removed framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
Three main events of an offer lifecycle
- allocate: the Mesos master sends an offer to a framework’s scheduler. Note that the log shows the detail of the resources from an agent offered by Mesos master to a concrete framework, identified by the framework’s unique id.
I1027 00:41:36.338580 1166 hierarchical.cpp:1511] Allocating cpus(*):4; mem(*):14922; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
- accept: the framework’s scheduler accepts the offer but we have no idea of how many resources have been taken. We’ll know it when a task is launched with the accepted resources. Remember that the scheduler should only take the resources that it needs, not necessarily all the resources included in the offer.
I1027 00:41:36.341940 1163 master.cpp:3342] Processing ACCEPT call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O645 ] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 at slave(1)@127.0.1.1:5051 (irodriguez) for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080 (mesos-execute instance)
- decline: the framework’s scheduler declines the offer.
I1027 00:41:42.217736 1160 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O647 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080 (mesos-execute instance)
Four main events of the task lifecycle
- adding: it could correspond with the TASK_STAGING state, in which the Mesos master has received the framework’s scheduler request to launch a task but the task hasn’t yet started to run.
I1027 00:41:36.342283 1163 master.cpp:7447] Adding task F1-TASK with resources cpus(*):2; mem(*):128 on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 (irodriguez)
- launching: it could correspond with the TASK_STARTING state, in which the executor has learned about the existence of the task and prepares to run it.
I1027 00:41:36.342308 1163 master.cpp:3831] Launching task F1-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080 (mesos-execute instance) with resources cpus(*):2; mem(*):128 on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 at slave(1)@127.0.1.1:5051 (irodriguez)
- forwarding: allows us to identify the different task status updates. The following traces show the TASK_RUNNING status update, indicating that the task has begun running successfully. And the second status update shows that the task has been completed successfully, TASK_FINISHED
I1027 00:41:36.426267 1166 master.cpp:5199] Forwarding status update TASK_RUNNING (UUID: 09444748-8f44-430e-9f8e-2ba9b4f47963) for task F1-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
…
I1027 00:42:01.493413 1164 master.cpp:5199] Forwarding status update TASK_FINISHED (UUID: b90c09e4-80c8-4f75-8610-d46f56c3af76) for task F1-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
If you are interested on the complete lifecycle of a task, it is described here: “The Life Cycle of a Task”
PUTTING EVERYTHING TOGETHER: 3 FRAMEWORKS FIGHTING FOR RESOURCES
We are going to introduce an easy example simulating 3 frameworks (F1, F2 and F3) which will arrive at the Mesos cluster at different times demanding different amounts of resources.
Framework | Arrive time | Requested CPU | Requested memory | Duration |
F1 | t0 | 2 CPUs | 128 MB | 25 secs |
F2 | t0+5 | 1.5 CPUs | 400 MB | 20 secs |
F3 | t0+11 | 1.8 CPUs | 300 MB | 20 secs |
The first framework will request 2 CPUs and 128MB of memory and will be running for 25 seconds. The second framework will arrive to the cluster 5 seconds later, will request 1.5 CPUs and 400MB of memory and will be running for 20 seconds. And finally the third framework will arrive 6 seconds later, requesting 1.8 CPUs and 300 MB of memory, and it will be running for 20 seconds. Each framework will run only one task which will last the forementioned time in each case.
The following script uses a built-in Mesos scheduler, the CommandScheduler, via mesos-execute command and represents the scenario described above:
#!/bin/sh
mesos-execute --master=127.0.1.1:5050 --resources="cpus:2;mem:128" --name="F1-TASK" --command='echo "Tarea 1 del Framework 1";sleep 25'
sleep 5
mesos-execute --master=127.0.1.1:5050 --resources="cpus:1.5;mem:400" --name="F2-TASK" --command='echo "Tarea 1 del Framework 2";sleep 25'
sleep 6
mesos-execute --master=127.0.1.1:5050 --resources="cpus:1.8;mem:300" --name="F3-TASK" --command='echo "Tarea 1 del Framework 3";sleep 20'
You can execute the above script in your Mesos cluster and try to analyze the logs with the guidelines provided.
The following images show the execution of the above script in a Mesos cluster with one agent with 4 CPUs and 14922MB of memory available. The evolution in the time of the max share of each framework is included, it is very interesting to see how the DRF allocates the offers from the min share to the max share. Two screenshots of the Mesos UI showing the active frameworks with the assigned resources, the number of tasks and the max share of each one in different moments of the execution are also included.
Presented scenario: frameworks, offer and tasks lifecycle I/II
Mesos UI t0+11
Presented scenario: frameworks, offer and tasks lifecycle II/II
Mesos UI t0+Y
Finally here is the result of applying our pattern to the log of the Mesos master, having previously toggled the log level to 3:
I1027 00:41:36.338513 1166 hierarchical.cpp:271] Added framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
I1027 00:41:36.338580 1166 hierarchical.cpp:1511] Allocating cpus(*):4; mem(*):14922; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
I1027 00:41:36.341940 1163 master.cpp:3342] Processing ACCEPT call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O645 ] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 at slave(1)@127.0.1.1:5051 (irodriguez) for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080 (mesos-execute instance)
I1027 00:41:36.342283 1163 master.cpp:7447] Adding task F1-TASK with resources cpus(*):2; mem(*):128 on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 (irodriguez)
I1027 00:41:36.342308 1163 master.cpp:3831] Launching task F1-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080 (mesos-execute instance) with resources cpus(*):2; mem(*):128 on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 at slave(1)@127.0.1.1:5051 (irodriguez)
I1027 00:41:36.426267 1166 master.cpp:5199] Forwarding status update TASK_RUNNING (UUID: 09444748-8f44-430e-9f8e-2ba9b4f47963) for task F1-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
I1027 00:41:41.164577 1159 hierarchical.cpp:271] Added framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081
I1027 00:41:41.164649 1159 hierarchical.cpp:1511] Allocating cpus(*):2; mem(*):14794; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081
I1027 00:41:41.166434 1163 master.cpp:3342] Processing ACCEPT call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O646 ] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 at slave(1)@127.0.1.1:5051 (irodriguez) for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081 (mesos-execute instance)
I1027 00:41:41.166695 1163 master.cpp:7447] Adding task F2-TASK with resources cpus(*):1.5; mem(*):400 on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 (irodriguez)
I1027 00:41:41.166718 1163 master.cpp:3831] Launching task F2-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081 (mesos-execute instance) with resources cpus(*):1.5; mem(*):400 on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 at slave(1)@127.0.1.1:5051 (irodriguez)
I1027 00:41:41.243779 1160 master.cpp:5199] Forwarding status update TASK_RUNNING (UUID: c050e32b-99a3-4b07-a792-6d539717d8a8) for task F2-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081
I1027 00:41:42.216320 1166 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
I1027 00:41:42.217736 1160 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O647 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080 (mesos-execute instance)
I1027 00:41:46.219952 1166 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081
I1027 00:41:46.221310 1166 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O648 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081 (mesos-execute instance)
I1027 00:41:47.220943 1159 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
I1027 00:41:47.222218 1171 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O649 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080 (mesos-execute instance)
I1027 00:41:47.784749 1159 hierarchical.cpp:271] Added framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:41:47.784827 1159 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:41:47.787749 1163 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O650 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082 (mesos-execute instance)
I1027 00:41:51.223963 1163 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081
I1027 00:41:51.225608 1166 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O651 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081 (mesos-execute instance)
I1027 00:41:52.225154 1159 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
I1027 00:41:52.228976 1161 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O652 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080 (mesos-execute instance)
I1027 00:41:53.227172 1171 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:41:53.231391 1166 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O653 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082 (mesos-execute instance)
I1027 00:41:56.230902 1169 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081
I1027 00:41:56.233460 1164 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O654 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081 (mesos-execute instance)
I1027 00:41:57.231640 1164 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
I1027 00:41:57.235337 1163 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O655 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080 (mesos-execute instance)
I1027 00:41:59.232581 1161 hierarchical.cpp:1511] Allocating cpus(*):0.5; mem(*):14394; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:41:59.233829 1161 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O656 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082 (mesos-execute instance)
I1027 00:42:01.493413 1164 master.cpp:5199] Forwarding status update TASK_FINISHED (UUID: b90c09e4-80c8-4f75-8610-d46f56c3af76) for task F1-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
I1027 00:42:01.495795 1163 hierarchical.cpp:333] Removed framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0080
I1027 00:42:02.234576 1169 hierarchical.cpp:1511] Allocating cpus(*):2.5; mem(*):14522; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:42:02.236279 1163 master.cpp:3342] Processing ACCEPT call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O657 ] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 at slave(1)@127.0.1.1:5051 (irodriguez) for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082 (mesos-execute instance)
I1027 00:42:02.236750 1160 master.cpp:7447] Adding task F3-TASK with resources cpus(*):1.8; mem(*):300 on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 (irodriguez)
I1027 00:42:02.236784 1160 master.cpp:3831] Launching task F3-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082 (mesos-execute instance) with resources cpus(*):1.8; mem(*):300 on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 at slave(1)@127.0.1.1:5051 (irodriguez)
I1027 00:42:02.343415 1160 master.cpp:5199] Forwarding status update TASK_RUNNING (UUID: 41a5806f-6c35-4ea7-ac10-a4b25141e2b0) for task F3-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:42:03.235754 1160 hierarchical.cpp:1511] Allocating cpus(*):0.7; mem(*):14222; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081
I1027 00:42:03.236974 1160 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O658 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081 (mesos-execute instance)
I1027 00:42:06.308914 1164 master.cpp:5199] Forwarding status update TASK_FINISHED (UUID: 0eb9e99f-0bdd-4d54-afa3-2c49dac5ff6e) for task F2-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081
I1027 00:42:06.312667 1166 hierarchical.cpp:333] Removed framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0081
I1027 00:42:07.239861 1171 hierarchical.cpp:1511] Allocating cpus(*):2.2; mem(*):14622; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:42:07.242084 1166 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O659 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082 (mesos-execute instance)
I1027 00:42:12.252518 1163 hierarchical.cpp:1511] Allocating cpus(*):2.2; mem(*):14622; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:42:12.253914 1166 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O660 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082 (mesos-execute instance)
I1027 00:42:17.257071 1163 hierarchical.cpp:1511] Allocating cpus(*):2.2; mem(*):14622; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:42:17.258216 1159 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O661 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082 (mesos-execute instance)
I1027 00:42:22.263494 1164 hierarchical.cpp:1511] Allocating cpus(*):2.2; mem(*):14622; disk(*):925185; ports(*):[31000-32000] on agent 9ecb851f-3841-4dbf-b167-fe608fb11a86-S0 to framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:42:22.267007 1171 master.cpp:3951] Processing DECLINE call for offers: [ 9ecb851f-3841-4dbf-b167-fe608fb11a86-O662 ] for framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082 (mesos-execute instance)
I1027 00:42:22.389632 1159 master.cpp:5199] Forwarding status update TASK_FINISHED (UUID: c97786e7-d711-468e-9957-da7777b6208a) for task F3-TASK of framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
I1027 00:42:22.391718 1166 hierarchical.cpp:333] Removed framework 9ecb851f-3841-4dbf-b167-fe608fb11a86-0082
CONCLUSIONS
In this post we have briefly explained what the default Mesos allocation module is and how it works, following that, we have introduced a comprehensible approach for monitoring the offers and some key events in the lifecycle of a framework and its tasks. The described method is only valid for the default Mesos allocation algorithm, the hierarchical Dominant Resource Fairness. This is a simple but powerful approach as long as it allows us to have a cluster wide vision of the offers, tasks and frameworks lifecycle . The described method could be used as the basis for building a graphical dashboard or some other kind of tool based on the described log events and anything else that could be of interest and not described in this post.

I’m a technical engineer in computer systems with more than 14 years experience as a specialist in Real Time and Event Driven Architectures, having a vast background in technologies related to stream processing, complex event processing, messaging middelwares, rule engines and integration solutions. Now I’m facing new challenges in leading the technical architecture area in Datio.