Improving VM performance in OpenStack: NUMA and CPU Pinning

Today we are going to see how to improve the performance of a VM running in OpenStack.

Memory has a large impact in the performance of workload. This affirmation is specially true if the workload is running on a VM, so it’s necessary to be careful with the memory and NUMA if the machine supports it.

 

But wait! What is NUMA?

In the past, processors had been designed as Symmetric Multi-processing or  Uniform Memory Architecture machines, which means that all processors shared the same access to all memory available in the system.

 

However that changed a few years ago with AMD Opteron and Intel Nehalem proccesors. They implemented a new architecture called Non-uniform memory access (NUMA) or more correctly Cache-Coherent Numa (ccNUMA). In this architecture each processor has a “local” bank of memory, to which it has a much closer (lower latency) access. Of course a processor can access to the whole memory available in the system, but at a pottentialy higher latency and lower performance.

 

 

As you can see on the diagram, each processor has a “local” bank of memory. If data resides in local memory, access is very fast, however if data resides in remote memory, access it’s slower and you’ll get a performance hit.

 

CPU Pinning and OpenStack

In OpenStack if you create a VM using a flavor with 2 or more VCPU’s, your VCPU’s could be mapped to different phyiscal memory zones (NUMA node0 and NUMA node1) which would imply that your VPCPU’s would need to access to two different memory zones. This is a major problem if you want to squeeze your performance. Let’s see how can we deal with this problem.

First of all, you should check if your machine supports NUMA using the following command:

lscpu | grep NUMA 
NUMA node(s): 2 
NUMA node0 CPU(s): 0-17,36-53 
NUMA node1 CPU(s): 18-35,54-71

As you can see on the example, this machine has 2 NUMA nodes. The first seventeen cores belongs to the first NUMA node. Cores from 18 to 35 belong to the second NUMA node. Don’t take into consideration cores from 36 to 71 since they’re additional threads from HyperThreading. It’s very important for CPU pinning to know which cores are virtual and which cores are pyhsical. Uses lscpu tool to identify physical and virtual cores:

lscpu 
Architecture: x86_64 
CPU op-mode(s): 32-bit, 64-bit 
Byte Order: Little Endian 
CPU(s): 72 On-line 
CPU(s) list: 0-71 
Thread(s) per core: 2 
Core(s) per socket: 18 
Socket(s): 2 
NUMA node(s): 2

Lastly, check that your hypervisor is aware of NUMA topology.

virsh nodeinfo 
CPU model: x86_64 CPU(s): 72 
CPU frequency: 2099 MHz 
CPU socket(s): 2 
Core(s) per socket: 
18 Thread(s) per core: 2 
NUMA cell(s): 2 
Memory size: 150582772 KiB

On each compute node that pinning of virtual machines will be allowed we need to edit the nova.conf file and set the following option:

vcpu_pin_set=0-30

This options specifies a list or range of physical CPU’s cores to reserve for VM’s. OpenStack will ensure that your VM’s will be pinned to these CPU cores.

Now, restart nova-compute on each compute node (the name of the package could be different if you’re using Ubuntu)

systemctl restart openstack-nova-compute

We have set up that our VM’s will be pinned to cores 0 to 30. We need to ensure that host processes would not run on these cores. We can achieve that with the **isolcpus** kernel argument.

On Red Hat 7 and derivatives you can edit boot options using grubby:

grubby --update-kernel=ALL --args="isolcpus=0-30

Update boot record after that:

grub2-install your_boot_device

And, of course, reboot the machine.

 

NUMA and OpenStack Scheduler: How to?

We’re very close to being able to launch VM’s pinned to our physical cores. However, we need to set up a few more things.

First of all we need to edit the Nova-Scheduler filters. Adding AggregateInstanceExtraSpecFilter and NUMATopologyFilter values to the list of scheduler_default_filters.  We’ll be using these filters to segregate compute nodes that can be used for CPU pinning from those that can not, and to apply NUMA aware scheduling rules when launching instances.

Your scheduler_default_filters should look similar to this one:

scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter,NUMATopologyFilter,AggregateInstanceExtraSpecsFilter

Now, create the “numa_nodes” host aggregate. This hosts will hosts our pinned VM’s:

nova aggregate-create numa_nodes

We need to create some metadata for the “numa_nodes” aggregate. This metadata will match the flavors which will be used to instantiate our VM’s pinned to the phyisical cores:

nova aggregate-set-metadata 1 pinned=true

Also, we’re going to create another host aggregate for hosts which will not host pinned VM’s:

nova aggregate-create normal 
nova aggregate-set-metadata 2 pinned=false

Update the existing flavors so that their extra spec’s match them to compute hosts in normal aggregate:

for FLAVOR in `nova flavor-list | cut -f 2 -d ' ' | grep -o [0-9]*`; \ do 
nova flavor-key ${FLAVOR} set \ "aggregate_instance_extra_specs:pinned"="false"; \ 
done

Create a new flavour for our pinned VM’s:

nova flavor-create m1.small.numa 6 2048 20 2

We need to set *hw:cpy_policy* flavor extra specification to dedicated This option specifies that all instances created using this flavor will require dedicated compute resources and will be pinned to physical cores accordingly.

nova flavor-key flavor_id set hw:cpu_policy=dedicated

Set the *aggregate_instance_extra_specs:pinned* flavor extra specification to **true**:

nova flavor-key flavor_id set 

aggregate_instance_extra_specs:pinned=true

Lastly, add some compute hosts to our “numa_nodes” aggregate. Compute nodes which are not intended to be targets for pinned instances sholud be added to our “normal” aggregate:

nova aggregate-add-host 1 compute-node-1

And, that’s all!  Happy “OpenStacking” :)!

 

mm
On my last job at Institute For Biocomputation and Physics of Complex Systems (BIFI) I grew up with OpenStack, Docker and a lot of different technologies related to the cloud enviroment. Also, I had to deal with a lot of mad scientists!
Lover of automatization using Ansible, I keep the OpenStack up and running at Datio, but I’ve also have to deal with a lot of mad people! 🙂
mm

Carlos Gimeno

On my last job at Institute For Biocomputation and Physics of Complex Systems (BIFI) I grew up with OpenStack, Docker and a lot of different technologies related to the cloud enviroment. Also, I had to deal with a lot of mad scientists! Lover of automatization using Ansible, I keep the OpenStack up and running at Datio, but I've also have to deal with a lot of mad people! 🙂

More Posts

2 thoughts on “Improving VM performance in OpenStack: NUMA and CPU Pinning

  1. Hello Sergio and thanks for your comment.

    Regarding the difference of performance between using NUMA and not using it, it’s difficult to put some figures on it since the perfomance gain would depend heavily on your workload.

    My advice: If you have enough resources, launch your workload on some compute nodes without NUMA and then launch it on compute nodes with NUMA

    Regards

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *