Achieving high density deployments with NUMA
There are many factors that can affect the performance of Virtual Machines (VMs) running on host hardware. One of these is how the VM interacts with NUMA.
This section provides an overview of NUMA and how it applies to Pexip Infinity Conferencing Nodes. It summarizes our recommendations and suggests best practices for maximizing performance.
NUMA stands for non-uniform memory access. It is an architecture that divides the computer into a number of nodes, each containing one or more processor cores and associated memory. A core can access its local memory faster than it can access the rest of the memory on that machine. In other words, it can access memory allocated to its own NUMA node faster than it can access memory allocated to another NUMA node on the same machine.
We strongly recommend that a Pexip Infinity Conferencing Node is deployed on a single NUMA node in order to avoid the loss of performance incurred when a core accesses memory outside its own node.
In practice, with modern servers, each socket represents a NUMA node. We therefore recommend that:
- one Pexip Infinity Conferencing Node VM is deployed per socket of the host server
- the number of vCPUs that the Conferencing Node VM is configured to use is the same as or less than the number of physical cores (or logical threads, if NUMA affinity is enabled) available in that socket
- to utilize the logical threads of a socket (hyperthreading), all VMs must be pinned to their respective sockets within the hypervisor. For more information, see Achieving additional performance with VMware NUMA affinity and hyperthreading.
You can deploy smaller Conferencing Nodes over fewer cores/threads than are available in a single socket, but this will reduce capacity.
Deploying a Conferencing Node over more cores (or threads when pinned) than provided by a single socket will cause loss of performance, as and when remote memory is accessed. This must be taken into account when moving Conferencing Node VMs between host servers with different hardware configuration: if an existing VM is moved to a socket that contains fewer cores/threads than the VM is configured to use, the VM will end up spanning two sockets and therefore NUMA nodes, thus impacting performance.
To prevent this occurring, ensure that either:
- you deploy Conferencing Nodes only on servers with a large number of cores per processor
- the number of vCPUs used by each Conferencing Node is the same as (or less than) the number of cores/threads available on each NUMA node of even your smallest hosts.
As well as the physical restrictions discussed above, the hypervisor can also impose restrictions. VMware provides virtual NUMA nodes on VMs that are configured with more than 8 CPUs. This default value can be altered by setting numa.vcpu.min in the VM's configuration file.
We are constantly optimizing our use of the host hardware and expect that some of this advice will change in later releases of our product. However our current recommendations are:
- Prefer processors with a high core count.
- Prefer a smaller number of large Conferencing Nodes rather than a larger number of smaller Conferencing Nodes.
- Deploy one Conferencing Node per NUMA node (i.e. per socket).
- Configure one vCPU per physical core on that NUMA node (without hyperthreading and NUMA pinning), or one vCPU per logical thread (with hyperthreading and all VMs pinned to a socket in the hypervisor).
- Populate memory equally across all NUMA nodes on a single host server.
- Do not over-commit resources on hardware hosts.
See Resource allocation case study for examples.