Capacity planning

The capacity of each Transcoding Conferencing Node in your Pexip Infinity deployment — in terms of the number of connections* that can be handled simultaneously — depends on a variety of factors including:

The server capacity and hardware configuration.
The type of call — Full HD, HD, SD, or audio-only, the codec, and whether there is a presentation stream included.
The number of unique VMRs being used (and thus the number of backplanes being reserved).
The type of gateway call — whether the inbound and outbound legs are on the same transcoding node or not, and the types of client involved in the call.

* A connection can be a call or presentation from an endpoint to a Virtual Meeting Room or Virtual Auditorium, a backplane between Transcoding Conferencing Nodes, or a call into or out of the Infinity Gateway. In this context, a connection is analogous to a port. Note that a Skype for Business client may require two connections (one for the video call, and one for presentation content).

When a connection is proxied via a Proxying Edge Node, the proxying node also consumes connection resources in order to forward the media streams on to a Transcoding Conferencing Node. A transcoding node always consumes the same amount of connection resources regardless of whether it has a direct connection to an endpoint, or it is receiving the media streams via a proxying node.

In all cases, you must also have sufficient concurrent call licenses available.

The following sections explain each of these factors. For some comprehensive examples showing how these different factors can combine to influence capacity, see Resource allocation examples.

Server capacity

The capacity and configuration of the server on which the Conferencing Node is running determines the number of calls that can be handled. This is influenced by a number of factors, including processor generation, number of cores, processor speed, hypervisor and BIOS settings.

For more information, see Server design recommendations and Example Conferencing Node server configurations.

When deploying on a cloud service we have recommended instance types and call capacity guidelines for Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform.

Call types and resource requirements

The type of call (Full HD, HD, SD, or audio-only) affects the amount of resource required by a Transcoding Conferencing Node to handle the call.

In general, when compared to a single high definition HD 720p call:

a Full HD 1080p call uses twice the resource
an SD standard definition call uses half the resource
an audio-only call uses one sixteenth of the resource.

However, note that:

A WebRTC call using the VP8 codec uses the same amount of resource as H.264, and the VP9 codec uses around 25% more resource. Therefore:
- VP9 at 720p uses the equivalent of 1.25 HD resources
- VP9 at 1080p uses the equivalent of 2.5 HD resources
- the maximum number of calls a given Conferencing Node can support will be fewer for VP9 calls, for example a node that supports 39 H.264/VP8 SD calls will support 31 VP9 SD calls.
Note that within the same conference some participants may use VP9 (if they are connected to a Conferencing Node using the AVX2 or later instruction set) while other participants may use VP8 (if they are connected to a Conferencing Node on older hardware).
Conferences or gateway calls that use the Adaptive Composition layout, and Teams conferences that use the Teams-like layout, consume additional Conferencing Node resources. The actual amount of additional resource depends on many factors, but as a guide, it uses an additional 1 HD of resource per conference, for up to 3 other video participants, plus approximately another 0.5 HD for each additional (4th, 5th etc.) video participant that is on stage. This is regardless of the call quality / resolution of the conference itself and each individual participant's connection (codec, bandwidth and so on).
H.323 audio-only calls are treated the same as video calls for resource usage purposes.
Connections to a Media Playback Service use 1.2 times as much resource as a connection to a VMR.
When transferring a participant, the transferee can temporarily take two sets of resources while in the process of being transferred. This should not normally last more than a few seconds.

If you want to limit video calls to specific resolutions (and limit the transcoding node resources that are reserved for calls), you should use the Maximum call quality setting (see Setting and limiting call quality for more information).

On startup, each Conferencing Node runs an internal capacity check. This capacity is translated into an estimated maximum number of Full HD, HD, SD or audio-only calls, and can be viewed on the status page (Status > Conferencing Nodes) for each Conferencing Node. The status also shows the current media load on each Conferencing Node as a percentage of its total capacity.

Proxying Edge Node resource requirements

When a connection is proxied via a Proxying Edge Node, the proxying node also consumes connection resources in order to forward the media streams on to a Transcoding Conferencing Node.

A proxying node uses approximately the equivalent of 3 audio-only resources to proxy a video call (of any resolution), and 1 audio-only resource to proxy an audio call.

We recommend allocating 4 vCPU and 4 GB RAM (which must both be dedicated resource) to each Proxying Edge Node, with a maximum of 8 vCPU and 8 GB RAM for large or busy deployments.

Backplane reservation

Each conference instance on each Transcoding Conferencing Node reserves a backplane connection at a resource level corresponding to the conference's Maximum call quality setting, to allow the conference to become geographically distributed if required. The exceptions to this are:

Deployments with a single Conferencing Node. In such cases, no backplanes will ever be required, so capacity is not reserved.
Conferences that are audio-only (in other words, where the conference has its Conference capabilities set to Audio-only). In such cases, capacity equivalent to one audio connection is reserved for the backplane.

For some reservation examples, see Resource allocation examples.

Gateway calls

Gateway calls (person-to-person calls, or calls to an externally-hosted conference, such as a Microsoft Teams or Skype for Business meeting, or Google Meet) require sufficient capacity for both the inbound leg and the outbound leg. In general, this means that each gateway call consumes resources equivalent to two connections.

Non-distributed gateway calls (where the inbound and outbound legs are on the same transcoding node) involving only SIP or H.323 clients do not use any additional ports/connections. However, in other scenarios, additional ports may be used as described below (assuming that Maximum call quality is HD unless otherwise specified):

Distributed gateway calls (where the outbound leg is on a different transcoding node to the inbound leg) consume backplane ports — thus 1 HD video + 1 backplane for participant A plus 1 HD video + 1 backplane for participant B. This typically occurs when calling registered endpoints (where the outbound call to the registered endpoint will originate from the node the endpoint is registered to), or when using Call Routing Rules with an Outgoing location set to something other than Automatic.
For a gateway call to a Microsoft Teams meeting, the connection to Teams uses 1.5 HD of resource if Maximum call quality is SD or HD, otherwise it uses 1.5 Full HD resources. The resources required for the VTC leg of the connection depend upon the Maximum call quality setting. If any participant presents content, additional resources (typically 0.5 HD) would be required, either on the Teams backplane (when an endpoint presents) or on the node handling the endpoint's media connection (when a Teams client presents). The exact amount of resource used depends on the codec, resolution and frame rate of the presentation stream.
For a gateway call to Google Meet, the connection to Google Meet always uses 1 HD resource (it uses VP8) for main video. The resources required for the VTC leg of the connection depend upon the type of endpoint and the Maximum call quality setting. If the VTC endpoint starts to present content then an extra 1 HD resource is used for the connection from Pexip Infinity to Google Meet. However, no additional resources are required on the Google Meet leg if presentation content is sent from Google Meet, but 0.5 HD of additional resource would typically be required for each endpoint receiving presentation.

Call protocols and presentation content

The various call protocols have different behavior and limitations for sending and receiving video and presentation content, in addition to what Pexip Infinity will request based on its configured maximum call quality and bandwidth settings. As the endpoints ultimately decide what to send to Pexip Infinity, the following information should be seen as a guide only.

WebRTC

A Connect app / WebRTC client sends or receives presentation content (files or screen sharing) via the existing call connection used for audio and video media. WebRTC may send resolutions up to 1080p to Pexip Infinity, depending on the camera capabilities and available bandwidth. Incoming presentation content is viewed by Connect app clients in full motion HD video by default.

H.323 and SIP

H.323 and SIP video endpoints may send resolutions up to 1080p to Pexip Infinity at whatever bandwidth and frame rate they decide. Presentation content shares the bandwidth of the main video channel, meaning that when a participant starts presenting, this might decrease resolution of the main video. Most H.323 and SIP endpoints prioritize motion (higher frame rate) for main video and sharpness (higher resolution) for content.

Licenses

The total number of concurrent calls that can be made in your deployment (regardless of whether those calls are HD, SD or audio-only) is limited by your license. For more information, see Pexip Infinity license installation and usage.

Scaling up capacity, media overflow and dynamic bursting

You can easily scale a deployment up by creating several Conferencing Nodes in the same location (i.e. the same datacenter). Capacity can even be added “on the fly” – Conferencing Nodes can be added in a couple of minutes if more capacity is needed. Alternatively, each location can be configured to overflow to another location if it reaches its capacity, including bursting to temporary resources on a cloud service.

For more information on media overflow and dynamic bursting, see: