Viewing alarms

When there are active alarms on your Pexip Infinity deployment, a flashing blue triangle appears at the top right of each page of the Administrator interface. To view details of the current alarms, click on this icon or go to the Alarms page (Status > Alarms).

  • Alarms remain in place for as long as the issue exists. After the issue has been resolved (for example, if a conference ends, therefore freeing up licenses) the associated alarm will automatically disappear from the Alarms page.
  • Multiple instances of the same type of alarm can be raised. For example if two Conferencing Nodes are not correctly synchronized to an NTP server, you will see an alarm for each node.
  • You can select individual alarms and view the associated documentation (this guide) for suggested causes and resolutions.

The History & Logs > Alarm history page shows the details of all historic alarms including the severity level, and the time the alarm was raised and lowered.

An alarm is raised in each of the following situations:

Alarm ID Logged as Alarm = Level Cause Suggested resolutions
The Management Node does not have a TLS certificate 20

tls_certificate_missing_management

Critical The Management Node has no associated TLS certificate. Upload a TLS certificate and associate it with the Management Node.
A Conferencing Node does not have a TLS certificate 9

tls_certificate_missing

Critical A Conferencing Node has no associated TLS certificate.

Upload a TLS certificate and associate it with the Conferencing Node.

Alternatively, and if appropriate for your deployment, associate an existing certificate with your Conferencing Node. When doing this, the existing certificate should already contain a SAN (Subject Alternative Name) that matches your Conferencing Node's FQDN.

See Managing a node's TLS server certificate for more information.

CPU instruction set not supported 10

cpu_not_supported

Critical

A Conferencing Node has gone into maintenance mode because it was deployed on a server with an unsupported processor instruction set (e.g. SSE4.1).

This could also be caused by setting the EVC mode on a VMware cluster to too low a level, such as Westmere (see Enhanced vMotion Compatibility (EVC) for more information).

Deploy the Conferencing Node on a server with AVX or later.
Eventsink Reached Maximum Backoff 36

eventsink_maximum_backoff

Critical

If an event cannot be delivered to an event sink, the node will try again after 1 second. If it fails again it tries again after 2 seconds, then 4, 8, 16 seconds and so on — it keeps doubling the timeout. In this case, the events may not necessarily be sent in sequence number (seq field) order.

If the timeout exceeds 30 minutes it will instead raise an "Eventsink Reached Maximum Backoff" alarm and stop the event sink publisher for that particular event sink. (The retry/timeout parameters are not configurable.)

  1. Check support.event logs for the reason for the event sink failures.
  2. Take the appropriate action to resolve the failures.
  3. Restart the event sink process:

    1. Remove the event sink from its associated system locations.
    2. Wait for the removal changes to synchronize to the Conferencing Nodes.
    3. Reassociate the event sink with its original system locations.
NTP not synchronized 11

ntp_not_synchronised

Error A node has failed to synchronize with the configured NTP servers. Ensure that NTP is enabled on the Management Node, and that NTP servers are assigned to, and accessible from, each location. See Syncing with NTP servers for more information.
Configuration not synchronized 18

configuration_sync_failure

Error This alarm is raised if the Conferencing Node status “Last contacted” time has not been updated within the last 2 expected replication intervals (typically no contact within the last 3 minutes).

In typical deployments, configuration replication is performed approximately once per minute. However, in very large deployments (more than 60 Conferencing Nodes), configuration replication intervals are extended, and it may take longer for configuration changes to be applied to all Conferencing Nodes (the administrator log shows when each node has been updated).

If configuration synchronization fails this may indicate network connectivity or routing issues between the Management Node and the Conferencing Node, which could be due to a malfunction or misconfiguration of devices such as routers or firewalls etc.

Ensure that all of the appropriate Pexip nodes are fully routable to each other in both directions. See General network requirements.

MS Exchange Connection Failure 22

scheduling_connection_failure

Error The Management Node cannot connect to the Exchange server. Check that the details entered in the EWS URL (System > VMR scheduling for Exchange integrations) are correct and the Exchange server is online.
Automatic backup upload failed 25

autobackup_upload_failed

Error The Management Node cannot connect to the FTP server to upload a backup file. Check that the Upload URL (supported schemes are FTPS and FTP) and the Username and Password credentials of the FTP server are correct (Utilities > Automatic backups) and that the Management Node can reach the FTP server.
LDAP sync failed 28

ldap_sync_failure

Error An LDAP template synchronization process has failed. This alarm duplicates the information shown for the error listed at Status > LDAP sync.

See Troubleshooting LDAP server connections for help with resolving LDAP connection issues.

The alarm is lowered when you resync the template (although it will get re-raised if the issue has not been resolved).

License limit reached 2

licenses_exhausted

Warning A Conferencing Node is unable to accept a call because there are not enough concurrent licenses available on the system at this time. For more information, see Pexip Infinity license installation and usage.
  • Wait until one or more of the existing conferences have finished and the licenses have been returned to the pool.
  • Contact your Pexip authorized support representative to purchase more licenses.

Note that when a license subsequently becomes available (e.g. because a participant leaves a conference, or because the administrator adds more licenses), the alarm is not cleared immediately; the alarm is cleared after the next participant successfully joins a conference.

Licenses expiring 3

licenses_expiring

Warning One or more of your licenses is due to expire within the next 60 days. Contact your Pexip authorized support representative to renew your licenses.
Call capacity limit reached 1

capacity_exhausted

Warning

A call has not been accepted because all Conferencing Nodes that are able to take the media for this call are at capacity. It could be either Proxying Edge Nodes or Transcoding Conferencing Nodes that are out of capacity.

Note: to understand how often this issue is occurring in your deployment, search the Administrator log for "out of proxying resource" or "out of transcoding resource".

This alarm clears either when an existing call is disconnected or the next time a new call is successfully placed.

  • Deploy more Conferencing Nodes in either the proxying or transcoding location as appropriate.
  • Move existing Conferencing Nodes onto more powerful servers.
  • Allocate more virtual CPUs for Conferencing Nodes on existing servers (if there are sufficient CPU cores). Note that the Conferencing Node will have to be rebooted for this to take effect.
  • Configure each location with a primary and secondary overflow location.
  • If a call is received in a location that contains Proxying Edge Nodes, that location must be configured with a Transcoding location that contains your Transcoding Conferencing Nodes.

Note that some types of call consume more resources than other calls. Thus, for example, if you are at full capacity and an audio-only call disconnects, there may still not be sufficient free resource to connect a new HD video call. For further information on capacity and how calls consume resources, see Hardware resource allocation rules.

Management Node limit reached 5

management_node_exhausted

Warning The Management Node does not have sufficient resources for the current deployment size (number of Conferencing Nodes).

Increase the amount of RAM and the number of virtual CPUs assigned to the Management Node.

See the recommended hardware requirements in Server design recommendations.

Trusted CA certificates expiring 6

trustedca_expiring

Warning One or more of your trusted CA certificates is due to expire within the next 30 days, or has already expired. Obtain and upload an updated certificate for the certificate authority.
TLS certificates expiring 7

tls_certificate_expiring

Warning One or more of your TLS certificates is due to expire within the next 30 days, or has already expired. Obtain and upload an updated TLS certificate. You may also need to delete the old certificate.
Incomplete TLS certificate chains 8

tls_certificate_chains

Warning A TLS certificate has an incomplete chain of trust to the root CA certificate. Obtain and upload the appropriate chain of intermediate CA certificates to the Management Node (the certificate provider normally provides the relevant bundle of intermediate CA certificates).
Syslog server inaccessible 4

syslog_inaccessible

Warning A syslog server has been configured to use TCP or TLS but either is not responding to contact requests, or the connection has dropped.
  • Check your network connectivity.
  • Check that the syslog server is running.
Connectivity lost between nodes 19

connectivity_lost

Warning

Communication to a Pexip Infinity node has been lost.

When a connection is lost, Pexip Infinity tries to contact the node every 5 seconds until the connection is re-established. In large deployments with many connectivity failures, it attempts to re-establish connections to a maximum of 10 nodes at a time.

Intermittent short-lived "Connectivity lost between nodes" alarms may be an indication of an unreliable network.

These alarms may be raised for a short period of time — as expected behavior — if a node is placed into maintenance mode and Proxying Edge Nodes need to establish new connectivity paths.

They may occur during initial deployment or an upgrade, and is also expected behavior. They automatically clear as each node is upgraded to the new software version, has restarted and is ready to handle calls.

When restricted routing for Proxying Edge Nodes is enabled, you may see these alarms (and is expected behavior):

  • When deploying proxying nodes if, for example, the location containing your proxying nodes is configured with a Transcoding location that doesn't yet contain any transcoding nodes. In this case the alarm will be lowered when transcoding nodes are deployed in that location.
  • If all of the nodes in the edge location's Transcoding location (and any configured overflow locations) are in maintenance mode. This applies even if all of the proxying nodes are also in maintenance mode.
Check network connectivity and routing as for "Configuration not synchronized" above, or in the case of a software upgrade, wait for the upgrade process to complete.
Hardware instability detected 21

irregular_pulse

Warning Pexip Infinity has detected that the underlying VM infrastructure has paused the Pexip virtual machine. This is usually indicative of over-committed hardware, which we do not support. Pexip Infinity is a real time system and requires dedicated access to the underlying CPU and RAM resources of the hardware host.

Ensure that the Management Node and all Conferencing Nodes have dedicated access to their own RAM and CPU cores.

See the recommended hardware requirements in Server design recommendations.

CPU instruction set is deprecated 23

cpu_deprecated

Warning

The node is deployed on a server that is not using the AVX or later CPU instruction set (e.g. if it uses SSE4.2).

This alarm is raised when a Conferencing Node restarts and is automatically cleared after 48 hours.

Deploy the Conferencing Node on a server with AVX or later.
Hardware IO (input/output) instability detected 24

io_high_latency

Warning Pexip Infinity has recently detected consistent read latency greater than 100ms or write latency greater than 400ms.
  • Avoid having multiple VMs using the same physical hard drive.
  • Check the hard drive for failures.
VOIP scanner resistance has detected excessive incorrect aliases being dialed in a short period 26

possible_voip_scanner_ips_blocked

Warning Pexip Infinity's VOIP scanner resistance has detected excessive incorrect aliases being dialed in a short period, and has temporarily blocked access attempts from the suspected VOIP scanner IP addresses. See the administrator log for details of the calls.
PIN brute force resistance has detected excessive incorrect PIN entry attempts in a short period 27

service_access_quarantined

Warning Pexip Infinity's PIN brute force resistance has detected excessive incorrect PIN entry attempts in a short period, and has temporarily blocked access attempts to one or more conferencing services. See the administrator log for details of the calls.

Cloud bursting alarms

The following alarms may be raised in relation to issues with dynamic cloud bursting. See Dynamic bursting to a cloud service for more information about resolving these alarms.

Alarm ID Logged as Alarm = Level Cause Suggested resolutions
Not authorized to perform this operation

15

&

16

bursting_unauthorized_instance_failure

bursting_unauthorized_region_failure

Error Pexip Infinity is not authorized to view instance data or to start and stop instances in the cloud service.

For AWS, ensure that an appropriate policy document is configured in AWS and is attached to the user that is being used by the Pexip platform.

For Azure, check your Active Directory (AD) application and its associated role/permissions.

For GCP, check your service account and its associated role/permissions.

Authentication failure while trying to communicate with the cloud provider 17

bursting_authentication_failure

Error Pexip Infinity cannot sign in to the cloud service.

Check your cloud bursting settings in Platform > Global settings > Cloud bursting:

  • For AWS, check that the Access Key ID and Secret Access Key match the User Security Credentials for the user you added within Identity and Access Management in the AWS dashboard.
  • For Azure, check that your subscription, client and tenant IDs and secret key are correct for your Active Directory application.
  • For GCP, check that your configured GCP project ID, service account ID and private key are correct for your GCP service account.
Cloud bursting process encountered an unexpected error 12

bursting_error

Error

Pexip Infinity encountered an unexpected error while managing the cloud overflow nodes.

Check the status of your cloud bursting nodes within Pexip Infinity (Status > Cloud bursting) and of your instances within your cloud provider.

Also check administrator and support log messages that are tagged with a log module name of administrator.alarm to see additional error message information.

Cloud-bursting node found, but no corresponding Conferencing Node has been configured 13

bursting_missing_pexip_node

Warning

This occurs when Pexip Infinity detects a bursting instance with a tag matching your system's hostname but there is no corresponding Conferencing Node configured within Pexip Infinity.

This message can occur temporarily in a normal scenario when deploying a new Conferencing Node and you have set up the VM instance in your cloud provider but you have not yet deployed the Conferencing Node in Pexip Infinity. In this case, the issue will disappear as soon as the Conferencing Node is deployed.

A location contains cloud bursting nodes, but no other locations are using it for overflow 14

bursting_no_location_overflow

Warning A location contains some cloud overflow nodes, but no other locations are using it as an overflow location. Set the location containing the cloud overflow nodes as the Primary overflow location of the locations containing your "always on" Conferencing Nodes.

One-Touch Join alarms

The following alarms may be raised in relation to issues with One-Touch Join:

Alarm ID Logged as Alarm = Level Instance Cause Suggested resolutions
OTJ Google Gatherer Error 29

mjx_google_gatherer_failure

Error Google Connection Test Failure The connection test to G Suite has failed. This could be because your service account credentials are incorrect. Check your service account details, specifically the service account email and private key.
Google Room Connection Failure OTJ has been unable to connect to one of the rooms you have specified. This could be because the room is misconfigured within G Suite. Check the steps to set up a new room. Is the room resource email correct? Has it been shared with the service account?
OTJ Exchange Gatherer Error 30

mjx_exchange_gatherer_failure

Error Exchange Connection Test Error The connection test to Exchange has failed. This could be because your service account credentials are incorrect. Check your Exchange service account username and password.
Exchange OAuth Error OTJ is unable to use OAuth to sign into Exchange. Check your OAuth credentials.
Exchange Room Connection Error OTJ is unable to connect to the room specified in the alarm description. This could be because the room is misconfigured. Check the room has been correctly set up.
OTJ Endpoint Configurator Error 31

mjx_endpoint_configurator_failure

Error Endpoint Misconfigured Error The OTJ endpoint does not have a username and password configured, and there is no default username and password. Provide a username and password for the OTJ endpoint or for the associated OTJ profile.
Endpoint Request Error OTJ is unable to connect to the endpoint. Check that the endpoint is configured correctly.
Endpoint Non-200 Status Code The endpoint returns a non-200 status code. Check the status code that is given in the logs. This is likely a configuration error with the endpoint.
OTJ Meeting Processor Failure 32

mjx_meeting_processor_failure

Error Meeting Processor Rendering Error, Template Error or Runtime Error The meeting processing rule could not extract a meeting alias. Check and edit the rule using the test tool.
OTJ Poly Endpoint Error 35 mjx_poly_failure Error Poly Endpoint Not Polled A Poly endpoint that has Raise alarms enabled has not made contact with the OTJ calendaring service within the last 10 minutes.

Ensure that the configuration for endpoint on Pexip Infinity and on the endpoint itself is correct, in particular that the username and password configured on both match.

Ensure that the endpoint is showing as registered to the calendaring service.

Restart the endpoint.