About Pexip Private AI and AIMS

The Pexip Private AI platform allows you to access Pexip's AI-powered features (such as live captions) in a secure environment. It uses Pexip's AI Media Server (AIMS), a self-hosted standalone virtual machine, which you deploy on your own hardware or private cloud environment, giving you complete control of your data.

The Pexip Private AI platform is deployed alongside, but entirely separately to, your Pexip Infinity platform. You configure Pexip Infinity to integrate with Pexip Private AI where required for supported features.

This initial release of Pexip Private AI runs on AIMS v1 and supports Pexip Infinity's live captions feature.

Supported hardware, software and environments

Deployment environments

Pexip provides the AI Media Server (AIMS) software as an OVA template suitable for deployment on VMware ESXi, and as an Amazon Machine Image (AMI) for deployment on Amazon Web Services (AWS).

For step-by-step guides for installation in your chosen environment, see:

Pexip Infinity

AIMS v1 requires Pexip Infinity v36 or later.

If you are running Pexip Infinity versions 32 to 35, please contact your Pexip authorized support representative.

NVIDIA GPU

AIMS requires complete control of all GPUs.

The following NVIDIA GPU models are supported:

  • NVIDIA L4
  • NVIDIA A100
  • NVIDIA H100

If you are unsure about compatibility with a given GPU, please contact your Pexip authorized support representative.

Host hardware

Host hardware must meet the following minimum specifications:

  • CPU: 8 cores

  • RAM: 32GB

  • Storage: 50GB SSD

These requirements may change in future versions.

Capacity planning

When live captions are enabled for a VMR, AIMS receives the audio stream from Pexip Infinity, which it transcribes and returns as a text stream. Pexip Infinity then provides the text to all users who have enabled live captions. AIMS supports simultaneous transcription of up to the following number of audio streams:

  • L4: 80 streams per GPU

  • A100: 160 streams per GPU

  • H100: 300 streams per GPU

In each case, the maximum number of supported GPUs per server is 8.

AI model cards

The table below lists the models used within AIMS, and provides links to NVIDIA's AI model cards (which are documents that provide detailed information about each model, including the training dataset, intended use, and other compliance information).

Acoustic models
English https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/parakeet-ctc-riva-0-6b-en-us/explainability
Spanish https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechtotext_es_us_conformer/explainability
German https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechtotext_de_de_conformer/explainability
French https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/models/speechtotext_fr_fr_conformer/explainability

Each language also uses its own Language Model, Punctuation and Capitalization Model, and Inverse Text Normalization Model.

Security considerations

AIMS runs on a standalone server which you can deploy in your own secure environment. All communication between AIMS and Pexip Infinity is over a secure (encrypted and authenticated) link.

When the live captions feature is enabled:

  • The AIMS deployment receives an audio stream from Pexip Infinity, and returns the transcription text stream to Pexip Infinity, over this secure link.

  • The audio and the corresponding captions generated from the audio are only stored temporarily in memory on the AIMS server, and the memory is immediately freed up when processing is complete.

  • The transcription text received by Pexip Infinity is provided as ephemeral text overlaid on the main video to those meeting participants who have enabled live captions. Some clients (such as Pexip's native Webapp2, or custom web clients) may also record and display a transcript of all captions received while the client is connected to the call.

More information

Release notes

Version Release date Description
v1.0 12 November 2024 Initial release