Active Speaker Detection Model by NVIDIA

Getting Started

NVIDIA Active Speaker Detection NIM uses gRPC APIs for inferencing requests. Following instructions demonstrate the usage of Active Speaker Detection NIM model using Python client.

Prerequisites

You will need a system with git and Python 3.10+ installed.

Download the NVIDIA Active Speaker Detection NIM Python client

Download code by cloning the gRPC Client Repository:

git clone https://github.com/NVIDIA-Maxine/nim-clients.git
cd nim-clients/active-speaker-detection/

Install the dependencies for the NVIDIA Active Speaker Detection NIM Python client:

sudo apt-get install python3-pip
pip install -r requirements.txt

Run Python Client

Navigate to the scripts directory

cd scripts

Send gRPC request

Open a command terminal and execute below command to send gRPC request. If you have generated the API key, it will be auto-populated in the command.

python active_speaker_detection.py --preview-mode \
--target grpc.nvcf.nvidia.com:443 \
--function-id f03279b9-0c11-453b-b8d6-157109885cf0 \
--api-key $NVIDIA_API_KEY \
--video-input <input videofile path> \
--audio-input <input audio file path> \
--diarization-input <input diarization file path> \
--output <output file path>

Example command with sample input:

python active_speaker_detection.py --preview-mode \
--target grpc.nvcf.nvidia.com:443 \
--function-id f03279b9-0c11-453b-b8d6-157109885cf0 \
--api-key $NVIDIA_API_KEY \
--video-input ../assets/sample_video_streamable.mp4 \
--audio-input ../assets/sample_audio.wav \
--diarization-input ../assets/sample_diarization.json \
--output out.mp4

Note the requirements for input file:

The supported file type is mp4 with H.264 codec.
The size limit for input file is 500 MB.

Refer this documentation for more information.