
Detect and track speaker identities across video frames.
NVIDIA Active Speaker Detection NIM uses gRPC APIs for inferencing requests. Following instructions demonstrate the usage of Active Speaker Detection NIM model using Python client.
You will need a system with git and Python 3.10+ installed.
Download code by cloning the gRPC Client Repository:
git clone https://github.com/NVIDIA-Maxine/nim-clients.git
cd nim-clients/active-speaker-detection/
Install the dependencies for the NVIDIA Active Speaker Detection NIM Python client:
sudo apt-get install python3-pip
pip install -r requirements.txt
cd scripts
Open a command terminal and execute below command to send gRPC request. If you have generated the API key, it will be auto-populated in the command.
python active_speaker_detection.py --preview-mode \
--target grpc.nvcf.nvidia.com:443 \
--function-id None \
--api-key $NVIDIA_API_KEY \
--video-input <input videofile path> \
--audio-input <input audio file path> \
--diarization-input <input diarization file path> \
--output <output file path>
python active_speaker_detection.py --preview-mode \
--target grpc.nvcf.nvidia.com:443 \
--function-id None \
--api-key $NVIDIA_API_KEY \
--video-input ../assets/sample_video_streamable.mp4 \
--audio-input ../assets/sample_audio.wav \
--diarization-input ../assets/sample_diarization.json \
--output out.mp4
Refer this documentation for more information.