Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.
Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.
Pull and run the NVIDIA NIM with the command below.
GPU specific optimized models are available for select GPUs. For using optimized models, refer the Supported Models. These profiles are automatically selected when you run Audio2Face-3D NIM.
export NGC_API_KEY=<PASTE_API_KEY_HERE> export CONTAINER_NAME=audio2face docker run -it --rm --name=$CONTAINER_NAME \ --gpus all \ --network=host \ -e NGC_API_KEY=$NGC_API_KEY \ nvcr.io/nim/nvidia/audio2face-3d:1.3
This command will start the NIM container and expose port 52000 for the user to interact with the NIM.
If a pre-generated profile is found for your GPU, you will see messages similar to this in the logs:
"timestamp": "XXXX-XX-XX XX:XX:XX", "level": "INFO", "message": "Matched profile_id in manifest from TagsBasedProfileSelector"
To list available model profiles:
docker run -it --rm --network=host --gpus all \ --entrypoint nim_list_model_profiles nvcr.io/nim/nvidia/audio2face-3d:1.3
It may take up to 30 minutes depending on your network speed, for the container to be ready and start accepting requests from the time the docker container is started. When starting the service, you might encounter warnings labeled as GStreamer-WARNING. These warnings are safe to ignore as they are not used by Audio2Face-3D.
For more details on getting started with this NIM, visit the NVIDIA Audio2Face-3D Docs