nvidia

audio2face-2d

Run Anywhere

Create facial animations using a portrait photo and synchronize mouth movement with audio.

Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Generate API Key

Pull and Run the NIM

NVIDIA Maxine Audio2Face-2D NIM uses gRPC APIs for inferencing requests.

A NGC API KEY is required to download the appropriate models and resources when starting the NIM. Pass the value of the API key to the docker run command in the next section as the NGC_API_KEY environment variable as indicated.

If you are not familiar with how to create the NGC_API_KEY environment variable, the simplest way is to export it in your terminal:

export NGC_API_KEY=<PASTE_API_KEY_HERE>

Run one of the following commands to make the key available at startup:

# If using bash echo "export NGC_API_KEY=<value>" >> ~/.bashrc # If using zsh echo "export NGC_API_KEY=<value>" >> ~/.zshrc

Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE, or using a password manager.

The following command launches the Maxine Audio2Face-2D NIM container with the gRPC service. Find reference to runtime parameters for the container here.

Then run the NIM launch command

docker run -it --rm --name=maxine-audio2face-2d-nim \ --runtime=nvidia \ --gpus all \ --shm-size=8GB \ -e NGC_API_KEY=$NGC_API_KEY \ -e NIM_HTTP_API_PORT=8000 \ -p 8000:8000 \ -p 8001:8001 \ nvcr.io/nim/nvidia/maxine-audio2face-2d:latest

The flag --gpus all is used to assign all available GPUs to the NIM container. To assign specific GPU to the NIM container (in case of multiple GPUs available in your machine) use --gpus '"device=0,1,2..."'

If the NIM launch is successful, you will get a response similar to the following.

I1121 09:59:42.023798 49 grpc_server.cc:2558] "Started GRPCInferenceService at 0.0.0.0:9001" I1121 09:59:42.024109 49 http_server.cc:4704] "Started HTTPService at 0.0.0.0:9000" I1121 09:59:42.065331 49 http_server.cc:362] "Started Metrics Service at 0.0.0.0:9002" Maxine GRPC Service: Listening to 0.0.0.0:8001

By default Maxine Audio2Face-2D NIM's gRPC service is hosted on port 8001. You will have to use this port for inferencing requests.

Test the NIM

We have provided a sample client script file in our GitHub repo. The script could be used to invoke the Docker container using the following instructions.

Download the Maxine Audio2Face-2D client code by cloning the NVIDIA Maxine NIM Clients Repository:

git clone https://github.com/NVIDIA-Maxine/nim-clients.git cd nim-clients/audio2face-2d/

Install the dependencies for the NVIDIA Maxine Audio2Face-2D client:

  • For python client
sudo apt-get install python3-pip pip install -r python/requirements.txt
  • For NodeJS client
# Add the repo and install the latest stable Node.js curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - sudo apt install nodejs # Install all the required packages using package.json file in nodejs directory npm install --prefix nodejs/

Running Inference via Python Script

You can use the sample client script in the Maxine Audio2Face-2D GitHub repo to send a gRPC request to the hosted NIM server:

  1. Go to the python scripts directory

    cd python/scripts
  2. Run the command to send gRPC request

    python audio2face-2d.py --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name>

If the command line arguments are not passed, the script will use the following default values:

  • target is 127.0.0.1:8001
  • portrait-input is ../../assets/sample_portrait_image.png
  • audio-input is ../../assets/sample_audio.wav
  • output is output.mp4 in the current directory.

Running Inference via NodeJS Script

NodeJS client, similar to Python client, can also be used to exercise A2F2D feature by sending gRPC requests to hosted NIM server. The audio2face-2d.js NodeJS script takes a portrait image and a wav or pcm audio file (default is wav) and generates the mp4 video output.

Go the NodeJS script folder

cd nodejs/scripts

Run the following command to send a gRPC request (all command line parameters are optional):

node audio2face-2d.js --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name> --format <wav/pcm>

If the command line arguments are not passed, the script will use the following default values:

  • target is 127.0.0.1:8001
  • portrait-input is ../../assets/sample_portrait_image.png
  • audio-input is ../../assets/sample_audio.wav
  • output is output.mp4 in the current directory.
  • format is wav

For more details on getting started with this NIM including configuring using parameters, visit the NVIDIA Maxine Audio2Face-2D NIM Docs.