Create facial animations using a portrait photo and synchronize mouth movement with audio.
By running the below commands, you accept the NVIDIA AI Enterprise Terms of Use and the NVIDIA Community Models License.
Pull and run nvidia/audio2face-2d
using Docker (this will download the full model and run it in your local environment)
$ docker login nvcr.io Username: $oauthtoken Password: <PASTE_API_KEY_HERE>
NVIDIA Maxine Audio2Face-2D NIM uses gRPC APIs for inferencing requests.
A NGC API KEY is required to download the appropriate models and resources when starting the NIM. Pass the value of the API key to the docker run
command in the next section as the NGC_API_KEY
environment variable as indicated.
If you are not familiar with how to create the NGC_API_KEY
environment variable, the simplest way is to export it in your terminal:
export NGC_API_KEY=<PASTE_API_KEY_HERE>
Run one of the following commands to make the key available at startup:
# If using bash echo "export NGC_API_KEY=<value>" >> ~/.bashrc # If using zsh echo "export NGC_API_KEY=<value>" >> ~/.zshrc
Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE
, or using a password manager.
The following command launches the Maxine Audio2Face-2D NIM container with the gRPC service. Find reference to runtime parameters for the container here.
Then run the NIM launch command
docker run -it --rm --name=maxine-audio2face-2d-nim \ --runtime=nvidia \ --gpus all \ --shm-size=8GB \ -e NGC_API_KEY=$NGC_API_KEY \ -e NIM_HTTP_API_PORT=8000 \ -p 8000:8000 \ -p 8001:8001 \ nvcr.io/nim/nvidia/maxine-audio2face-2d:latest
The flag --gpus all
is used to assign all available GPUs to the NIM container.
To assign specific GPU to the NIM container (in case of multiple GPUs available in your machine) use --gpus '"device=0,1,2..."'
If the NIM launch is successful, you will get a response similar to the following.
I1121 09:59:42.023798 49 grpc_server.cc:2558] "Started GRPCInferenceService at 0.0.0.0:9001" I1121 09:59:42.024109 49 http_server.cc:4704] "Started HTTPService at 0.0.0.0:9000" I1121 09:59:42.065331 49 http_server.cc:362] "Started Metrics Service at 0.0.0.0:9002" Maxine GRPC Service: Listening to 0.0.0.0:8001
By default Maxine Audio2Face-2D NIM's gRPC service is hosted on port 8001
. You will have to use this port for inferencing requests.
We have provided a sample client script file in our GitHub repo. The script could be used to invoke the Docker container using the following instructions.
Download the Maxine Audio2Face-2D client code by cloning the NVIDIA Maxine NIM Clients Repository:
git clone https://github.com/NVIDIA-Maxine/nim-clients.git cd nim-clients/audio2face-2d/
Install the dependencies for the NVIDIA Maxine Audio2Face-2D client:
sudo apt-get install python3-pip pip install -r python/requirements.txt
# Add the repo and install the latest stable Node.js curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - sudo apt install nodejs # Install all the required packages using package.json file in nodejs directory npm install --prefix nodejs/
You can use the sample client script in the Maxine Audio2Face-2D GitHub repo to send a gRPC request to the hosted NIM server:
Go to the python scripts directory
cd python/scripts
Run the command to send gRPC request
python audio2face-2d.py --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name>
If the command line arguments are not passed, the script will use the following default values:
target
is 127.0.0.1:8001
portrait-input
is ../../assets/sample_portrait_image.png
audio-input
is ../../assets/sample_audio.wav
output
is output.mp4
in the current directory.NodeJS client, similar to Python client, can also be used to exercise A2F2D feature by sending gRPC requests to hosted NIM server.
The audio2face-2d.js
NodeJS script takes a portrait image and a wav
or pcm
audio file (default is wav) and generates the mp4 video output.
Go the NodeJS script folder
cd nodejs/scripts
Run the following command to send a gRPC request (all command line parameters are optional):
node audio2face-2d.js --target <server_ip:port> --audio-input <input audio file path> --portrait-input <input portrait image file path> --output <output file path and the file name> --format <wav/pcm>
If the command line arguments are not passed, the script will use the following default values:
target
is 127.0.0.1:8001
portrait-input
is ../../assets/sample_portrait_image.png
audio-input
is ../../assets/sample_audio.wav
output
is output.mp4
in the current directory.format
is wav
For more details on getting started with this NIM including configuring using parameters, visit the NVIDIA Maxine Audio2Face-2D NIM Docs.