nvidia

studiovoice

Run Anywhere

Enhance speech by correcting common audio degradations to create studio quality speech output.

Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Prerequisites

WSL2 is required for hosting any NIM. Refer to the official NVIDIA NIM on WSL2 documentation for setup instructions.

To run the NIM refer to the docs on Studio Voice NIM on WSL2

Open the Windows Subsystem for Linux 2 WSL2 Distro

Install the WSL2 Distro.

Once installed, open the NVIDIA-Workbench WSL2 distro using the following command in the Windows terminal.

wsl -d NVIDIA-Workbench

Run the Container

$ podman login nvcr.io Username: $oauthtoken Password: <PASTE_API_KEY_HERE>

A NGC API KEY is required to download the appropriate models and resources when starting the NIM.

If you are not familiar with how to create the NGC_API_KEY environment variable, the simplest way is to export it in your terminal:

export NGC_API_KEY=<PASTE_API_KEY_HERE>

Run one of the following commands to make the key available at startup:

# If using bash echo "export NGC_API_KEY=<value>" >> ~/.bashrc # If using zsh echo "export NGC_API_KEY=<value>" >> ~/.zshrc

Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE, or using a password manager.

Pull and run the NIM with the command below.

podman run -it --rm --name=studio-voice \ --device nvidia.com/gpu=all \ --shm-size=8GB \ -e NGC_API_KEY=$NGC_API_KEY \ -e NIM_MODEL_PROFILE=<nim_model_profile> \ -e FILE_SIZE_LIMIT=36700160 \ -e STREAMING=false \ -p 8000:8000 \ -p 8001:8001 \ nvcr.io/nim/nvidia/maxine-studio-voice:latest

The above command is to run the NIM in transactional mode. To run the NIM in streaming mode use -e STREAMING=true. Ensure you use the appropriate NIM_MODEL_PROFILE for your GPU. For more information about NIM_MODEL_PROFILE, refer to the the NIM Model Profile Table.

Please note, the flag --gpus all is used to assign all available GPUs to the docker container. This fails on multiple GPU unless all GPUs are same. To assign specific GPU to the docker container (in case of different multiple GPUs available in your machine) use --gpus '"device=0,1,2..."'

If the command runs successfully, you will get an output ending similar to the following:

I1126 09:22:21.048202 31 grpc_server.cc:2558] "Started GRPCInferenceService at 127.0.0.1:9001" I1126 09:22:21.048377 31 http_server.cc:4704] "Started HTTPService at 127.0.0.1:9000" I1126 09:22:21.089295 31 http_server.cc:362] "Started Metrics Service at 127.0.0.1:9002" Maxine GRPC Service: Listening to 0.0.0.0:8001

By default Maxine Studio Voice gRPC service is hosted on port 8001. You will have to use this port for inferencing requests.

Test the NIM

We have provided a sample client script file in our GitHub repo. The script could be used to invoke the Docker container using the following instructions.

Download the Maxine Studio Voice Python client code by cloning the NVIDIA Maxine NIM Clients Repository:

git clone https://github.com/NVIDIA-Maxine/nim-clients.git cd nim-clients/studio-voice

Install the dependencies for the NVIDIA Maxine Studio Voice Python client:

sudo apt-get install python3-pip pip install -r requirements.txt

Go to scripts directory

cd scripts

Assuming the client is on the same machine as the NIM, run the following command to send a gRPC request to the NIM.

By default transactional mode:

python studio_voice.py --target localhost:8001 --input <input_file_path> --output <output_file_path>

For streaming mode:

python studio_voice.py --target localhost:8001 --input <input_file_path> --output <output_file_path> --streaming --model-type 48k-hq

When using --streaming mode, ensure the selected --model-type (48k-hq, 48k-ll, or 16k-hq) aligns with the NIM_MODEL_PROFILE Model Type configuration to maintain compatibility .

For more advance usage of the client, refer to this documentation

To view details of command line arguments run this command

python studio_voice.py -h

You will get a response similar to the following.

usage: studio_voice.py [-h] [--ssl-mode {MTLS,TLS}] [--ssl-key SSL_KEY] [--ssl-cert SSL_CERT] [--ssl-root-cert SSL_ROOT_CERT] [--target TARGET] [--input INPUT] [--output OUTPUT] [--api-key API_KEY] [--function-id FUNCTION_ID] [--streaming] [--model-type {48k-hq,48k-ll,16k-hq}] Process wav audio files using gRPC and apply studio-voice. options: -h, --help show this help message and exit --preview-mode Flag to send request to preview NVCF server on https://build.nvidia.com/nvidia/studiovoice/api. --ssl-mode {MTLS,TLS} Flag to set SSL mode, default is None --ssl-key SSL_KEY The path to ssl private key. --ssl-cert SSL_CERT The path to ssl certificate chain. --ssl-root-cert SSL_ROOT_CERT The path to ssl root certificate. --target TARGET IP:port of gRPC service, when hosted locally. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF. --input INPUT The path to the input audio file. --output OUTPUT The path for the output audio file. --api-key API_KEY NGC API key required for authentication, utilized when using TRY API ignored otherwise --function-id FUNCTION_ID NVCF function ID for the service, utilized when using TRY API ignored otherwise --streaming Flag to enable grpc streaming mode. --model-type {48k-hq,48k-ll,16k-hq} Studio Voice model type, default is 48k-hq.

For more details on getting started with this NIM including configuring using parameters, visit the NVIDIA Maxine Studio Voice NIM Docs.