NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

nvidia

studiovoice

Run Anywhere

Enhance speech by correcting common audio degradations to create studio quality speech output.

Digital HumanNvidia MaxineRun-on-RTXSpeech EnhancementSpeech-to-speech
Get API Key
API Reference
Accelerated by DGX Cloud
Deploying your application in production? Get started with a 90-day evaluation of NVIDIA AI Enterprise

Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.

Requirements

  • NVIDIA GeForce RTX 40xx or above (see supported GPUs)

  • Install the latest NVIDIA GPU Driver on Windows (Version 570+)

Step 1
Ensure virtualization is enabled in the system BIOS

In Windows, open Task Manager, select the Performance tab, and find Virtualization. If Disabled, see here to enable.

Step 2
Open the Windows Subsystem for Linux 2 - WSL2 - Distro

Install WSL2. Refer to the official NVIDIA NIM on WSL2 documentation for setup instructions.

Once installed, open the NVIDIA-Workbench WSL2 distro using the following command in the Windows terminal.

wsl -d NVIDIA-Workbench

Step 3
Export API Key

Export your personal credentials as environment variables:

export NGC_API_KEY=<PASTE_API_KEY_HERE>

Step 4
Login to NVIDIA NGC

Login to NVIDIA NGC so that you can pull the NIM container:

echo "$NGC_API_KEY" | podman login nvcr.io --username '$oauthtoken' --password-stdin

Step 5
Pull and Run NVIDIA NIM

Pull and run the NVIDIA NIM with the command below. This will download the optimized model for your infrastructure.

export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
chmod -R a+w "$LOCAL_NIM_CACHE"
podman run -it --rm --name=studio-voice \
  --device nvidia.com/gpu=all \
  --shm-size=8GB \
  -e NGC_API_KEY=$NGC_API_KEY \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e NIM_MODEL_PROFILE=<nim_model_profile> \
  -e FILE_SIZE_LIMIT=36700160 \
  -e STREAMING=false \
  -p 8000:8000 \
  -p 8001:8001 \
  nvcr.io/nim/nvidia/maxine-studio-voice:latest

The above command is to run the NIM in transactional mode. To run the NIM in streaming mode use -e STREAMING=true. Ensure you use the appropriate NIM_MODEL_PROFILE for your GPU. For more information about NIM_MODEL_PROFILE, refer to the the NIM Model Profile Table.

Please note, the flag --gpus all is used to assign all available GPUs to the docker container. This fails on multiple GPU unless all GPUs are same. To assign specific GPU to the docker container (in case of different multiple GPUs available in your machine) use --gpus '"device=0,1,2..."'

If the command runs successfully, you will get an output ending similar to the following:

I1126 09:22:21.048202 31 grpc_server.cc:2558] "Started GRPCInferenceService at 127.0.0.1:9001"
I1126 09:22:21.048377 31 http_server.cc:4704] "Started HTTPService at 127.0.0.1:9000"
I1126 09:22:21.089295 31 http_server.cc:362] "Started Metrics Service at 127.0.0.1:9002"
Maxine GRPC Service: Listening to 0.0.0.0:8001

By default Maxine Studio Voice gRPC service is hosted on port 8001. You will have to use this port for inferencing requests.

Step 6
Test the NIM

Download the Maxine Studio Voice Python client code by cloning the NVIDIA Maxine NIM Clients Repository:

git clone https://github.com/NVIDIA-Maxine/nim-clients.git
cd nim-clients/studio-voice

Install the dependencies for the NVIDIA Maxine Studio Voice Python client:

sudo apt-get install python3-pip
pip install -r requirements.txt

Go to scripts directory

cd scripts

Assuming the client is on the same machine as the NIM, run the following command to send a gRPC request to the NIM.

By default transactional mode:

python studio_voice.py --target localhost:8001 --input <input_file_path> --output <output_file_path>

For streaming mode:

python studio_voice.py --target localhost:8001 --input <input_file_path> --output <output_file_path> --streaming --model-type 48k-hq

When using --streaming mode, ensure the selected --model-type (48k-hq, 48k-ll, or 16k-hq) aligns with the NIM_MODEL_PROFILE Model Type configuration to maintain compatibility .

For more advance usage of the client, refer to this documentation

To view details of command line arguments run this command

python studio_voice.py -h

You will get a response similar to the following.

usage: studio_voice.py [-h] [--ssl-mode {MTLS,TLS}] [--ssl-key SSL_KEY] [--ssl-cert SSL_CERT] [--ssl-root-cert SSL_ROOT_CERT] [--target TARGET]
                       [--input INPUT] [--output OUTPUT] [--api-key API_KEY] [--function-id FUNCTION_ID] [--streaming] [--model-type {48k-hq,48k-ll,16k-hq}]

Process wav audio files using gRPC and apply studio-voice.

options:
  -h, --help            show this help message and exit
  --preview-mode        Flag to send request to preview NVCF server on https://build.nvidia.com/nvidia/studiovoice/api.
  --ssl-mode {MTLS,TLS}
                        Flag to set SSL mode, default is None
  --ssl-key SSL_KEY     The path to ssl private key.
  --ssl-cert SSL_CERT   The path to ssl certificate chain.
  --ssl-root-cert SSL_ROOT_CERT
                        The path to ssl root certificate.
  --target TARGET       IP:port of gRPC service, when hosted locally. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF.
  --input INPUT         The path to the input audio file.
  --output OUTPUT       The path for the output audio file.
  --api-key API_KEY     NGC API key required for authentication, utilized when using TRY API ignored otherwise
  --function-id FUNCTION_ID
                        NVCF function ID for the service, utilized when using TRY API ignored otherwise
  --streaming           Flag to enable grpc streaming mode.
  --model-type {48k-hq,48k-ll,16k-hq}
                        Studio Voice model type, default is 48k-hq.

For more details on getting started with this NIM including configuring using parameters, visit the NVIDIA Maxine Studio Voice NIM Docs.