NVIDIA
Explore Models Blueprints GPUs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

nvidia

studiovoice

Run Anywhere

Enhance speech by correcting common audio degradations to create studio quality speech output.

digital humannvidia maxinerun on rtxspeech enhancementspeech-to-speech
Get API Key
API Reference
Accelerated by DGX Cloud

Getting Started

NVIDIA Maxine Studio Voice NIM uses gRPC APIs for inferencing requests. Following instructions demonstrate the usage of Maxine Studio Voice NIM model using Python client.

Prerequisites

You will need a system with git and Python 3.10+ installed.

Setup NVIDIA Maxine Studio Voice Python client

Download the Maxine Studio Voice Python client code by cloning the NVIDIA Maxine NIM Clients Repository:

git clone https://github.com/NVIDIA-Maxine/nim-clients.git cd nim-clients/studio-voice

Install the dependencies for the NVIDIA Maxine Studio Voice Python client:

sudo apt-get install python3-pip pip install -r requirements.txt

Run Python Client

Navigate to the scripts directory.

cd scripts

Send the gRPC requests

python studio_voice.py --preview-mode \ --ssl-mode TLS \ --target grpc.nvcf.nvidia.com:443 \ --function-id 7cf12edb-2181-4947-8b19-2b1c18270588 \ --api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC \ --input <input_file_path> \ --output <output_file_path>

Note the requirements for input file:

  • The supported format is 16-bit mono channel wav file.
  • The size limit for input file is 32 MB.
  • The duration limit for input file is 6 min.

Command line arguments:

  • --preview-mode - Flag to send request to preview NVCF server on https://build.nvidia.com/nvidia/studiovoice/api.
  • --ssl-mode - Set the SSL mode to TLS or MTLS. Defaults to no SSL. When running preview, TLS mode must be used with default root certificate.
  • --target <ip:port> - URI of NIM's gRPC service. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF. (Default: 127.0.0.1:8001)
  • --api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC - NGC API key required for authentication. Utilized when using TRY API ignored otherwise.
  • --function-id <function_id> - Function ID for the feature.
  • --input <input_file_path> - The path to the input audio file. (Default: ../assets/studio_voice_48k_input.wav)
  • --output <output_file_path> - The path to the output audio file. (Default: ./studio_voice_48k_output.wav)
  • --streaming - Flag to enable grpc streaming mode.

Refer the Maxine Studio Voice NIM documentation for more information.