nvidia/studiovoice

RUN ANYWHERE

Enhance speech by correcting common audio degradations to create studio quality speech output.

Getting Started

NVIDIA Maxine Studio Voice NIM uses gRPC APIs for inferencing requests. Following instructions demonstrate the usage of Maxine Studio Voice NIM model using Python client.

Prerequisites

You will need a system with git and Python 3.10+ installed.

Setup NVIDIA Maxine Studio Voice Python client

Download the Maxine Studio Voice Python client code by cloning the NVIDIA Maxine NIM Clients Repository:

git clone https://github.com/NVIDIA-Maxine/nim-clients.git cd nim-clients/studio-voice

Install the dependencies for the NVIDIA Maxine Studio Voice Python client:

sudo apt-get install python3-pip pip install -r requirements.txt

Run Python Client

Navigate to the scripts directory.

cd scripts

Send the gRPC requests

python studio_voice.py --use-ssl \ --target grpc.nvcf.nvidia.com:443 \ --function-id 7cf12edb-2181-4947-8b19-2b1c18270588 \ --api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC \ --input <input_file_path> \ --output <output_file_path>

Note the requirements for input file:

  • The supported format is 16-bit mono channel wav file.
  • The size limit for input file is 32 MB.
  • The duration limit for input file is 6 min.

Command line arguments:

  • --use-ssl - Flag to control SSL/TLS encryption enablement.
  • --target <ip:port> - URI of NIM's gRPC service. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF. (Default: 127.0.0.1:8001)
  • --api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC - NGC API key required for authentication. Utilized when using TRY API ignored otherwise.
  • --function-id <function_id> - Function ID for the feature.
  • --input <input_file_path> - The path to the input audio file. (Default: ../assets/studio_voice_48k_input.wav)
  • --output <output_file_path> - The path to the output audio file. (Default: ./studio_voice_48k_output.wav)

Refer the Maxine Studio Voice NIM documentation for more information.