nvidia/studiovoice
RUN ANYWHERE
Enhance speech by correcting common audio degradations to create studio quality speech output.
Getting Started
NVIDIA Maxine Studio Voice NIM uses gRPC APIs for inferencing requests. Following instructions demonstrate the usage of Maxine Studio Voice NIM model using Python client.
Prerequisites
You will need a system with git
and Python 3.10+
installed.
Setup NVIDIA Maxine Studio Voice Python client
Download the Maxine Studio Voice Python client code by cloning the NVIDIA Maxine NIM Clients Repository:
git clone https://github.com/NVIDIA-Maxine/nim-clients.git cd nim-clients/studio-voice
Install the dependencies for the NVIDIA Maxine Studio Voice Python client:
sudo apt-get install python3-pip pip install -r requirements.txt
Run Python Client
Navigate to the scripts directory.
cd scripts
Send the gRPC requests
python studio_voice.py --use-ssl \ --target grpc.nvcf.nvidia.com:443 \ --function-id 7cf12edb-2181-4947-8b19-2b1c18270588 \ --api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC \ --input <input_file_path> \ --output <output_file_path>
Note the requirements for input file:
- The supported format is 16-bit mono channel wav file.
- The size limit for input file is 32 MB.
- The duration limit for input file is 6 min.
Command line arguments:
--use-ssl
- Flag to control SSL/TLS encryption enablement.--target <ip:port>
- URI of NIM's gRPC service. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF. (Default:127.0.0.1:8001
)--api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC
- NGC API key required for authentication. Utilized when usingTRY API
ignored otherwise.--function-id <function_id>
- Function ID for the feature.--input <input_file_path>
- The path to the input audio file. (Default:../assets/studio_voice_48k_input.wav
)--output <output_file_path>
- The path to the output audio file. (Default:./studio_voice_48k_output.wav
)
Refer the Maxine Studio Voice NIM documentation for more information.