Enhance speech by correcting common audio degradations to create studio quality speech output.
NVIDIA Maxine Studio Voice NIM uses gRPC APIs for inferencing requests. Following instructions demonstrate the usage of Maxine Studio Voice NIM model using Python client.
You will need a system with git
and Python 3.10+
installed.
Download the Maxine Studio Voice Python client code by cloning the NVIDIA Maxine NIM Clients Repository:
git clone https://github.com/NVIDIA-Maxine/nim-clients.git cd nim-clients/studio-voice
Install the dependencies for the NVIDIA Maxine Studio Voice Python client:
sudo apt-get install python3-pip pip install -r requirements.txt
Navigate to the scripts directory.
cd scripts
Send the gRPC requests
python studio_voice.py --preview-mode \ --ssl-mode TLS \ --target grpc.nvcf.nvidia.com:443 \ --function-id 7cf12edb-2181-4947-8b19-2b1c18270588 \ --api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC \ --input <input_file_path> \ --output <output_file_path>
Note the requirements for input file:
Command line arguments:
--preview-mode
- Flag to send request to preview NVCF server on https://build.nvidia.com/nvidia/studiovoice/api.--ssl-mode
- Set the SSL mode to TLS or MTLS. Defaults to no SSL. When running preview, TLS mode must be used with default root certificate.--target <ip:port>
- URI of NIM's gRPC service. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF. (Default: 127.0.0.1:8001
)--api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC
- NGC API key required for authentication. Utilized when using TRY API
ignored otherwise.--function-id <function_id>
- Function ID for the feature.--input <input_file_path>
- The path to the input audio file. (Default: ../assets/studio_voice_48k_input.wav
)--output <output_file_path>
- The path to the output audio file. (Default: ./studio_voice_48k_output.wav
)--streaming
- Flag to enable grpc streaming mode.Refer the Maxine Studio Voice NIM documentation for more information.