
Enhance input speech recorded with low-quality microphones in noisy or reverberant environments, producing studio-quality speech.
NVIDIA Studio Voice NIM uses gRPC APIs for inferencing requests. Following instructions demonstrate the usage of Studio Voice NIM model using Python client.
You will need a system with git and Python 3.10+ installed.
Download the Studio Voice Python client code by cloning the NIM gRPC Client Repository:
git clone https://github.com/NVIDIA-Maxine/nim-clients.git
cd nim-clients/studio-voice
Install the dependencies for the NVIDIA Studio Voice Python client:
sudo apt-get install python3-pip
pip install -r requirements.txt
Navigate to the scripts directory.
cd scripts
Send the gRPC requests
python studio_voice.py --preview-mode \
--ssl-mode TLS \
--target grpc.nvcf.nvidia.com:443 \
--function-id 3f0aeba3-6d91-4465-b8cc-cc2aef355186 \
--api-key $NVIDIA_API_KEY \
--input <input_file_path> \
--output <output_file_path>
Note the requirements for input file:
Command line arguments:
--preview-mode - Flag to send request to preview NVCF server on https://build.nvidia.com/nvidia/studiovoice/api.--ssl-mode - Set the SSL mode to TLS or MTLS. Defaults to no SSL. When running preview, TLS mode must be used with default root certificate.--target <ip:port> - URI of NIM's gRPC service. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF. (Default: 127.0.0.1:8001)--api-key $NVIDIA_API_KEY - NGC API key required for authentication. Utilized when using TRY API ignored otherwise.--function-id <function_id> - Function ID for the feature.--input <input_file_path> - The path to the input audio file. (Default: ../assets/studio_voice_48k_input.wav)--output <output_file_path> - The path to the output audio file. (Default: ./studio_voice_48k_output.wav)--streaming - Flag to enable grpc streaming mode.Refer the Studio Voice NIM documentation for more information.