Enhance speech by correcting common audio degradations to create studio quality speech output.
Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.
WSL2 is required for hosting any NIM. Refer to the official NVIDIA NIM on WSL2 documentation for setup instructions.
To run the NIM refer to the docs on Studio Voice NIM on WSL2
Install the WSL2 Distro.
Once installed, open the NVIDIA-Workbench
WSL2 distro using the following command in the Windows terminal.
wsl -d NVIDIA-Workbench
$ podman login nvcr.io Username: $oauthtoken Password: <PASTE_API_KEY_HERE>
Pull and run the NIM with the command below.
podman run -it --rm --name=studio-voice \ --device nvidia.com/gpu=all \ --shm-size=8GB \ -e NGC_API_KEY=$NGC_API_KEY \ -e NIM_MODEL_PROFILE=<nim_model_profile> \ -e FILE_SIZE_LIMIT=36700160 \ -e STREAMING=false \ -p 8000:8000 \ -p 8001:8001 \ nvcr.io/nim/nvidia/maxine-studio-voice:latest
The above command is to run the NIM in transactional mode.
To run the NIM in streaming mode use -e STREAMING=true
.
Ensure you use the appropriate NIM_MODEL_PROFILE
for your GPU. For more information about NIM_MODEL_PROFILE
, refer to the the NIM Model Profile Table.
Please note, the flag --gpus all is used to assign all available GPUs to the docker container. This fails on multiple GPU unless all GPUs are same. To assign specific GPU to the docker container (in case of different multiple GPUs available in your machine) use --gpus '"device=0,1,2..."'
If the command runs successfully, you will get an output ending similar to the following:
I1126 09:22:21.048202 31 grpc_server.cc:2558] "Started GRPCInferenceService at 127.0.0.1:9001" I1126 09:22:21.048377 31 http_server.cc:4704] "Started HTTPService at 127.0.0.1:9000" I1126 09:22:21.089295 31 http_server.cc:362] "Started Metrics Service at 127.0.0.1:9002" Maxine GRPC Service: Listening to 0.0.0.0:8001
By default Maxine Studio Voice gRPC service is hosted on port 8001
. You will have to use this port for inferencing requests.
We have provided a sample client script file in our GitHub repo. The script could be used to invoke the Docker container using the following instructions.
Download the Maxine Studio Voice Python client code by cloning the NVIDIA Maxine NIM Clients Repository:
git clone https://github.com/NVIDIA-Maxine/nim-clients.git cd nim-clients/studio-voice
Install the dependencies for the NVIDIA Maxine Studio Voice Python client:
sudo apt-get install python3-pip pip install -r requirements.txt
Go to scripts directory
cd scripts
Assuming the client is on the same machine as the NIM, run the following command to send a gRPC request to the NIM.
By default transactional mode:
python studio_voice.py --target localhost:8001 --input <input_file_path> --output <output_file_path>
For streaming mode:
python studio_voice.py --target localhost:8001 --input <input_file_path> --output <output_file_path> --streaming --model-type 48k-hq
When using --streaming
mode, ensure the selected --model-type
(48k-hq, 48k-ll, or 16k-hq) aligns with the NIM_MODEL_PROFILE
Model Type configuration to maintain compatibility .
For more advance usage of the client, refer to this documentation
To view details of command line arguments run this command
python studio_voice.py -h
You will get a response similar to the following.
usage: studio_voice.py [-h] [--ssl-mode {MTLS,TLS}] [--ssl-key SSL_KEY] [--ssl-cert SSL_CERT] [--ssl-root-cert SSL_ROOT_CERT] [--target TARGET] [--input INPUT] [--output OUTPUT] [--api-key API_KEY] [--function-id FUNCTION_ID] [--streaming] [--model-type {48k-hq,48k-ll,16k-hq}] Process wav audio files using gRPC and apply studio-voice. options: -h, --help show this help message and exit --preview-mode Flag to send request to preview NVCF server on https://build.nvidia.com/nvidia/studiovoice/api. --ssl-mode {MTLS,TLS} Flag to set SSL mode, default is None --ssl-key SSL_KEY The path to ssl private key. --ssl-cert SSL_CERT The path to ssl certificate chain. --ssl-root-cert SSL_ROOT_CERT The path to ssl root certificate. --target TARGET IP:port of gRPC service, when hosted locally. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF. --input INPUT The path to the input audio file. --output OUTPUT The path for the output audio file. --api-key API_KEY NGC API key required for authentication, utilized when using TRY API ignored otherwise --function-id FUNCTION_ID NVCF function ID for the service, utilized when using TRY API ignored otherwise --streaming Flag to enable grpc streaming mode. --model-type {48k-hq,48k-ll,16k-hq} Studio Voice model type, default is 48k-hq.
For more details on getting started with this NIM including configuring using parameters, visit the NVIDIA Maxine Studio Voice NIM Docs.