nvidia/studiovoice
Enhance speech by correcting common audio degradations to create studio quality speech output.
By running the below commands, you accept the NVIDIA AI Enterprise Terms of Use and the NVIDIA Community Models License.
Pull and run nvidia/studiovoice
using Docker (this will download the full model and run it in your local environment)
$ docker login nvcr.io Username: $oauthtoken Password: <PASTE_API_KEY_HERE>
NVIDIA Maxine Studio Voice NIM uses gRPC APIs for inferencing requests.
A NGC API KEY is required to download the appropriate models and resources when starting the NIM.
If you are not familiar with how to create the NGC_API_KEY
environment variable, the simplest way is to export it in your terminal:
export NGC_API_KEY=<PASTE_API_KEY_HERE>
Run one of the following commands to make the key available at startup:
# If using bash echo "export NGC_API_KEY=<value>" >> ~/.bashrc # If using zsh echo "export NGC_API_KEY=<value>" >> ~/.zshrc
Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE
, or using a password manager.
The following command launches the Maxine Studio Voice NIM container with the gRPC service. Find reference to runtime parameters for the container here.
docker run -it --rm --name=maxine-studio-voice \ --net host \ --runtime=nvidia \ --gpus all \ --shm-size=8GB \ -e NGC_API_KEY=$NGC_API_KEY \ -e NIM_MODEL_PROFILE=<nim_model_profile> \ -e FILE_SIZE_LIMIT=36700160 \ nvcr.io/nim/nvidia/maxine-studio-voice:latest
Ensure you use the appropriate NIM_MODEL_PROFILE
for your GPU. For more information about NIM_MODEL_PROFILE
, refer to the the NIM Model Profile Table.
Please note, the flag --gpus all is used to assign all available GPUs to the docker container. This fails on multiple GPU unless all GPUs are same. To assign specific GPU to the docker container (in case of different multiple GPUs available in your machine) use --gpus '"device=0,1,2..."' "'`
If the command runs successfully, you get a response similar to the following.
+-------------------------------+---------+--------+ | Model | Version | Status | +-------------------------------+---------+--------+ | maxine_nvcf_studiovoice | 1 | READY | | studio_voice_high_quality-48k | 1 | READY | +-------------------------------+---------+--------+ I1126 09:22:21.040917 31 metrics.cc:877] "Collecting metrics for GPU 0: NVIDIA GeForce RTX 4090" I1126 09:22:21.046137 31 metrics.cc:770] "Collecting CPU metrics" I1126 09:22:21.046253 31 tritonserver.cc:2598] +----------------------------------+------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.50.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy m | | | odel_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters stati | | | stics trace logging | | model_repository_path[0] | /opt/maxine/models | | model_control_mode | MODE_EXPLICIT | | startup_models_0 | maxine_nvcf_studiovoice | | startup_models_1 | studio_voice_high_quality-48k | | strict_model_config | 0 | | model_config_name | | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | | cache_enabled | 0 | +----------------------------------+------------------------------------------------------------------------------------------------+ I1126 09:22:21.048202 31 grpc_server.cc:2558] "Started GRPCInferenceService at 127.0.0.1:9001" I1126 09:22:21.048377 31 http_server.cc:4704] "Started HTTPService at 127.0.0.1:9000" I1126 09:22:21.089295 31 http_server.cc:362] "Started Metrics Service at 127.0.0.1:9002" Maxine GRPC Service: Listening to 0.0.0.0:8001
By default Maxine Studio Voice gRPC service is hosted on port 8001
. You will have to use this port for inferencing requests.
We have provided a sample client script file in our GitHub repo. The script could be used to invoke the Docker container using the following instructions.
Download the Maxine Studio Voice Python client code by cloning the NVIDIA Maxine NIM Clients Repository:
git clone https://github.com/NVIDIA-Maxine/nim-clients.git cd nim-clients/studio-voice
Install the dependencies for the NVIDIA Maxine Studio Voice Python client:
sudo apt-get install python3-pip pip install -r requirements.txt
Go to scripts directory
cd scripts
Run the command to send gRPC request
python studio_voice.py --target <target_ip:port> --input <input_file_path> --output <output_file_path>
To view details of command line arguments run this command
python studio_voice.py -h
You will get a response similar to the following.
usage: studio_voice.py [-h] [--target TARGET] [--input INPUT] [--output OUTPUT] Process wav audio files using gRPC and apply Studio Voice. options: -h, --help show this help message and exit --target TARGET The target gRPC server address. --input INPUT The path to the input audio file. --output OUTPUT The path for the output audio file.
For more details on getting started with this NIM including configuring using parameters, visit the NVIDIA Maxine Studio Voice NIM Docs.