State-of-the-art accuracy and speed for English transcriptions.
Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.
$ docker login nvcr.io Username: $oauthtoken Password: <PASTE_API_KEY_HERE>
Refer Supported Models for full list of models.
export NGC_API_KEY=<PASTE_API_KEY_HERE> docker run -it --rm \ --runtime=nvidia \ --gpus '"device=0"' \ --shm-size=8GB \ -e NGC_API_KEY \ -e NIM_HTTP_API_PORT=9000 \ -e NIM_GRPC_API_PORT=50051 \ -p 9000:9000 \ -p 50051:50051 \ -e NIM_TAGS_SELECTOR=name=parakeet-0-6b-ctc-riva-en-us,mode=ofl,bs1 \ nvcr.io/nim/nvidia/parakeet-0-6b-ctc-en-us:latest
It may take a up to 30 minutes depending on your network speed, for the container to be ready and start accepting requests from the time the docker container is started.
Open a new terminal and run following command to check if the service is ready to handle inference requests
curl -X 'GET' 'http://localhost:9000/v1/health/ready'
If the service is ready, you get a response similar to the following.
{"ready":true}
Install the Riva Python client package
sudo apt-get install python3-pip pip install nvidia-riva-client
Download Riva sample clients
git clone https://github.com/nvidia-riva/python-clients.git
Run Speech to Text inference in streaming modes. Riva ASR supports Mono, 16-bit audio in WAV, OPUS and FLAC formats.
python3 python-clients/scripts/asr/transcribe_file_offline.py --server 0.0.0.0:50051 --input-file <path_to_speech_file> --language-code en-US
For more details on getting started with this NIM, visit the Riva ASR NIM Docs.