
Follow the steps below to download and run the NVIDIA NIM inference microservice for this model on your infrastructure of choice.
$ docker login nvcr.io
Username: $oauthtoken
Password: <PASTE_API_KEY_HERE>
This command launches NIM container on any of the supported GPUs.
export NGC_API_KEY=<PASTE_API_KEY_HERE>
export CONTAINER_NAME=riva-translate-1_6b
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY=$NGC_API_KEY \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
nvcr.io/nim/nvidia/riva-translate-1_6b:latest
It may take a up to 30 minutes depending on your network speed, for the container to be ready and start accepting requests from the time the docker container is started.
Open a new terminal and run following command to check if the service is ready to handle inference requests
curl -X 'GET' 'http://localhost:9000/v1/health/ready'
If the service is ready, you get a response similar to the following.
{"ready":true}
Install the Riva Python client package
sudo apt-get install python3-pip
pip install -U nvidia-riva-client
Download Riva sample clients
git clone https://github.com/nvidia-riva/python-clients.git
Run Text to Text translation inference
python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 --text "This will become German words" --source-language-code en --target-language-code de
Above command will translate the text from English to German and output will be as shown below.
## Das werden deutsche Wörter
For more details on getting started with this NIM, visit the Riva NIM Docs.