This experience showcases James, our interactive digital human that has the knowledge of NVIDIA’s products by having direct access to our product knowledge base. James and the RAG-powered backend application use a collection of NVIDIA NIM inference microservices, NVIDIA ACE technologies, and ElevenLabs text-to-speech to provide natural and immersive responses. Using James as an inspiration, users would be able to download and customize the Digital Human for Customer Service blueprint for their industry, with document ingestion from RAG and customizing the avatar look and voice for their application.
The Digital Human for Customer Service NVIDIA NIM™ Agent Blueprint is powered by NVIDIA Tokkio, a workflow based on ACE technologies, to bring enterprise applications to life with a 3D animated digital human interface. With approachable, human-like interactions, customer service applications can provide more engaging user experience compared to traditional customer service options.
This workflow is designed to integrate within your existing generative AI applications built using retrieval-augmented generation (RAG). Use this workflow to start evolving your applications running in your data center, in the cloud, or at the edge, to include a full digital human interface.
The following NIM are used by this blueprint:
Hardware Requirements
Digital human pipeline
The digital human pipeline supports the following hardware:
You would need 2 GPUs minimum for 1 stream and 4 GPUs for 3 streams.
RAG pipeline
The RAG pipeline needs 2xA100 GPUs, one for the embedding and reranking NIM and one for the LLM NIM.
OS Requirements
Both the digital human and the RAG pipeline can be deployed on Ubuntu 22.04 OS.
NVIDIA NIM™ Agent Blueprints are customizable AI workflow examples that equip enterprise developers with NIM microservices, reference code, documentation, and a Helm chart for deployment.
This blueprint provides a reference for the users to showcase how an LLM or a RAG application can be easily connected to a digital human pipeline. The digital human and the RAG application are deployed separately. The RAG application is responsible for generating the text content of the interaction and Tokkio customer service workflow is providing a solution to enable avatar live interaction. Those two entities are separated and communicate using the REST API. The users can develop their requirements and tune the app based on their needs. Included in this workflow are steps to setup and connect both components of the customer service pipeline. Each part of the pipelines consists of the following components:
Digital Human Pipeline
RAG Pipeline
With this blueprint the users will be able to do the following:
Input
Input Type(s): Audio
Input Format: bytes
Input Parameters: Tuning Parameters, Audio
Other Properties Related to Input: Supported Sampling rates: 22.05KHz, 44.1KHz, 16KHz; All audio is resampled to 16KHz. There is no max audio length.
Output
Output Type(s): Blendshape Coefficients
Output Format: Custom Protobuf Format
Output Parameters: Custom Protobuf Format
Input
Input Format: Text
Input Parameters: Temperature, TopP
Output
Output Format: Text and code
Output Parameters: Max output tokens
Input
Input Type(s): Audio in English
Input Format(s): Linear PCM 16-bit 1 channel
Output
Output Type(s): Text String in English with Timestamps
Input
Input Type: text
Input Format: list of strings with task-specific instructions
Output
Output Type: floats
Output Format: list of float arrays, each array containing the embeddings for the corresponding input string
Input
Input Type: Pair of Texts
Input Format: List of text pairs
Other Properties Related to Input: The model's maximum context length is 512 tokens. Texts longer than maximum length must either be chunked or truncated.
Output
Output Type: floats
Output Format: List of float arrays
Other Properties Related to Output: Each the probability score (or raw logits) The user can decide if a Sigmoid activation function is applied to the logits.
The Audio data captured from the user is sent to ACE Agent which orchestrates the communication between various NIM. The ACE Agent uses the Parakeet NIM to convert the audio data to text which is then sent to the RAG pipeline. The RAG pipelines uses the embedding, reranking and LLM NIM to answer the question with context from documents fed to it. The text result is sent to TTS, and the voice output from TTS is used to animate the digital human using the Audio2Face NIM.
API Interfaces for NIM collections conform to OpenAPI standards, and can be readily integrated with NVIDIA NIM containers deployed in any compatible compute cluster. Integration or replacement of API compatible components allow for easy modification of workloads to adapt to your specific use case where needed. See individual NIM documentation for the integration details.
By default, the digital human RAG plugin has support for an API that follows the OpenAPI specification. To customize the pipeline to connect to your own RAG system, follow the instructions here.
GOVERNING TERMS:
Your use of this trial service is governed by the NVIDIA API Trial Terms of Service
ACE NIM and NGC Microservices - NVIDIA AI Product License
Generative AI Examples - Apache 2
ADDITIONAL TERMS:
Meta Llama 3 Community License Agreement at https://llama.meta.com/llama3/license/.
Create intelligent, interactive avatars for customer service across industries