nvidia

Build an AI Virtual Assistant

Create intelligent virtual assistants for customer service across every industry

llama-3_1-70b-instruct•llama-3_2-nv-embedqa-1b-v2•llama-3_2-nv-rerankqa-1b-v2

blueprint customer service retrieval-augmented generation nvidia ai contact center llm

Organizations can benefit from using generative AI to enhance customer service but face challenges like fragmented data sources and potential data risks—this blueprint helps them address those issues. This NVIDIA AI Blueprint leverages retrieval-augmented generation (RAG) and generative AI technologies including NVIDIA NIM™ and NVIDIA NeMo™ Retriever.

It uses some of the latest AI agent-building methodologies, connecting disparate data sources to improve the operational efficiency of existing solutions or build new customer service-centric systems. It offers advanced AI tools, secure management of sensitive data wherever it resides, personalized multi-turn question answering, sentiment analysis, summary generation, and configurable session handling.

Architecture Diagram

What’s Included in the Blueprint

NVIDIA AI Blueprints provide customizable generative AI reference architectures designed to equip enterprise developers with essential assets such as NIM microservices, reference code, detailed documentation, and Helm charts for deployment. These blueprints serve as a foundation for building advanced AI virtual assistants, either as standalone applications or as enhancements to existing systems. Their focus is on enabling personalization, summarization, and sentiment analysis, particularly through the use of generative AI for data that’s often inaccessible.

The blueprint includes a reference UI and an AI assistant (developed using the LangGraph framework) that leverages sub-agents to handle queries from both structured and unstructured data sources.

Included NIM Microservices

The following NIM microservices are used in this blueprint:

NeMo Retriever Embed QA E5
NeMo Retriever Rerank Mistral 4B
Llama 3.1 70B Instruct

Sample Data

The blueprint comes with synthetic sample data representing a typical customer service function, including customer profiles, order histories (structured data), and technical product manuals (unstructured data). A notebook is provided to guide users on how to ingest both structured and unstructured data efficiently.

Structured Data: Includes customer profiles and order history
Unstructured Data: Ingests product manuals, product catalogs, and FAQs

AI Agent

This reference solution implements three sub-agents using the open-source LangGraph framework. These sub-agents address common customer service tasks for the included sample dataset. They rely on the Llama 3.1 models (70B and 8B Instruct) and NVIDIA NIM microservices for generating responses, converting natural language into SQL queries, and assessing the sentiment of the conversation.

Key Components

Structured Data Retriever: Works in tandem with a Postgres database and Vanna.AI to fetch relevant data based on user queries.
Unstructured Data Retriever: Processes unstructured data (e.g., PDFs, FAQs) by chunking it, creating embeddings using the NeMo Retriever embedding NIM, and storing it in Milvus for fast retrieval.

Analytics and Admin Operations

To support operational requirements, the blueprint includes reference code for managing key administrative tasks:

Storing conversation histories
Generating conversation summaries
Conducting sentiment analysis on customer interactions

These features ensure that customer service teams can efficiently monitor and evaluate interactions for quality and performance.

Data Flywheel

The blueprint comes with pre-built APIs that support continuous model improvement. The feedback loop, or “data flywheel,” allows LLM models to be fine-tuned over time to enhance both accuracy and cost-effectiveness. Feedback is collected at multiple points in the process to refine the models’ performance further.

Summary

In summary, this NVIDIA AI Blueprint offers a comprehensive solution for building intelligent, generative AI-powered virtual assistants for customer service, leveraging structured and unstructured data to deliver personalized and efficient support. It includes all necessary tools and guidance to deploy, monitor, and continually improve the solution in real-world environments.

Minimum System Requirements

Hardware Requirements

The AI virtual assistant pipeline supports the following hardware:

H100
A100

OS Requirements

Ubuntu 22.04 OS

Software Dependencies

NVIDIA NIM inference microservices

Embed QA for embeddings
Rerank Mistral 4B for reranking
Llama 3.1 70B Instruct for advanced reasoning, inferencing, and natural language mastery
Nemotron4-340B for synthetic Data Generation (Optional)

Example Walkthrough With Sample Input/Output

Explore example walkthroughs on the NVIDIA API catalog through the specific NIM microservices links below:

NeMo Retriever Embed QA NeMo Retriever Rerank Mistral 4B Llama 3.1 70B Instruct

You can find the complete example of how to get started with this Blueprint available on the NVIDIA GitHub repository.

Security Considerations

The AI Virtual Assistant Customer Service Blueprint Application is shared as reference architecture and is provided “as is”. The security in the production environment is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of known vulnerabilities. Only authenticated users must be able to submit queries.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.

License

Use of the models in this AI virtual assistant for customer service blueprint is governed by the NVIDIA AI Foundation Models Community License.

Terms of Use

GOVERNING TERMS: The blueprint is governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product.

Meta Llama 3.1 70B Instruct

GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and the Product Specific Terms for AI Products;

Use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement. ADDITIONAL INFORMATION: Llama 3.1 Community License Agreement, Built with Llama.

Nemo Text Retriever E5 Embedding Model

GOVERNING TERMS: The NIM container is governed by NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product; and the use of this model is governed by the ai-foundation-models-community-license.pdf (nvidia.com). ADDITIONAL INFORMATION: MIT license.