NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

The NVIDIA AI Blueprint for Retrieval-Augmented Generation (RAG) is a production-ready, modular reference architecture for building high-accuracy, high-performance RAG systems that power enterprise search, knowledge assistants, copilots, and agentic workflows at scale. Optimized for GPU acceleration and enterprise throughput, the blueprint provides a complete foundation for ingestion, retrieval, reasoning, and generation across multimodal enterprise data.

Built to support modern agent ecosystems, the blueprint includes shallow and deep document summarization, reasoning-budget configurability, query decomposition, and dynamic metadata filtering—enabling agents to efficiently narrow search space, select trusted sources, and reason over large corpora. Native Python libraries, OpenAI-compatible APIs, MCP server support, and a built-in data catalog make it easy for developers to integrate RAG capabilities into existing applications and multi-agent workflows.

The blueprint supports advanced multimodal generation, including vision-language models (VLMs) for image understanding, captioning, and image-aware answer generation, along with optional reflection to further improve answer quality. A robust multimodal ingestion pipeline extracts text, tables, charts, images, infographics, and audio/video content, enriched with custom metadata to improve downstream retrieval and filtering.

Designed for flexibility and scale, the RAG Blueprint offers hybrid dense + sparse retrieval, multi-collection search, GPU-accelerated indexing and querying, reranking, and pluggable vector database support—including ElasticSearch and Milvus—with fine-grained database authorization and token support. Built-in observability, OpenTelemetry integration, and evaluation scripts (RAGAS) help teams measure accuracy, latency, and quality as they move from pilot to production, while optional programmable guardrails support enterprise safety requirements.

Deployable via Docker or Kubernetes, with a user interface included and support for GPU sharing through the NIM Operator, the blueprint is fully decomposable and customizable to fit domain-specific needs. It can run standalone, integrate with other NVIDIA Blueprints, or serve as a core building block in agentic systems.

Importantly, the NVIDIA AI Blueprint for RAG serves as a foundational layer of the NVIDIA AI Data Platform, transforming raw, multimodal enterprise data into AI-ready knowledge that powers retrieval, reasoning, and generation across applications.

It is also foundational to the AI Agent for Enterprise Research, providing the trusted knowledge base, summarization, and retrieval capabilities required for advanced, reasoning-driven enterprise agents AI Agent for Enterprise Research

Get started with this reference architecture to ground AI-driven decisions and generation in trusted, relevant enterprise data—at production scale.

Architecture Diagram

Key Features

  • Agent Ecosystem Support

    • Summarization (Shallow and Deep)
    • MCP Server Support
    • Data Catalog Support
    • Reasoning Budget Configurability
    • Native Python library support
    • OpenAI-compatible APIs
  • Multimodal and Advanced Generation

    • Vision Language Model (VLM) Support in answer generation
    • Image captioning with vision language models (VLMs)
    • Improve accuracy with optional reflection
  • Data Ingestion and Processing

    • Multimodal PDF data extraction support with text, tables, charts, images and infographics
    • Support for audio/video file ingestion
    • Custom metadata support
  • Vector Database and Retrieval

    • Multi-collection searchability
    • Hybrid search with dense and sparse search
    • Reranking to further improve accuracy
    • GPU-accelerated Index creation and search
    • Pluggable vector database
    • ElasticSearch Support as a Vector Database
    • Milvus Support as a Vector Database
    • Query Decomposition
    • Dynamic metadata filter generation
    • Database Authorization Support
    • Database Auth token Support
  • Governance

    • Improve content safety with optional programmable guardrails
  • Observability and Telemetry

    • Evaluation Scripts included (RAGAS framework)
    • OpenTelemetry Support
  • Other

    • User interface included
    • NIM Operator support to allow GPU sharing
    • Decomposable and customizable
    • Multi-turn conversations
    • Multi-session support

Minimum System Requirements

Hardware Requirements

The blueprint offers two primary modes of deployment. By default, it deploys the referenced NIM microservices locally. Each method lists its minimum required hardware. This will change if the deployment turns on optional configuration settings.

  • Docker

    • 2 x RTX Pro 6000
    • 2 x H100
    • 2 x B200
    • 3 x A100
  • Kubernetes

    • 8 x H100-80GB
    • 8 x B200
    • 9 x A100-80GB SXM
    • 8 x RTX PRO 6000
    • 3 x H100 (with Multi-Instance GPU)
  • The blueprint allows for use of NVIDIA NGC-hosted models, in which case one GPU will be required to host the NVIDIA cuVS-accelerated vector database.

OS Requirements

  • Ubuntu 22.04 OS

Deployment Options

  • Docker
  • Kubernetes with NIM Operator

Software used in this blueprint

NVIDIA Technology

  • Llama Nemotron Super 49B
  • NeMo Retriever Llama 3.2 embedding NIM
  • NeMo Retriever Llama 3.2 reranking NIM
  • NeMo Retriever page elements NIM
  • NeMo Retriever table structure NIM
  • NeMo Retriever graphic elements NIM
  • NeMo Retriever OCR NIM
  • Llama Nemotron nano vl 12b (optional)
  • Nemotron parse NIM (optional)
  • Llama 3.1 NemoGuard 8B Content safety NIM (optional)
  • Llama 3.1 NemoGuard 8B Topic control NIM (optional)
  • NVIDIA Riva ASR NIM (optional)
  • NeMo Retriever Llama 3.2 vlm embedding NIM (optional)

3rd Party Software

  • LangChain
  • Milvus database (accelerated with NVIDIA cuVS)
  • ElasticSearch Vector Database
  • Minio
  • Redis Cache

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI concerns here.

License

Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.

Terms of Use

This blueprint is governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and the NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product. The models are governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License and the NVIDIA RAG dataset which is governed by the NVIDIA Asset License Agreement. The following models that are built with Llama are governed by the Llama 3.2 Community License Agreement: nvidia/llama-3.2-nv-embedqa-1b-v2 and nvidia/llama-3.2-nv-rerankqa-1b-v2 and llama-3.2-nemoretriever-1b-vlm-embed-v1.

ADDITIONAL INFORMATION:

The Llama 3.1 Community License Agreement for the llama-3.1-nemoguard-8b-content-safety and llama-3.1-nemoguard-8b-topic-control models. The Llama 3.2 Community License Agreement for the nvidia/llama-3.2-nv-embedqa-1b-v2, nvidia/llama-3.2-nv-rerankqa-1b-v2 and llama-3.2-nemoretriever-1b-vlm-embed-v1 models. The Llama 3.3 Community License Agreement for the llama-3.3-nemotron-super-49b-v1.5 model. Built with Llama. Apache 2.0 for NVIDIA Ingest and for the nemoretriever-page-elements-v3, nemoretriever-table-structure-v1, nemoretriever-graphic-elements-v1 and nemoretriever-ocr-v1 models.

nvidia

Build an Enterprise RAG Pipeline Blueprint

Power fast, accurate semantic search across multimodal enterprise data with NVIDIA’s RAG Blueprint—built on NeMo Retriever and Nemotron models—to connect your agents to trusted, authoritative sources of knowledge.

llama-3_3-nemotron-super-49b-v1_5•llama-3_2-nv-embedqa-1b-v2•llama-3_2-nemoretriever-1b-vlm-embed-v1•llama-3_2-nv-rerankqa-1b-v2•nemoretriever-page-elements-v3•nemoretriever-table-structure-v1•nemoretriever-graphic-elements-v1•nemoretriever-ocr-v1
NIMNeMo RetrieverNemotronRetrieval-Augmented GenerationEnterpriseLaunchableNVIDIA AI
View GitHubDeploy on Cloud