The NVIDIA AI Blueprint for Retrieval-Augmented Generation (RAG) is a production-ready, modular reference architecture for building high-accuracy, high-performance RAG systems that power enterprise search, knowledge assistants, copilots, and agentic workflows at scale. Optimized for GPU acceleration and enterprise throughput, the blueprint provides a complete foundation for ingestion, retrieval, reasoning, and generation across multimodal enterprise data.
Built to support modern agent ecosystems, the blueprint includes shallow and deep document summarization, reasoning-budget configurability, query decomposition, and dynamic metadata filtering—enabling agents to efficiently narrow search space, select trusted sources, and reason over large corpora. Native Python libraries, OpenAI-compatible APIs, MCP server support, and a built-in data catalog make it easy for developers to integrate RAG capabilities into existing applications and multi-agent workflows.
The blueprint supports advanced multimodal generation, including vision-language models (VLMs) for image understanding, captioning, and image-aware answer generation, along with optional reflection to further improve answer quality. A robust multimodal ingestion pipeline extracts text, tables, charts, images, infographics, and audio/video content, enriched with custom metadata to improve downstream retrieval and filtering.
Designed for flexibility and scale, the RAG Blueprint offers hybrid dense + sparse retrieval, multi-collection search, GPU-accelerated indexing and querying, reranking, and pluggable vector database support—including ElasticSearch and Milvus—with fine-grained database authorization and token support. Built-in observability, OpenTelemetry integration, and evaluation scripts (RAGAS) help teams measure accuracy, latency, and quality as they move from pilot to production, while optional programmable guardrails support enterprise safety requirements.
Deployable via Docker or Kubernetes, with a user interface included and support for GPU sharing through the NIM Operator, the blueprint is fully decomposable and customizable to fit domain-specific needs. It can run standalone, integrate with other NVIDIA Blueprints, or serve as a core building block in agentic systems.
Importantly, the NVIDIA AI Blueprint for RAG serves as a foundational layer of the NVIDIA AI Data Platform, transforming raw, multimodal enterprise data into AI-ready knowledge that powers retrieval, reasoning, and generation across applications.
It is also foundational to the AI Agent for Enterprise Research, providing the trusted knowledge base, summarization, and retrieval capabilities required for advanced, reasoning-driven enterprise agents AI Agent for Enterprise Research
Get started with this reference architecture to ground AI-driven decisions and generation in trusted, relevant enterprise data—at production scale.
Agent Ecosystem Support
Multimodal and Advanced Generation
Data Ingestion and Processing
Vector Database and Retrieval
Governance
Observability and Telemetry
Other
Hardware Requirements
The blueprint offers two primary modes of deployment. By default, it deploys the referenced NIM microservices locally. Each method lists its minimum required hardware. This will change if the deployment turns on optional configuration settings.
Docker
Kubernetes
The blueprint allows for use of NVIDIA NGC-hosted models, in which case one GPU will be required to host the NVIDIA cuVS-accelerated vector database.
OS Requirements
Deployment Options
NVIDIA Technology
3rd Party Software
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI concerns here.
Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.
This blueprint is governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and the NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product. The models are governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License and the NVIDIA RAG dataset which is governed by the NVIDIA Asset License Agreement. The following models that are built with Llama are governed by the Llama 3.2 Community License Agreement: nvidia/llama-3.2-nv-embedqa-1b-v2 and nvidia/llama-3.2-nv-rerankqa-1b-v2 and llama-3.2-nemoretriever-1b-vlm-embed-v1.
ADDITIONAL INFORMATION:
The Llama 3.1 Community License Agreement for the llama-3.1-nemoguard-8b-content-safety and llama-3.1-nemoguard-8b-topic-control models. The Llama 3.2 Community License Agreement for the nvidia/llama-3.2-nv-embedqa-1b-v2, nvidia/llama-3.2-nv-rerankqa-1b-v2 and llama-3.2-nemoretriever-1b-vlm-embed-v1 models. The Llama 3.3 Community License Agreement for the llama-3.3-nemotron-super-49b-v1.5 model. Built with Llama. Apache 2.0 for NVIDIA Ingest and for the nemoretriever-page-elements-v3, nemoretriever-table-structure-v1, nemoretriever-graphic-elements-v1 and nemoretriever-ocr-v1 models.

Power fast, accurate semantic search across multimodal enterprise data with NVIDIA’s RAG Blueprint—built on NeMo Retriever and Nemotron models—to connect your agents to trusted, authoritative sources of knowledge.