The NVIDIA AI Blueprint for Retrieval-Augmented Generation (RAG) is a production-ready reference workflow that provides a complete foundation for building scalable, customizable pipelines for both retrieval and generation. Powered by NVIDIA NeMo Retriever models and NVIDIA Llama Nemotron models, the blueprint is optimized for high accuracy, strong reasoning, and enterprise-scale throughput.
With built-in support for multimodal data ingestion, advanced retrieval, reranking, and reflection techniques, and seamless integration into LLM-powered workflows, it connects language models to enterprise data across text, tables, charts, audio, and infographics from millions of documents—enabling truly context-aware and generative responses.
Beyond retrieval and generation, the blueprint includes governance, observability, and safety features to meet enterprise requirements, along with developer-friendly APIs, telemetry, and evaluation frameworks for streamlined experimentation and deployment. GPU acceleration ensures unmatched performance at scale, while flexible plug-ins and customizability let teams adapt the solution to their unique use cases.
Whether you’re building enterprise search, knowledge assistants, generative copilots, or vertical AI workflows, the NVIDIA AI Blueprint for RAG delivers everything needed to move from prototype to production with confidence. It can be used standalone, combined with other NVIDIA Blueprints or integrated into an agentic workflow to support more advanced reasoning-driven applications.For example, this blueprint serves as a foundational building block in the AI Agent for Enterprise Research
Get started with this reference architecture to ground AI-driven decisions and generation in relevant enterprise data.
Data Ingestion and Processing
Vector Database and Retrieval
Multimodal and Advanced Generation
Governance
Observability and Telemetry
Other
Hardware Requirements
The blueprint offers two primary modes of deployment. By default, it deploys the referenced NIM microservices locally. Each method lists its minimum required hardware. This will change if the deployment turns on optional configuration settings.
Docker
Kubernetes
The blueprint allows for use of NVIDIA NGC-hosted models, in which case one GPU will be required to host the NVIDIA cuVS-accelerated vector database.
OS Requirements
Deployment Options
NVIDIA Technology
3rd Party Software
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI concerns here.
Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.
This blueprint is governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and the NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product. The models are governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License and the NVIDIA RAG dataset which is governed by the NVIDIA Asset License Agreement. The following models that are built with Llama are governed by the Llama 3.2 Community License Agreement: nvidia/llama-3.2-nv-embedqa-1b-v2 and nvidia/llama-3.2-nv-rerankqa-1b-v2 and llama-3.2-nemoretriever-1b-vlm-embed-v1.
ADDITIONAL INFORMATION:
The Llama 3.1 Community License Agreement for the llama-3.1-nemotron-nano-vl-8b-v1, llama-3.1-nemoguard-8b-content-safety and llama-3.1-nemoguard-8b-topic-control models. The Llama 3.2 Community License Agreement for the nvidia/llama-3.2-nv-embedqa-1b-v2, nvidia/llama-3.2-nv-rerankqa-1b-v2 and llama-3.2-nemoretriever-1b-vlm-embed-v1 models. The Llama 3.3 Community License Agreement for the llama-3.3-nemotron-super-49b-v1.5 model. Built with Llama. Apache 2.0 for NVIDIA Ingest and for the nemoretriever-page-elements-v2, nemoretriever-table-structure-v1, nemoretriever-graphic-elements-v1, paddleocr and nemoretriever-ocr-v1 models.

Power fast, accurate semantic search across multimodal enterprise data with NVIDIA’s RAG Blueprint—built on NeMo Retriever and Nemotron models—to connect your agents to trusted, authoritative sources of knowledge.