NVIDIA
Explore Models Blueprints GPUs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

The NVIDIA AI Blueprint for RAG gives developers a foundational starting point for building scalable, customizable retrieval pipelines that deliver both high accuracy and throughput. Use this blueprint to build RAG applications that provide context-aware responses by connecting LLMs to extensive multimodal enterprise data—including text, tables, charts, and infographics from millions of PDFs. With 15x faster multimodal PDF data extraction and 50% fewer incorrect answers, enterprises can unlock actionable insights from data and drive productivity at scale.

This blueprint can be used as-is, combined with other NVIDIA Blueprints, such as the Digital Human blueprint or the AI Assistant for customer service blueprint, or integrated with an agent to support more advanced use cases. Get started with this reference architecture to ground AI-driven decisions in relevant enterprise data - wherever it resides.

Architecture Diagram

Architecture Diagram

Key Features

  • Multimodal PDF data extraction support with text, tables, charts, and infographics
  • Hybrid search with dense and sparse search
  • Opt-in image captioning with vision language models (VLMs)
  • Reranking to further improve accuracy
  • GPU-accelerated Index creation and search
  • Multi-turn conversations
  • Multi-session support
  • Telemetry and observability
  • Opt-in for reflection to improve accuracy
  • Opt-in for guardrailing conversations
  • Sample user interface
  • OpenAI-compatible APIs
  • Decomposable and customizable

Minimum System Requirements

Hardware Requirements The blueprint offers two primary modes of deployment. By default, it deploys the referenced NIM microservices locally. Each method lists its minimum required hardware. This will change if the deployment turns on optional configuration settings.

  • Docker
    • 2xH100 or 3xA100.
  • Kubernetes
    • 8xH100-80GB or 9xA100-80GB
  • The blueprint provides the alternative to use NGC-hosted models, in which case one GPU will be required to host the NVIDIA cuVS-accelerated vector database.
  • The blueprint can be modified to use additional NIM microservices hosted by NVIDIA.

OS Requirements

  • Ubuntu 22.04 OS

Deployment Options

  • Docker
  • Kubernetes

Software used in this blueprint

NVIDIA Technology

  • NeMo Retriever Llama 3.2 embedding NIM
  • NeMo Retriever Llama 3.2 reranking NIM
  • Llama 3.3 Nemotron Super 49B v1 NIM
  • NeMo Retriever page elements NIM
  • NeMo Retriever table structure NIM
  • NeMo Retriever graphic elements NIM
  • PaddleOCR NIM
  • NeMo Retriever parse NIM (optional)
  • Llama 3.1 NemoGuard 8B content safety NIM (optional)
  • Llama 3.1 NemoGuard 8B topic control NIM (optional)
  • Llama 3.2 11B vision instruct NIM (optional)
  • Mixtral 8x22B instruct 0.1 (optional)

3rd Party Software

  • LangChain
  • Milvus database (accelerated with NVIDIA cuVS)

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI concerns here.

License

Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.

Terms of Use

The software, NIM microservices and materials are governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products, except that models are governed by the NVIDIA Community Model License. The NVIDIA RAG Dataset is governed by the NVIDIA Asset License Agreement. If this Blueprint is deployed using NVIDIA API endpoints on build.nvidia.com, use of the service is governed by NVIDIA API Trial Terms of Service.

ADDITIONAL INFORMATION: The Llama 3.1 Community License Agreement for the llama-3.1-nemoguard-8b-content-safety and llama-3.1-nemoguard-8b-topic-control models. The Llama 3.2 Community License Agreement for the nvidia/llama-3.2-nv-embedqa-1b-v2, nvidia/llama-3.2-nv-rerankqa-1b-v2 and llama-3.2-11b-vision-instruct models. The Llama 3.3 Community License Agreement for the llama-3.3-nemotron-super-49b-v1 model. Built with Llama. Apache 2.0 for NVIDIA Ingest and for the nemoretriever-page-elements-v2, nemoretriever-table-structure-v1, nemoretriever-graphic-elements-v1, paddleocr and mixtral-8x22b-instruct-v0.1 models.

nvidia

Build an Enterprise RAG pipeline

Connect AI applications to multimodal enterprise data with a scalable retrieval augmented generation (RAG) pipeline built on highly performant, industry-leading NIM microservices, for faster PDF data extraction and more accurate information retrieval.

llama-3_3-nemotron-super-49b-v1•llama-3_2-nv-embedqa-1b-v2•llama-3_2-nv-rerankqa-1b-v2•nemoretriever-page-elements-v2•nemoretriever-table-structure-v1•nemoretriever-graphic-elements-v1•paddleocr
blueprintnimnemo retrieverretrieval-augmented generationenterpriselaunchablenvidia ai
View GitHubDeploy on Cloud