NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data-science

  • Optimized JAX
  • Text to Knowledge Graph

tools

  • Comfy UI
  • DGX Dashboard
  • VS Code
  • RAG application in AI Workbench
  • Set up Tailscale on your Spark

fine-tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine tune with Pytorch
  • Unsloth on DGX Spark
  • Vision-Language Model Fine-tuning

use-case

  • Build and Deploy a Multi-Agent Chatbot
  • NCCL for Two Sparks
  • Connect Two Sparks
  • Video Search and Summarization

inference

  • Multi-modal Inference
  • NIM on Spark
  • NVFP4 Quantization
  • Speculative Decoding
  • TRT LLM for Inference
  • Install and Use vLLM for Inference

RAG application in AI Workbench

30 MIN

Install and use AI Workbench to clone and run a reproducible RAG application

View on GitHub

Basic idea

This walkthrough demonstrates how to set up and run an agentic retrieval-augmented generation (RAG) project using NVIDIA AI Workbench. You'll use AI Workbench to clone and run a pre-built agentic RAG application that intelligently routes queries, evaluates responses for relevancy and hallucination, and iterates through evaluation and generation cycles. The project uses a Gradio web interface and can work with both NVIDIA-hosted API endpoints or self-hosted models.

What you'll accomplish

You'll have a fully functional agentic RAG application running in NVIDIA AI Workbench with a web interface where you can submit queries and receive intelligent responses. The system will demonstrate advanced RAG capabilities including query routing, response evaluation, and iterative refinement, giving you hands-on experience with both AI Workbench's development environment and sophisticated RAG architectures.

What to know before starting

  • Basic familiarity with retrieval-augmented generation (RAG) concepts
  • Understanding of API keys and how to generate them
  • Comfort working with web applications and browser interfaces
  • Basic understanding of containerized development environments

Prerequisites

  • DGX Spark system with NVIDIA AI Workbench installed or ready to install
  • Free NVIDIA API key: Generate at NGC API Keys
  • Free Tavily API key: Generate at Tavily
  • Internet connection for cloning repositories and accessing APIs
  • Web browser for accessing the Gradio interface

Verification commands

  • Verify the NVIDIA AI Workbench application exists on your DGX Spark system
  • Verify your API keys are valid and up-to-date

Time & risk

  • Estimated time: 30-45 minutes (including AI Workbench installation if needed)
  • Risk level: Low - Uses pre-built containers and established APIs
  • Rollback: Simply delete the cloned project from AI Workbench to remove all components. No system changes are made outside the AI Workbench environment.

Resources

  • DGX Spark Documentation
  • DGX Spark Forum