NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

onboarding

  • Set Up Local Network Access
  • Open WebUI with Ollama

data science

  • Single-cell RNA Sequencing
  • Portfolio Optimization
  • CUDA-X Data Science
  • Text to Knowledge Graph
  • Optimized JAX

tools

  • DGX Dashboard
  • Comfy UI
  • RAG Application in AI Workbench
  • Set up Tailscale on Your Spark
  • VS Code
  • Connect Three DGX Spark in a Ring Topology
  • Connect Multiple DGX Spark through a Switch

fine tuning

  • FLUX.1 Dreambooth LoRA Fine-tuning
  • LLaMA Factory
  • Fine-tune with NeMo
  • Fine-tune with Pytorch
  • Unsloth on DGX Spark

use case

  • NemoClaw with Nemotron 3 Super and Telegram on DGX Spark
  • cuTile Kernels
  • CLI Coding Agent
  • Live VLM WebUI
  • Install and Use Isaac Sim and Isaac Lab
  • Vibe Coding in VS Code
  • Build and Deploy a Multi-Agent Chatbot
  • Connect Two Sparks
  • NCCL for Two Sparks
  • Build a Video Search and Summarization (VSS) Agent
  • Spark & Reachy Photo Booth
  • Secure Long Running AI Agents with OpenShell on DGX Spark
  • OpenClaw 🦞

inference

  • LM Studio on DGX Spark
  • Speculative Decoding
  • Run models with llama.cpp on DGX Spark
  • Nemotron-3-Nano with llama.cpp
  • SGLang for Inference
  • TRT LLM for Inference
  • NVFP4 Quantization
  • Multi-modal Inference
  • NIM on Spark
  • vLLM for Inference

Multi-modal Inference

1 HR

Setup multi-modal inference with TensorRT

DGXSpark
View on GitHub
OverviewOverviewInstructionsInstructionsTroubleshootingTroubleshooting

Basic idea

Multi-modal inference combines different data types, such as text, images, and audio, within a single model pipeline to generate or interpret richer outputs.
Instead of processing one input type at a time, multi-modal systems have shared representations that text-to-image generation, image captioning, or vision-language reasoning.

On GPUs, this enables parallel processing across modalities for faster, higher-fidelity results for tasks that combine language and vision.

What you'll accomplish

You'll deploy GPU-accelerated multi-modal inference capabilities on NVIDIA Spark using TensorRT to run Flux.1 and SDXL diffusion models with optimized performance across multiple precision formats (FP16, FP8, FP4).

What to know before starting

  • Working with Docker containers and GPU passthrough
  • Using TensorRT for model optimization
  • Hugging Face model hub authentication and downloads
  • Command-line tools for GPU workloads
  • Basic understanding of diffusion models and image generation

Prerequisites

  • NVIDIA Spark device with Blackwell GPU architecture
  • Docker installed and accessible to current user
  • NVIDIA Container Runtime configured
  • Hugging Face account with access to Black Forest Labs models FLUX.1-dev and FLUX.1-dev-onnx on Hugging Face
  • Hugging Face token configured with access to both FLUX.1 model repositories
  • At least 48GB VRAM available for FP16 Flux.1 Schnell operations
  • Verify GPU access: nvidia-smi
  • Check Docker GPU integration: docker run --rm --gpus all nvcr.io/nvidia/pytorch:25.11-py3 nvidia-smi

Ancillary files

All necessary files can be found in the TensorRT repository here on GitHub

  • requirements.txt - Python dependencies for TensorRT demo environment
  • demo_txt2img_flux.py - Flux.1 model inference script
  • demo_txt2img_xl.py - SDXL model inference script
  • TensorRT repository - Contains diffusion demo code and optimization tools

Time & risk

  • Duration: 45-90 minutes depending on model downloads and optimization steps

  • Risks:

    • Large model downloads may timeout
    • High VRAM requirements may cause OOM errors
    • Quantized models may show quality degradation
  • Rollback:

    • Remove downloaded models from HuggingFace cache
    • Then exit the container environment
  • Last Updated: 12/22/2025
    • Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
    • Add HuggingFace token setup instructions for model access
    • Add docker container permission setup instructioins

Resources

  • DGX Spark Documentation
  • DGX Spark Forum
  • DGX Spark User Performance Guide
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation