Live VLM WebUI
Real-time Vision Language Model interaction with webcam streaming
Basic idea
Live VLM WebUI is a universal web interface for real-time Vision Language Model (VLM) interaction and benchmarking. It enables you to stream your webcam directly to any VLM backend (Ollama, vLLM, SGLang, or cloud APIs) and receive live AI-powered analysis. This tool is perfect for testing VLM models, benchmarking performance across different hardware configurations, and exploring vision AI capabilities.
The interface provides WebRTC-based video streaming, integrated GPU monitoring, customizable prompts, and support for multiple VLM backends. It works seamlessly with the powerful Blackwell GPU in your DGX Spark, enabling real-time vision inference at impressive speeds.
What you'll accomplish
You'll set up a complete real-time vision AI testing environment on your DGX Spark that allows you to:
- Stream webcam video and get instant VLM analysis through a web browser
- Test and compare different vision language models (Gemma 3, Llama Vision, Qwen VL, etc.)
- Monitor GPU and system performance in real-time while models process video frames
- Customize prompts for various use cases (object detection, scene description, OCR, safety monitoring)
- Access the interface from any device on your network with a web browser
What to know before starting
- Basic familiarity with Linux command line and terminal operations
- Basic knowledge of Python package installation with pip
- Basic knowledge of REST APIs and how services communicate via HTTP
- Familiarity with web browsers and network access (IP addresses, ports)
- Optional: Knowledge of Vision Language Models and their capabilities (helpful but not required)
Prerequisites
Hardware Requirements:
- Webcam (laptop built-in camera, USB camera, or remote browser with camera)
- At least 10GB available storage space for Python packages and model downloads
Software Requirements:
- DGX Spark with DGX OS installed
- Python 3.10 or later (verify with
python3 --version) - pip package manager (verify with
pip --version) - Network access to download Python packages from PyPI
- A VLM backend running locally (Ollama being easiest) or cloud API access
- Web browser access to
https://<SPARK_IP>:8090
VLM Backend Options:
- Ollama (recommended for beginners) - Easy to install and use
- vLLM - Higher performance for production workloads
- SGLang - Alternative high-performance backend
- NIM - NVIDIA Inference Microservices for optimized performance
- Cloud APIs - NVIDIA API Catalog, OpenAI, or other OpenAI-compatible APIs
Ancillary files
All source code and documentation can be found at the Live VLM WebUI GitHub repository.
The package will be installed directly via pip, so no additional files are required for basic installation.
Time & risk
- Estimated time: 20-30 minutes (including Ollama installation and model download)
- 5 minutes to install Live VLM WebUI via pip
- 10-15 minutes to install Ollama and download a model (varies by model size)
- 5 minutes to configure and test
- Risk level: Low
- Python packages installed in user space, isolated from system
- No system-level changes required
- Port 8090 must be accessible for web interface functionality
- Self-signed SSL certificate requires browser security exception
- Rollback: Uninstall the Python package with
pip uninstall live-vlm-webui. Ollama can be uninstalled with standard package removal. No persistent changes to DGX Spark configuration. - Last Updated: 01/02/2026
- First Publication