First, install Ollama to serve Vision Language Models. Ollama is one of the easiest options to run/serve models locally on your DGX Spark.
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
Ollama will automatically start as a system service and detect your Blackwell GPU.
Now download a vision language model. We recommend starting with gemma3:4b for quick testing:
# Download a lightweight model (recommended for testing)
ollama pull gemma3:4b
# Alternative models you can try:
# ollama pull llama3.2-vision:11b # Sometime better quality, slower
# ollama pull qwen2.5-vl:7b #
The model download may take 5-15 minutes depending on your network speed and model size.
Verify Ollama is working:
# Check if Ollama API is accessible
curl http://localhost:11434/v1/models
Expected output should show a JSON response listing your downloaded models.
Install Live VLM WebUI using pip:
pip install live-vlm-webui
The installation will download all required Python dependencies and install the live-vlm-webui command.
Now start the server:
# Launch the web server
live-vlm-webui
The server will:
The server will start and display output like:
Starting Live VLM WebUI...
Generating SSL certificates...
GPU detected: NVIDIA GB10 Blackwell
Access the WebUI at:
Local URL: https://localhost:8090
Network URL: https://<YOUR_SPARK_IP>:8090
Press Ctrl+C to stop the server
Live VLM WebUI supports several command-line options for customization:
# Specify a different port
live-vlm-webui --port 8091
# Use custom SSL certificates
live-vlm-webui --ssl-cert /path/to/cert.pem --ssl-key /path/to/key.pem
# Change default API endpoint
live-vlm-webui --api-base http://localhost:8000/v1
# Run in background (optional)
nohup live-vlm-webui > live-vlm.log 2>&1 &
Open your web browser and navigate to:
https://<YOUR_SPARK_IP>:8090
Replace <YOUR_SPARK_IP> with your DGX Spark's IP address. You can find it with:
hostname -I | awk '{print $1}'
Important: You must use https:// (not http://) because modern browsers require secure connections for webcam access.
Since the application uses a self-signed SSL certificate, your browser will show a security warning. This is expected and safe.
In Chrome/Edge:
In Firefox:
When prompted, allow the website to access your camera. The webcam stream should appear in the interface.
TIP
Remote Access Recommended: For the best experience, access the web interface from a laptop or PC on the same network. This provides better browser performance and built-in webcam access compared to accessing locally on the DGX Spark.
The interface auto-detects local VLM backends. Verify the configuration in the VLM API Configuration section on the left sidebar:
API Endpoint: Should show http://localhost:11434/v1 (Ollama)
Model Selection: Click the dropdown and select your downloaded model (e.g., gemma3:4b)
Optional Settings:
For the best performance on DGX Spark Blackwell GPU:
gemma3:4b gives 1-2s/frame, llama3.2-vision:11b gives slower speed.NOTE
DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
Click the green "Start Camera and Start VLM Analysis" button.
The interface will:
You should see:
With the Blackwell GPU in DGX Spark, you should see inference times of 1-2 seconds per frame for gemma3:4b and similar speeds for llama3.2-vision:11b.
The Prompt Editor at the bottom of the left sidebar allows you to customize what the VLM analyzes.
Quick Prompts - 8 presets ready to use:
Custom Prompts - Enter your own:
Try this for real-time CSV output (useful for downstream applications):
List all objects you can see in this image, separated by commas.
Do not include explanatory text. Output only the comma-separated list.
The VLM will immediately start using the new prompt for the next frame analysis. This enables real-time "prompt engineering" where you can iterate and refine prompts while watching live results.
Want to compare models? Download another model and switch:
# Download another model
ollama pull llama3.2-vision:11b
# The model will appear in the Model dropdown in the web interface
In the web interface:
Compare inference speed and quality between models on your DGX Spark's Blackwell GPU.
The bottom section shows real-time system metrics:
Use these metrics to:
When you're done, stop the server with Ctrl+C in the terminal where it's running.
To completely remove Live VLM WebUI:
pip uninstall live-vlm-webui
Your Ollama installation and downloaded models remain available for future use.
To remove Ollama as well (optional):
# Uninstall Ollama
sudo systemctl stop ollama
sudo rm /usr/local/bin/ollama
sudo rm -rf /usr/share/ollama
# Remove Ollama models (optional)
rm -rf ~/.ollama
Now that you have Live VLM WebUI running, explore these use cases:
Model Benchmarking:
Application Prototyping:
Cloud API Integration:
To use NVIDIA API Catalog or other cloud APIs:
In the VLM API Configuration section, change the API Base URL to:
https://integrate.api.nvidia.com/v1https://api.openai.com/v1Enter your API Key in the field that appears
Select your model from the dropdown (list is fetched from the API)
Advanced Configuration:
For more advanced usage, see the full documentation on GitHub.
For latest known issues, please review the DGX Spark User Guide and the Live VLM WebUI Troubleshooting Guide.