Live VLM WebUI

20 MIN

Real-time Vision Language Model interaction with webcam streaming

Install Ollama as VLM Backend

First, install Ollama to serve Vision Language Models. Ollama is one of the easiest options to run/serve models locally on your DGX Spark.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

Ollama will automatically start as a system service and detect your Blackwell GPU.

Now download a vision language model. We recommend starting with gemma3:4b for quick testing:

# Download a lightweight model (recommended for testing)
ollama pull gemma3:4b

# Alternative models you can try:
# ollama pull llama3.2-vision:11b    # Sometime better quality, slower
# ollama pull qwen2.5-vl:7b          #

The model download may take 5-15 minutes depending on your network speed and model size.

Verify Ollama is working:

# Check if Ollama API is accessible
curl http://localhost:11434/v1/models

Expected output should show a JSON response listing your downloaded models.

Install Live VLM WebUI

Install Live VLM WebUI using pip:

pip install live-vlm-webui

The installation will download all required Python dependencies and install the live-vlm-webui command.

Now start the server:

# Launch the web server
live-vlm-webui

The server will:

  • Auto-generate SSL certificates for HTTPS (required for webcam access)
  • Start the WebRTC server on port 8090
  • Detect your Blackwell GPU automatically

The server will start and display output like:

Starting Live VLM WebUI...
Generating SSL certificates...
GPU detected: NVIDIA GB10 Blackwell

Access the WebUI at:
  Local URL:   https://localhost:8090
  Network URL: https://<YOUR_SPARK_IP>:8090

Press Ctrl+C to stop the server

Command Line Options

Live VLM WebUI supports several command-line options for customization:

# Specify a different port
live-vlm-webui --port 8091

# Use custom SSL certificates
live-vlm-webui --ssl-cert /path/to/cert.pem --ssl-key /path/to/key.pem

# Change default API endpoint
live-vlm-webui --api-base http://localhost:8000/v1

# Run in background (optional)
nohup live-vlm-webui > live-vlm.log 2>&1 &

Access the Web Interface

Open your web browser and navigate to:

https://<YOUR_SPARK_IP>:8090

Replace <YOUR_SPARK_IP> with your DGX Spark's IP address. You can find it with:

hostname -I | awk '{print $1}'

Important: You must use https:// (not http://) because modern browsers require secure connections for webcam access.

Accept the SSL Certificate

Since the application uses a self-signed SSL certificate, your browser will show a security warning. This is expected and safe.

In Chrome/Edge:

  1. Click "Advanced" button
  2. Click "Proceed to <YOUR_SPARK_IP> (unsafe)"

In Firefox:

  1. Click "Advanced..."
  2. Click "Accept the Risk and Continue"

Grant Camera Permissions

When prompted, allow the website to access your camera. The webcam stream should appear in the interface.

TIP

Remote Access Recommended: For the best experience, access the web interface from a laptop or PC on the same network. This provides better browser performance and built-in webcam access compared to accessing locally on the DGX Spark.

Configure VLM Settings

The interface auto-detects local VLM backends. Verify the configuration in the VLM API Configuration section on the left sidebar:

API Endpoint: Should show http://localhost:11434/v1 (Ollama)

Model Selection: Click the dropdown and select your downloaded model (e.g., gemma3:4b)

Optional Settings:

  • Max Tokens: Controls response length (default: 512, reduce to 100-200 for faster responses)
  • Frame Processing Interval: How many frames to skip between analyses (default: 30 frames, increase for slower pace)

Performance Optimization Tips

For the best performance on DGX Spark Blackwell GPU:

  • Model Selection: gemma3:4b gives 1-2s/frame, llama3.2-vision:11b gives slower speed.
  • Frame Interval: Set to 60 frames (2 seconds at 30 fps) or bigger for comfortable viewing
  • Max Tokens: Reduce to 100 for faster responses

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

Start Analyzing Video

Click the green "Start Camera and Start VLM Analysis" button.

The interface will:

  1. Start streaming your webcam via WebRTC
  2. Begin processing frames and sending them to the VLM
  3. Display AI analysis results in real-time
  4. Show GPU/CPU/RAM metrics at the bottom

You should see:

  • Live video feed on the right side (with mirror toggle)
  • VLM analysis results overlaid on video or in the info box
  • Performance metrics showing latency and frame count
  • GPU monitoring showing Blackwell GPU utilization and VRAM usage

With the Blackwell GPU in DGX Spark, you should see inference times of 1-2 seconds per frame for gemma3:4b and similar speeds for llama3.2-vision:11b.

Customize Prompts

The Prompt Editor at the bottom of the left sidebar allows you to customize what the VLM analyzes.

Quick Prompts - 8 presets ready to use:

  • Scene Description - "Describe what you see in this image in one sentence."
  • Object Detection - "List all objects you can see in this image, separated by commas."
  • Activity Recognition - "Describe the person's activity and what they are doing."
  • Safety Monitoring - "Are there any safety hazards visible? Answer with 'ALERT: description' or 'SAFE'."
  • OCR / Text Recognition - "Read and transcribe any text visible in the image."
  • And more...

Custom Prompts - Enter your own:

Try this for real-time CSV output (useful for downstream applications):

List all objects you can see in this image, separated by commas.
Do not include explanatory text. Output only the comma-separated list.

The VLM will immediately start using the new prompt for the next frame analysis. This enables real-time "prompt engineering" where you can iterate and refine prompts while watching live results.

Test Different Models (Optional)

Want to compare models? Download another model and switch:

# Download another model
ollama pull llama3.2-vision:11b

# The model will appear in the Model dropdown in the web interface

In the web interface:

  1. Stop VLM analysis (if running)
  2. Select the new model from the Model dropdown
  3. Start VLM analysis again

Compare inference speed and quality between models on your DGX Spark's Blackwell GPU.

Monitor Performance

The bottom section shows real-time system metrics:

  • GPU Usage - Blackwell GPU utilization percentage
  • VRAM Usage - GPU memory consumption
  • CPU Usage - System CPU utilization
  • System RAM - Memory usage

Use these metrics to:

  • Benchmark different models on the same hardware
  • Identify performance bottlenecks
  • Optimize settings for your use case

Cleanup

When you're done, stop the server with Ctrl+C in the terminal where it's running.

To completely remove Live VLM WebUI:

pip uninstall live-vlm-webui

Your Ollama installation and downloaded models remain available for future use.

To remove Ollama as well (optional):

# Uninstall Ollama
sudo systemctl stop ollama
sudo rm /usr/local/bin/ollama
sudo rm -rf /usr/share/ollama

# Remove Ollama models (optional)
rm -rf ~/.ollama

Next Steps

Now that you have Live VLM WebUI running, explore these use cases:

Model Benchmarking:

  • Test multiple models (Gemma 3, Llama Vision, Qwen VL) on your DGX Spark
  • Compare inference latency, accuracy, and GPU utilization
  • Evaluate structured output capabilities (JSON, CSV)

Application Prototyping:

  • Use the web interface as reference for building your own VLM applications
  • Integrate with ROS 2 for robotics vision
  • Connect to RTSP IP cameras for security monitoring (Beta feature)

Cloud API Integration:

  • Switch from local Ollama to cloud APIs (NVIDIA API Catalog, OpenAI)
  • Compare edge vs. cloud inference performance and costs
  • Test hybrid deployments

To use NVIDIA API Catalog or other cloud APIs:

  1. In the VLM API Configuration section, change the API Base URL to:

    • NVIDIA API Catalog: https://integrate.api.nvidia.com/v1
    • OpenAI: https://api.openai.com/v1
    • Other: Your custom endpoint
  2. Enter your API Key in the field that appears

  3. Select your model from the dropdown (list is fetched from the API)

Advanced Configuration:

  • Use vLLM, SGLang, or NIM backends for higher throughput
  • Set up NIM for optimized NVIDIA-specific performance
  • Customize the Python backend for your specific use case

For more advanced usage, see the full documentation on GitHub.

For latest known issues, please review the DGX Spark User Guide and the Live VLM WebUI Troubleshooting Guide.