Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
nvidia
vila
Multi-modal vision-language model that understands text/img/video and creates informative responses
VLM
Vision language model
image caption
image to text
Get API Key
This NIM has been deprecated