nvidia / vila
PREVIEW

Multi-modal vision-language model that understands text/images and generates informative responses

Sorry, your browser does not support inline SVG.