Copyright © 2026 NVIDIA Corporation
Multi-modal vision-language model that understands text/images and generates informative responses