Cutting-edge open multimodal model exceling in high-quality reasoning from images.
Vision foundation model capable of performing diverse computer vision and vision language tasks.
NvClip generates vector embeddings for the given image or text.
Cutting-edge open multimodal model exceling in high-quality reasoning from images.
Groundbreaking multimodal model designed to understand and reason about visual elements in images.
One-shot visual language understanding model that translates images of plots into tables.
Multi-modal model for a wide range of tasks, including image understanding and language generation.