NVIDIA
Explore Models Blueprints GPUs Docs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

microsoft

kosmos-2

PREVIEW

Groundbreaking multimodal model designed to understand and reason about visual elements in images.

image understandingmultimodalvisual question answeringcomputer visioncvimageimage-to-textvideovlm
Get API Key
API Reference
Accelerated by DGX Cloud

Input

GOVERNING TERMS: Your use of this API is governed by the NVIDIA API Trial Service Terms of Use; and the use of this model is governed by the NVIDIA AI Foundation Models Community License and MIT License.

Output

model
A young family is sitting in the grass with their dog.