Explore
Models
Blueprints
GPUs
?
Login
microsoft
kosmos-2
PREVIEW
Groundbreaking multimodal model designed to understand and reason about visual elements in images.
image understanding
multimodal
visual question answering
computer vision
cv
image
image-to-text
video
vlm
Get API Key
Experience
Model Card
API Reference
Sorry, your browser does not support inline SVG.
Input
Try
View Examples
Upload Image
*
image.png
jpg,
jpeg,
png
Upload
Input
View Parameters
Reset
Run
Output
Preview
JSON
A young family
is sitting in the grass with
their dog
.