Explore
Models
Blueprints
Docs
Forums
Login
microsoft
/
kosmos-2
PREVIEW
Groundbreaking multimodal model designed to understand and reason about visual elements in images.
image understanding
multimodal
visual question answering
computer vision
cv
image
image-to-text
video
vlm
Build
Experience
Model Card
API Reference
Input
Try
View Examples
Upload Image
*
image.png
jpg,
jpeg,
png
Upload
Input
View Parameters
Reset
Run
Output
Preview
JSON
A young family
is sitting in the grass with
their dog
.