microsoft/kosmos-2

PREVIEW

Groundbreaking multimodal model designed to understand and reason about visual elements in images.

Input

Output

model
A young family is sitting in the grass with their dog.