Explore
Models
Blueprints
Docs
Forums
Login
microsoft
/
florence-2
PREVIEW
Vision foundation model capable of performing diverse computer vision and vision language tasks.
language generation
multimodal
vision assistant
visual question answering
computer vision
cv
image
image classification
image-to-text
object detection
text-to-image
vlm
Build
Experience
Model Card
API Reference
Input
Try
View Examples
Upload Image
*
image.png
jpg,
jpeg,
png
Upload
Tasks
*
Reset
Run
Output