Explore
NIM
Docs
Forums
Login
microsoft
/
florence-2
PREVIEW
Vision foundation model capable of performing diverse computer vision and vision language tasks.
Language Generation
Multimodal
Vision Assistant
Visual Question Answering
computer vision
Build
Experience
Model Card
API Reference
Input
Try
View Examples
Upload Image
*
image.png
jpg,
jpeg,
png
Upload
Tasks
*
Reset
Run
Output