Skip to main content
NVIDIA
Explore
Models
Skills
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

4 results for

Filters

  • Free Endpoint
    2
  • Partner Endpoint
    2
  • Download Available
    2
  • Image-to-Text
    2
  • Deepinfra
    2
  • Together AI
    2
  • OpenRouter
    1
  • Meta
    2
  • NVIDIA
    2
  • AI Engineer
    2
  • Data Scientist
    2
  • Developer
    2
  • Ml Engineer
    2
  • AI And Machine Learning
    2
  • TAO Toolkit
    2
  • Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them to pixel-space bounding boxes via a VLM. Use when the user wants to ground captions to bboxes, generate phrase-grounded annotations, auto-label
    Skill
    Developer
    404
    10d
    Meta
    DownloadableFree Endpoint

    llama-3.2-11b-vision-instruct

    Cutting-edge vision-language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    Items per page
    of 1 pages
    1.67M
    1y
    Meta
    DownloadableFree Endpoint

    llama-3.2-90b-vision-instruct

    Cutting-edge vision-Language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    2.69M
    1y

    Four-step image referring-expression pipeline: turns images plus KITTI bounding-box labels into region descriptions, scene captions, grounded referring expressions, and (optionally) verified expressions via VLM distillation. Use when the user wants to gen
    Skill
    Developer
    404
    10d