NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

5 results for

Filters

  • API Endpoint
    3
  • Download Available
    2
  • Image-to-Text
    3
  • Google
    2
  • Meta
    2
  • Microsoft
    1
  • Meta

    llama-3.2-11b-vision-instruct

    Cutting-edge vision-language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    711K
    9mo
    Meta

    llama-3.2-90b-vision-instruct

    Cutting-edge vision-Language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    582K
    9mo
    Google

    gemma-3n-e2b-it

    An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
    Model
    language generation
    632K
    7mo
    Google

    gemma-3n-e4b-it

    An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
    Model
    language generation
    695K
    7mo
    Microsoft

    phi-4-multimodal-instruct

    Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
    Model
    Speech Recognition
    462K
    9mo
    Items per page
    of 1 pages