NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

20 results for

Filters

  • API Endpoint
    12
  • Download Available
    6
  • Launchable
    3
  • Enterprise
    2
  • Image-to-Text
    9
  • Retrieval Augmented Generation
    2
  • Text-to-Embedding
    2
  • Image Generation
    1
  • Text-to-Image
    1
  • NVIDIA
    5
  • Mistral AI
    4
  • Meta
    3
  • Microsoft
    2
  • Cyborg
    1
  • NVIDIA AI
    2
  • Microsoft

    phi-4-multimodal-instruct

    Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
    Model
    Speech Recognition
    385K
    9mo
    Minimaxai

    minimax-m2.1

    MiniMax M2.1 excels in multi-language coding, app/web dev, office AI, and agent integration
    Model
    Agentic
    8.1M
    1mo
    Moonshotai

    kimi-k2.5

    1T multimodal MoE for high‑capacity video and image understanding with efficient inference.
    Model
    Multimodal
    19.52M
    1mo
    Mistral AI

    mistral-medium-3-instruct

    Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.
    Model
    language generation
    3.69M
    7mo
    Mistral AI

    mistral-small-3.1-24b-instruct-2503

    Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses
    Model
    language generation
    1.2M
    9mo
    Meta

    llama-guard-4-12b

    Multi-modal model to classify safety for input prompts as well output responses.
    Model
    LLM Multimodal Safety
    369K
    8mo
    Mistral AI

    ministral-14b-instruct-2512

    A general purpose VLM ideal for chat and instruction based use cases
    Model
    language generation
    3.6M
    3mo
    Mistral AI

    mistral-large-3-675b-instruct-2512

    A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
    Model
    language generation
    4.89M
    3mo
    NVIDIA

    nvclip

    NV-CLIP is a multimodal embeddings model for image and text.
    Model
    Computer vision
    37.87K
    8mo
    Meta

    llama-4-scout-17b-16e-instruct

    A multimodal, multilingual 16 MoE model with 17B parameters.
    Model
    language generation
    295K
    7mo
    Cyborg
    Launchable

    Cyborg Enterprise RAG

    Securely extract, embed, and index multimodal data with encryption in-use for fast, accurate semantic search.
    Blueprint
    NIM
    2w
    Black-forest-labs

    FLUX.1-Kontext-dev

    FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
    Model
    Image Generation
    3.97K
    6mo
    Google

    gemma-3-27b-it

    Cutting-edge open multimodal model exceling in high-quality reasoning from images.
    Model
    Vision Assistant
    5.3M
    9mo
    NVIDIA

    llama-3.2-nemoretriever-1b-vlm-embed-v1

    Multimodal question-answer retrieval representing user queries as text and documents as images.
    Model
    nemo retriever
    276K
    8mo
    Meta

    llama-4-maverick-17b-128e-instruct

    A general purpose multimodal, multilingual 128 MoE model with 17B parameters.
    Model
    language generation
    2.59M
    7mo
    NVIDIA

    llama-nemotron-embed-vl-1b-v2

    Multimodal question-answer retrieval representing user queries as text and documents as images.
    Model
    nemo retriever
    591K
    3w
    Microsoft

    phi-3.5-vision-instruct

    Cutting-edge open multimodal model exceling in high-quality reasoning from images.
    Model
    Vision Assistant
    451K
    1y
    Qwen

    qwen3.5-122b-a10b

    122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
    Model
    tool calling
    Today
    NVIDIA
    LaunchableEnterprise

    Build an AI Agent for Enterprise Research

    Build a custom enterprise research assistant powered by state-of-the-art models that process and synthesize multimodal data, enabling reasoning, planning, and refinement to generate comprehensive reports.
    Blueprint
    NIM
    2w
    NVIDIA
    LaunchableEnterprise

    Build an Enterprise RAG Pipeline Blueprint

    Power fast, accurate semantic search across multimodal enterprise data with NVIDIA’s RAG Blueprint—built on NeMo Retriever and Nemotron models—to connect your agents to trusted, authoritative sources of knowledge.
    Blueprint
    NIM
    2w
    Items per page
    of 1 pages