NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIALaunch from Hugging FaceBeta

Filters

  • Free Endpoint
    27
  • Partner Endpoint
    27
  • Download Available
    29
  • Code Generation
    11
  • Image-to-Text
    4
  • Retrieval Augmented Generation
    3
  • Text-to-Embedding
    2
  • Object Detection
    1
  • Fireworks AI
    21
  • Deep Infra
    20
  • Together AI
    17
  • GMI Cloud
    8
  • Bitdeer AI
    7
  • NVIDIA
    17
  • Mistral AI
    6
  • Qwen
    5
  • Meta
    4
  • Microsoft
    4
  • Enterprise
    0
  • NVIDIA BioNemo
    0
  • 56 models
    NVIDIA
    Downloadable

    llama-nemotron-rerank-vl-1b-v2

    GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.
    nemo retriever
    243
    4d
    Qwen
    Downloadable

    qwen3.5-122b-a10b

    122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
    chat
    7.81M
    4w
    Qwen
    Downloadable

    qwen3.5-397b-a17b

    Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
    chat
    13.54M
    1mo
    Z.ai
    Downloadable

    glm-5

    GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.
    MoE
    36.94M
    1mo
    NVIDIA
    Downloadable

    llama-nemotron-embed-vl-1b-v2

    Multimodal question-answer retrieval representing user queries as text and documents as images.
    nemo retriever
    6.02M
    1mo
    Z.ai
    Free Endpoint

    glm-4.7

    GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.
    Tool Calling
    14.46M
    2mo
    DeepSeek AI
    Free Endpoint

    deepseek-v3.2

    State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.
    chat
    15.8M
    3mo
    Mistral AI
    Free Endpoint

    mistral-large-3-675b-instruct-2512

    A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
    chat
    6.07M
    4mo
    Mistral AI
    Downloadable

    ministral-14b-instruct-2512

    A general purpose VLM ideal for chat and instruction based use cases
    chat
    2.05M
    4mo
    NVIDIA
    Downloadable

    nemotron-nano-12b-v2-vl

    Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
    chat
    784K
    5mo
    NVIDIA
    Free Endpoint

    llama-3.1-nemotron-safety-guard-8b-v3

    Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
    content moderation
    248K
    5mo
    DeepSeek AI
    Free Endpoint

    deepseek-v3.1-terminus

    DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.
    chat
    13.17M
    5mo
    ByteDance
    Free Endpoint

    seed-oss-36b-instruct

    ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
    chat
    1.99M
    7mo
    NVIDIA
    Downloadable

    nvidia-nemotron-nano-9b-v2

    High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.
    chat
    386K
    7mo
    OpenAI
    Downloadable

    gpt-oss-20b

    Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
    reasoning
    8.3M
    8mo
    OpenAI
    Downloadable

    gpt-oss-120b

    Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
    reasoning
    45.29M
    8mo
    Opengpt-x
    Downloadable

    teuken-7b-instruct-commercial-v0.4

    Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.
    sovereign ai
    165K
    8mo
    Meta
    Free Endpoint

    llama-guard-4-12b

    Multi-modal model to classify safety for input prompts as well output responses.
    LLM Multimodal Safety
    167K
    9mo
    NVIDIA
    Downloadable

    llama-3.2-nemoretriever-1b-vlm-embed-v1

    Multimodal question-answer retrieval representing user queries as text and documents as images.
    nemo retriever
    202K
    9mo
    NVIDIA
    Downloadable

    llama-3.1-nemotron-nano-vl-8b-v1

    Multi-modal vision-language model that understands text/img and creates informative responses
    chat
    9.84M
    9mo
    Gotocompany
    Downloadable

    gemma-2-9b-cpt-sahabatai-instruct

    SOTA LLM pre-trained for instruction following and proficiency in Indonesian language and its dialects.
    chat
    159K
    9mo
    Google
    Downloadable

    gemma-3-1b-it

    A lightweight, multilingual, advanced SLM text model for edge computing, resource constraint applications
    chat
    3.61K366K
    10mo
    Microsoft
    Downloadable

    phi-4-mini-instruct

    Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
    chat
    925K
    10mo
    NVIDIA
    Downloadable

    llama-3.1-nemoguard-8b-topic-control

    Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
    nemo guardrails
    181K
    1y
    Items per page
    of 3 pages