NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIALaunch from Hugging FaceBeta

Filters

  • Free Endpoint
    9
  • Partner Endpoint
    17
  • Download Available
    18
  • Code Generation
    5
  • Image-to-Text
    4
  • Synthetic Data Generation
    1
  • Digital Twin
    1
  • Retrieval Augmented Generation
    0
  • Deep Infra
    14
  • Together AI
    12
  • Bitdeer AI
    6
  • GMI Cloud
    5
  • CoreWeave
    4
  • NVIDIA
    10
  • Mistral AI
    5
  • Qwen
    3
  • Meta
    2
  • OpenAI
    2
  • A100 SXM4 80GB
    0
  • B200
    0
  • GB200
    0
  • GH200 144G HBM3e
    0
  • H100 80GB HBM3
    0
  • 27 models
    NVIDIA
    Downloadable

    nemotron-3-nano-omni-30b-a3b-reasoning

    Nemotron 3 Nano Omni is an omni-modal reasoning model that understands images, video, speech, text.
    Image-to-Text
    Items per page
    of 2 pages
    2.37M
    1w
    Z.ai
    Downloadable

    glm-5.1

    GLM-5.1 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.
    Agentic AI
    8.91M
    2w
    Z.ai
    Deprecation in 10dFree Endpoint

    glm-4.7

    GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.
    Tool Calling
    8.95M
    2w
    NVIDIA
    Free Endpoint

    nemotron-3-content-safety

    Multilingual, multimodal model for detecting unsafe and toxic content.
    llm safety
    45.87K
    2w
    NVIDIA
    Downloadable

    ising-calibration-1-35b-a3b

    Open VLM for quantum computer calibration chart understanding across a range of qubit modalities.
    Quantum
    182K
    3w
    Qwen
    Downloadable

    qwen3.5-122b-a10b

    122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
    tool calling
    8.78M
    2mo
    Qwen
    Downloadable

    qwen3.5-397b-a17b

    Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
    MoE
    9.14M
    2mo
    Mistral AI
    Free Endpoint

    mistral-large-3-675b-instruct-2512

    A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
    language generation
    3.98M
    5mo
    Mistral AI
    Downloadable

    ministral-14b-instruct-2512

    A general purpose VLM ideal for chat and instruction based use cases
    language generation
    2.01M
    5mo
    NVIDIA
    Free Endpoint

    llama-3.1-nemotron-safety-guard-8b-v3

    Leading multilingual content safety model for enhancing the safety and moderation capabilities of LLMs
    content moderation
    113K
    6mo
    ByteDance
    Free Endpoint

    seed-oss-36b-instruct

    ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
    thinking budget
    1.13M
    8mo
    NVIDIA
    Downloadable

    nvidia-nemotron-nano-9b-v2

    High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.
    thinking budget
    436K
    8mo
    OpenAI
    Downloadable

    gpt-oss-20b

    Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
    reasoning
    11.37M
    9mo
    OpenAI
    Downloadable

    gpt-oss-120b

    Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
    reasoning
    28.23M
    9mo
    Meta
    Free Endpoint

    llama-guard-4-12b

    Multi-modal model to classify safety for input prompts as well output responses.
    LLM Multimodal Safety
    191K
    10mo
    Microsoft
    Downloadable

    phi-4-mini-instruct

    Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
    Chat
    581K
    11mo
    NVIDIA
    Downloadable

    llama-3.1-nemoguard-8b-topic-control

    Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.
    nemo guardrails
    128K
    1y
    NVIDIA
    Downloadable

    nemoguard-jailbreak-detect

    Industry leading jailbreak classification model for protection from adversarial attempts
    nemo guardrails
    35.78K
    10mo
    NVIDIA
    Downloadable

    llama-3.1-nemoguard-8b-content-safety

    Leading content safety model for enhancing the safety and moderation capabilities of LLMs
    nemo guardrails
    132K
    1y
    Qwen
    Deprecation in 8dDownloadable

    qwen2.5-coder-32b-instruct

    Advanced LLM for code generation, reasoning, and fixing across popular programming languages.
    code completion
    2.74M
    10mo
    NVIDIA
    Free Endpoint

    usdcode

    State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code.
    Digital Twin
    10mo
    Meta
    Downloadable

    llama-3.3-70b-instruct

    Advanced LLM for reasoning, math, general knowledge, and function calling
    Instruction following
    9.51M
    10mo
    NVIDIA
    Free Endpoint

    nemotron-mini-4b-instruct

    Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling
    Chat
    558K
    1y
    Mistral AI
    Downloadable

    mistral-7b-instruct-v0.3

    This LLM follows instructions, completes requests, and generates creative text.
    Chat
    494K
    10mo