NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIALaunch from Hugging FaceBeta

Filters (1)

  • API Endpoint
    23
  • Download Available
    21
  • Code Generation
    11
  • Image-to-Text
    3
  • Synthetic Data Generation
    1
  • Digital Twin
    1
  • Retrieval Augmented Generation
    0
  • NVIDIA
    7
  • Mistral AI
    6
  • Qwen
    5
  • Microsoft
    4
  • Meta
    3
  • chat
  • 44 models
    Qwen

    qwen3.5-122b-a10b

    122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
    tool calling
    32.38K
    1d
    Qwen

    qwen3.5-397b-a17b

    Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
    MoE
    5.42M
    2w
    Z.ai

    glm5

    GLM-5 744B MoE enables efficient reasoning for complex systems and long-horizon agentic tasks.
    MoE
    6.37M
    3w
    Z.ai

    glm4.7

    GLM-4.7 is a multilingual agentic coding partner with stronger reasoning, tool use, and UI skills.
    Tool Calling
    17.69M
    1mo
    DeepSeek AI

    deepseek-v3.2

    State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.
    long context
    14.82M
    2mo
    Mistral AI

    mistral-large-3-675b-instruct-2512

    A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.
    language generation
    5.24M
    3mo
    Mistral AI

    ministral-14b-instruct-2512

    A general purpose VLM ideal for chat and instruction based use cases
    language generation
    3.82M
    3mo
    NVIDIA

    nemotron-nano-12b-v2-vl

    Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
    language generation
    1.59M
    4mo
    DeepSeek AI

    deepseek-v3.1-terminus

    DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.
    tool calling
    12.1M
    5mo
    ByteDance

    seed-oss-36b-instruct

    ByteDance open-source LLM with long-context, reasoning, and agentic intelligence.
    thinking budget
    2.8M
    6mo
    NVIDIA

    nvidia-nemotron-nano-9b-v2

    High‑efficiency LLM with hybrid Transformer‑Mamba design, excelling in reasoning and agentic tasks.
    thinking budget
    650K
    6mo
    OpenAI

    gpt-oss-20b

    Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math
    text-to-text
    7.06M
    7mo
    OpenAI

    gpt-oss-120b

    Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.
    text-to-text
    34.11M
    7mo
    Opengpt-x

    teuken-7b-instruct-commercial-v0.4

    Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.
    sovereign ai
    426K
    7mo
    NVIDIA

    llama-3.1-nemotron-nano-vl-8b-v1

    Multi-modal vision-language model that understands text/img and creates informative responses
    doc intelligence
    6.69M
    8mo
    Gotocompany

    gemma-2-9b-cpt-sahabatai-instruct

    SOTA LLM pre-trained for instruction following and proficiency in Indonesian language and its dialects.
    Sovereign AI
    426K
    8mo
    Google

    gemma-3-1b-it

    A lightweight, multilingual, advanced SLM text model for edge computing, resource constraint applications
    Translation
    4.72K463K
    9mo
    Microsoft

    phi-4-mini-instruct

    Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
    chat
    2.12M
    9mo
    Igenius

    colosseum_355b_instruct_16k

    NVIDIA DGX Cloud trained multilingual LLM designed for mission critical use cases in regulated industries including financial services, government, heavy industry
    Heavy industry
    85.81K
    9mo
    Tiiuae

    falcon3-7b-instruct

    Instruction tuned LLM achieving SoTA performance on reasoning, math and general knowledge capabilities
    Coding
    489K
    9mo
    Igenius

    italia_10b_instruct_16k

    Multilingual LLM with emphasis on European languages supporting regulated use cases including financial services, government, heavy industry
    Heavy industry
    423K
    9mo
    Qwen

    qwen2.5-7b-instruct

    Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.
    Chinese Language Generation
    861K
    9mo
    Qwen

    qwen2.5-coder-32b-instruct

    Advanced LLM for code generation, reasoning, and fixing across popular programming languages.
    code completion
    4.75M
    8mo
    NVIDIA

    usdcode

    State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code.
    OpenUSD
    326K
    8mo
    Items per page
    of 2 pages