Skip to main content

Your Privacy Choices

Copyright © 2026 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (2)

Free Endpoint

3

Free Endpoint

3

Partner Endpoint

3

Partner Endpoint

3

Download Available

3

Use Case

Image-to-Text

0

Speech-to-Text

0

Image Generation

0

Text-to-Embedding

0

Text Translation

0

Inference Providers

Deepinfra

3

Deepinfra

3

GMI Cloud

3

Together AI

2

Bitdeer

2

Publisher

DeepSeek AI

2

DeepSeek AI

2

Qwen

1

NVIDIA

0

Meta

0

NIM Container GPUs

B200

1

H100 80GB HBM3

1

H200

1

B200

1

H100 80GB HBM3

1

Labels (2)

Agentic

MoE

3 models

Sort By

DownloadableFree Endpoint

deepseek-v4-flash

DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.

Items per page

of 1 pages

15.16M

1mo

DownloadableFree Endpoint

deepseek-v4-pro

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

7.5M

1mo

DownloadableFree Endpoint

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

13.15M

3mo