Explore

Models

Skills

Blueprints

GPUs

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Optimized by NVIDIA Launch from Hugging FaceBeta

Filters (1)

Free Endpoint9

Partner Endpoint8

Download Available6

Use Case

Retrieval Augmented Generation1Text-to-Embedding1Drug Discovery0Speech-to-Text0Image Generation0

Inference Providers

OpenRouter7Deepinfra6Together AI4GMI Cloud4Bitdeer4

Publisher

DeepSeek AI2NVIDIA1Google1Mistral AI1Qwen1

NIM Container GPUs

A100 SXM4 80GB0H100 80GB HBM30L40S0A10G0B2000

Labels (1)

agentic

9 models

Sort By

NVIDIA

Free Endpoint

nemotron-3-embed-1b

1B embedding model for semantic search, retrieval, and RAG applications.

Nemotron Retriever

Agentic Retrieval
Code Retrieval
Text-to-Embedding
Retrieval Augmented Generation

Last updated on July 16, 2026

Items per page

of 1 pages

Poolside

Free Endpoint

laguna-xs-2.1

Efficient 33B MoE for local, long-horizon agentic coding and terminal tasks

Agentic AI

Coding
Reasoning
Tool Use

Last updated on July 15, 2026

Z.ai

DownloadableFree Endpoint

glm-5.2

GLM-5.2 is a flagship LLM for agentic workflows, coding, and long-horizon reasoning tasks.

Agentic AI

Coding
Reasoning
Tool Use

8M API calls in the last 30 days

Last updated on July 3, 2026

Mistral AI

DownloadableFree Endpoint

mistral-medium-3.5-128b

A high performing model for text generation, coding and agentic use cases

coding

reasoning
text
agentic

5M API calls in the last 30 days

Last updated on April 29, 2026

DeepSeek AI

DownloadableFree Endpoint

deepseek-v4-flash

DeepSeek V4 Flash is a 284B MoE model with 1M-token context optimized for fast coding and agents.

coding

MoE
fast
agentic

17M API calls in the last 30 days

Last updated on April 24, 2026

DeepSeek AI

DownloadableFree Endpoint

deepseek-v4-pro

DeepSeek V4 scales to 1M-token context windows with efficient MoE architecture for coding tasks.

reasoning
coding
agentic

7M API calls in the last 30 days

Last updated on April 24, 2026

Google

DownloadableFree Endpoint

gemma-4-31b-it

Dense 31B model delivering frontier reasoning for coding, agentic workflows, and fine-tuning.

reasoning

coding
text-to-text
agentic

6M API calls in the last 30 days

Last updated on April 2, 2026

Stepfun-ai

Deprecation in 2dFree Endpoint

step-3.5-flash

200B open-source reasoning engine with sparse MoE powering frontier agentic AI.

Agentic

Coding
Reasoning

10M API calls in the last 30 days

Last updated on February 2, 2026

Qwen

DownloadableFree Endpoint

qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

text-generation

agentic

25M API calls in the last 30 days

Last updated on September 22, 2025