Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

15 results for

Filters (1)

Free Endpoint

Partner Endpoint

Download Available

Enterprise Blueprint

Developer Example

Launchable

Use Case

Image-to-Text

Image Generation

Text-to-Image

Synthetic Data Generation

Medical Imaging

Inference Providers

Deep Infra

Bitdeer AI

GMI Cloud

Together AI

Vultr

Publisher

NVIDIA

Qwen

Image & Video Generation with ComfyUI

Generate images and videos with FLUX, Wan 2.1, HunyuanVideo, and Cosmos on DGX Station

Playbook

Image Generation

12d

Items per page

of 1 pages

DGX Spark

1 HR

FLUX.1 Dreambooth LoRA Fine-tuning

Fine-tune FLUX.1-dev 12B model using Dreambooth LoRA for custom image generation

Playbook

Image Generation

8mo

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Model

Image-Text Retrieval

2.8M

Qwen

DownloadableFree Endpoint

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

Model

MoE

12.46M

3mo

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Model

Image-Text Retrieval

1.71M

NVIDIA

Free Endpoint

cosmos3-nano

Generates physics-aware videos from text prompts or an image prompt for physical AI development.

Model

autonomous vehicles

1.58K

DGX Spark

1 HR

Vision-Language Model Fine-tuning

Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3

Playbook

DGX

8mo

Black-forest-labs

Downloadable

flux.2-klein-4b

FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed

Model

image editing

270K

2mo

Microsoft

Downloadable

TRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

Model

text-to-3d

3.99K

9mo

NVIDIA

DownloadableFree Endpoint

llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

Model

doc intelligence

9.93M

11mo

Mistral AI

DownloadableFree Endpoint

mistral-small-4-119b-2603

Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context

Model

code generation

16.19M

2mo

Google

Free Endpoint

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

10.29K

NVIDIA

Downloadable

vista-3d

VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.

Model

Interactive Annotation

803

Qwen

DownloadableFree Endpoint

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

Model

B200

10.34M

3mo

Robotics

Enterprise

Synthetic Manipulation Motion Generation for Robotics

Generate exponentially large amounts of synthetic motion trajectories for robot manipulation from just a few human demonstrations.

Blueprint

synthetic data

3mo