Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

20 results for

Filters (1)

Download Available

API Endpoint

Enterprise

Launchable

Use Case

Image-to-Text

Medical Imaging

Object Detection

Image Generation

Optical Character Recognition

Publisher

NVIDIA

ai-generated-image-detection

Robust image classification model for detecting and managing AI-generated content.

Model

image classification

9.76K

11mo

University at Buffalo

cached

Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

Model

nemo retriever

738

NVIDIA

cosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

Model

VLM

DGX Spark

1 HR

FLUX.1 Dreambooth LoRA Fine-tuning

Fine-tune FLUX.1-dev 12B model using Dreambooth LoRA for custom image generation

Playbook

Image Generation

5mo

NVIDIA

llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

Model

doc intelligence

6.69M

8mo

llama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Model

Image-Text Retrieval

617K

9mo

llama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Model

Image-Text Retrieval

568K

9mo

NVIDIA

maisi

MAISI is a pre-trained volumetric (3D) CT Latent Diffusion Generative Model.

Model

Image Generation

773

11mo

Mistral AI

mistral-small-3.1-24b-instruct-2503

Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses

Model

language generation

1.27M

9mo

NVIDIA

nvclip

NV-CLIP is a multimodal embeddings model for image and text.

Model

Computer vision

33.75K

8mo

NVIDIA

ocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Model

Optical Character Recognition

798

Google

paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

Model

image

327K

Qwen

qwen3.5-122b-a10b

122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.

Model

tool calling

32.38K

Qwen

qwen3.5-397b-a17b

Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.

Model

MoE

5.42M

NVIDIA

retail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Model

Object Detection

794

NVIDIA

Enterprise

Synthetic Manipulation Motion Generation for Robotics

Generate exponentially large amounts of synthetic motion trajectories for robot manipulation from just a few human demonstrations.

Blueprint

NVIDIA Omniverse

Microsoft

TRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

Model

text-to-3d

5.43K

6mo

DGX Spark

1 HR

Vision-Language Model Fine-tuning

Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3

Playbook

DGX

5mo

NVIDIA

vista-3d

VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.

Model

Interactive Annotation

778

10mo

NVIDIA

visual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

Model

image

615

Items per page

of 1 pages