NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIALaunch from Hugging FaceBeta
Publisher
Use Case
NIM Type
Sorting by Most Recent

nvidianemotron-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

nvidianemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

microsoftphi-4-mini-flash-reasoning

Lightweight reasoning model for applications in latency bound, memory/compute constrained environments

nvidiallama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

metallama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

metallama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

googlegemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

nvidianemoretriever-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

microsoftphi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

nvidiacosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

hivedeepfake-image-detection

Advanced AI model detects faces and identifies deep fake images.

metallama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

metallama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

hiveai-generated-image-detection

Robust image classification model for detecting and managing AI-generated content.

microsoftphi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

microsoftphi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses