NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Publisher
Use Case
NIM Type
Sorting by Most Recent

nvidiaparakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.

nvidiasparsedrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

nvidianv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nvidiallama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nvidianv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

nv-mistralaimistral-nemo-12b-instruct

Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU.

nvidianv-rerankqa-mistral-4b-v3

Multilingual text reranking model.

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

nvidianv-embedqa-mistral-7b-v2

Multilingual text question-answering retrieval, transforming textual information into dense vector representations.

nvidiamaisi

MAISI is a pre-trained volumetric (3D) CT Latent Diffusion Generative Model.

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

nvidianv-embed-v1

Generates high-quality numerical embeddings from text inputs.

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses