NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Models

Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Publisher
Use Case
NIM Type
Sorting by Most Recent

microsoftTRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

black-forest-labsFLUX.1-Kontext-dev

FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.

black-forest-labsFLUX.1-schnell

FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds

black-forest-labsFLUX.1-dev

FLUX.1 is a state-of-the-art suite of image generation models

deepseek-aideepseek-r1-distill-llama-8b

Distilled version of Llama 3.1 8B using reasoning data generated by DeepSeek R1 for enhanced performance.

nvidianemoretriever-table-structure-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

nvidianemoretriever-graphic-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

nvidianemoretriever-page-elements-v2

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nvidiallama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nvidianv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

baidupaddleocr

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

nvidiastudiovoice

Enhance speech by correcting common audio degradations to create studio quality speech output.

nvidiaparakeet-ctc-0.6b-asr

State-of-the-art accuracy and speed for English transcriptions.

metallama-3.1-8b-instruct

Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.

nv-mistralaimistral-nemo-12b-instruct

Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU.

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.