Try NVIDIA NIM APIs

⌘KCtrl+K

Your Privacy Choices

Contact

Explore

⌘KCtrl+K

Search Results

Searching for: av stack

Sort By

Publisher

Use Case

NIM Type

Blueprint Type

GPU Types

Launchable

Sorting by Last Updated

nvidia cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding Synthetic Data Generation autonomous vehicles industrial Physical AI vision language model reasoning robotics smart cities

deepseek-ai deepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

long context text-to-text chat reasoning

nvidia streampetr

StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.

autonomous vehicles bev AV Stack automotive

deepseek-ai deepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

tool calling chat advanced reasoning agentic

stockmark stockmark-2-100b-instruct

Japanese-specialized large-language-model for enterprises to read and understand complex business documents.

sovereign ai japanese stockmark chat large language model

qwen qwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

chat text-generation agentic

speakleash bielik-11b-v2.6-instruct

State-of-the-art model for Polish language processing tasks such as text generation, Q&A, and chatbots.

Polish Sovereign AI chat Chatbots Summarization

qwen qwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

Reasoning chat Text-to-Text

microsoft TRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

text-to-3d Run-on-RTX image-to-3d

deepseek-ai deepseek-v3.1

DeepSeek V3.1 Instruct is a hybrid AI model with fast reasoning, 128K context, and strong tool use.

Reasoning chat Text-to-Text

nvidia cosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understanding Synthetic Data Generation autonomous vehicles industrial Physical AI vision language model reasoning robotics smart cities

openai gpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

text-to-text chat reasoning math

opengpt-x teuken-7b-instruct-commercial-v0.4

Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.

sovereign ai text-to-text chat european Multilingual

nvidia nv-embed-v1

Generates high-quality numerical embeddings from text inputs.

Non-Commercial Use Only Retrieval Augmented Generation Text-to-Embedding

nvidia llama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nemo retriever Retrieval Augmented Generation reranking

nvidia nv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

Embedding run-on-rtx Retrieval Augmented Generation Nemo retriever Text-to-Embedding

qwen qwen3-235b-a22b

Advanced reasoing MOE mode excelling at reasoning, multilingual tasks, and instruction following

chat complex math advanced reasoning instruction following

nvidia llama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retriever embedding Retrieval Augmented Generation Text-to-Embedding

nvidia sparsedrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

autonomous vehicles bev av stack automotive

mistralai mixtral-8x22b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

Advanced Reasoning chat Code Generation Text-to-Text Large Language Models

mistralai mixtral-8x7b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

Advanced Reasoning chat Code Generation Text-to-Text Large Language Models

deepseek-ai deepseek-r1

State-of-the-art, high-efficiency LLM excelling in reasoning, math, and coding.

chat Math advanced reasoning

google gemma-3n-e2b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generation speech recognition Visual QA chat

google gemma-3n-e4b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generation speech recognition Visual QA chat

nvidia nv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object Detection Data ingestion Chart Detection nemo retriever Table Detection run-on-rtx extraction

baidu paddleocr

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

Optical Character Recognition Table Extraction Optical Character Detection nemo retriever data ingestion run-on-rtx extraction

deepseek-ai deepseek-r1-distill-llama-8b

Distilled version of Llama 3.1 8B using reasoning data generated by DeepSeek R1 for enhanced performance.

Distillation coding chat reasoning run-on-rtx math

meta llama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

LLM Multimodal Safety Content Safety Guardrail Content Moderator

nvidia cosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Synthetic Data Generation Autonomous Vehicles Physical AI robotics video-to-world

nvidia llama-3.1-nemotron-nano-8b-v1

Leading reasoning and agentic AI accuracy model for PC and edge.

chat math advanced reasoning instruction following function calling

nvidia llama-3.2-nemoretriever-1b-vlm-embed-v1

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retriever embedding Retrieval Augmented Generation Text-to-Embedding

gotocompany gemma-2-9b-cpt-sahabatai-instruct

SOTA LLM pre-trained for instruction following and proficiency in Indonesian language and its dialects.

Sovereign AI chat Indonesian Text-to-Text Regional Language Generation

utter-project eurollm-9b-instruct

State-of-the-art, multilingual model tailored to all 24 official European Union languages.

Sovereign AI chat Text-to-Text Multilingual European Regional Language Generation

nvidia nvclip

NV-CLIP is a multimodal embeddings model for image and text.

Computer vision multimodal embeddings text and image Run-on-rtx

black-forest-labs FLUX.1-schnell

FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds

Image Generation Text-to-Image Run-on-RTX

deepseek-ai deepseek-r1-0528

Updated version of DeepSeek-R1 with enhanced reasoning, coding, math, and reduced hallucination.

coding chat math advanced reasoning

nvidia nv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nemo retriever Embedding Retrieval Augmented Generation

microsoft phi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chat Code Generation Text-to-Text Language Generation

deepseek-ai deepseek-r1-distill-qwen-14b

Distilled version of Qwen 2.5 14B using reasoning data generated by DeepSeek R1 for enhanced performance.

coding distillation chat reasoning math

deepseek-ai deepseek-r1-distill-qwen-32b

Distilled version of Qwen 2.5 32B using reasoning data generated by DeepSeek R1 for enhanced performance.

coding distillation chat reasoning math

google gemma-2-2b-it

Advanced small language generative AI model for edge applications

chat Code Generation Text-to-Text Language Generation

deepseek-ai deepseek-r1-distill-qwen-7b

Distilled version of Qwen 2.5 7B using reasoning data generated by DeepSeek R1 for enhanced performance.

coding distillation chat math

yentinglin llama-3-taiwan-70b-instruct

Sovereign AI model finetuned on Traditional Mandarin and English data using the Llama-3 architecture.

regional language generation chat Code Generation Large Language Models

ai21labs jamba-1.5-mini-instruct

Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.

chat Language Generation Text-to-text

institute-of-science-tokyo llama-3.1-swallow-70b-instruct-v0.1

Sovereign AI model trained on Japanese language that understands regional nuances.

Sovereign AI Large Language Model chat Regional Language Generation

tokyotech-llm llama-3-swallow-70b-instruct-v0.1

Sovereign AI model trained on Japanese language that understands regional nuances.

Large Language Model chat Regional Language Generation

institute-of-science-tokyo llama-3.1-swallow-8b-instruct-v0.1

Sovereign AI model trained on Japanese language that understands regional nuances.

Sovereign AI Large Language Model chat Regional Language Generation

mistralai mistral-small-3.1-24b-instruct-2503

Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses

language generation chat multimodal image understanding

nvidia nv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

Object Detection computer vision deepstream NVIDIA NIM

nvidia nv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

Image-to-Embedding computer vision deepstream NVIDIA NIM object Classification

nvidia llama-3.1-nemoguard-8b-topic-control

Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.

nemo guardrails LLM safety Safety and moderation dialogue safety nemotron

nvidia corrdiff

Generative downscaling model for generating high resolution regional scale weather fields.

AI Weather prediction Weather Simulation Earth-2

nvidia fourcastnet

FourCastNet predicts global atmospheric dynamics of various weather / climate variables.

Weather Simulation AI Weather Prediction Climate science Earth-2

hive deepfake-image-detection

Advanced AI model detects faces and identifies deep fake images.

computer vision AI safety deep fake detection Content moderation

hive ai-generated-image-detection

Robust image classification model for detecting and managing AI-generated content.

image classification computer vision AI safety Content moderation

nvidia cosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

Synthetic Data Generation Physical AI policy evaluation robotics video-to-world

microsoft phi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chat Code Generation Text-to-Text Language Generation Large Language Models

university-at-buffalo cached

Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

nemo retriever Chart Element Detection Image-To-Text

nvidia usdsearch

AI-powered search for OpenUSD data, 3D models, images, and assets using text or image-based inputs.

OpenUSD Synthetic Data Generation Digital Twin USD Text-to-3D

nvidia visual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

image Image Generation cv Image Segmentation vlm computer vision TAO Toolkit video NVIDIA NIM

nvidia retail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Object Detection image cv vlm computer vision TAO Toolkit video NVIDIA NIM

nvidia ocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Optical Character Recognition image Optical Character Detection cv vlm computer vision TAO Toolkit video

google paligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

image cv Vision Assistant vlm Visual Question Answering computer vision Language Generation Image-to-Text video