NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

Search Results

Searching for: av stack
Sorting by Last Updated

nvidiacosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understandingSynthetic Data Generationautonomous vehiclesindustrialPhysical AIvision language modelreasoningroboticssmart cities

deepseek-aideepseek-v3.2

State-of-the-art 685B reasoning LLM with sparse attention, long context, and integrated agentic tools.

long contexttext-to-textchatreasoning

nvidiastreampetr

StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.

autonomous vehiclesbevAV Stackautomotive

deepseek-aideepseek-v3.1-terminus

DeepSeek-V3.1: hybrid inference LLM with Think/Non-Think modes, stronger agents, 128K context, strict function calling.

tool callingchatadvanced reasoningagentic

stockmarkstockmark-2-100b-instruct

Japanese-specialized large-language-model for enterprises to read and understand complex business documents.

sovereign aijapanesestockmarkchatlarge language model

qwenqwen3-next-80b-a3b-instruct

Qwen3-Next Instruct blends hybrid attention, sparse MoE, and stability boosts for ultra-long context AI.

chattext-generationagentic

speakleashbielik-11b-v2.6-instruct

State-of-the-art model for Polish language processing tasks such as text generation, Q&A, and chatbots.

PolishSovereign AIchatChatbotsSummarization

qwenqwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

ReasoningchatText-to-Text

microsoftTRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

text-to-3dRun-on-RTXimage-to-3d

deepseek-aideepseek-v3.1

DeepSeek V3.1 Instruct is a hybrid AI model with fast reasoning, 128K context, and strong tool use.

ReasoningchatText-to-Text

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understandingSynthetic Data Generationautonomous vehiclesindustrialPhysical AIvision language modelreasoningroboticssmart cities

openaigpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

text-to-textchatreasoningmath

opengpt-xteuken-7b-instruct-commercial-v0.4

Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.

sovereign aitext-to-textchateuropeanMultilingual

nvidianv-embed-v1

Generates high-quality numerical embeddings from text inputs.

Non-Commercial Use OnlyRetrieval Augmented GenerationText-to-Embedding

nvidiallama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nemo retrieverRetrieval Augmented Generationreranking

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

Embeddingrun-on-rtxRetrieval Augmented GenerationNemo retrieverText-to-Embedding

qwenqwen3-235b-a22b

Advanced reasoing MOE mode excelling at reasoning, multilingual tasks, and instruction following

chatcomplex mathadvanced reasoninginstruction following

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retrieverembeddingRetrieval Augmented GenerationText-to-Embedding

nvidiasparsedrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

autonomous vehiclesbevav stackautomotive

mistralaimixtral-8x22b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

Advanced ReasoningchatCode GenerationText-to-TextLarge Language Models

mistralaimixtral-8x7b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

Advanced ReasoningchatCode GenerationText-to-TextLarge Language Models

deepseek-aideepseek-r1

State-of-the-art, high-efficiency LLM excelling in reasoning, math, and coding.

chatMathadvanced reasoning

googlegemma-3n-e2b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generationspeech recognitionVisual QAchat

googlegemma-3n-e4b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generationspeech recognitionVisual QAchat

nvidianv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionData ingestionChart Detectionnemo retrieverTable Detectionrun-on-rtxextraction

baidupaddleocr

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

Optical Character RecognitionTable ExtractionOptical Character Detectionnemo retrieverdata ingestionrun-on-rtxextraction

deepseek-aideepseek-r1-distill-llama-8b

Distilled version of Llama 3.1 8B using reasoning data generated by DeepSeek R1 for enhanced performance.

Distillationcodingchatreasoningrun-on-rtxmath

metallama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

LLM Multimodal SafetyContent SafetyGuardrailContent Moderator

nvidiacosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Synthetic Data GenerationAutonomous VehiclesPhysical AIroboticsvideo-to-world

nvidiallama-3.1-nemotron-nano-8b-v1

Leading reasoning and agentic AI accuracy model for PC and edge.

chatmathadvanced reasoninginstruction followingfunction calling

nvidiallama-3.2-nemoretriever-1b-vlm-embed-v1

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retrieverembeddingRetrieval Augmented GenerationText-to-Embedding

gotocompanygemma-2-9b-cpt-sahabatai-instruct

SOTA LLM pre-trained for instruction following and proficiency in Indonesian language and its dialects.

Sovereign AIchatIndonesianText-to-TextRegional Language Generation

utter-projecteurollm-9b-instruct

State-of-the-art, multilingual model tailored to all 24 official European Union languages.

Sovereign AIchatText-to-TextMultilingualEuropeanRegional Language Generation

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

Computer visionmultimodal embeddingstext and imageRun-on-rtx

black-forest-labsFLUX.1-schnell

FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds

Image GenerationText-to-ImageRun-on-RTX

deepseek-aideepseek-r1-0528

Updated version of DeepSeek-R1 with enhanced reasoning, coding, math, and reduced hallucination.

codingchatmathadvanced reasoning

nvidianv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nemo retrieverEmbeddingRetrieval Augmented Generation

microsoftphi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chatCode GenerationText-to-TextLanguage Generation

deepseek-aideepseek-r1-distill-qwen-14b

Distilled version of Qwen 2.5 14B using reasoning data generated by DeepSeek R1 for enhanced performance.

codingdistillationchatreasoningmath

deepseek-aideepseek-r1-distill-qwen-32b

Distilled version of Qwen 2.5 32B using reasoning data generated by DeepSeek R1 for enhanced performance.

codingdistillationchatreasoningmath

googlegemma-2-2b-it

Advanced small language generative AI model for edge applications

chatCode GenerationText-to-TextLanguage Generation

deepseek-aideepseek-r1-distill-qwen-7b

Distilled version of Qwen 2.5 7B using reasoning data generated by DeepSeek R1 for enhanced performance.

codingdistillationchatmath

yentinglinllama-3-taiwan-70b-instruct

Sovereign AI model finetuned on Traditional Mandarin and English data using the Llama-3 architecture.

regional language generationchatCode GenerationLarge Language Models

ai21labsjamba-1.5-mini-instruct

Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.

chatLanguage GenerationText-to-text

institute-of-science-tokyollama-3.1-swallow-70b-instruct-v0.1

Sovereign AI model trained on Japanese language that understands regional nuances.

Sovereign AILarge Language ModelchatRegional Language Generation

tokyotech-llmllama-3-swallow-70b-instruct-v0.1

Sovereign AI model trained on Japanese language that understands regional nuances.

Large Language ModelchatRegional Language Generation

institute-of-science-tokyollama-3.1-swallow-8b-instruct-v0.1

Sovereign AI model trained on Japanese language that understands regional nuances.

Sovereign AILarge Language ModelchatRegional Language Generation

mistralaimistral-small-3.1-24b-instruct-2503

Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses

language generationchatmultimodalimage understanding

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

Object Detectioncomputer visiondeepstreamNVIDIA NIM

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

Image-to-Embeddingcomputer visiondeepstreamNVIDIA NIMobject Classification

nvidiallama-3.1-nemoguard-8b-topic-control

Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.

nemo guardrailsLLM safetySafety and moderationdialogue safetynemotron

nvidiacorrdiff

Generative downscaling model for generating high resolution regional scale weather fields.

AI Weather predictionWeather SimulationEarth-2

nvidiafourcastnet

FourCastNet predicts global atmospheric dynamics of various weather / climate variables.

Weather SimulationAI Weather PredictionClimate scienceEarth-2

hivedeepfake-image-detection

Advanced AI model detects faces and identifies deep fake images.

computer visionAI safetydeep fake detectionContent moderation

hiveai-generated-image-detection

Robust image classification model for detecting and managing AI-generated content.

image classificationcomputer visionAI safetyContent moderation

nvidiacosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

Synthetic Data GenerationPhysical AIpolicy evaluationroboticsvideo-to-world

microsoftphi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

university-at-buffalocached

Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

nemo retrieverChart Element DetectionImage-To-Text

nvidiausdsearch

AI-powered search for OpenUSD data, 3D models, images, and assets using text or image-based inputs.

OpenUSDSynthetic Data GenerationDigital TwinUSDText-to-3D

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageImage GenerationcvImage Segmentationvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Object Detectionimagecvvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Optical Character RecognitionimageOptical Character Detectioncvvlmcomputer visionTAO Toolkitvideo

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

imagecvVision AssistantvlmVisual Question Answeringcomputer visionLanguage GenerationImage-to-Textvideo