NVIDIA
Explore Models Blueprints GPUs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: av stack
Sorting by Most Recent

mistralaimistral-small-3.1-24b-instruct-2503

Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses

language generationmultimodalimage understandingmistralai

nvidia3D Guided Generative AI

Create high quality images using Flux.1 in ComfyUI, guided by 3D.

blueprintrun on rtxnvidia ainvidia

nvidiaBuild an AI Agent for Research and Reporting

Create AI agents that reason, plan, reflect and refine to produce high-quality reports based on source materials of your choice.

nimllama nemotronreasoningblueprintretrieval-augmented generationnvidia ainemo retrievernvidia

nvidiaAI Weather Analytics with Earth-2

Develop AI powered weather analysis and forecasting application visualizing multi-layered geospatial data.

climate scienceblueprintweather simulationai weather predictionnvidia aiearth-2nvidia

nvidiaSynthetic Manipulation Motion Generation for Robotics

Generate exponentially large amounts of synthetic motion trajectories for robot manipulation from just a few human demonstrations.

nvidia omniverseblueprintsynthetic dataroboticsphysical airobot learninghumanoidsnvidia isaac gr00ttext-to-worldimage-to-worldteleopnvidia

nvidiacosmos-predict1-7b

Generates physics-aware video world states from text and image prompts for physical AI development.

synthetic data generationphysical airoboticstext-to-worldimage-to-worldnvidia

nvidiacosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

synthetic data generationphysical aipolicy evaluationroboticsvideo-to-worldnvidia

nvidiaTest Multi-Robot Fleets for Industrial Automation

Simulate, test, and optimize physical AI and robotic fleets at scale in industrial digital twins before real-world deployment.

industrialnvidia omniverseblueprintsimulationomniverse blueprintnvidia

nvidiasparsedrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

autonomous vehiclesbevav stackautomotivenvidia

nvidiallama-3.1-nemotron-nano-8b-v1

Leading reasoning and agentic AI accuracy model for PC and edge.

mathadvanced reasoninginstruction followingfunction callingnvidia

nvidianv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nemo retrieverembeddingretrieval augmented generationnvidia

deepseek-aideepseek-r1-distill-llama-8b

Distilled version of Llama 3.1 8B using reasoning data generated by DeepSeek R1 for enhanced performance.

distillationcodingrun on rtxreasoningmathdeepseek-ai

nvidiaLLM Router

Route LLM requests to the best model for the task at hand.

launchableblueprintllm routernvidia ainvidia

deepseek-aideepseek-r1-distill-qwen-32b

Distilled version of Qwen 2.5 32B using reasoning data generated by DeepSeek R1 for enhanced performance.

codingdistillationreasoningmathdeepseek-ai

deepseek-aideepseek-r1-distill-qwen-14b

Distilled version of Qwen 2.5 14B using reasoning data generated by DeepSeek R1 for enhanced performance.

codingdistillationreasoningmathdeepseek-ai

deepseek-aideepseek-r1-distill-qwen-7b

Distilled version of Qwen 2.5 7B using reasoning data generated by DeepSeek R1 for enhanced performance.

codingdistillationreasoningmathdeepseek-ai

microsoftphi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

code generationchattext-to-textlanguage generationmicrosoft

nvidiaEvo 2 Protein Design

This workflow shows how generative AI can generate DNA sequences that can be translated into proteins for bioengineering.

blueprintnimbionemobiologydrug discoveryprotein generationnvidia

deepseek-aideepseek-r1

State-of-the-art, high-efficiency LLM excelling in reasoning, math, and coding.

chatmathadvanced reasoningdeepseek-ai

nvidiaBuild an Enterprise RAG pipeline

Connect AI applications to multimodal enterprise data with a scalable retrieval augmented generation (RAG) pipeline built on highly performant, industry-leading NIM microservices, for faster PDF data extraction and more accurate information retrieval.

nemo retrievernimlaunchableblueprintenterpriseretrieval-augmented generationnvidia ainvidia

nvidiallama-3.1-nemoguard-8b-topic-control

Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.

dialogue safetyllm safetyguard modelcontent safetynvidia

nvidiaBuild A Generative Protein Binder Design Pipeline

This blueprint shows how generative AI and accelerated NIM microservices can design protein binders smarter and faster.

nvidia bionemoblueprintbionemobiologydrug discoveryprotein generationnvidia

nvidiaPDF to Podcast

Transform PDFs into AI podcasts for engaging on-the-go audio content.

blueprintmulti-modallaunchabletext-to-speechconversational aipdf-to-podcastnvidia aiai agenttext-to-speechnvidia

wandbTraceability for Agentic AI

Trace and evaluate AI Agents with Weights & Biases.

traceabilitylaunchableai agentswandbblueprintpartnernvidia aiwandb

pipecatVoice Agent Framework for Conversational AI

Automate voice AI agents with NVIDIA NIM microservices and Pipecat.

pipecatlaunchableai agentsblueprintconversational aipartnernvidia aipipecat

llamaindexDocument Research Assistant for Blog Creation

Automate research, and generate blogs with AI Agents using LlamaIndex and Llama3.3-70B NIM LLM.

blog creationlaunchableai agentsblueprintpartnerllamaindexnvidia aillamaindex

langchainStructured Report Generation

Generate detailed, structured reports on any topic using LangGraph and Llama3.3 70B NIM

langgraphreport generationlaunchableai agentsblueprintpartnernvidia ailangchain

crewaiCode Documentation for Software Development

Document your github repositories with AI Agents using CrewAI and Llama3.3 70B NIM.

code documentationcrewailaunchableai agentsblueprintpartnernvidia aicrewai

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retrieverrun on rtxembeddingretrieval augmented generationtext-to-embeddingnvidia

nvidiallama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nemo retrieverretrieval augmented generationrerankingnvidia

university-at-buffalocached

Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

nemo retrieverchart element detectionimage-to-textuniversity-at-buffalo

nvidianv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

object detectiondata ingestionchart detectionnemo retrievertable detectionrun on rtxextractionnvidia

baidupaddleocr

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

optical character recognitiontable extractionoptical character detectionnemo retrieverrun on rtxdata ingestionextractionbaidu

nvidiacorrdiff

Generative downscaling model for generating high resolution regional scale weather fields.

ai weather predictionweather simulationearth-2nvidia

nvidiafourcastnet

FourCastNet predicts global atmospheric dynamics of various weather / climate variables.

weather simulationai weather predictionclimate scienceearth-2nvidia

hivedeepfake-image-detection

Advanced AI model detects faces and identifies deep fake images.

computer visionai safetydeep fake detectioncontent moderationhive

nvidiaBuild a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

visionvideo-to-textgenerative aiblueprintchatnvidia ainvidia

nvidia3D Conditioning for Precise Visual Generative AI

Enhance and modify high-quality compositions using real-time rendering and generative AI output without affecting a hero product asset.

visual designnvidia omniverseblueprintsimulationnvidia

nvidiaBuild an AI Virtual Assistant

Create intelligent virtual assistants for customer service across every industry

customer servicelaunchableblueprintretrieval-augmented generationllmcontact centernvidia ainvidia

ibmgranite-3.0-8b-instruct

Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI

small language modelchattext-to-textibm

nvidiaVulnerability Analysis for Container Security

Rapidly identify and mitigate container security vulnerabilities with generative AI.

generative ailaunchablenv-embedqa-e5-v5blueprintllama-3_1-70b-instructcybersecuritynvidia ainvidia

institute-of-science-tokyollama-3.1-swallow-70b-instruct-v0.1

Sovereign AI model trained on Japanese language that understands regional nuances.

sovereign ailarge language modelchatregional language generationinstitute-of-science-tokyo

institute-of-science-tokyollama-3.1-swallow-8b-instruct-v0.1

Sovereign AI model trained on Japanese language that understands regional nuances.

sovereign ailarge language modelchatregional language generationinstitute-of-science-tokyo

nvidiallama-3.1-nemotron-51b-instruct

Unique language model that delivers an unmatched accuracy-efficiency performance.

language generationchattext-to-textnvidia

hiveai-generated-image-detection

Robust image classification model for detecting and managing AI-generated content.

image classificationcomputer visionai safetycontent moderationhive

nvidiaBuild a Digital Human

Create intelligent, interactive avatars for customer service across industries

digital humansspeech-to-textnvidia omniverseblueprintchataudio-to-facenvidia ainvidia

yentinglinllama-3-taiwan-70b-instruct

Sovereign AI model finetuned on Traditional Mandarin and English data using the Llama-3 architecture.

regional language generationchatcode generationlarge language modelsyentinglin

tokyotech-llmllama-3-swallow-70b-instruct-v0.1

Sovereign AI model trained on Japanese language that understands regional nuances.

large language modelchatregional language generationtokyotech-llm

nvidiaBuild A Generative Virtual Screening Pipeline

This blueprint shows how generative AI and accelerated NIM microservices can design optimized small molecules smarter and faster.

chemistrynimnvidia bionemoblueprintbionemodockingdrug discoverynvidia

ai21labsjamba-1.5-mini-instruct

Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.

chatlanguage generationtext-to-textai21labs

ai21labsjamba-1.5-large-instruct

Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.

chatlanguage generationtext-to-textai21labs

microsoftphi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

code generationchattext-to-textlanguage generationlarge language modelsmicrosoft

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

image-to-embeddingcomputer visiondeepstreamnvidia nimobject classificationnvidia

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

object detectioncomputer visiondeepstreamnvidia nimnvidia

briaaiBRIA-2.3

An enterprise-grade text-to-image model trained on a compliant dataset produces high quality images.

image generationtext-to-imagebriaai

microsoftflorence-2

Vision foundation model capable of performing diverse computer vision and vision language tasks.

image classificationimageobject detectioncvmultimodalvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-texttext-to-imagemicrosoft

googlegemma-2-2b-it

Advanced small language generative AI model for edge applications

code generationchattext-to-textlanguage generationgoogle

nvidiausdsearch

AI-powered search for OpenUSD data, 3D models, images, and assets using text or image-based inputs.

openusdsynthetic data generationdigital twinusdtext-to-3dnvidia nimnvidia

nv-mistralaimistral-nemo-12b-instruct

Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU.

run on rtxcode generationchatlanguage generationtext-to-textnv-mistralai

nvidianv-rerankqa-mistral-4b-v3

Multilingual text reranking model.

nemo retrieverrerankingretrieval augmented generationnvidia

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

embeddingretrieval augmented generationnemo retrievertext-to-embeddingnvidia

nvidianv-embedqa-mistral-7b-v2

Multilingual text question-answering retrieval, transforming textual information into dense vector representations.

nemo retrieverembeddingretrieval augmented generationnvidia

01-aiyi-large

Powerful model trained on English and Chinese for diverse tasks including chatbot and creative writing.

code generationchattext-to-textmultilingual01-ai

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

computer visionmultimodal embeddingstext and imagerun on rtxnvidia nimnvidia

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

optical character recognitionimageoptical character detectioncvvlmcomputer visiontao toolkitvideonvidia

nvidianv-embed-v1

Generates high-quality numerical embeddings from text inputs.

non-commercial use onlyretrieval augmented generationtext-to-embeddingnvidia

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageimage generationcvimage segmentationvlmcomputer visiontao toolkitvideonvidia nimnvidia

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

object detectionimagecvvlmcomputer visiontao toolkitvideonvidia nimnvidia

microsoftphi-3-vision-128k-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

imagecvvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-textvideomicrosoft

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

imagecvvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-textvideogoogle

mistralaimixtral-8x22b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

advanced reasoningcode generationchattext-to-textlarge language modelsmistralai

microsoftkosmos-2

Groundbreaking multimodal model designed to understand and reason about visual elements in images.

imagecvmultimodalvlmvisual question answeringcomputer visionimage understandingimage-to-textvideomicrosoft

nvidianeva-22b

Multi-modal vision-language model that understands text/images and generates informative responses

imagecvvision assistantnon-commercial use onlyvlmvisual question answeringcomputer visionimage-to-textvideonvidia

adeptfuyu-8b

Multi-modal model for a wide range of tasks, including image understanding and language generation.

imagecvmultimodalvlmcomputer visionimage understandinglanguage generationimage-to-textvideoadept

stabilityaistable-video-diffusion

Stable Video Diffusion (SVD) is a generative diffusion model that leverages a single image as a conditioning frame to synthesize video sequences.

image generationtext-to-imagestabilityai

mistralaimixtral-8x7b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

advanced reasoningcode generationchattext-to-textlarge language modelsmistralai