NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: Computer vision
Sorting by Most Recent

nvidianemotron-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

text and table extractiondocument parsingsupported language - english

nvidianemotron-nano-12b-v2-vl

Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

language generationchatImage-to-Textvision assistantvisual question answering

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understandingSynthetic Data Generationautonomous vehiclesindustrialPhysical AIvision language modelreasoningroboticssmart cities

microsoftphi-4-mini-flash-reasoning

Lightweight reasoning model for applications in latency bound, memory/compute constrained environments

edgechatreasoningtext-generationmath

nvidiallama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

doc intelligencechatmultiple image understandingOCR

metallama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generationchatImage-to-Textvision assistantvisual question answering

metallama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

language generationchatImage-to-Textvision assistantvisual question answering

siemenssimcenter-star-ccm+

Run computational-fluid dynamics (CFD) simulations

aerodynamicscaefluid-dynamicssimulationheat-transfercomputer-aided engineering

cadencefidelity

Run computational-fluid dynamics (CFD) simulations

aerodynamicscaefluid-dynamicssimulationheat-transfercomputer-aided engineering

ansysfluent

Run computational-fluid dynamics (CFD) simulations

aerodynamicscaefluid-dynamicssimulationheat-transfercomputer-aided engineering

googlegemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Vision AssistantchatVisual Question AnsweringLanguage GenerationImage-to-Text

nvidianemoretriever-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

optical character recognitionnemo retrieverdata ingestiontable extractionsupported language - english

microsoftphi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chatCode GenerationText-to-TextLanguage Generation

nvidiacosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLMVision language modelimage captionimage to text

nvidiaBuild a Digital Twin for Interactive Fluid Simulation

This NVIDIA Omniverse™ Blueprint demonstrates how commercial software vendors can create interactive digital twins.

NVIDIA OmniverseBlueprintCAEsimulationExternal AerodynamicsEnterpriseComputer-aided-engineering

hivedeepfake-image-detection

Advanced AI model detects faces and identifies deep fake images.

computer visionAI safetydeep fake detectionContent moderation

nvidiaBuild a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

visionvideo-to-textgenerative AILaunchableBlueprintchatEnterpriseNVIDIA AI

metallama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

Image-Text RetrievalVisual QAchatImage-to-TextImage CaptioningVisual Grounding

metallama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

Image-Text RetrievalVisual QAimage captioningchatImage-to-TextVisual Grounding

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLMVision language modelimage captionimage to text

hiveai-generated-image-detection

Robust image classification model for detecting and managing AI-generated content.

image classificationcomputer visionAI safetyContent moderation

microsoftphi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Vision AssistantVisual Question AnsweringLanguage GenerationImage-to-Text

microsoftphi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

Image-to-Embeddingcomputer visiondeepstreamNVIDIA NIMobject Classification

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

Object Detectioncomputer visiondeepstreamNVIDIA NIM

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

Computer visionmultimodal embeddingstext and imageRun-on-rtx

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Optical Character RecognitionimageOptical Character Detectioncvvlmcomputer visionTAO Toolkitvideo

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageImage GenerationcvImage Segmentationvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Object Detectionimagecvvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

imagecvVision AssistantvlmVisual Question Answeringcomputer visionLanguage GenerationImage-to-Textvideo