NVIDIA
Explore Models Blueprints GPUs Docs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: Computer vision
Sorting by Most Recent

nvidiacosmos-reason1-7b

Reasoning vision language model (VLM) for physical AI and robotics.

video understandingsynthetic data generationautonomous vehiclesindustrialphysical aivision language modelreasoningroboticssmart citiesnvidia

microsoftphi-4-mini-flash-reasoning

Lightweight reasoning model for applications in latency bound, memory/compute constrained environments

edgechatreasoningtext-generationmathmicrosoft

nvidiallama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

doc intelligencemultiple image understandingocrnvidia

metallama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generationimage-to-textvision assistantvisual question answeringmeta

metallama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

language generationimage-to-textvision assistantvisual question answeringmeta

siemenssimcenter-star-ccm+

Run computational-fluid dynamics (CFD) simulations

aerodynamicscaefluid-dynamicssimulationheat-transfercomputer-aided engineeringsiemens

cadencefidelity

Run computational-fluid dynamics (CFD) simulations

aerodynamicscaefluid-dynamicssimulationheat-transfercomputer-aided engineeringcadence

ansysfluent

Run computational-fluid dynamics (CFD) simulations

aerodynamicscaefluid-dynamicssimulationheat-transfercomputer-aided engineeringansys

googlegemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

vision assistantvisual question answeringlanguage generationimage-to-textgoogle

nvidianemoretriever-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

optical character recognitionnemo retrieverdata ingestiontable extractionsupported language - englishnvidia

microsoftphi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

code generationchattext-to-textlanguage generationmicrosoft

nvidiacosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

vlmvision language modelimage captionimage to textnvidia

nvidiaBuild a Digital Twin for Interactive Fluid Simulation

This NVIDIA Omniverseâ„¢ Blueprint demonstrates how commercial software vendors can create interactive digital twins.

nvidia omniverseblueprintcaesimulationexternal aerodynamicsenterprisecomputer-aided-engineeringnvidia

hivedeepfake-image-detection

Advanced AI model detects faces and identifies deep fake images.

computer visionai safetydeep fake detectioncontent moderationhive

nvidiaBuild a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

visionvideo-to-textgenerative ailaunchableblueprintchatenterprisenvidia ainvidia

metallama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

image-text retrievalvisual qaimage-to-textimage captioningvisual groundingmeta

metallama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

image-text retrievalvisual qaimage captioningimage-to-textvisual groundingmeta

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

vlmvision language modelimage captionimage to textnvidia

hiveai-generated-image-detection

Robust image classification model for detecting and managing AI-generated content.

image classificationcomputer visionai safetycontent moderationhive

microsoftphi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

vision assistantvisual question answeringlanguage generationimage-to-textmicrosoft

microsoftphi-3.5-moe-instruct

Advanced LLM based on Mixture of Experts architecure to deliver compute efficient content generation

moechatcode generationchattext-to-textlanguage generationmicrosoft

microsoftphi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

code generationchattext-to-textlanguage generationlarge language modelsmicrosoft

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

image-to-embeddingcomputer visiondeepstreamnvidia nimobject classificationnvidia

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

object detectioncomputer visiondeepstreamnvidia nimnvidia

microsoftflorence-2

Vision foundation model capable of performing diverse computer vision and vision language tasks.

image classificationimageobject detectioncvmultimodalvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-texttext-to-imagemicrosoft

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

computer visionmultimodal embeddingstext and imagenvidia nimrun-on-rtxnvidia

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

optical character recognitionimageoptical character detectioncvvlmcomputer visiontao toolkitvideonvidia

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageimage generationcvimage segmentationvlmcomputer visiontao toolkitvideonvidia nimnvidia

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

object detectionimagecvvlmcomputer visiontao toolkitvideonvidia nimnvidia

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

imagecvvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-textvideogoogle

microsoftkosmos-2

Groundbreaking multimodal model designed to understand and reason about visual elements in images.

imagecvmultimodalvlmvisual question answeringcomputer visionimage understandingimage-to-textvideomicrosoft

nvidianeva-22b

Multi-modal vision-language model that understands text/images and generates informative responses

imagecvvision assistantnon-commercial use onlyvlmvisual question answeringcomputer visionimage-to-textvideonvidia

adeptfuyu-8b

Multi-modal model for a wide range of tasks, including image understanding and language generation.

imagecvmultimodalvlmcomputer visionimage understandinglanguage generationimage-to-textvideoadept