NVIDIA
Explore Models Blueprints GPUs Docs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: Visual Question Answering
Sorting by Most Recent

googlegemma-3n-e4b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generationspeech recognitionvisual qachatgoogle

googlegemma-3n-e2b-it

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

language generationspeech recognitionvisual qachatgoogle

nvidiallama-3.2-nemoretriever-500m-rerank-v2

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

nemo retrieverretrieval augmented generationrerankingnvidia

nvidiallama-3.2-nemoretriever-1b-vlm-embed-v1

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retrieverembeddingretrieval augmented generationtext-to-embeddingnvidia

mistralaimistral-medium-3-instruct

Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.

language generationimage-to-textmultimodalvisual question answeringmistralai

metallama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generationimage-to-textvision assistantvisual question answeringmeta

metallama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

language generationimage-to-textvision assistantvisual question answeringmeta

nvidiaAI Weather Analytics with Earth-2

Develop AI powered weather analysis and forecasting application visualizing multi-layered geospatial data.

blueprintclimate scienceenterpriseweather simulationai weather predictionnvidia aiearth-2nvidia

googlegemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

vision assistantvisual question answeringlanguage generationimage-to-textgoogle

microsoftphi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

speech recognitionvisual qalanguage generationimage-to-textchart and table understandingmicrosoft

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retrieverrun-on-rtxembeddingretrieval augmented generationtext-to-embeddingnvidia

nvidiallama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nemo retrieverretrieval augmented generationrerankingnvidia

nvidia3D Conditioning for Precise Visual Generative AI

Enhance and modify high-quality compositions using real-time rendering and generative AI output without affecting a hero product asset.

visual designnvidia omniverseblueprintsimulationenterprisenvidia

metallama-3.2-11b-vision-instruct

Cutting-edge vision-language model exceling in high-quality reasoning from images.

image-text retrievalvisual qaimage-to-textimage captioningvisual groundingmeta

metallama-3.2-90b-vision-instruct

Cutting-edge vision-Language model exceling in high-quality reasoning from images.

image-text retrievalvisual qaimage captioningimage-to-textvisual groundingmeta

microsoftphi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

vision assistantvisual question answeringlanguage generationimage-to-textmicrosoft

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

image-to-embeddingcomputer visiondeepstreamnvidia nimobject classificationnvidia

microsoftflorence-2

Vision foundation model capable of performing diverse computer vision and vision language tasks.

image classificationimageobject detectioncvmultimodalvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-texttext-to-imagemicrosoft

nvidiausdvalidate

Verify compatibility of OpenUSD assets with instant RTX render and rule-based validation.

validationopenusdsynthetic data generationdigital twinusdvisualization 3dnvidia

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

embeddingretrieval augmented generationnemo retrievertext-to-embeddingnvidia

nvidianv-embedqa-mistral-7b-v2

Multilingual text question-answering retrieval, transforming textual information into dense vector representations.

nemo retrieverembeddingretrieval augmented generationnvidia

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageimage generationcvimage segmentationvlmcomputer visiontao toolkitvideonvidia nimnvidia

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

imagecvvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-textvideogoogle

nvidiarerank-qa-mistral-4b

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

rankingretrieval augmented generationnvidia

microsoftkosmos-2

Groundbreaking multimodal model designed to understand and reason about visual elements in images.

imagecvmultimodalvlmvisual question answeringcomputer visionimage understandingimage-to-textvideomicrosoft

googledeplot

Translate images of plots into tables with one-shot visual language understanding.

nemo retrievermultimodaldata ingestionimage-to-textextractiongoogle

nvidianeva-22b

Multi-modal vision-language model that understands text/images and generates informative responses

imagecvvision assistantnon-commercial use onlyvlmvisual question answeringcomputer visionimage-to-textvideonvidia