NVIDIA
Explore Models Blueprints GPUs Docs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: multimodal embeddings
Sorting by Most Recent

nvidiallama-3.2-nemoretriever-1b-vlm-embed-v1

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retrieverembeddingretrieval augmented generationtext-to-embeddingnvidia

mistralaimistral-small-3.1-24b-instruct-2503

Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses

language generationmultimodalimage understandingmistralai

mistralaimistral-medium-3-instruct

Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.

language generationimage-to-textmultimodalvisual question answeringmistralai

metallama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generationimage-to-textvision assistantvisual question answeringmeta

metallama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

language generationimage-to-textvision assistantvisual question answeringmeta

nvidiaBuild an AI Agent for Enterprise Research

Build artificial general agents (AGA) powered by AGI models that continuously process and synthesize multimodal enterprise data, enabling reasoning, planning, and refinement to generate comprehensive reports.

nimlaunchablellama nemotronreasoningblueprintenterpriseretrieval-augmented generationnvidia ainemo retrievernvidia

nvidianv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nemo retrieverembeddingretrieval augmented generationnvidia

googlegemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

vision assistantvisual question answeringlanguage generationimage-to-textgoogle

microsoftphi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

speech recognitionvisual qalanguage generationimage-to-textchart and table understandingmicrosoft

nvidiaBuild an Enterprise RAG pipeline

Connect AI applications to multimodal enterprise data with a scalable retrieval augmented generation (RAG) pipeline built on highly performant, industry-leading NIM microservices, for faster PDF data extraction and more accurate information retrieval.

nemo retrievernimlaunchableblueprintenterpriseretrieval-augmented generationnvidia ainvidia

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retrieverrun on rtxembeddingretrieval augmented generationtext-to-embeddingnvidia

metaesm2-650m

Generates embeddings of proteins from their amino acid sequences.

nimprotein embeddingbionemobiologydrug discoverymeta

microsoftphi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

vision assistantvisual question answeringlanguage generationimage-to-textmicrosoft

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

image-to-embeddingcomputer visiondeepstreamnvidia nimobject classificationnvidia

microsoftflorence-2

Vision foundation model capable of performing diverse computer vision and vision language tasks.

image classificationimageobject detectioncvmultimodalvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-texttext-to-imagemicrosoft

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

embeddingretrieval augmented generationnemo retrievertext-to-embeddingnvidia

nvidianv-embedqa-mistral-7b-v2

Multilingual text question-answering retrieval, transforming textual information into dense vector representations.

nemo retrieverembeddingretrieval augmented generationnvidia

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

computer visionmultimodal embeddingstext and imagenvidia nimrun-on-rtxnvidia

nvidianv-embed-v1

Generates high-quality numerical embeddings from text inputs.

non-commercial use onlyretrieval augmented generationtext-to-embeddingnvidia

baaibge-m3

Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.

embeddingsretrieval augmented generationtext-to-embeddingbaai

microsoftphi-3-vision-128k-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

imagecvvision assistantvlmvisual question answeringcomputer visionlanguage generationimage-to-textvideomicrosoft

snowflakearctic-embed-l

Optimized community model for text embedding.

nemo retrieverembeddingretrieval augmented generationtext-to-embeddingsnowflake

nvidiaembed-qa-4

GPU-accelerated generation of text embeddings used for question-answering retrieval.

embeddingsretrieval augmented generationtext-to-embeddingnvidia

microsoftkosmos-2

Groundbreaking multimodal model designed to understand and reason about visual elements in images.

imagecvmultimodalvlmvisual question answeringcomputer visionimage understandingimage-to-textvideomicrosoft

googledeplot

Translate images of plots into tables with one-shot visual language understanding.

nemo retrievermultimodaldata ingestionimage-to-textgoogle

adeptfuyu-8b

Multi-modal model for a wide range of tasks, including image understanding and language generation.

imagecvmultimodalvlmcomputer visionimage understandinglanguage generationimage-to-textvideoadept