NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: multimodal embeddings
Sorting by Most Recent

mistralaimistral-large-3-675b-instruct-2512

A state-of-the-art general purpose MoE VLM ideal for chat, agentic and instruction based use cases.

language generationchatImage-to-Textmultimodalagentic

mistralaiministral-14b-instruct-2512

A general purpose VLM ideal for chat and instruction based use cases

language generationSLMchatImage-to-Textmultimodal

cyborgCyborg Enterprise RAG

Securely extract, embed, and index multimodal data with encryption in-use for fast, accurate semantic search.

NIMLaunchableBlueprintRetrieval-Augmented GenerationNeMo Retriever

nvidiallama-3_2-nemoretriever-300m-embed-v2

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

Retrieval Augmented GenerationText-to-EmbeddingNeMo Retriever

black-forest-labsFLUX.1-Kontext-dev

FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.

Image GenerationText-to-ImageRun-on-RTX

nvidiallama-3_2-nemoretriever-300m-embed-v1

Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.

Retrieval Augmented GenerationText-to-EmbeddingNeMo Retriever

metallama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

LLM Multimodal SafetyContent SafetyGuardrailContent Moderator

nvidiallama-3.2-nemoretriever-1b-vlm-embed-v1

Multimodal question-answer retrieval representing user queries as text and documents as images.

nemo retrieverembeddingRetrieval Augmented GenerationText-to-Embedding

mistralaimistral-small-3.1-24b-instruct-2503

Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses

language generationchatmultimodalimage understanding

mistralaimistral-medium-3-instruct

Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.

language generationchatImage-to-Textmultimodalvisual question answering

metallama-4-maverick-17b-128e-instruct

A general purpose multimodal, multilingual 128 MoE model with 17B parameters.

language generationchatImage-to-Textvision assistantvisual question answering

metallama-4-scout-17b-16e-instruct

A multimodal, multilingual 16 MoE model with 17B parameters.

language generationchatImage-to-Textvision assistantvisual question answering

nvidiaBuild an AI Agent for Enterprise Research

Build a custom enterprise research assistant powered by state-of-the-art models that process and synthesize multimodal data, enabling reasoning, planning, and refinement to generate comprehensive reports.

NIMLaunchableLlama NemotronReasoningBlueprintEnterpriseRetrieval-Augmented GenerationNVIDIA AINeMo Retriever

nvidianv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nemo retrieverEmbeddingRetrieval Augmented Generation

googlegemma-3-27b-it

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Vision AssistantchatVisual Question AnsweringLanguage GenerationImage-to-Text

microsoftphi-4-multimodal-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Speech RecognitionVisual QAchatLanguage GenerationImage-to-TextChart and Table Understanding

nvidiaBuild an Enterprise RAG Pipeline Blueprint

Power fast, accurate semantic search across multimodal enterprise data with NVIDIA’s RAG Blueprint—built on NeMo Retriever and Nemotron models—to connect your agents to trusted, authoritative sources of knowledge.

NIMLaunchableNemotronBlueprintEnterpriseRetrieval-Augmented GenerationNVIDIA AINeMo Retriever

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retrieverembeddingRetrieval Augmented GenerationText-to-Embedding

metaesm2-650m

Generates embeddings of proteins from their amino acid sequences.

nimProtein EmbeddingBioNemoBiologyDrug Discovery

microsoftphi-3.5-vision-instruct

Cutting-edge open multimodal model exceling in high-quality reasoning from images.

Vision AssistantVisual Question AnsweringLanguage GenerationImage-to-Text

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

Image-to-Embeddingcomputer visiondeepstreamNVIDIA NIMobject Classification

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

Embeddingrun-on-rtxRetrieval Augmented GenerationNemo retrieverText-to-Embedding

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

Computer visionmultimodal embeddingstext and imageRun-on-rtx

nvidianv-embed-v1

Generates high-quality numerical embeddings from text inputs.

Non-Commercial Use OnlyRetrieval Augmented GenerationText-to-Embedding

baaibge-m3

Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.

EmbeddingsRetrieval Augmented GenerationText-to-Embedding