NVIDIA
Explore Models Blueprints GPUs
Terms of Use

|

Privacy Policy

|

Manage My Privacy

|

Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: Multi-modal
Sorting by Most Recent

nvidiaAI Weather Analytics with Earth-2

Develop AI powered weather analysis and forecasting application visualizing multi-layered geospatial data.

blueprintclimate scienceenterpriseweather simulationai weather predictionnvidia aiearth-2nvidia

nvidiaTest Multi-Robot Fleets for Industrial Automation

Simulate, test, and optimize physical AI and robotic fleets at scale in industrial digital twins before real-world deployment.

industrialnvidia omniverseblueprintsimulationenterpriseomniverse blueprintnvidia

nvidiabevformer

Advanced transformer for multi-frame bird's-eye-view 3D perception in autonomous driving.

autonomous vehiclesbevautomotiveperceptionnvidia

nvidiacanary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asraststreamingspeech-to-textbatchspanishmultilingualnvidia nimnvidia rivanvidia

nvidiacanary-0.6b-turbo-asr

Multi-lingual model supporting speech-to-text recognition and translation.

asrastfastspeech-to-textbatchmultilingualnvidia nimnvidia rivanvidia

nvidiaPDF to Podcast

Transform PDFs into AI podcasts for engaging on-the-go audio content.

blueprintmulti-modallaunchabletext-to-speechconversational aipdf-to-podcastnvidia aiai agenttext-to-speechnvidia

nvidiacosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

vlmvision language modelimage captionimage to textnvidia

abacusaidracarys-llama-3.1-70b-instruct

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

chatcode generationtext-to-textabacusai

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

vlmvision language modelimage captionimage to textnvidia

baaibge-m3

Embedding model for text retrieval tasks, excelling in dense, multi-vector, and sparse retrieval.

embeddingsretrieval augmented generationtext-to-embeddingbaai

nvidianeva-22b

Multi-modal vision-language model that understands text/images and generates informative responses

imagecvvision assistantnon-commercial use onlyvlmvisual question answeringcomputer visionimage-to-textvideonvidia

adeptfuyu-8b

Multi-modal model for a wide range of tasks, including image understanding and language generation.

imagecvmultimodalvlmcomputer visionimage understandinglanguage generationimage-to-textvideoadept