
Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.

Multi-modal model to classify safety for input prompts as well output responses.

Multi-modal vision-language model that understands text/img and creates informative responses

Multi-lingual model supporting speech-to-text recognition and translation.

Multi-modal vision-language model that understands text/img/video and creates informative responses

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.