Stable Diffusion 3.5 is a popular text-to-image generation model
FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
Multilingual, cross-lingual embedding model for long-document QA retrieval, supporting 26 languages.
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
Multimodal question-answer retrieval representing user queries as text and documents as images.
Build advanced AI agents within the biomedical domain using the AI-Q Blueprint and the BioNeMo Virtual Screening Blueprint
Multi-modal vision-language model that understands text/img and creates informative responses
FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses
FLUX.1 is a state-of-the-art suite of image generation models
Build a custom deep researcher powered by state-of-the-art models that continuously process and synthesize multimodal enterprise data, enabling reasoning, planning, and refinement to generate comprehensive reports.
Generate exponentially large amounts of synthetic motion trajectories for robot manipulation from just a few human demonstrations.
Generalist model to generate future world state as videos from text and image prompts to create synthetic training data for robots and autonomous vehicles.
Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.
The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Continuously extract, embed, and index multimodal data for fast, accurate semantic search. Built on world-class NeMo Retriever models, the RAG blueprint connects AI applications to multimodal enterprise data wherever it resides.
Multi-modal vision-language model that understands text/img/video and creates informative responses
Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.
Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.
Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.
Advanced AI model detects faces and identifies deep fake images.
Create intelligent virtual assistants for customer service across every industry
Cutting-edge vision-language model exceling in high-quality reasoning from images.
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
Robust image classification model for detecting and managing AI-generated content.
Vision foundation model capable of performing diverse computer vision and vision language tasks.
English text embedding model for question-answering retrieval.
Multilingual text question-answering retrieval, transforming textual information into dense vector representations.
Advanced text-to-image model for generating high quality images
Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask
EfficientDet-based object detection network to detect 100 specific retail objects from an input video.
GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.