Stable Diffusion 3.5 is a popular text-to-image generation model
FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
Reasoning vision language model (VLM) for physical AI and robotics.
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
Multimodal question-answer retrieval representing user queries as text and documents as images.
Multi-modal vision-language model that understands text/img and creates informative responses
FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses
Create high quality images using Flux.1 in ComfyUI, guided by 3D.
FLUX.1 is a state-of-the-art suite of image generation models
Generate exponentially large amounts of synthetic motion trajectories for robot manipulation from just a few human demonstrations.
Generalist model to generate future world state as videos from text and image prompts to create synthetic training data for robots and autonomous vehicles.
Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.
Cutting-edge open multimodal model exceling in high-quality reasoning from images.
Cutting-edge vision-language model exceling in retrieving text and metadata from images.
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Multi-modal vision-language model that understands text/img/video and creates informative responses
Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.
Advanced AI model detects faces and identifies deep fake images.
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
Cutting-edge vision-language model exceling in high-quality reasoning from images.
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
Robust image classification model for detecting and managing AI-generated content.
Cutting-edge open multimodal model exceling in high-quality reasoning from images.
Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.
Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.
Vision foundation model capable of performing diverse computer vision and vision language tasks.
Powers complex conversations with superior contextual understanding, reasoning and text generation.
Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.
Cutting-edge text generation model text understanding, transformation, and code generation.
Cutting-edge text generation model text understanding, transformation, and code generation.
Advanced text-to-image model for generating high quality images
Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask
EfficientDet-based object detection network to detect 100 specific retail objects from an input video.
A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG.
Powers complex conversations with superior contextual understanding, reasoning and text generation.
Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.