Multi-modal vision-language model that understands text/img/video and creates informative responses
Generates physics-aware video world states from text and image prompts for physical AI development.
Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.
Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A
Estimate gaze angles of a person in a video and redirect to make it frontal.
Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask
EfficientDet-based object detection network to detect 100 specific retail objects from an input video.
Cutting-edge open multimodal model exceling in high-quality reasoning from images.
Stable Video Diffusion (SVD) is a generative diffusion model that leverages a single image as a conditioning frame to synthesize video sequences.