Multi-modal vision-language model that understands text/img and creates informative responses
Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.
Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A