Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
Multi-modal vision-language model that understands text/img and creates informative responses
Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.
Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A