
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Multi-modal vision-language model that understands text/img and creates informative responses

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.