
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.

Reasoning vision language model (VLM) for physical AI and robotics.

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.