
Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.

Accelerate post-training of end-to-end autonomous vehicle stacks with vector search and retrieval for large video datasets.

Reasoning vision language model (VLM) for physical AI and robotics.

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.