Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

Reasoning vision language model (VLM) for physical AI and robotics.

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.