Generalist model to generate future world state as videos from text and image prompts to create synthetic training data for robots and autonomous vehicles.