Now that LLMs and foundation models have become everyday concepts, the next big idea in AI is world models.
Language models like ChatGPT or Claude predict the next word or “token” and are great at writing, coding, and conversational tasks. However, these systems operate on a probability distribution over text. They recognize language about cause and effect but don’t truly model it.
World models, by contrast, predict the next state. Instead of asking “what word or concept comes next?” they ask “what happens next?”. Through direct observational data and simulation, they build an internal model of system dynamics. And unlike foundation models, which operate on probability, world models can understand cause-and-effect. Given a current state and an action, the model can simulate forward and predict the next state.
Think of one of these models as a toddler learning about gravity. They can’t do it by reading a book and they certainly can’t derive F=mg from Newton’s law of universal gravitation; rather, they pick up the intuition of the concept when they see a bottle of milk or a toy fall off a table.
Robotics and autonomous systems are the obvious examples. A robot needs to understand how objects move, how forces interact, and how environments change. Language alone isn’t enough. But the implications are broader.
We're already seeing capital flow into this space. World model startups are raising significant rounds, and the applications span a surprisingly wide spectrum. On one end, you have creative and generative use cases: think controllable scene and world generation where you can edit what appears in the frame and navigate through it. On the other end, you have companies modeling complex industrial system behaviors with true cause and effect.
The next major leap in AI won’t just be better at talking about the world. It will be better at understanding how the world works. Expect to hear much more about world models in the months to come.

