- Unaligned Newsletter
- Posts
- What are World Models?
What are World Models?
Thank you to our Sponsor: Grow Max Value

World Models are AI systems that learn an internal model of how an environment works over time, then use that model to predict what happens next and to support planning. Instead of only recognizing objects or producing text, a World Model tries to capture dynamics, meaning how scenes change and how actions lead to outcomes. In research terms, a World Model learns a compressed representation of observations, plus a transition function that forecasts future states, so an agent can simulate possible futures before choosing what to do.
• A World Model builds a “state” from raw inputs such as video, sensor streams, or game logs
• A World Model predicts what the next state could be under different actions
• A World Model supports planning by comparing multiple simulated futures
What makes a World Model different from a regular generative model
Many generative models can create images or video that look plausible. A World Model has an extra requirement: consistency under interaction. The output needs to remain coherent when a user or an agent changes something, moves through the scene, or takes a sequence of actions. This is why the idea often shows up in model-based reinforcement learning, where a learned simulator can be used to train or guide decision making with fewer expensive real-world trials.
• The goal is not only “looks real” but “behaves predictably under actions”
• The value comes from running many hypothetical futures quickly
• The output should carry uncertainty, not only a single confident story
The core building blocks in plain language
A practical World Model stack usually has four pieces, even if different companies name them differently.
Perception and compression
The system ingests raw observations such as camera frames, lidar sweeps, or game engine state, then compresses them into a smaller internal representation that keeps what matters.
Dynamics prediction
Given the current internal representation plus a candidate action, the system predicts what could happen next. This prediction can be rolled forward many steps.
Rendering or reconstruction
The model can translate internal representations back into human readable outputs, such as images, 3D assets, or sensor streams.
Planning or control
An agent uses predicted futures to select actions that improve outcomes, such as safety, speed, reward, or task completion.
• Compression makes simulation cheaper than operating on raw data
• Prediction enables “try many options before committing to one”
• Rendering makes results inspectable for humans and test harnesses
• Planning converts predictions into decisions
World Labs and spatial World Models
World Labs describes itself as a spatial intelligence company building models that can perceive, generate, reason, and interact with the 3D world. That positioning fits the World Model idea because reliable 3D interaction requires consistent geometry, stable object identity, and controllable changes across viewpoint and time.
World Labs also published “Marble: A Multimodal World Model,” describing world models as systems that can reconstruct, generate, and simulate 3D worlds, and support interaction by humans and agents. Marble itself is presented as a product that lets users create, edit, and share persistent 3D worlds from prompts.
Why this matters for World Models
A tool that produces editable 3D environments is making a strong claim. Editing implies the model is not only generating a snapshot but maintaining structure that survives changes. That is where “world model” becomes more than marketing language, because the system must support operations like repositioning objects, adjusting layouts, and preserving spatial consistency.
• World Labs frames the target as interaction with the 3D world, not only depiction
• Marble explicitly describes reconstruction, generation, and simulation in 3D
• Persistence and editability suggest a structured internal scene representation
Roblox and World Models for interactive creation
Roblox is pushing a World Model direction from a different angle: user generated games. Roblox recently announced “4D generation” powered by its Cube Foundation Model, describing 4D as adding interactivity so generated objects behave as players expect. Roblox launched AI tech called “4D creation” in beta that can generate functional in game models from natural language prompts, not only static 3D shapes, such as making a vehicle that works as a vehicle.
This is an important World Model use case because the output is not just geometry. The output includes behavior. A functional object must follow rules of physics, player interaction, and the game engine’s logic. In effect, Roblox is trying to let creators specify intent and have the system generate both form and function.
Why this counts as a World Model style problem
When a system generates interactive objects, the system is implicitly modeling how the virtual world responds to actions. Even if the engine enforces physics, the AI still has to generate structures and scripts that work inside that world. That requires an internal understanding of causal relationships, such as “door opens after input” or “vehicle accelerates when driven.”
• Roblox positions 4D as interactivity layered on generated 3D objects
• Functional objects created from natural language prompts
• Functionality implies the model is learning patterns of action and response, not only shape
Waymo and World Models for autonomous driving simulation
Autonomous driving needs World Models because real roads contain rare but critical situations. Simulation lets teams test safety behavior without waiting for those situations to happen in the wild. Waymo introduced “The Waymo World Model” as a frontier generative model aimed at large scale, hyper realistic autonomous driving simulation.
Waymo’s simulator can generate realistic 3D environments for testing robotaxis in a wide range of conditions, including unusual edge cases, with controls through language prompts, scene layouts, and driving actions. The key theme is controllable simulation at scale, which is exactly where World Models become operationally valuable.
What makes a driving World Model especially demanding
Driving simulation needs more than visuals. It needs consistent multi sensor outputs, plausible agent behavior for other vehicles and pedestrians, and long horizon stability so scenarios remain coherent over time. Even small inaccuracies can mislead evaluation if the simulator drifts away from real world statistics.
• Waymo frames the World Model as a step change for large scale simulation
• Controllable scenario generation for hard edge cases
• The quality bar includes behavior realism and sensor realism, not only graphics
Thank you to our Sponsor: FlashLabs

Why World Models are so important to robotics
Robotics is where World Models stop being a nice idea and become a practical requirement. A robot must act in the physical world, where mistakes cost time, money, safety, and hardware. The robot also faces constant change: lighting shifts, objects move, surfaces vary, people behave unpredictably, and sensors drop frames. A World Model helps a robot handle this by giving the robot a way to predict consequences, plan safely, and learn faster from fewer real-world trials.
Robots have three big pain points that World Models directly address.
Data is expensive and slow
A robot cannot collect training data as cheaply as a software system. Every attempt takes time, supervision, wear, and risk. A World Model reduces how many real attempts are needed by letting the robot “practice” in imagination and by using simulation to test many options quickly.
The world is partially observed
A robot never sees everything. Cameras have blind spots. Hands block views. Lidar misses thin objects. A World Model can maintain memory of what was seen earlier and infer what is likely true even when the robot cannot currently observe it, such as where an object probably is after a brief occlusion.
Planning needs predictions, not reactions
Reactive policies can work in stable settings, yet many robotic tasks require multi-step plans. Picking up a mug includes reaching, avoiding collision, grasping, lifting, and placing, with branches if the mug slips. World Models allow rollouts that compare different action sequences, so the robot chooses a safer and more reliable path.
This becomes even more important for manipulation, where tiny errors compound.
A small pose error leads to a missed grasp.
A missed grasp leads to reattempts.
Reattempts cause timeouts, damage, or unsafe interactions.
A World Model helps detect the drift early and select a corrective plan.
World Models also support sim to real transfer. A robot can train in simulation, then adapt to reality by updating the World Model using real sensor streams. Better world modeling narrows the gap between simulated behavior and real-world behavior, which is one of the hardest problems in robotics.
• Robots need prediction because physical actions have irreversible costs
• Simulation plus world modeling reduces real world trial count and hardware wear
• Hidden state helps handle occlusion, partial views, and sensor dropouts
• Long horizon rollouts help with multi step tasks like manipulation and navigation
• Better world modeling can improve sim to real transfer by aligning dynamics
Thank you to our Sponsor: EezyCollab
One concept, three very different products
World Labs, Roblox, and Waymo all use the World Model idea, but each emphasizes a different requirement.
World Labs: 3D spatial consistency and editability
The focus is on generating and modifying environments with persistent structure.
Roblox: interactive object generation for creators
The focus is on function, playability, and fast creation pipelines.
Waymo: safety focused, controllable simulation for autonomous driving
The focus is on coverage of rare events and accurate testing at enormous scale.
• Same underlying ambition, learn dynamics and support planning
• Different success metrics, creativity workflow, player expectation, or safety assurance
• Different risk profile, creative glitches versus safety critical mistakes
How World Models are evaluated
Evaluation is harder than scoring a text model. A World Model can look correct and still be wrong in ways that matter. Good evaluation therefore mixes several tests.
Prediction accuracy under distribution
How well does the model forecast what happens next for typical situations in the training distribution.
Counterfactual consistency
If a small change is applied, does the result change in the right way while other parts remain stable.
Long horizon stability
When rolled forward many steps, does the simulation remain coherent, or does it drift into nonsense.
Control responsiveness
When given a prompt or action constraint, does the model obey it reliably.
Uncertainty calibration
When the model is unsure, does it express that uncertainty rather than committing to a single crisp narrative.
• “Looks real” is not enough because plausible errors can still mislead decisions
• The best systems separate measured facts from inferred elements, especially for safety
• Controls and constraints matter as much as raw generative quality
The biggest open problems
Even with major progress, a few problems remain central across all three domains.
Causal correctness versus pattern matching
A model can learn correlations that look right, yet still misunderstand causes. In driving this can show up in rare interactions. In games this can show up in emergent exploits. In 3D creation this can show up in broken geometry after edits.
Data bias and missing edge cases
World Models learn from what they see. If certain weather, regions, or behaviors are rare in data, the model can underperform exactly where stakes are high.
Security and misuse
Interactive generation can be turned into exploit generation. A tool that generates working scripts or behaviors must be hardened against malicious intent.
Tooling and verification
Even strong models need validation tooling. Developers need ways to inspect assumptions, stress test scenarios, and reproduce results.
• Correlation can mimic understanding, especially in high dimensional video and sensor data
• Rare events are expensive to collect but essential to simulate well
• Safety and security must be designed into both training and product interfaces
Why World Models matter right now
Across robotics, driving, and virtual creation, the economic advantage comes from faster iteration. A good World Model compresses experimentation time by letting teams test many variants quickly. In Roblox the win is more creators shipping more experiences. In Waymo the win is broader safety coverage without risky road testing. In World Labs the win is faster creation of consistent 3D environments that remain editable and persistent.
• Faster iteration is the practical benefit across domains
• Simulation plus control turns generation into a decision tool, not only a content tool
• The next step is tighter integration with agents that plan, act, and learn from rollouts
Looking to sponsor our Newsletter and Scoble’s X audience?
By sponsoring our newsletter, your company gains exposure to a curated group of AI-focused subscribers which is an audience already engaged in the latest developments and opportunities within the industry. This creates a cost-effective and impactful way to grow awareness, build trust, and position your brand as a leader in AI.
Sponsorship packages include:
Dedicated ad placements in the Unaligned newsletter
Product highlights shared with Scoble’s 500,000+ X followers
Curated video features and exclusive content opportunities
Flexible formats for creative brand storytelling
📩 Interested? Contact [email protected], @samlevin on X, +1-415-827-3870
Just Three Things
According to Scoble and Cronin, the top three relevant and recent happenings
OpenAI Unveils GPT 5.3 Codex, a Coding Model That Helped Build Itself
OpenAI released a new coding model called GPT 5.3 Codex and says early versions helped speed up Codex’s own development by assisting with debugging training, deployment work, and evaluation. OpenAI also says the model runs faster than the prior version while using fewer resources, and has been classified as “high capability” for cybersecurity related tasks, which raises both excitement about faster AI progress and concern about safety as the feedback loop tightens between AI tools and the next generation of AI systems. NBC News
Goldman Sachs Uses Claude Agents to Automate Accounting and Compliance
Goldman Sachs has spent about six months working with embedded Anthropic engineers to build Claude based AI agents for high volume back office work, starting with trade and transaction accounting plus client vetting and onboarding. The bank expects the agents to speed up reconciliation and onboarding and help limit future headcount growth rather than trigger immediate layoffs, and executives say Claude’s ability to handle complex, rules driven tasks beyond coding was a key surprise. CNBC
SportsLine AI Sets Super Bowl 60 Picks for Seahawks vs. Patriots
SportsLine’s self learning AI published betting style picks and a score prediction for Super Bowl 60 on February 8, 2026, featuring the Seattle Seahawks versus the New England Patriots at Levi’s Stadium. The article highlights current betting lines with Seattle favored by about 4.5 points and a total near 45.5, then argues both sides can cover by pointing to Seattle’s elite scoring defense and balanced offense led by Sam Darnold, plus New England’s strong defense and high scoring offense led by Drake Maye. CBS Sports
Scoble’s Top Five X Posts







