Aligning Agentic World Models via Knowledgeable Experience Learning
Abstract
WorldMind addresses the modal disconnect in LLMs by autonomously building a symbolic world knowledge repository that enhances physical feasibility and task optimality through experience-based learning.
Current Large Language Models (LLMs) exhibit a critical modal disconnect: they possess vast semantic knowledge but lack the procedural grounding to respect the immutable laws of the physical world. Consequently, while these agents implicitly function as world models, their simulations often suffer from physical hallucinations-generating plans that are logically sound but physically unexecutable. Existing alignment strategies predominantly rely on resource-intensive training or fine-tuning, which attempt to compress dynamic environmental rules into static model parameters. However, such parametric encapsulation is inherently rigid, struggling to adapt to the open-ended variability of physical dynamics without continuous, costly retraining. To bridge this gap, we introduce WorldMind, a framework that autonomously constructs a symbolic World Knowledge Repository by synthesizing environmental feedback. Specifically, it unifies Process Experience to enforce physical feasibility via prediction errors and Goal Experience to guide task optimality through successful trajectories. Experiments on EB-ALFRED and EB-Habitat demonstrate that WorldMind achieves superior performance compared to baselines with remarkable cross-model and cross-environment transferability.
Community
WorldMind helps language models stop making physically impossible plans by learning real-world rules from feedback and successful experiences, rather than retraining the model itself.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- From Word to World: Can Large Language Models be Implicit Text-based World Models? (2025)
- VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents (2025)
- Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts (2026)
- Can We Predict Before Executing Machine Learning Agents? (2026)
- Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning (2025)
- Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models (2026)
- ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper