Learning Physical Principles from Interaction: Self-Evolving Planning via Test-Time Memory

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited ability of existing vision-language models to accurately predict the physical behavior of objects—such as rolling trajectories or support stability—in specific environments, due to their inability to adapt to dynamic physical conditions. The authors propose PhysMem, a framework enabling robots to learn physical regularities through environmental interaction at test time without updating model parameters. PhysMem employs a “verify-then-apply” mechanism: the system records experiences, formulates hypotheses, and validates them through targeted interactions before applying them to decision-making, thereby avoiding rigid reuse of past experiences. Integrating a vision-language planner, test-time memory, hypothesis verification, and abstracted experience retrieval, PhysMem achieves a 76% success rate on a real-world block-insertion task—substantially outperforming the 23% baseline—and demonstrates consistent, robust performance gains across multiple tasks and simulation benchmarks.

Technology Category

Application Category

📝 Abstract
Reliable object manipulation requires understanding physical properties that vary across objects and environments. Vision-language model (VLM) planners can reason about friction and stability in general terms; however, they often cannot predict how a specific ball will roll on a particular surface or which stone will provide a stable foundation without direct experience. We present PhysMem, a memory framework that enables VLM robot planners to learn physical principles from interaction at test time, without updating model parameters. The system records experiences, generates candidate hypotheses, and verifies them through targeted interaction before promoting validated knowledge to guide future decisions. A central design choice is verification before application: the system tests hypotheses against new observations rather than applying retrieved experience directly, reducing rigid reliance on prior experience when physical conditions change. We evaluate PhysMem on three real-world manipulation tasks and simulation benchmarks across four VLM backbones. On a controlled brick insertion task, principled abstraction achieves 76% success compared to 23% for direct experience retrieval, and real-world experiments show consistent improvement over 30-minute deployment sessions.
Problem

Research questions and friction points this paper is trying to address.

physical reasoning
robot manipulation
test-time learning
vision-language models
interactive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time memory
physical reasoning
hypothesis verification
vision-language models
robotic manipulation
🔎 Similar Papers
No similar papers found.