Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited long-horizon reasoning capability of robots in extended tasks due to the absence of structural priors. To overcome this, the authors propose the ENAP framework, which adaptively emerges a two-level neuro-symbolic policy directly from raw visuomotor trajectories—without requiring handcrafted symbolic rules or task labels. At the high level, an interpretable Mealy machine is automatically inferred via clustering and the L* algorithm for task planning; at the low level, a residual network learns continuous control guided by this discrete structure. ENAP achieves, for the first time, end-to-end co-learning of symbolic structures and continuous policies. Evaluated on complex long-horizon manipulation tasks under low-data regimes, it outperforms state-of-the-art vision-language-action (VLA) policies by up to 27%, demonstrating superior sample efficiency, interpretability, and structured intent representation.
📝 Abstract
Scaling robot learning to long-horizon tasks remains a formidable challenge. While end-to-end policies often lack the structural priors needed for effective long-term reasoning, traditional neuro-symbolic methods rely heavily on hand-crafted symbolic priors. To address the issue, we introduce ENAP (Emergent Neural Automaton Policy), a framework that allows a bi-level neuro-symbolic policy adaptively emerge from visuomotor demonstrations. Specifically, we first employ adaptive clustering and an extension of the L* algorithm to infer a Mealy state machine from visuomotor data, which serves as an interpretable high-level planner capturing latent task modes. Then, this discrete structure guides a low-level reactive residual network to learn precise continuous control via behavior cloning (BC). By explicitly modeling the task structure with discrete transitions and continuous residuals, ENAP achieves high sample efficiency and interpretability without requiring task-specific labels. Extensive experiments on complex manipulation and long-horizon tasks demonstrate that ENAP outperforms state-of-the-art (SoTA) end-to-end VLA policies by up to 27% in low-data regimes, while offering a structured representation of robotic intent (Fig. 1).
Problem

Research questions and friction points this paper is trying to address.

long-horizon tasks
neuro-symbolic methods
symbolic structure
visuomotor trajectories
robot learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic learning
Mealy automaton
visuomotor trajectory
behavior cloning
structured policy
🔎 Similar Papers
No similar papers found.