🤖 AI Summary
Reinforcement learning (RL) for wireless indoor navigation suffers from heavy reliance on handcrafted physical priors, low sample efficiency, and poor generalization. To address these issues, this paper proposes Physics-informed Policy Reinforcement Learning (PiPRL), a physics-guided RL framework. PiPRL introduces a hierarchical neuro-symbolic architecture that explicitly encodes physical inductive biases via a domain-specific symbolic programming language (DSL), enabling interpretable, human-readable physical modeling and policy guidance. It tightly integrates a neural perception module with a symbolic rule engine to synthesize robust navigation policies under partial observability and physical constraints. Experimental results demonstrate that PiPRL reduces training time by over 26% compared to purely neural or purely symbolic baselines. Moreover, it achieves significant improvements in zero-shot cross-scenario navigation—enhancing both generalization capability and policy stability—while substantially reducing dependence on expert domain knowledge.
📝 Abstract
When using reinforcement learning (RL) to tackle physical control tasks, inductive biases that encode physics priors can help improve sample efficiency during training and enhance generalization in testing. However, the current practice of incorporating these helpful physics-informed inductive biases inevitably runs into significant manual labor and domain expertise, making them prohibitive for general users. This work explores a symbolic approach to distill physics-informed inductive biases into RL agents, where the physics priors are expressed in a domain-specific language (DSL) that is human-readable and naturally explainable. Yet, the DSL priors do not translate directly into an implementable policy due to partial and noisy observations and additional physical constraints in navigation tasks. To address this gap, we develop a physics-informed program-guided RL (PiPRL) framework with applications to indoor navigation. PiPRL adopts a hierarchical and modularized neuro-symbolic integration, where a meta symbolic program receives semantically meaningful features from a neural perception module, which form the bases for symbolic programming that encodes physics priors and guides the RL process of a low-level neural controller. Extensive experiments demonstrate that PiPRL consistently outperforms purely symbolic or neural policies and reduces training time by over 26% with the help of the program-based inductive biases.