🤖 AI Summary
This work addresses the challenge of enabling mobile agents to efficiently select observations under strict time constraints for simultaneous source localization and inference of hidden parameters in physical fields. Exact Bayesian inference is computationally prohibitive, while lightweight belief models often lead to reward hacking. To overcome this, the authors propose Distill-Belief, a novel framework that introduces belief distillation into closed-loop inverse source problems. A teacher module maintains an accurate posterior via particle filtering and provides information-gain signals, from which a student module distills belief statistics for control and uncertainty certificates for termination decisions. Evaluated across seven physical field modalities and two stress tests, the method significantly reduces sensing costs while improving task success rates, estimation accuracy, and posterior contraction, outperforming all existing baselines without succumbing to reward cheating.
📝 Abstract
{Closed-loop inverse source localization and characterization (ISLC) requires a mobile agent to select measurements that localize sources and infer latent field parameters under strict time constraints.} {The core challenge lies in the belief-space objective: valid uncertainty estimation requires expensive Bayesian inference, whereas using fast learned belief model leads to reward hacking, in which the policy exploits approximation errors rather than actually reducing uncertainty.} {We propose \textbf{Distill-Belief}, a teacher--student framework that decouples correctness from efficiency. A Bayes-correct particle-filter teacher maintains the posterior and supplies a dense information-gain signal, while a compact student distills the posterior into belief statistics for control and an uncertainty certificate for stopping. At deployment, only the student is used, yielding constant per-step cost.} {Experiments on seven field modalities and two stress tests show that Distill-Belief consistently reduces sensing cost and improves success, posterior contraction, and estimation accuracy over baselines, while mitigating reward hacking.}