Repeated Deceptive Path Planning against Learnable Observer

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Traditional deceptive path planning struggles to maintain long-term stealth against observers endowed with learning capabilities. To address this limitation, this work proposes a Repeated Deceptive Path Planning (RDPP) framework that explicitly models learnable observers for the first time and introduces a bilevel optimization mechanism: an inner loop adapts strategies within each episode, while an outer loop leverages meta-learning to enable cross-episode adaptation, dynamically generating deceptive policies informed by historical trajectory predictions. The proposed DeMP method effectively mitigates the lag in adapting to evolving observer models and significantly outperforms existing approaches across diverse environments, achieving sustained deception against learning-based observers while maintaining competitive path costs.

📝 Abstract

We study the problem of deceptive path planning (DPP), where an agent aims to conceal its true destination from external observers. While existing work assumes static, non-learning observers, real-world adversaries-such as in critical goods transportation or military operations-can adapt by learning from historical trajectories. To address this gap, we introduce Repeated Deceptive Path Planning (RDPP), a new formulation that explicitly models learnable observers. We show that existing DPP methods fail under this setting, as they cannot adapt to evolving adversarial predictions. While incorporating observer previous predictions into updates enables some adaptation, such incremental updates cause accumulative lag that degrades deception. To this end, we propose Deceptive Meta Planning (DeMP), a two-level optimization framework that combines episode-level adaptation, which enables short-term policy adjustment to counter updated observer, and meta-level updates, which leverage cross-episode feedback to capture how observers update their models and accelerate adaptation in future episodes. In this way, DeMP mitigates the accumulation of adaptation lag, enabling sustained deception against a learning observer. Experiments across environments demonstrate that DeMP significantly outperforms existing approaches in RDPP while maintaining competitive path cost. Our results highlight the importance of modeling repeated interactions with learnable adversaries, providing new insights into deception and privacy in multi-agent systems.

Problem

Research questions and friction points this paper is trying to address.

Deceptive Path Planning

Learnable Observer

Repeated Interaction

Adversarial Learning

Privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deceptive Path Planning

Learnable Observer

Meta Planning