Repeated Deceptive Path Planning against Learnable Observer

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
Traditional deceptive path planning struggles to maintain long-term stealth against observers endowed with learning capabilities. To address this limitation, this work proposes a Repeated Deceptive Path Planning (RDPP) framework that explicitly models learnable observers for the first time and introduces a bilevel optimization mechanism: an inner loop adapts strategies within each episode, while an outer loop leverages meta-learning to enable cross-episode adaptation, dynamically generating deceptive policies informed by historical trajectory predictions. The proposed DeMP method effectively mitigates the lag in adapting to evolving observer models and significantly outperforms existing approaches across diverse environments, achieving sustained deception against learning-based observers while maintaining competitive path costs.
📝 Abstract
We study the problem of deceptive path planning (DPP), where an agent aims to conceal its true destination from external observers. While existing work assumes static, non-learning observers, real-world adversaries-such as in critical goods transportation or military operations-can adapt by learning from historical trajectories. To address this gap, we introduce Repeated Deceptive Path Planning (RDPP), a new formulation that explicitly models learnable observers. We show that existing DPP methods fail under this setting, as they cannot adapt to evolving adversarial predictions. While incorporating observer previous predictions into updates enables some adaptation, such incremental updates cause accumulative lag that degrades deception. To this end, we propose Deceptive Meta Planning (DeMP), a two-level optimization framework that combines episode-level adaptation, which enables short-term policy adjustment to counter updated observer, and meta-level updates, which leverage cross-episode feedback to capture how observers update their models and accelerate adaptation in future episodes. In this way, DeMP mitigates the accumulation of adaptation lag, enabling sustained deception against a learning observer. Experiments across environments demonstrate that DeMP significantly outperforms existing approaches in RDPP while maintaining competitive path cost. Our results highlight the importance of modeling repeated interactions with learnable adversaries, providing new insights into deception and privacy in multi-agent systems.
Problem

Research questions and friction points this paper is trying to address.

Deceptive Path Planning
Learnable Observer
Repeated Interaction
Adversarial Learning
Privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deceptive Path Planning
Learnable Observer
Meta Planning
Repeated Interaction
Adversarial Adaptation
S
Shiyue Cao
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China, National Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institution of Automation, Chinese Academy of Sciences, Beijing, China
P
Pei Xu
National Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institution of Automation, Chinese Academy of Sciences, Beijing, China
L
Likun Yang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China, National Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institution of Automation, Chinese Academy of Sciences, Beijing, China
Lei Cui
Lei Cui
Deakin University, School of IT
S
Shizhao Yu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China, National Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institution of Automation, Chinese Academy of Sciences, Beijing, China
Shiyu Zhang
Shiyu Zhang
天津大学
计算机视觉,多模态大模型
Y
Yongjian Ren
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China, National Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institution of Automation, Chinese Academy of Sciences, Beijing, China
X
Xiaotang Chen
National Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institution of Automation, Chinese Academy of Sciences, Beijing, China
K
Kaiqi Huang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China, National Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institution of Automation, Chinese Academy of Sciences, Beijing, China