Rejecting Hallucinated State Targets during Planning

📅 2024-10-09

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

In generative planning, large language models often produce hallucinated goals—semantically plausible yet physically unreachable states—leading to delusional decision-making and safety risks. To address this, we propose a planning framework augmented with a learnable goal evaluator. Its core is a differentiable goal reachability discriminator, synergistically integrating rule-guided neural architecture design with two novel delusion-aware hindsight relabeling strategies. This is the first approach capable of robustly identifying and actively rejecting hallucinated goals without requiring real-world reward signals. The method effectively suppresses planning delusions, improving task success rates by 12.7–34.2% across multiple simulated domains while reducing unsafe action frequency by over 50%. By enabling reliable goal validation within the planning loop, our framework establishes a new paradigm for safe and trustworthy generative agent planning.

Technology Category

Application Category

📝 Abstract

Generative models can be used in planning to propose targets corresponding to states or observations that agents deem either likely or advantageous to experience. However, agents can struggle with hallucinated, infeasible targets proposed by the models, leading to delusional planning behaviors, which raises safety concerns. Drawing inspiration from the human brain, we propose to reject these hallucinated targets with an add-on target evaluator. Without proper training, however, the evaluator can produce delusional estimates, rendering it futile. We propose to address this via a combination of learning rule, architecture, and two novel hindsight relabeling strategies, which leads to correct evaluations of infeasible targets. Our experiments confirm that our approach significantly reduces delusional behaviors and enhances the performance of planning agents.

Problem

Research questions and friction points this paper is trying to address.

reject hallucinated targets

prevent delusional planning

enhance planning performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative models in planning

Add-on target evaluator

Hindsight relabeling strategies

🔎 Similar Papers

No similar papers found.