Preemptive Detection and Correction of Misaligned Actions in LLM Agents

📅 2024-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM)-based agents often execute high-risk actions—such as inadvertently clicking “Buy Now”—due to misalignment between their behavior and user intent in real-world scenarios. To address this, we propose InferAct, the first Theory-of-Mind–inspired *proactive* belief-reasoning framework. InferAct models the alignment status between user intent and agent beliefs *prior to action execution*, integrating endogenous belief inference, real-time intent verification, and human-in-the-loop confirmation to proactively detect and intervene in misaligned behaviors. Experiments across three mainstream task categories demonstrate a 20% improvement in Macro-F1 for misaligned-action detection, significantly enhancing decision reliability and user controllability. InferAct bridges a critical gap in LLM-agent research by enabling *active* intent alignment—moving beyond reactive correction toward anticipatory, human-centered control.

Technology Category

Application Category

📝 Abstract
Deploying LLM-based agents in real-life applications often faces a critical challenge: the misalignment between agents' behavior and user intent. Such misalignment may lead agents to unintentionally execute critical actions that carry negative outcomes (e.g., accidentally triggering a"buy-now"in web shopping), resulting in undesirable or even irreversible consequences. Although addressing these issues is crucial, the preemptive detection and correction of misaligned actions remains relatively underexplored. To fill this gap, we introduce InferAct, a novel approach that leverages the belief reasoning ability of LLMs, grounded in Theory-of-Mind, to detect misaligned actions before execution. Once the misalignment is detected, InferAct alerts users for timely correction, preventing adverse outcomes and enhancing the reliability of LLM agents' decision-making processes. Experiments on three widely used tasks demonstrate that InferAct achieves up to 20% improvements on Marco-F1 against baselines in misaligned action detection. An in-depth evaluation of misalignment correction further highlights InferAct's effectiveness in improving agent alignment.
Problem

Research questions and friction points this paper is trying to address.

Inconsistent Behavior
Unpredictability
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

InferAct
Theoretical Mind Ability
Behavior Correction
🔎 Similar Papers
No similar papers found.