🤖 AI Summary
Existing preference alignment methods (e.g., DPO) underperform in dynamic collaborative tasks due to sparse and skewed belief-misalignment signals from interlocutors, causing models to respond indiscriminately. This paper proposes a dual-strategy decoupling framework: a *friction-state policy* explicitly identifies belief misalignment, while an *intervention policy* generates user-preferred responses. Leveraging analytical optimization, we derive a closed-form solution for single-policy supervised training—bypassing RL complexity. Our method jointly models belief alignment and context-aware friction generation, introducing a controllable “friction” mechanism that stimulates human-AI co-reflection. Evaluated on three benchmarks, it significantly improves the conciseness, interpretability, and out-of-distribution generalization of friction generation. This advances LLMs from passive responders to adaptive “thinking partners.”
📝 Abstract
AI support of collaborative interactions entails mediating potential misalignment between interlocutor beliefs. Common preference alignment methods like DPO excel in static settings, but struggle in dynamic collaborative tasks where the explicit signals of interlocutor beliefs are sparse and skewed. We propose the Frictional Agent Alignment Framework (FAAF), to generate precise, context-aware"friction"that prompts for deliberation and re-examination of existing evidence. FAAF's two-player objective decouples from data skew: a frictive-state policy identifies belief misalignments, while an intervention policy crafts collaborator-preferred responses. We derive an analytical solution to this objective, enabling training a single policy via a simple supervised loss. Experiments on three benchmarks show FAAF outperforms competitors in producing concise, interpretable friction and in OOD generalization. By aligning LLMs to act as adaptive"thought partners"-- not passive responders -- FAAF advances scalable, dynamic human-AI collaboration. Our code and data can be found at https://github.com/csu-signal/FAAF_ACL.