PrefIx: Understand and Adapt to User Preference in Human-Agent Interaction

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the degradation of user experience in large language model (LLM) agents caused by suboptimal interaction patterns—such as excessive confirmation requests, opaque reasoning, and misaligned pacing—and the lack of evaluation frameworks that account for interaction quality and user preference alignment. To bridge this gap, we propose the Interaction-as-a-Tool (IaaT) paradigm, which formalizes interactive behaviors as structured tool calls and introduces a configurable environment, PrefIx, to jointly optimize task performance and interaction experience. We define 31 user preferences across 14 attributes and, for the first time, treat user experience as a core evaluation metric alongside task accuracy. Using a composite LLM-as-a-Judge mechanism across seven dimensions, our experiments demonstrate that preference-aware agents improve user experience by 7.6% and preference alignment by 18.5%, with the evaluation framework exhibiting high inter-rater reliability (ICC > 0.79), internal consistency (α = 0.943), and strong correlation with human judgments (ρ = 0.52–0.78).

Technology Category

Application Category

📝 Abstract

LLM-based agents can complete tasks correctly yet still frustrate users through poor interaction patterns, such as excessive confirmations, opaque reasoning, or misaligned pacing. Current benchmarks evaluate task accuracy but overlook how agents interact: whether they infer preferences from implicit cues, adapt dynamically, or maintain fine-grained interaction quality. We introduce Prefix, a configurable environment that evaluates both what agents accomplish and how they interact. Central to Prefix is the Interaction-as-a-Tool (IaaT) paradigm, which treats interaction behaviors as structured tool calls, unifying them with existing evaluation frameworks. We define 31 preference settings across 14 attributes and formalize user experience (UX) as a core metric alongside task accuracy. A composite LLM-as-a-Judge mechanism across seven UX dimensions achieves strong aggregate reliability (ICC>0.79), high internal consistency (alpha = 0.943), and human correlation (rho = 0.52-0.78). Preference-aware agents show 7.6% average UX improvement and 18.5% gain in preference alignment. Our work is openly accessible.

Problem

Research questions and friction points this paper is trying to address.

user preference

human-agent interaction

interaction quality

LLM-based agents

user experience

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interaction-as-a-Tool

User Preference Modeling

LLM-based Agent Evaluation