🤖 AI Summary
Existing large language model (LLM) alignment methods for multi-turn dialogue rely excessively on textual content-based rewards, neglecting interactive dynamics—a critical yet underexploited signal.
Method: We propose TRACE, the first framework to introduce “dialogue geometry”: it models geometric properties of dialogue embedding trajectories—such as curvature and divergence—as structured reward signals, enabling explicit modeling of interaction patterns. Crucially, TRACE operates without access to raw dialogue text, ensuring privacy-preserving agent alignment and enabling diagnostic analysis of collaborative behaviors. It integrates geometric rewards with textual rewards in a hybrid reward model optimized via reinforcement learning.
Contribution/Results: Using geometry-only rewards, TRACE achieves 68.20% pairwise accuracy—comparable to the full-text baseline (70.04%). The hybrid model further improves performance to 80.17%, empirically validating that interaction structure is equally essential as content for effective dialogue alignment.
📝 Abstract
The alignment of Large Language Models (LLMs) for multi-turn conversations typically relies on reward signals derived from the content of the text. This approach, however, overlooks a rich, complementary source of signal: the dynamics of the interaction itself. This paper introduces TRACE (Trajectory-based Reward for Agent Collaboration Estimation), a novel reward signal derived from the geometric properties of a dialogue's embedding trajectory--a concept we term'conversational geometry'. Our central finding is that a reward model trained only on these structural signals achieves a pairwise accuracy (68.20%) comparable to a powerful LLM baseline that analyzes the full transcript (70.04%). Furthermore, a hybrid model combining interaction dynamics with textual analysis achieves the highest performance (80.17%), demonstrating their complementary nature. This work provides strong evidence that for interactive settings, how an agent communicates is as powerful a predictor of success as what it says, offering a new, privacy-preserving framework that not only aligns agents but also serves as a diagnostic tool for understanding the distinct interaction patterns that drive successful collaboration.