🤖 AI Summary
This study addresses the challenge of sustaining natural, coherent, and emotionally engaging long-term human–agent interactions, which existing virtual agents struggle to support due to a lack of cross-temporal affective modeling. To bridge this gap, the authors propose the Cross-Temporal Emotion Modeling (CTEM) framework, which establishes a continuous closed loop among behavioral memory, dynamic emotional states, and anticipated future interactions—thereby enabling emotional consistency, reflection, and anticipation. CTEM leverages foundation models to design mechanisms for tracking and updating emotional states, integrating a memory system with user feedback to drive affectively grounded dialogue generation. In a 21-day in-the-wild study, the CTEM-equipped virtual agent Auri significantly enhanced users’ perceived naturalness, coherence, and emotional harmony in interaction.
📝 Abstract
Recent advances in foundation models have enabled conversational agents that aim for sustained companionship rather than mere task completion. Yet most still remain unable to support natural, long-term companion-like interactions, resulting in experiences that feel episodic and inauthentic. We argue that current agents overlooked cross-temporal modeling of agents’ social behaviors and internal emotions: generated behaviors rarely influence an agent’s emotional state, and emotional states seldom shape subsequent behaviors. We present Cross-Temporal Emotion Modeling (CTEM), a framework that links long-term behavioral history to moment-to-moment emotional expression. CTEM establishes a closed loop where past experiences update an evolving emotional state; this state conditions immediate interactions; and user feedback continually revises both memory and emotional state, enabling reflection and anticipation. We instantiate CTEM as Auri, a companion agent on an instant-messaging platform, and report a 21-day in-the-wild study showing that CTEM shows improvements in perceived naturalness, coherence, and emotional harmony.