🤖 AI Summary
Existing evaluation paradigms for intelligent agents overemphasize single-turn task completion, neglecting the inherently iterative nature of real-world problems and the essential human–agent collaboration—where user goals are often ambiguous and dynamically evolving. Empirical evidence shows that state-of-the-art agents underperform in multi-turn collaborative settings, primarily due to their inability to sustain user engagement and support cognitive understanding through adaptive assistance.
Method: We propose the Collaborative Effort Expansion (CEE) evaluation framework, the first to formally model the quantitative relationship between user engagement and agent utility, enabling systematic assessment of an agent’s capacity to foster shared understanding and scaffold collaborative processes over sustained interaction.
Contribution/Results: Validated via integrated case studies and controlled simulations, CEE effectively identifies capability gaps of agents in authentic collaborative scenarios and provides both theoretical foundations and practical design principles for next-generation cognitive-augmenting collaborative agents.
📝 Abstract
Current evaluations of agents remain centered around one-shot task completion, failing to account for the inherently iterative and collaborative nature of many real-world problems, where human goals are often underspecified and evolve. We argue for a shift from building and assessing task completion agents to developing collaborative agents, assessed not only by the quality of their final outputs but by how well they engage with and enhance human effort throughout the problem-solving process. To support this shift, we introduce collaborative effort scaling, a framework that captures how an agent's utility grows with increasing user involvement. Through case studies and simulated evaluations, we show that state-of-the-art agents often underperform in multi-turn, real-world scenarios, revealing a missing ingredient in agent design: the ability to sustain engagement and scaffold user understanding. Collaborative effort scaling offers a lens for diagnosing agent behavior and guiding development toward more effective interactions.