🤖 AI Summary
Current conversational information access (CONIAC) systems lack a unified, human-centered evaluation framework. Method: This paper introduces CAFE—the first consensus-driven, multidimensional evaluation framework for CONIAC—systematically integrating six core dimensions: stakeholder goals, user tasks, user characteristics, evaluation criteria, methodologies, and quantitative metrics. Moving beyond traditional technology-centric, unidimensional paradigms, CAFE employs world-model abstraction and interdisciplinary consensus workshops—drawing on human-computer interaction, information retrieval, and evaluation science—to structurally align system objectives with human factors. Contribution/Results: As the first internationally recognized, consensus-driven evaluation framework for conversational agents, CAFE has been formally published in the Dagstuhl Perspectives Workshop series. It provides both theoretical foundations and practical guidelines for the design, evaluation, and standardization of CONIAC systems.
📝 Abstract
During the workshop, we deeply discussed what CONversational Information ACcess (CONIAC) is and its unique features, proposing a world model abstracting it, and defined the Conversational Agents Framework for Evaluation (CAFE) for the evaluation of CONIAC systems, consisting of six major components: 1) goals of the system's stakeholders, 2) user tasks to be studied in the evaluation, 3) aspects of the users carrying out the tasks, 4) evaluation criteria to be considered, 5) evaluation methodology to be applied, and 6) measures for the quantitative criteria chosen.