🤖 AI Summary
Existing AI evaluations predominantly rely on static benchmarking, failing to detect emergent harms—such as emotional dependency, social manipulation, and cognitive overload—that arise during prolonged human-AI interaction. To address this gap, we propose a novel dynamic evaluation paradigm centered on *interactional ethics*, moving beyond conventional single-turn output assessment. We introduce the first integrated framework for interactional ethical evaluation, combining controlled human-AI interaction experiments, NLP-based behavioral analysis, validated social science psychometric scales, and computational modeling of human impact. Our framework explicitly incorporates the evolution of human-AI relationships, downstream societal effects, and cognitive consequences as core evaluation dimensions. The project yields actionable principles for identifying interactional harms and scenario-specific evaluation protocols, thereby bridging a critical gap in safety assessment for generative AI deployed in authentic, long-term interaction settings. This work establishes a methodological foundation for context-aware, usage-oriented AI governance.
📝 Abstract
Current AI evaluation paradigms that rely on static, model-only tests fail to capture harms that emerge through sustained human-AI interaction. As interactive AI systems, such as AI companions, proliferate in daily life, this mismatch between evaluation methods and real-world use becomes increasingly consequential. We argue for a paradigm shift toward evaluation centered on extit{interactional ethics}, which addresses risks like inappropriate human-AI relationships, social manipulation, and cognitive overreliance that develop through repeated interaction rather than single outputs. Drawing on human-computer interaction, natural language processing, and the social sciences, we propose principles for evaluating generative models through interaction scenarios and human impact metrics. We conclude by examining implementation challenges and open research questions for researchers, practitioners, and regulators integrating these approaches into AI governance frameworks.