🤖 AI Summary
This paper addresses the critical problem of inconsistent, nonstandardized, and multidimensionally incomplete evaluation metrics for misinformation warning interventions—particularly regarding cognitive, affective, attitudinal, and trust-related impacts. To resolve this, we propose the first four-dimensional assessment framework: behavioral impact, credibility/gullibility, usability, and cognitive-psychological effects. Through a systematic literature review and cross-study comparative analysis of evaluation metrics, we identify key deficiencies in existing work: inconsistent measurement of cognitive attitudes, absence of standardized affective impact metrics, substantial heterogeneity in trust assessment, and insufficient inclusivity in warning design. Our contribution is a comprehensive, dimensionally explicit taxonomy of evaluation metrics, which clarifies critical research gaps and establishes both a theoretical foundation and practical guidelines for developing reproducible, comparable, and human-centered evaluation systems for misinformation warnings.
📝 Abstract
Misinformation has become a widespread issue in the 21st century, impacting numerous areas of society and underscoring the need for effective intervention strategies. Among these strategies, user-centered interventions, such as warning systems, have shown promise in reducing the spread of misinformation. Many studies have used various metrics to evaluate the effectiveness of these warning interventions. However, no systematic review has thoroughly examined these metrics in all studies. This paper provides a comprehensive review of existing metrics for assessing the effectiveness of misinformation warnings, categorizing them into four main groups: behavioral impact, trust and credulity, usability, and cognitive and psychological effects. Through this review, we identify critical challenges in measuring the effectiveness of misinformation warnings, including inconsistent use of cognitive and attitudinal metrics, the lack of standardized metrics for affective and emotional impact, variations in user trust, and the need for more inclusive warning designs. We present an overview of these metrics and propose areas for future research.