🤖 AI Summary
This work addresses the challenge of accurately assessing the popularity of online comments, which is influenced by multiple factors including linguistic quality, originality, emotional resonance, and platform-specific style preferences. To this end, the authors introduce HotComment, a multimodal benchmark that unifies comment popularity modeling across three dimensions: content quality assessment, popularity prediction, and user behavior simulation. They further propose StyleCmt, a novel model incorporating a style alignment mechanism to simulate social ripple effects and enhance consistency between comment expression and community stylistic norms. By integrating video and textual modalities, the framework combines interpretable quality evaluation, interaction-grounded trend forecasting, and agent-based user engagement simulation, significantly improving the accuracy of both popularity prediction and user interaction modeling across diverse platforms.
📝 Abstract
Online comments play a crucial role in shaping public sentiment and opinion dynamics on social media. However, evaluating their popularity remains challenging, not only because it depends on linguistic quality, originality, and emotional resonance, but also because stylistic preferences vary widely across platforms and user groups, causing the same comment to resonate differently in different communities. In this work, we present HotComment, a multimodal benchmark integrating video and text modalities that comprehensively quantifies popularity from three enhanced aspects: (1) Content Quality, which evaluates semantic similarity with ground-truth human comments and extends quality assessment through four interpretable dimensions; (2) Popularity Prediction, based on trends from models trained on real-world interaction data; and (3) User Behavior Simulation, which models the distribution of platform users and approximates \textbf{engagement scores} through an agent-based framework. Furthermore, we propose StyleCmt, inspired by social ripple effects, where multiple stylistic dimensions align to amplify socially resonant expressions and suppress incongruent ones.