🤖 AI Summary
Existing social bot detection methods inadequately model neighborhood preference heterogeneity and lack theoretical foundations for motif-based discrimination.
Method: This paper proposes a Naive Bayes–based heterogeneous motif theory framework: (i) node labeling refines motif structures to explicitly capture heterogeneous neighbor preferences; (ii) it theoretically quantifies the maximum discriminative capacity of heterogeneous motifs for the first time; and (iii) leverages this capacity to design a capability-driven Top-K motif selection mechanism, drastically reducing computational overhead without sacrificing performance. The approach integrates heterogeneous network motif mining, probabilistic modeling, and pairwise node contribution analysis.
Results: Evaluated on four public datasets, the method outperforms state-of-the-art approaches across five metrics. Notably, using only ~10% of high-capacity motifs achieves over 98% of the detection performance attained by the full motif set.
📝 Abstract
Identifying social bots has become a critical challenge due to their significant influence on social media ecosystems. Despite advancements in detection methods, most topology-based approaches insufficiently account for the heterogeneity of neighborhood preferences and lack a systematic theoretical foundation, relying instead on intuition and experience. Here, we propose a theoretical framework for detecting social bots utilizing heterogeneous motifs based on the Naïve Bayes model. Specifically, we refine homogeneous motifs into heterogeneous ones by incorporating node-label information, effectively capturing the heterogeneity of neighborhood preferences. Additionally, we systematically evaluate the contribution of different node pairs within heterogeneous motifs to the likelihood of a node being identified as a social bot. Furthermore, we mathematically quantify the maximum capability of each heterogeneous motif, enabling the estimation of its potential benefits. Comprehensive evaluations on four large, publicly available benchmarks confirm that our method surpasses state-of-the-art techniques, achieving superior performance across five evaluation metrics. Moreover, our results reveal that selecting motifs with the highest capability achieves detection performance comparable to using all heterogeneous motifs. Overall, our framework offers an effective and theoretically grounded solution for social bot detection, significantly enhancing cybersecurity measures in social networks.