🤖 AI Summary
Large language models (LLMs) suffer from hallucination and outdated knowledge when deployed in dynamic social media environments due to reliance on static training data. To address this, we propose a retrieval-augmented generation (RAG) framework tailored for community response prediction. Grounded in social computing principles, our method jointly retrieves historical community interactions (e.g., X platform posts) and heterogeneous external knowledge (e.g., news, policy documents), then co-models ideological, affective, and semantic features to enable fine-grained prediction of responses to both real and hypothetical posts. Unlike conventional RAG approaches, our framework overcomes key adaptability bottlenecks in social contexts and introduces the first dynamic reasoning paradigm supporting counterfactual social analysis. Evaluated across six representative X-platform scenarios, our method achieves >10% average improvement in key metrics, significantly enhancing response diversity, ideological sensitivity, and factual accuracy.
📝 Abstract
This paper introduces SCRAG, a prediction framework inspired by social computing, designed to forecast community responses to real or hypothetical social media posts. SCRAG can be used by public relations specialists (e.g., to craft messaging in ways that avoid unintended misinterpretations) or public figures and influencers (e.g., to anticipate social responses), among other applications related to public sentiment prediction, crisis management, and social what-if analysis. While large language models (LLMs) have achieved remarkable success in generating coherent and contextually rich text, their reliance on static training data and susceptibility to hallucinations limit their effectiveness at response forecasting in dynamic social media environments. SCRAG overcomes these challenges by integrating LLMs with a Retrieval-Augmented Generation (RAG) technique rooted in social computing. Specifically, our framework retrieves (i) historical responses from the target community to capture their ideological, semantic, and emotional makeup, and (ii) external knowledge from sources such as news articles to inject time-sensitive context. This information is then jointly used to forecast the responses of the target community to new posts or narratives. Extensive experiments across six scenarios on the X platform (formerly Twitter), tested with various embedding models and LLMs, demonstrate over 10% improvements on average in key evaluation metrics. A concrete example further shows its effectiveness in capturing diverse ideologies and nuances. Our work provides a social computing tool for applications where accurate and concrete insights into community responses are crucial.