🤖 AI Summary
Addressing the challenges of data sparsity and dynamic heterogeneity in popularity prediction of information cascades within social networks, this paper proposes the first large language model (LLM) transfer framework that models cascade diffusion as an autoregressive sequence. Methodologically, it introduces (1) a structure-aware cascade graph tokenization mechanism that explicitly encodes both topological and temporal diffusion context, and (2) a cascade-specific prompt learning paradigm enabling effective LLM adaptation under few-shot settings—without full-parameter fine-tuning. Evaluated on multiple real-world datasets, the approach significantly outperforms state-of-the-art methods in cascade popularity prediction. It exhibits strong generalization across diverse platforms and domains while inheriting the scalable parameter efficiency inherent to LLMs. The framework thus bridges structural modeling of diffusion processes with the expressive power of foundation models, offering a principled and practical solution for dynamic cascade forecasting.
📝 Abstract
Popularity prediction in information cascades plays a crucial role in social computing, with broad applications in viral marketing, misinformation control, and content recommendation. However, information propagation mechanisms, user behavior, and temporal activity patterns exhibit significant diversity, necessitating a foundational model capable of adapting to such variations. At the same time, the amount of available cascade data remains relatively limited compared to the vast datasets used for training large language models (LLMs). Recent studies have demonstrated the feasibility of leveraging LLMs for time-series prediction by exploiting commonalities across different time-series domains. Building on this insight, we introduce the Autoregressive Information Cascade Predictor (AutoCas), an LLM-enhanced model designed specifically for cascade popularity prediction. Unlike natural language sequences, cascade data is characterized by complex local topologies, diffusion contexts, and evolving dynamics, requiring specialized adaptations for effective LLM integration. To address these challenges, we first tokenize cascade data to align it with sequence modeling principles. Next, we reformulate cascade diffusion as an autoregressive modeling task to fully harness the architectural strengths of LLMs. Beyond conventional approaches, we further introduce prompt learning to enhance the synergy between LLMs and cascade prediction. Extensive experiments demonstrate that AutoCas significantly outperforms baseline models in cascade popularity prediction while exhibiting scaling behavior inherited from LLMs. Code is available at this repository: https://anonymous.4open.science/r/AutoCas-85C6