Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses three critical bottlenecks in social network information cascade popularity prediction: temporal leakage, feature poverty (i.e., absence of authentic conversion signals), and computational inefficiency. To tackle these challenges, we propose a systematic solution: (1) a temporally un-leaked evaluation protocol that rigorously prevents future-information leakage; (2) Taoke, the first large-scale e-commerce cascade dataset featuring multi-stage conversion behaviors (e.g., likes, comments, purchases); and (3) CasTemp, a lightweight model integrating Jaccard-based neighbor selection, temporal random walks, GRU-based encoding, and spatiotemporal attention to efficiently capture cross-cascade dependencies and temporal dynamics. Experiments demonstrate that CasTemp achieves state-of-the-art performance across four benchmark datasets, accelerates training by several orders of magnitude, and significantly improves practical utility—particularly for predicting downstream conversions such as purchases.

Technology Category

Application Category

📝 Abstract
Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) temporal leakage in current evaluation--random cascade-based splits allow models to access future information, yielding unrealistic results; (2) feature-poor datasets that lack downstream conversion signals (e.g., likes, comments, or purchases), which limits more practical applications; (3) computational inefficiency of complex graph-based methods that require days of training for marginal gains. We systematically address these challenges from three perspectives: task setup, dataset construction, and model design. First, we propose a time-ordered splitting strategy that chronologically partitions data into consecutive windows, ensuring models are evaluated on genuine forecasting tasks without future information leakage. Second, we introduce Taoke, a large-scale e-commerce cascade dataset featuring rich promoter/product attributes and ground-truth purchase conversions--capturing the complete diffusion lifecycle from promotion to monetization. Third, we develop CasTemp, a lightweight framework that efficiently models cascade dynamics through temporal walks, Jaccard-based neighbor selection for inter-cascade dependencies, and GRU-based encoding with time-aware attention. Under leak-free evaluation, CasTemp achieves state-of-the-art performance across four datasets with orders-of-magnitude speedup. Notably, it excels at predicting second-stage popularity conversions--a practical task critical for real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Addressing temporal leakage in cascade prediction evaluation methods
Overcoming feature-poor datasets lacking conversion signals like purchases
Solving computational inefficiency of complex graph-based training methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-ordered splitting prevents future information leakage
Large-scale e-commerce dataset includes purchase conversion signals
Lightweight framework models cascade dynamics with temporal walks
🔎 Similar Papers
No similar papers found.
J
Jie Peng
Renmin University of China, Beijing, China
R
Rui Wang
Alibaba, Beijing, China
Q
Qiang Wang
Alibaba, Beijing, China
Zhewei Wei
Zhewei Wei
Renmin University of China
Graph AlgorithmsStreaming AlgorithmsAI4ScienceAI4DB
Bin Tong
Bin Tong
Alibaba, Beijing, China
G
Guan Wang
Alibaba, Beijing, China