๐ค AI Summary
Traditional topic modeling approaches (e.g., LDA) incur high computational overhead and latency in real-time exploratory analysis of massive social media text streams, hindering interactive use.
Method: This paper proposes an interactive topic querying framework based on model reuse, featuring three key innovations: (1) materialization and dynamic fusion of pre-trained topic models; (2) hierarchical query planning; and (3) batch query re-ranking. It enables low-latency (interactive-scale) topic retrieval and analysis atop pre-built modelsโwithout retraining.
Contribution/Results: The framework significantly reduces computational cost while preserving topic quality. Integrated into a visual analytics prototype, it achieves 10โ100ร lower query latency than full LDA, with comparable accuracy. It effectively bridges the gap between scalable topic modeling and real-time interactive analysis.
๐ Abstract
With massive texts on social media, users and analysts often rely on topic modeling techniques to quickly extract key themes and gain insights. Traditional topic modeling techniques, such as Latent Dirichlet Allocation (LDA), provide valuable insights but are computationally expensive, making them impractical for real-time data analysis. Although recent advances in distributed training and fast sampling methods have improved efficiency, real-time topic exploration remains a significant challenge. In this paper, we present MLego, an interactive query framework designed to support real-time topic modeling analysis by leveraging model materialization and reuse. Instead of retraining models from scratch, MLego efficiently merges materialized topic models to construct approximate results at interactive speeds. To further enhance efficiency, we introduce a hierarchical plan search strategy for single queries and an optimized query reordering technique for batch queries. We integrate MLego into a visual analytics prototype system, enabling users to explore large-scale textual datasets through interactive queries. Extensive experiments demonstrate that MLego significantly reduces computation costs while maintaining high-quality topic modeling results. MLego enhances existing visual analytics approaches, which primarily focus on user-driven topic modeling, by enabling real-time, query-driven exploration. This complements traditional methods and bridges the gap between scalable topic modeling and interactive data analysis.