MLego: Interactive and Scalable Topic Exploration Through Model Reuse

๐Ÿ“… 2025-08-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional topic modeling approaches (e.g., LDA) incur high computational overhead and latency in real-time exploratory analysis of massive social media text streams, hindering interactive use. Method: This paper proposes an interactive topic querying framework based on model reuse, featuring three key innovations: (1) materialization and dynamic fusion of pre-trained topic models; (2) hierarchical query planning; and (3) batch query re-ranking. It enables low-latency (interactive-scale) topic retrieval and analysis atop pre-built modelsโ€”without retraining. Contribution/Results: The framework significantly reduces computational cost while preserving topic quality. Integrated into a visual analytics prototype, it achieves 10โ€“100ร— lower query latency than full LDA, with comparable accuracy. It effectively bridges the gap between scalable topic modeling and real-time interactive analysis.

Technology Category

Application Category

๐Ÿ“ Abstract
With massive texts on social media, users and analysts often rely on topic modeling techniques to quickly extract key themes and gain insights. Traditional topic modeling techniques, such as Latent Dirichlet Allocation (LDA), provide valuable insights but are computationally expensive, making them impractical for real-time data analysis. Although recent advances in distributed training and fast sampling methods have improved efficiency, real-time topic exploration remains a significant challenge. In this paper, we present MLego, an interactive query framework designed to support real-time topic modeling analysis by leveraging model materialization and reuse. Instead of retraining models from scratch, MLego efficiently merges materialized topic models to construct approximate results at interactive speeds. To further enhance efficiency, we introduce a hierarchical plan search strategy for single queries and an optimized query reordering technique for batch queries. We integrate MLego into a visual analytics prototype system, enabling users to explore large-scale textual datasets through interactive queries. Extensive experiments demonstrate that MLego significantly reduces computation costs while maintaining high-quality topic modeling results. MLego enhances existing visual analytics approaches, which primarily focus on user-driven topic modeling, by enabling real-time, query-driven exploration. This complements traditional methods and bridges the gap between scalable topic modeling and interactive data analysis.
Problem

Research questions and friction points this paper is trying to address.

Enables real-time topic modeling through model reuse
Reduces computation costs while maintaining quality results
Bridges scalable topic modeling with interactive analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages model materialization and reuse
Hierarchical plan search for queries
Optimized query reordering technique
๐Ÿ”Ž Similar Papers
No similar papers found.
F
Fei Ye
School of Computer Science, Fudan University, Shanghai 200438, China
J
Jiapan Liu
School of Computer Science, Fudan University, Shanghai 200438, China
Y
Yinan Jing
School of Computer Science, Fudan University, Shanghai 200438, China
Z
Zhenying He
School of Computer Science, Fudan University, Shanghai 200438, China
W
Weirao Wang
School of Computer Science, Fudan University, Shanghai 200438, China
X. Sean Wang
X. Sean Wang
School of Computer Science, Fudan University
Database SystemsInformation Security and PrivacyWireless Sensor NetworksStreaming Data Processing Time Series QueriesDat