Can User Feedback Help Issue Detection? An Empirical Study on a One-billion-user Online Service System

πŸ“… 2025-08-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Identifying critical system issues from massive volumes of unstructured user feedback in billion-scale online services remains challenging due to noise, semantic ambiguity, and the subtle signals of high-severity problems. Method: Leveraging 50 million real-world user feedback instances, this study employs an empirical approach integrating qualitative content analysis, statistical modeling, and machine learning to systematically investigate associations between feedback features and issue severity. Contribution/Results: (1) Feedback contains substantial irrelevant noise, necessitating robust pre-filtering; (2) Single-text features are insufficient for detecting certain high-risk issuesβ€”novel temporal features (e.g., topic stability over time) significantly improve detection; (3) Feedback topic distributions exhibit strong temporal stability, enabling reliable longitudinal modeling. The work validates the feasibility of ML-driven feedback analysis, uncovers distributional patterns and detection limits of actionable signals, and establishes a reusable feature taxonomy, methodological framework, and empirical foundation for feedback-driven problem detection.

Technology Category

Application Category

πŸ“ Abstract
Background: It has long been suggested that user feedback, typically written in natural language by end-users, can help issue detection. However, for large-scale online service systems that receive a tremendous amount of feedback, it remains a challenging task to identify severe issues from user feedback. Aims: To develop a better feedback-based issue detection approach, it is crucial first to gain a comprehensive understanding of the characteristics of user feedback in real production systems. Method: In this paper, we conduct an empirical study on 50,378,766 user feedback items from six real-world services in a one-billion-user online service system. We first study what users provide in their feedback. We then examine whether certain features of feedback items can be good indicators of severe issues. Finally, we investigate whether adopting machine learning techniques to analyze user feedback is reasonable. Results: Our results show that a large proportion of user feedback provides irrelevant information about system issues. As a result, it is crucial to filter out issue-irrelevant information when processing user feedback. Moreover, we find severe issues that cannot be easily detected based solely on user feedback characteristics. Finally, we find that the distributions of the feedback topics in different time intervals are similar. This confirms that designing machine learning-based approaches is a viable direction for better analyzing user feedback. Conclusions: We consider that our findings can serve as an empirical foundation for feedback-based issue detection in large-scale service systems, which sheds light on the design and implementation of practical issue detection approaches.
Problem

Research questions and friction points this paper is trying to address.

Identifying severe issues from massive user feedback in large-scale systems
Evaluating user feedback features as indicators of system issues
Assessing machine learning feasibility for feedback-based issue detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyze 50M user feedback items empirically
Filter irrelevant info via machine learning
Detect issues using feedback topic distributions
πŸ”Ž Similar Papers
No similar papers found.
S
Shuyao Jiang
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
Jiazhen Gu
Jiazhen Gu
Postdoctoral fellow, CUHK
Software EngineeringReliabilityCloud Computing
W
Wujie Zheng
Tencent Inc., Shenzhen, China
Y
Yangfan Zhou
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Michael R. Lyu
Michael R. Lyu
Professor of Computer Science & Engineering, The Chinese University of Hong Kong
software engineeringsoftware reliabilityfault tolerancemachine learningdistributed systems