π€ AI Summary
Identifying critical system issues from massive volumes of unstructured user feedback in billion-scale online services remains challenging due to noise, semantic ambiguity, and the subtle signals of high-severity problems.
Method: Leveraging 50 million real-world user feedback instances, this study employs an empirical approach integrating qualitative content analysis, statistical modeling, and machine learning to systematically investigate associations between feedback features and issue severity.
Contribution/Results: (1) Feedback contains substantial irrelevant noise, necessitating robust pre-filtering; (2) Single-text features are insufficient for detecting certain high-risk issuesβnovel temporal features (e.g., topic stability over time) significantly improve detection; (3) Feedback topic distributions exhibit strong temporal stability, enabling reliable longitudinal modeling. The work validates the feasibility of ML-driven feedback analysis, uncovers distributional patterns and detection limits of actionable signals, and establishes a reusable feature taxonomy, methodological framework, and empirical foundation for feedback-driven problem detection.
π Abstract
Background: It has long been suggested that user feedback, typically written in natural language by end-users, can help issue detection. However, for large-scale online service systems that receive a tremendous amount of feedback, it remains a challenging task to identify severe issues from user feedback. Aims: To develop a better feedback-based issue detection approach, it is crucial first to gain a comprehensive understanding of the characteristics of user feedback in real production systems. Method: In this paper, we conduct an empirical study on 50,378,766 user feedback items from six real-world services in a one-billion-user online service system. We first study what users provide in their feedback. We then examine whether certain features of feedback items can be good indicators of severe issues. Finally, we investigate whether adopting machine learning techniques to analyze user feedback is reasonable. Results: Our results show that a large proportion of user feedback provides irrelevant information about system issues. As a result, it is crucial to filter out issue-irrelevant information when processing user feedback. Moreover, we find severe issues that cannot be easily detected based solely on user feedback characteristics. Finally, we find that the distributions of the feedback topics in different time intervals are similar. This confirms that designing machine learning-based approaches is a viable direction for better analyzing user feedback. Conclusions: We consider that our findings can serve as an empirical foundation for feedback-based issue detection in large-scale service systems, which sheds light on the design and implementation of practical issue detection approaches.