🤖 AI Summary
Addressing the challenges of strong domain dependency, dynamic pattern evolution, and the difficulty of simultaneously achieving interpretability and cross-domain adaptability in online review quality assessment, this paper proposes AutoQual—a framework leveraging large language model (LLM)-based agents that emulate human research paradigms. AutoQual autonomously discovers explicit, computable, and interpretable features from implicit data knowledge via iterative hypothesis generation, self-directed tool invocation, and persistent memory mechanisms—eliminating manual feature engineering. It supports seamless cross-domain transfer and continuous adaptation to evolving data distributions. Deployed in an A/B test on a billion-user e-commerce platform, AutoQual significantly improved user experience: average review views per user increased by 0.79%, and the conversion rate of review readers rose by 0.27%. The framework advances automated, explainable, and scalable review quality modeling for real-world applications.
📝 Abstract
Ranking online reviews by their intrinsic quality is a critical task for e-commerce platforms and information services, impacting user experience and business outcomes. However, quality is a domain-dependent and dynamic concept, making its assessment a formidable challenge. Traditional methods relying on hand-crafted features are unscalable across domains and fail to adapt to evolving content patterns, while modern deep learning approaches often produce black-box models that lack interpretability and may prioritize semantics over quality. To address these challenges, we propose AutoQual, an LLM-based agent framework that automates the discovery of interpretable features. While demonstrated on review quality assessment, AutoQual is designed as a general framework for transforming tacit knowledge embedded in data into explicit, computable features. It mimics a human research process, iteratively generating feature hypotheses through reflection, operationalizing them via autonomous tool implementation, and accumulating experience in a persistent memory. We deploy our method on a large-scale online platform with a billion-level user base. Large-scale A/B testing confirms its effectiveness, increasing average reviews viewed per user by 0.79% and the conversion rate of review readers by 0.27%.