🤖 AI Summary
E-commerce query classification faces three key challenges: (1) sparse prior information due to short query texts, (2) isolated labels limiting exploitation of semantic and hierarchical label relationships, and (3) overreliance on user posterior click feedback, exacerbating the Matthew effect. To address these, we propose the first unified, scalable semi-supervised framework featuring three orthogonal yet synergistic enhancements: knowledge enhancement (integrating external knowledge graphs), label enhancement (modeling semantic correlations and hierarchical structures among labels), and structural enhancement (a modular, plug-and-play architecture). This design explicitly decouples and jointly optimizes subtasks—including intent identification and category prediction—while substantially reducing dependence on noisy posterior labels. Offline evaluations across multiple benchmarks surpass state-of-the-art methods; online A/B tests demonstrate statistically significant improvements in core business metrics—including CTR and GMV—validating the framework’s effectiveness, robustness, and practical deployability.
📝 Abstract
Query classification, including multiple subtasks such as intent and category prediction, is vital to e-commerce applications. E-commerce queries are usually short and lack context, and the information between labels cannot be used, resulting in insufficient prior information for modeling. Most existing industrial query classification methods rely on users' posterior click behavior to construct training samples, resulting in a Matthew vicious cycle. Furthermore, the subtasks of query classification lack a unified framework, leading to low efficiency for algorithm optimization.
In this paper, we propose a novel Semi-supervised Scalable Unified Framework (SSUF), containing multiple enhanced modules to unify the query classification tasks. The knowledge-enhanced module uses world knowledge to enhance query representations and solve the problem of insufficient query information. The label-enhanced module uses label semantics and semi-supervised signals to reduce the dependence on posterior labels. The structure-enhanced module enhances the label representation based on the complex label relations. Each module is highly pluggable, and input features can be added or removed as needed according to each subtask. We conduct extensive offline and online A/B experiments, and the results show that SSUF significantly outperforms the state-of-the-art models.