🤖 AI Summary
This work addresses the fine-grained hierarchical multi-label classification of mobile applications. We propose HMCL, a contrastive learning framework that jointly models textual semantics (app name and description) and hierarchical label structure. Methodologically, HMCL employs a dual-perspective mechanism—combining unconstrained classification with hierarchy-constrained classification—and integrates contrastive learning into the HMCN architecture to enhance both discriminability and hierarchical consistency of application representations. Evaluated on Tencent MyApp Store and multiple public benchmarks, HMCL achieves significant improvements over state-of-the-art methods. Deployed in a real-world credit risk control system, it boosts the KS statistic by 10.70% and has operated stably for over one year. Our core contributions are: (i) the first introduction of contrastive learning to hierarchical multi-label application classification; and (ii) the joint optimization of textual semantics and label hierarchy, enabling more robust and semantically coherent representations.
📝 Abstract
A hierarchical labeling system for mobile applications (apps) benefits a wide range of downstream businesses that integrate the labeling with their proprietary user data, to improve user modeling. Such a label hierarchy can define more granular labels that capture detailed app features beyond the limitations of traditional broad app categories. In this paper, we address the problem of hierarchical multilabel classification for apps by using their textual information such as names and descriptions. We present: 1) HMCN (Hierarchical Multilabel Classification Network) for handling the classification from two perspectives: the first focuses on a multilabel classification without hierarchical constraints, while the second predicts labels sequentially at each hierarchical level considering such constraints; 2) HMCL (Hierarchical Multilabel Contrastive Learning), a scheme that is capable of learning more distinguishable app representations to enhance the performance of HMCN. Empirical results on our Tencent App Store dataset and two public datasets demonstrate that our approach performs well compared with state-of-the-art methods. The approach has been deployed at Tencent and the multilabel classification outputs for apps have helped a downstream task--credit risk management of user--improve its performance by 10.70% with regard to the Kolmogorov-Smirnov metric, for over one year.