Enhancing Text-Based Hierarchical Multilabel Classification for Mobile Applications via Contrastive Learning

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fine-grained hierarchical multi-label classification of mobile applications. We propose HMCL, a contrastive learning framework that jointly models textual semantics (app name and description) and hierarchical label structure. Methodologically, HMCL employs a dual-perspective mechanism—combining unconstrained classification with hierarchy-constrained classification—and integrates contrastive learning into the HMCN architecture to enhance both discriminability and hierarchical consistency of application representations. Evaluated on Tencent MyApp Store and multiple public benchmarks, HMCL achieves significant improvements over state-of-the-art methods. Deployed in a real-world credit risk control system, it boosts the KS statistic by 10.70% and has operated stably for over one year. Our core contributions are: (i) the first introduction of contrastive learning to hierarchical multi-label application classification; and (ii) the joint optimization of textual semantics and label hierarchy, enabling more robust and semantically coherent representations.

Technology Category

Application Category

📝 Abstract
A hierarchical labeling system for mobile applications (apps) benefits a wide range of downstream businesses that integrate the labeling with their proprietary user data, to improve user modeling. Such a label hierarchy can define more granular labels that capture detailed app features beyond the limitations of traditional broad app categories. In this paper, we address the problem of hierarchical multilabel classification for apps by using their textual information such as names and descriptions. We present: 1) HMCN (Hierarchical Multilabel Classification Network) for handling the classification from two perspectives: the first focuses on a multilabel classification without hierarchical constraints, while the second predicts labels sequentially at each hierarchical level considering such constraints; 2) HMCL (Hierarchical Multilabel Contrastive Learning), a scheme that is capable of learning more distinguishable app representations to enhance the performance of HMCN. Empirical results on our Tencent App Store dataset and two public datasets demonstrate that our approach performs well compared with state-of-the-art methods. The approach has been deployed at Tencent and the multilabel classification outputs for apps have helped a downstream task--credit risk management of user--improve its performance by 10.70% with regard to the Kolmogorov-Smirnov metric, for over one year.
Problem

Research questions and friction points this paper is trying to address.

Improving hierarchical multilabel classification for mobile apps
Enhancing app representation learning via contrastive learning
Boosting downstream tasks like credit risk management
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Multilabel Classification Network (HMCN)
Hierarchical Multilabel Contrastive Learning (HMCL)
Text-based app classification with hierarchical constraints
🔎 Similar Papers
No similar papers found.
Jiawei Guo
Jiawei Guo
Bupt & M-A-P
LLM MLLM
Y
Yang Xiao
Xidian University, Xi’an, Shaanxi, China
Weipeng Huang
Weipeng Huang
Shenzhen University of Information Technology
G
Guangyuan Piao
Independent Researcher, Dublin, County Dublin, Ireland