π€ AI Summary
To address low classification accuracy, difficulty in modeling hierarchical industry-category relationships, and cold-start challenges in online job recruitment, this paper proposes Caroteneβa semantic-enhanced representation learning framework that jointly leverages a hierarchical taxonomy and a job similarity graph structure. Carotene is the first approach to simultaneously encode hierarchical constraints of occupational classification systems and graph-structured relational semantics, embedding both jobs and categories into a shared latent space. It introduces a hierarchical classification loss and integrates graph neural networks for end-to-end optimization. Evaluated on a large-scale real-world job dataset, Carotene significantly outperforms state-of-the-art baselines, achieving substantial gains in classification accuracy. Empirical results demonstrate its strong capability in capturing both hierarchical semantics and structural dependencies. The framework establishes a scalable, robust paradigm for dynamic job matching, personalized job recommendation, and labor market analytics.
π Abstract
In the dynamic realm of online recruitment, accurate job classification is paramount for optimizing job recommendation systems, search rankings, and labor market analyses. As job markets evolve, the increasing complexity of job titles and descriptions necessitates sophisticated models that can effectively leverage intricate relationships within job data. Traditional text classification methods often fall short, particularly due to their inability to fully utilize the hierarchical nature of industry categories. To address these limitations, we propose a novel representation learning and classification model that embeds jobs and hierarchical industry categories into a latent embedding space. Our model integrates the Standard Occupational Classification (SOC) system and an in-house hierarchical taxonomy, Carotene, to capture both graph and hierarchical relationships, thereby improving classification accuracy. By embedding hierarchical industry categories into a shared latent space, we tackle cold start issues and enhance the dynamic matching of candidates to job opportunities. Extensive experimentation on a large-scale dataset of job postings demonstrates the model's superior ability to leverage hierarchical structures and rich semantic features, significantly outperforming existing methods. This research provides a robust framework for improving job classification accuracy, supporting more informed decision-making in the recruitment industry.