🤖 AI Summary
Skill extraction from job postings heavily relies on large language models (LLMs), resulting in high computational overhead and slow inference—hindering real-time and scalable deployment.
Method: We propose a lightweight, efficient framework comprising (i) ConTeXT-match, a contrastive learning scheme with token-level attention; (ii) Skill-XL, the first sentence-level, fine-grained skill annotation benchmark; and (iii) JobBERT V2, an enhanced model integrating dual-encoder architecture, extreme multi-label classification, and skill-driven semantic normalization.
Contribution/Results: Our approach achieves state-of-the-art performance on skill identification while accelerating inference by multiple orders of magnitude. Skill-XL enables robust, fine-grained evaluation. JobBERT V2 balances accuracy, latency, and scalability—demonstrating strong suitability for industrial deployment in labor market analytics. The framework significantly advances the operationalization of skill intelligence at scale.
📝 Abstract
Labor market analysis relies on extracting insights from job advertisements, which provide valuable yet unstructured information on job titles and corresponding skill requirements. While state-of-the-art methods for skill extraction achieve strong performance, they depend on large language models (LLMs), which are computationally expensive and slow. In this paper, we propose extbf{ConTeXT-match}, a novel contrastive learning approach with token-level attention that is well-suited for the extreme multi-label classification task of skill classification. extbf{ConTeXT-match} significantly improves skill extraction efficiency and performance, achieving state-of-the-art results with a lightweight bi-encoder model. To support robust evaluation, we introduce extbf{Skill-XL}, a new benchmark with exhaustive, sentence-level skill annotations that explicitly address the redundancy in the large label space. Finally, we present extbf{JobBERT V2}, an improved job title normalization model that leverages extracted skills to produce high-quality job title representations. Experiments demonstrate that our models are efficient, accurate, and scalable, making them ideal for large-scale, real-time labor market analysis.