Efficient Text Encoders for Labor Market Analysis

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Skill extraction from job postings heavily relies on large language models (LLMs), resulting in high computational overhead and slow inference—hindering real-time and scalable deployment. Method: We propose a lightweight, efficient framework comprising (i) ConTeXT-match, a contrastive learning scheme with token-level attention; (ii) Skill-XL, the first sentence-level, fine-grained skill annotation benchmark; and (iii) JobBERT V2, an enhanced model integrating dual-encoder architecture, extreme multi-label classification, and skill-driven semantic normalization. Contribution/Results: Our approach achieves state-of-the-art performance on skill identification while accelerating inference by multiple orders of magnitude. Skill-XL enables robust, fine-grained evaluation. JobBERT V2 balances accuracy, latency, and scalability—demonstrating strong suitability for industrial deployment in labor market analytics. The framework significantly advances the operationalization of skill intelligence at scale.

Technology Category

Application Category

📝 Abstract

Labor market analysis relies on extracting insights from job advertisements, which provide valuable yet unstructured information on job titles and corresponding skill requirements. While state-of-the-art methods for skill extraction achieve strong performance, they depend on large language models (LLMs), which are computationally expensive and slow. In this paper, we propose extbf{ConTeXT-match}, a novel contrastive learning approach with token-level attention that is well-suited for the extreme multi-label classification task of skill classification. extbf{ConTeXT-match} significantly improves skill extraction efficiency and performance, achieving state-of-the-art results with a lightweight bi-encoder model. To support robust evaluation, we introduce extbf{Skill-XL}, a new benchmark with exhaustive, sentence-level skill annotations that explicitly address the redundancy in the large label space. Finally, we present extbf{JobBERT V2}, an improved job title normalization model that leverages extracted skills to produce high-quality job title representations. Experiments demonstrate that our models are efficient, accurate, and scalable, making them ideal for large-scale, real-time labor market analysis.

Problem

Research questions and friction points this paper is trying to address.

Efficient skill extraction from job ads using lightweight models

Addressing redundancy in large skill label spaces

Improving job title normalization with skill-enhanced representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning with token-level attention

Lightweight bi-encoder model for efficiency

Improved job title normalization using skills

🔎 Similar Papers

No similar papers found.