Multilingual JobBERT for Cross-Lingual Job Title Matching

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses unsupervised cross-lingual job title matching across English, German, Spanish, and Chinese. We propose a contrastive learning–based multilingual semantic alignment method that enhances a monolingual pre-trained model (JobBERT-V2) via synthetic translation augmentation and balanced multilingual dataset expansion—enabling robust multilingual encoding without task-specific annotations. Our approach integrates large-scale multilingual text representation learning with contrastive loss optimization. Key contributions include: (i) the first effective adaptation of a lightweight monolingual architecture to multilingual job title matching; and (ii) novel synthetic translation and data balancing strategies to mitigate language bias. Evaluated on the TalentCLEF 2025 benchmark, our method consistently outperforms state-of-the-art multilingual baselines in both monolingual and cross-lingual settings, demonstrating strong performance stability. The code and models are publicly released.

Technology Category

Application Category

📝 Abstract
We introduce JobBERT-V3, a contrastive learning-based model for cross-lingual job title matching. Building on the state-of-the-art monolingual JobBERT-V2, our approach extends support to English, German, Spanish, and Chinese by leveraging synthetic translations and a balanced multilingual dataset of over 21 million job titles. The model retains the efficiency-focused architecture of its predecessor while enabling robust alignment across languages without requiring task-specific supervision. Extensive evaluations on the TalentCLEF 2025 benchmark demonstrate that JobBERT-V3 outperforms strong multilingual baselines and achieves consistent performance across both monolingual and cross-lingual settings. While not the primary focus, we also show that the model can be effectively used to rank relevant skills for a given job title, demonstrating its broader applicability in multilingual labor market intelligence. The model is publicly available: https://huggingface.co/TechWolf/JobBERT-v3.
Problem

Research questions and friction points this paper is trying to address.

Cross-lingual job title matching across multiple languages
Improving multilingual job title alignment without supervision
Enhancing labor market intelligence with multilingual capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning-based multilingual job title matching
Synthetic translations and balanced multilingual dataset
Efficiency-focused architecture without task-specific supervision
🔎 Similar Papers
No similar papers found.