🤖 AI Summary
This work addresses unsupervised cross-lingual job title matching across English, German, Spanish, and Chinese. We propose a contrastive learning–based multilingual semantic alignment method that enhances a monolingual pre-trained model (JobBERT-V2) via synthetic translation augmentation and balanced multilingual dataset expansion—enabling robust multilingual encoding without task-specific annotations. Our approach integrates large-scale multilingual text representation learning with contrastive loss optimization. Key contributions include: (i) the first effective adaptation of a lightweight monolingual architecture to multilingual job title matching; and (ii) novel synthetic translation and data balancing strategies to mitigate language bias. Evaluated on the TalentCLEF 2025 benchmark, our method consistently outperforms state-of-the-art multilingual baselines in both monolingual and cross-lingual settings, demonstrating strong performance stability. The code and models are publicly released.
📝 Abstract
We introduce JobBERT-V3, a contrastive learning-based model for cross-lingual job title matching. Building on the state-of-the-art monolingual JobBERT-V2, our approach extends support to English, German, Spanish, and Chinese by leveraging synthetic translations and a balanced multilingual dataset of over 21 million job titles. The model retains the efficiency-focused architecture of its predecessor while enabling robust alignment across languages without requiring task-specific supervision. Extensive evaluations on the TalentCLEF 2025 benchmark demonstrate that JobBERT-V3 outperforms strong multilingual baselines and achieves consistent performance across both monolingual and cross-lingual settings. While not the primary focus, we also show that the model can be effectively used to rank relevant skills for a given job title, demonstrating its broader applicability in multilingual labor market intelligence. The model is publicly available: https://huggingface.co/TechWolf/JobBERT-v3.