ArabJobs: A Multinational Corpus of Arabic Job Ads

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A critical shortage of Arabic job advertisement datasets—capable of capturing regional dialectal variation and supporting fairness-aware NLP and labor market research—hampers progress in this domain. To address this, we introduce ArabJobs, the first cross-national Arabic job advertisement corpus, covering Egypt, Jordan, Saudi Arabia, and the UAE, comprising over 8,500 job postings and 550,000 tokens. We systematically annotate language variety, geographic origin, salary information, occupational categories, and gendered linguistic representations. Our analysis reveals novel patterns in dialect distribution and latent gender bias within Arabic recruitment texts. Leveraging ArabJobs, we establish four benchmark tasks: salary prediction, occupation normalization, gender bias detection, and multi-label occupation classification. Empirical evaluation demonstrates the corpus’s utility for fairness-aware NLP and structural labor market analysis. The dataset is publicly released to foster reproducible, equitable research.

Technology Category

Application Category

📝 Abstract
ArabJobs is a publicly available corpus of Arabic job advertisements collected from Egypt, Jordan, Saudi Arabia, and the United Arab Emirates. Comprising over 8,500 postings and more than 550,000 words, the dataset captures linguistic, regional, and socio-economic variation in the Arab labour market. We present analyses of gender representation and occupational structure, and highlight dialectal variation across ads, which offers opportunities for future research. We also demonstrate applications such as salary estimation and job category normalisation using large language models, alongside benchmark tasks for gender bias detection and profession classification. The findings show the utility of ArabJobs for fairness-aware Arabic NLP and labour market research. The dataset is publicly available on GitHub: https://github.com/drelhaj/ArabJobs.
Problem

Research questions and friction points this paper is trying to address.

Analyzing gender representation and occupational structure in Arab labor markets
Detecting dialectal variations across multinational Arabic job advertisements
Developing fairness-aware NLP applications for Arabic labor market analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Corpus of Arabic job ads from four countries
Analysis of gender representation and occupational structure
Salary estimation using large language models
🔎 Similar Papers
No similar papers found.