🤖 AI Summary
To address the inefficiency of manual occupational taxonomy construction and the limitations of existing automated approaches—particularly their inability to adapt to dynamic regional labor markets and to derive coherent hierarchical structures from noisy job posting data—this paper proposes CLIMB, a novel framework for learning region-adaptive occupational taxonomies. CLIMB first extracts core occupational clusters via semantic embedding and global hierarchical clustering; it then employs a reflective multi-agent system that iteratively refines hierarchical relationships through multi-round negotiation and feedback. This enables bottom-up, end-to-end generation of high-quality, scalable, semantically consistent, and region-specific occupational taxonomies directly from raw job descriptions. Experiments on three real-world job posting datasets demonstrate that CLIMB significantly outperforms baseline methods in classification coherence, hierarchical plausibility, and regional characteristic capture. The code and datasets are publicly available.
📝 Abstract
Creating robust occupation taxonomies, vital for applications ranging from job recommendation to labor market intelligence, is challenging. Manual curation is slow, while existing automated methods are either not adaptive to dynamic regional markets (top-down) or struggle to build coherent hierarchies from noisy data (bottom-up). We introduce CLIMB (CLusterIng-based Multi-agent taxonomy Builder), a framework that fully automates the creation of high-quality, data-driven taxonomies from raw job postings. CLIMB uses global semantic clustering to distill core occupations, then employs a reflection-based multi-agent system to iteratively build a coherent hierarchy. On three diverse, real-world datasets, we show that CLIMB produces taxonomies that are more coherent and scalable than existing methods and successfully capture unique regional characteristics. We release our code and datasets at https://anonymous.4open.science/r/CLIMB.