Building Data-Driven Occupation Taxonomies: A Bottom-Up Multi-Stage Approach via Semantic Clustering and Multi-Agent Collaboration

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency of manual occupational taxonomy construction and the limitations of existing automated approaches—particularly their inability to adapt to dynamic regional labor markets and to derive coherent hierarchical structures from noisy job posting data—this paper proposes CLIMB, a novel framework for learning region-adaptive occupational taxonomies. CLIMB first extracts core occupational clusters via semantic embedding and global hierarchical clustering; it then employs a reflective multi-agent system that iteratively refines hierarchical relationships through multi-round negotiation and feedback. This enables bottom-up, end-to-end generation of high-quality, scalable, semantically consistent, and region-specific occupational taxonomies directly from raw job descriptions. Experiments on three real-world job posting datasets demonstrate that CLIMB significantly outperforms baseline methods in classification coherence, hierarchical plausibility, and regional characteristic capture. The code and datasets are publicly available.

Technology Category

Application Category

📝 Abstract
Creating robust occupation taxonomies, vital for applications ranging from job recommendation to labor market intelligence, is challenging. Manual curation is slow, while existing automated methods are either not adaptive to dynamic regional markets (top-down) or struggle to build coherent hierarchies from noisy data (bottom-up). We introduce CLIMB (CLusterIng-based Multi-agent taxonomy Builder), a framework that fully automates the creation of high-quality, data-driven taxonomies from raw job postings. CLIMB uses global semantic clustering to distill core occupations, then employs a reflection-based multi-agent system to iteratively build a coherent hierarchy. On three diverse, real-world datasets, we show that CLIMB produces taxonomies that are more coherent and scalable than existing methods and successfully capture unique regional characteristics. We release our code and datasets at https://anonymous.4open.science/r/CLIMB.
Problem

Research questions and friction points this paper is trying to address.

Automating creation of occupation taxonomies from job data
Overcoming limitations of manual curation and existing methods
Building coherent hierarchies that capture regional characteristics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic clustering for core occupations
Multi-agent system for hierarchy building
Automated data-driven taxonomy creation
🔎 Similar Papers
No similar papers found.