Building Data-Driven Occupation Taxonomies: A Bottom-Up Multi-Stage Approach via Semantic Clustering and Multi-Agent Collaboration

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the inefficiency of manual occupational taxonomy construction and the limitations of existing automated approaches—particularly their inability to adapt to dynamic regional labor markets and to derive coherent hierarchical structures from noisy job posting data—this paper proposes CLIMB, a novel framework for learning region-adaptive occupational taxonomies. CLIMB first extracts core occupational clusters via semantic embedding and global hierarchical clustering; it then employs a reflective multi-agent system that iteratively refines hierarchical relationships through multi-round negotiation and feedback. This enables bottom-up, end-to-end generation of high-quality, scalable, semantically consistent, and region-specific occupational taxonomies directly from raw job descriptions. Experiments on three real-world job posting datasets demonstrate that CLIMB significantly outperforms baseline methods in classification coherence, hierarchical plausibility, and regional characteristic capture. The code and datasets are publicly available.

Technology Category

Application Category

📝 Abstract

Creating robust occupation taxonomies, vital for applications ranging from job recommendation to labor market intelligence, is challenging. Manual curation is slow, while existing automated methods are either not adaptive to dynamic regional markets (top-down) or struggle to build coherent hierarchies from noisy data (bottom-up). We introduce CLIMB (CLusterIng-based Multi-agent taxonomy Builder), a framework that fully automates the creation of high-quality, data-driven taxonomies from raw job postings. CLIMB uses global semantic clustering to distill core occupations, then employs a reflection-based multi-agent system to iteratively build a coherent hierarchy. On three diverse, real-world datasets, we show that CLIMB produces taxonomies that are more coherent and scalable than existing methods and successfully capture unique regional characteristics. We release our code and datasets at https://anonymous.4open.science/r/CLIMB.

Problem

Research questions and friction points this paper is trying to address.

Automating creation of occupation taxonomies from job data

Overcoming limitations of manual curation and existing methods

Building coherent hierarchies that capture regional characteristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic clustering for core occupations

Multi-agent system for hierarchy building

Automated data-driven taxonomy creation

🔎 Similar Papers

No similar papers found.