Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Existing general-purpose embedding models exhibit inconsistent performance and lack standardized evaluation under realistic workplace challenges—including long-tailed label distributions, extreme multi-label scenarios, and data scarcity. Method: We introduce WorkBench, the first comprehensive evaluation suite covering six distinct workplace tasks. We propose the Unified Workplace Embedding (UWE) model, a bidirectional multi-task ranking framework integrating graph-structure enhancement, many-to-many InfoNCE contrastive learning, a dual-encoder architecture, and token-level embeddings—augmented by synthetic data generation and bipartite graph modeling. UWE enables task-agnostic zero-shot ranking and target-space embedding caching for low-latency inference. Results: On WorkBench, UWE significantly outperforms general-purpose embedding models, achieving substantial gains in macro-average Mean Average Precision (MAP) and Recall-Precision@10 (RP@10). These results validate effective cross-task knowledge transfer and practical deployability in real-world workplace applications.

Technology Category

Application Category

📝 Abstract

Workforce transformation across diverse industries has driven an increased demand for specialized natural language processing capabilities. Nevertheless, tasks derived from work-related contexts inherently reflect real-world complexities, characterized by long-tailed distributions, extreme multi-label target spaces, and scarce data availability. The rise of generalist embedding models prompts the question of their performance in the work domain, especially as progress in the field has focused mainly on individual tasks. To this end, we introduce WorkBench, the first unified evaluation suite spanning six work-related tasks formulated explicitly as ranking problems, establishing a common ground for multi-task progress. Based on this benchmark, we find significant positive cross-task transfer, and use this insight to compose task-specific bipartite graphs from real-world data, synthetically enriched through grounding. This leads to Unified Work Embeddings (UWE), a task-agnostic bi-encoder that exploits our training-data structure with a many-to-many InfoNCE objective, and leverages token-level embeddings with task-agnostic soft late interaction. UWE demonstrates zero-shot ranking performance on unseen target spaces in the work domain, enables low-latency inference by caching the task target space embeddings, and shows significant gains in macro-averaged MAP and RP@10 over generalist embedding models.

Problem

Research questions and friction points this paper is trying to address.

Evaluates embedding models on work-related tasks with ranking problems

Addresses data scarcity and complex distributions in workforce NLP tasks

Develops unified embeddings for zero-shot ranking in specialized work domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning with many-to-many InfoNCE objective

Token-level embeddings with soft late interaction

Bipartite graph data structure synthetically enriched

🔎 Similar Papers

No similar papers found.