JobBench: Aligning Agent Work With Human Will

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work proposes JobBench, the first benchmark for AI occupational agents centered on human intent and job augmentation rather than economic value or human replacement. Built upon expert-identified high-priority delegable tasks, JobBench encompasses 35 professions and 130 real-world workplace tasks, each accompanied by heterogeneous reference materials and multidimensional scoring criteria. Leveraging a fact-anchored scoring chain mechanism—averaging 35.6 binary judgment points per task—evaluation across 36 models reveals that even the strongest current model, Claude Opus, achieves only 45.9 out of 100, underscoring a substantial gap between existing AI capabilities and the requirements of authentic professional workflows.

📝 Abstract

Current benchmarks for occupational AI agents are scoped primarily by economic values, telling a replacement story. We introduce JobBench, which evaluates AI agents on the workflows that experts identify as high-priority for delegation, empowering humans based on their needs instead of replacing them with GDP value. JobBench covers 130 agentic tasks across 35 occupations. Each task is packaged as a workspace of heterogeneous reference files, requiring the agent to reason through the cluttered information streams of real professional work. Outputs are graded by a fact-anchored chain of rubrics, averaging 35.6 binary criteria per task. We evaluate 36 models; the strongest, Claude Opus~4.7 under Claude Code, reaches only 45.9 %. We hope JobBench shifts the community's target labour-market effect from replacement to enhancement: building agents that do what humans actually want delegated, not only what is most economically valuable.

Problem

Research questions and friction points this paper is trying to address.

AI agent

human-centered AI

job delegation

benchmarking

work enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

JobBench

human-centered AI

task delegation