JobBench: Aligning Agent Work With Human Will

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes JobBench, the first benchmark for AI occupational agents centered on human intent and job augmentation rather than economic value or human replacement. Built upon expert-identified high-priority delegable tasks, JobBench encompasses 35 professions and 130 real-world workplace tasks, each accompanied by heterogeneous reference materials and multidimensional scoring criteria. Leveraging a fact-anchored scoring chain mechanism—averaging 35.6 binary judgment points per task—evaluation across 36 models reveals that even the strongest current model, Claude Opus, achieves only 45.9 out of 100, underscoring a substantial gap between existing AI capabilities and the requirements of authentic professional workflows.
📝 Abstract
Current benchmarks for occupational AI agents are scoped primarily by economic values, telling a replacement story. We introduce JobBench, which evaluates AI agents on the workflows that experts identify as high-priority for delegation, empowering humans based on their needs instead of replacing them with GDP value. JobBench covers 130 agentic tasks across 35 occupations. Each task is packaged as a workspace of heterogeneous reference files, requiring the agent to reason through the cluttered information streams of real professional work. Outputs are graded by a fact-anchored chain of rubrics, averaging 35.6 binary criteria per task. We evaluate 36 models; the strongest, Claude Opus~4.7 under Claude Code, reaches only 45.9 %. We hope JobBench shifts the community's target labour-market effect from replacement to enhancement: building agents that do what humans actually want delegated, not only what is most economically valuable.
Problem

Research questions and friction points this paper is trying to address.

AI agent
human-centered AI
job delegation
benchmarking
work enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

JobBench
human-centered AI
task delegation
agent evaluation
workplace augmentation
Yuetai Li
Yuetai Li
University of Washington
LLM AgentLLM ReasoningPost-trainingTrustworthy AI
Yichen Feng
Yichen Feng
University of California, Santa Barbara
Financial MathematicsStochastic Differential GamesMean Field GamesSystematic RiskPortfolio Allocation
Zhangchen Xu
Zhangchen Xu
University of Washington
(^._.^)ノSynthetic DataPost-TrainingSafetyFederated Learning
Zixian Ma
Zixian Ma
University of Washington
Multi-modal models and agentshuman-agent interaction and collaboration
K
Kaiyuan Zheng
University of Washington
Fengqing Jiang
Fengqing Jiang
University of Washington
Large Language ModelPost-trainingSafety and SecurityReasoningReinforcement Learning
Xinghua Sun
Xinghua Sun
Sun Yat-sen University
stochastic modeling of wireless networksmachine learning for networking
Rulin Shao
Rulin Shao
University of Washington
machine learning
Zichen Chen
Zichen Chen
UC Santa Barbara
Agentic LLMTrustworthy AIAI SafetySynthetic Data
Yue Huang
Yue Huang
PhD student, University of Notre Dame
trustworthy AIgenerative modelmachine learningAI for science
Xinyang Han
Xinyang Han
Southern University of Science and Technology
Robot controlEmbedded system
B
Brian Lee
University of Chicago
K
Kayla Xu
Northwestern University
Shenglai Zeng
Shenglai Zeng
Michigan State University
Large language modelsRetrieval-augmented GenerationInformation retrievalAI safety
Hang Hua
Hang Hua
University of Rochester
Computer VisionNatural Language ProcessingMachine Learning
Xiangliang Zhang
Xiangliang Zhang
Leonard C. Bettex Collegiate Professor, Computer Science and Engineering, University of Notre Dame
Machine LearningAI for Science
Basel Alomair
Basel Alomair
King Abdulaziz City for Science and Technology & University of Washington
Information Security and Cryptography
Ranjay Krishna
Ranjay Krishna
University of Washington, Allen Institute for AI
Computer VisionNatural Language ProcessingMachine LearningHuman Computer Interaction
Luke Zettlemoyer
Luke Zettlemoyer
University of Washington; Meta
Natural Language ProcessingSemanticsMachine LearningArtificial Intelligence
Pang Wei Koh
Pang Wei Koh
University of Washington; Allen Institute for AI
Machine learningNatural language processingComputational biology
Bhaskar Ramasubramanian
Bhaskar Ramasubramanian
Western Washington University
reinforcement learningML securityCPS securityformal methodscontrol theory
Luyao Niu
Luyao Niu
University of Washington
CPS securitytrustworthy machine learninggame theory and optimization
Xiang Yue
Xiang Yue
Carnegie Mellon University
Natural Language ProcessingLarge Language ModelsMachine Learning
Radha Poovendran
Radha Poovendran
Professor of ECE, University of Washington
SecurityGamesLearningNetworksCPS