🤖 AI Summary
This study addresses the lack of systematic, quantitative evaluation of AI’s automation capabilities in remote work. We introduce the Remote Labor Index (RLI), the first cross-industry, economically grounded benchmark for assessing remote labor automation. RLI establishes an end-to-end evaluation framework grounded in real-world remote tasks and jointly measures AI agents’ practical productivity along three dimensions—knowledge understanding, logical reasoning, and tool execution—using two complementary metrics: task completion rate and automation rate. Empirical evaluation reveals that state-of-the-art AI models achieve only a 2.5% overall automation rate on RLI, indicating their nascent capability in authentic remote work settings. By providing a reproducible, scalable, and economically meaningful quantification methodology, RLI bridges a critical gap between laboratory-based automation assessment and real-world labor market impact analysis.
📝 Abstract
AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real-world, economically valuable projects designed to evaluate end-to-end agent performance in practical settings. AI agents perform near the floor on RLI, with the highest-performing agent achieving an automation rate of 2.5%. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking AI impacts and enabling stakeholders to proactively navigate AI-driven labor automation.