🤖 AI Summary
This study challenges the prevailing view that structured human data labor is merely a transient phase in AI development, arguing instead for its enduring role as a necessary factor of production. By constructing a micro-founded model incorporating task composition, diminishing marginal returns, and complementarities—and integrating dynamic steady-state analysis, task-family decomposition, a Roy-type wage dispersion mechanism, and mapping to standard data layers—the paper advances a “no last mile” theory: even with continuous model improvements, structured human input remains indispensable. Theoretical analysis demonstrates a strictly positive steady-state share of human labor, and conservative calibration suggests that, in the long run, structured human contributions will account for approximately 5%–7% of total input, offering a new paradigm for building reusable and depreciation-resistant AI capabilities.
📝 Abstract
The standard framing treats structured human-data work as transitional, a bridge between today's imperfect models and a future state where automation is complete. We challenge this view by modeling structured human data as a persistent production input: evaluation, rubric-based judgment, auditing, exception handling, and continual updates that convert raw model capability into dependable, deployable performance. These activities accumulate into a reusable AI capability stock that raises productivity by improving reliability on existing tasks and by expanding the frontier of task families for which AI can be used at high confidence. Crucially, this capability stock depreciates as tasks and contexts drift, standards evolve, and new edge cases emerge. In a tractable baseline model, an interior steady state implies a closed-form, strictly positive long-run labor share devoted to structured human-data work whenever depreciation is positive, a "no last mile" result in which maintenance demand persists even as models improve. We then microfound aggregate capability with a portfolio of task families featuring diminishing returns, frontier entry, and complementarity, generating reallocation toward low-maturity and bottleneck families and a Roy-style mechanism for within-structured wage dispersion. Finally, we map model objects to observable proxies using standard data layers, and provide a conservative calibration suggesting a 5-7% steady-state structured labor share in the long run.