The Observability Gap: Why Output-Level Human Feedback Fails for LLM Coding Agents

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the challenge that large language model–based coding agents struggle to correct deep logical errors when relying solely on output-level human feedback, often leading to complete failure in complex tasks such as Blender 3D scene generation. The study formally identifies and characterizes the “observability gap”—a fundamental bottleneck arising from the broken causal chain between high-level output feedback and underlying code-level errors. To bridge this gap, the authors propose an “acquired autonomy” mechanism that leverages lightweight visual feedback to guide agents in autonomously constructing a reusable library of functions from scratch, supplemented by minimal code-level interventions to restore intermediate-layer observability. Experimental results demonstrate that this approach substantially improves task success rates, underscoring the critical role of intermediate observability in effective human–AI collaboration.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) multi-agent coding systems typically fix agent capabilities at design time. We study an alternative setting, earned autonomy, in which a coding agent starts with zero pre-defined functions and incrementally builds a reusable function library through lightweight human feedback on visual output alone. We evaluate this setup in a Blender-based 3D scene generation task requiring both spatial reasoning and programmatic geometric control. Although the agent rediscovered core utility functions comparable to a human reference implementation, it achieved 0% full-scene success under output-only feedback across multiple instruction granularities, where success required satisfying object completeness, ground contact, collision avoidance, and scale plausibility simultaneously. Our analysis identifies a structural observability gap: bugs originate in code logic and execution state, while human evaluation occurs only at the output layer, and the many-to-one mapping from internal states to visible outcomes prevents symptom-level feedback from reliably identifying root causes. This mismatch leads to persistent failure mode oscillation rather than convergence. A diagnostic intervention that injected minimal code-level knowledge restored convergence, strongly supporting the interpretation that the main bottleneck lies in feedback observability rather than programming competence. We formalize this phenomenon as a feedback paradox in domains with deep causal chains between internal code logic and perceptual outcomes, and argue that effective human-agent collaboration in such settings requires intermediate observability beyond output-only evaluation.

Problem

Research questions and friction points this paper is trying to address.

observability gap

human feedback

LLM coding agents

output-level evaluation

feedback paradox

Innovation

Methods, ideas, or system contributions that make the work stand out.

observability gap

earned autonomy

output-only feedback