daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that large language models face in long-horizon agent tasks due to the scarcity of high-quality, scalable training data capable of capturing cross-stage long-term dependencies and evolutionary dynamics. The study introduces a novel structured data synthesis paradigm grounded in real-world software development, leveraging sequences of Pull Requests (PRs) as natural supervision signals. By decomposing tasks progressively, enforcing long-term consistency constraints, and incorporating verifiable bug-fix trajectories, the method constructs training episodes with explicit causal dependencies and iterative refinement structures. Fine-tuning GLM-4.6 on only 239 such synthesized samples yields substantial performance gains on benchmarks like Toolathlon, achieving average trajectories of 85k tokens and 116 tool calls, thereby demonstrating remarkable data efficiency and strong generalization capability.

Technology Category

Application Category

📝 Abstract
While Large Language Models (LLMs) excel at short-term tasks, scaling them to long-horizon agentic workflows remains challenging. The core bottleneck lies in the scarcity of training data that captures authentic long-dependency structures and cross-stage evolutionary dynamics--existing synthesis methods either confine to single-feature scenarios constrained by model distribution, or incur prohibitive human annotation costs, failing to provide scalable, high-quality supervision. We address this by reconceptualizing data synthesis through the lens of real-world software evolution. Our key insight: Pull Request (PR) sequences naturally embody the supervision signals for long-horizon learning. They decompose complex objectives into verifiable submission units, maintain functional coherence across iterations, and encode authentic refinement patterns through bug-fix histories. Building on this, we propose daVinci-Agency, which systematically mines structured supervision from chain-of-PRs through three interlocking mechanisms: (1) progressive task decomposition via continuous commits, (2) long-term consistency enforcement through unified functional objectives, and (3) verifiable refinement from authentic bug-fix trajectories. Unlike synthetic trajectories that treat each step independently, daVinci-Agency's PR-grounded structure inherently preserves the causal dependencies and iterative refinements essential for teaching persistent goal-directed behavior and enables natural alignment with project-level, full-cycle task modeling. The resulting trajectories are substantial--averaging 85k tokens and 116 tool calls--yet remarkably data-efficient: fine-tuning GLM-4.6 on 239 daVinci-Agency samples yields broad improvements across benchmarks, notably achieving a 47% relative gain on Toolathlon. Beyond benchmark performance, our analysis confirms...
Problem

Research questions and friction points this paper is trying to address.

long-horizon agency
data efficiency
training data scarcity
cross-stage dynamics
long-dependency structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

long-horizon agency
data-efficient synthesis
Pull Request sequences
task decomposition
bug-fix trajectories