Automating Android Build Repair: Bridging the Reasoning-Execution Gap in LLM Agents with Domain-Specific Tools

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge that large language models (LLMs) struggle to reliably translate high-level reasoning into precise low-level operations for automatic Android build error repair, this paper proposes GradleFixer—a domain-augmented LLM agent integrating specialized tooling. Its core innovation is the “tool bridging” strategy: abstracting generic shell commands into Gradle-aware API interfaces, thereby drastically reducing the action space and enhancing execution reliability. GradleFixer incorporates a curated toolchain tailored to the Android build ecosystem and is evaluated on AndroidBuildBench, a benchmark comprising 1,019 real-world Android build failures. Experimental results demonstrate that GradleFixer achieves an 81.4% repair success rate—substantially outperforming state-of-the-art code agents relying on generic shell execution. To our knowledge, this is the first LLM-based approach enabling high-precision, interpretable, and reproducible Android build error repair.

Technology Category

Application Category

📝 Abstract
Android is the largest mobile platform, yet automatically building applications remains a practical challenge. While Large Language Models (LLMs) show promise for code repair, their use for fixing Android build errors remains underexplored. To address this gap, we first introduce AndroidBuildBench, a benchmark of 1,019 build failures curated from the commit histories of 43 open-source Android projects. Each problem is paired with a verified solution from a subsequent commit, ensuring that fixes are feasible. Second, we propose GradleFixer, an LLM agent with domain-specific tools for inspecting and manipulating the Gradle build environment. GradleFixer achieves a resolve rate of 81.4% (pass@1), significantly outperforming a state-of-the-art coding agent that relies on a general-purpose shell. GradleFixer's success suggests that while LLMs possess the high-level knowledge to solve these failures, they struggle to translate this knowledge into effective low-level actions using a general-purpose shell. We demonstrate the effectiveness of a strategy we term Tool Bridging, which replaces general-purpose shell commands with domain-aware abstractions. We hypothesize this approach works through two mechanisms: 1) it provides tools in an API-like format that LLMs use more reliably, and 2) it constrains the action space to relevant operations. This approach bridges the gap between the model's high-level reasoning and effective low-level execution.
Problem

Research questions and friction points this paper is trying to address.

Automating Android build error repair using domain-specific tools
Bridging reasoning-execution gap in LLM agents for Gradle builds
Replacing general shell commands with specialized build inspection tools
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-specific tools replace general-purpose shell commands
Tool Bridging strategy bridges reasoning-execution gap
API-like abstractions constrain action space for reliability
🔎 Similar Papers
No similar papers found.