Byam: Fixing Breaking Dependency Updates with Large Language Models

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

API upgrades often introduce breaking changes—such as deprecations, parameter modifications, or interface replacements—that trigger compilation errors in dependent client code. Method: This paper proposes the first systematic application of large language models (LLMs) to automate the repair of such compilation errors. We introduce a three-tiered repair framework—build-level, file-level, and error-level—that integrates build logs, API diff analysis, and stepwise reasoning prompts to enable context-aware, precise fixes. Our approach employs multi-model collaboration (Gemini-2.0 Flash, GPT-4o-mini, o3-mini, Qwen2.5-32B, DeepSeek V3) with context-enhanced prompt engineering. Results: Evaluated on the Java BUMP benchmark, o3-mini achieves a 27% full-build repair rate and a 78% single-error repair rate, demonstrating LLMs’ effectiveness and practical potential for maintaining software under dependency evolution.

Technology Category

Application Category

📝 Abstract

Application Programming Interfaces (APIs) facilitate the integration of third-party dependencies within the code of client applications. However, changes to an API, such as deprecation, modification of parameter names or types, or complete replacement with a new API, can break existing client code. These changes are called breaking dependency updates; It is often tedious for API users to identify the cause of these breaks and update their code accordingly. In this paper, we explore the use of Large Language Models (LLMs) to automate client code updates in response to breaking dependency updates. We evaluate our approach on the BUMP dataset, a benchmark for breaking dependency updates in Java projects. Our approach leverages LLMs with advanced prompts, including information from the build process and from the breaking dependency analysis. We assess effectiveness at three granularity levels: at the build level, the file level, and the individual compilation error level. We experiment with five LLMs: Google Gemini-2.0 Flash, OpenAI GPT4o-mini, OpenAI o3-mini, Alibaba Qwen2.5-32b-instruct, and DeepSeek V3. Our results show that LLMs can automatically repair breaking updates. Among the considered models, OpenAI's o3-mini is the best, able to completely fix 27% of the builds when using prompts that include contextual information such as the buggy line, API differences, error messages, and step-by-step reasoning instructions. Also, it fixes 78% of the individual compilation errors. Overall, our findings demonstrate the potential for LLMs to fix compilation errors due to breaking dependency updates, supporting developers in their efforts to stay up-to-date with changes in their dependencies.

Problem

Research questions and friction points this paper is trying to address.

Automating client code updates for breaking API changes

Evaluating LLMs to fix dependency-related compilation errors

Assessing repair effectiveness at build, file, and error levels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs for automated client code updates

Advanced prompts with build and dependency analysis

Evaluating effectiveness at multiple granularity levels

🔎 Similar Papers

No similar papers found.