🤖 AI Summary
This work addresses persistent challenges in software library upgrades—including high manual effort, error-prone interventions, and technical debt accumulation—by proposing an automated migration approach leveraging GitHub Copilot’s Agent Mode. We systematically evaluate its effectiveness on cross-version upgrades of SQLAlchemy. A key contribution is the novel definition of “migration coverage,” a metric quantifying the proportion of required code modifications successfully applied; this enables, for the first time, an empirical assessment of Copilot Agent’s performance on multi-step library migrations. Experimental results show a median migration coverage of 100%, confirming strong syntactic code-editing capability. However, functional correctness remains limited: the median test pass rate is only 39.75%, revealing critical weaknesses in semantic consistency and contextual reasoning inherent to current LLM-based agents. The study delivers a reproducible evaluation framework for automated dependency governance and provides foundational empirical evidence to guide future research and tool development.
📝 Abstract
Keeping software systems up to date is essential to avoid technical debt, security vulnerabilities, and the rigidity typical of legacy systems. However, updating libraries and frameworks remains a time consuming and error-prone process. Recent advances in Large Language Models (LLMs) and agentic coding systems offer new opportunities for automating such maintenance tasks. In this paper, we evaluate the update of a well-known Python library, SQLAlchemy, across a dataset of ten client applications. For this task, we use the Github's Copilot Agent Mode, an autonomous AI systema capable of planning and executing multi-step migration workflows. To assess the effectiveness of the automated migration, we also introduce Migration Coverage, a metric that quantifies the proportion of API usage points correctly migrated. The results of our study show that the LLM agent was capable of migrating functionalities and API usages between SQLAlchemy versions (migration coverage: 100%, median), but failed to maintain the application functionality, leading to a low test-pass rate (39.75%, median).