π€ AI Summary
This work addresses the challenge posed by entangled, unrelated changes in composite code commits, which significantly hinder code comprehension and maintenance. Existing approaches struggle to accurately infer semantic intent and lack mechanisms for iterative refinement. To overcome these limitations, we propose a large language modelβbased multi-agent collaborative framework that integrates structural and semantic information through an Intent-Oriented Chain-of-Thought (IO-CoT) strategy to disentangle and infer modification intents. The framework further incorporates a grouper and a reviewer to form a human-like collaborative feedback loop, enabling iterative optimization of change grouping. Evaluated on C# and Java datasets, our method outperforms the current state-of-the-art graph clustering approaches by 6.0% and 5.5% on average, respectively, with performance gains exceeding 16% in complex commit scenarios.
π Abstract
Composite commits, which entangle multiple unrelated concerns, are prevalent in software development and significantly hinder program comprehension and maintenance. Existing automated untangling methods, particularly state-of-the-art graph clustering-based approaches, are fundamentally limited by two issues. (1) They over-rely on structural information, failing to grasp the crucial semantic intent behind changes, and (2) they operate as ``single-pass''algorithms, lacking a mechanism for the critical reflection and refinement inherent in human review processes. To overcome these challenges, we introduce Atomizer, a novel collaborative multi-agent framework for composite commit untangling. To address the semantic deficit, Atomizer employs an Intent-Oriented Chain-of-Thought (IO-CoT) strategy, which prompts large language models (LLMs) to infer the intent of each code change according to both the structure and the semantic information of code. To overcome the limitations of ``single-pass''grouping, we employ two agents to establish a grouper-reviewer collaborative refinement loop, which mirrors human review practices by iteratively refining groupings until all changes in a cluster share the same underlying semantic intent. Extensive experiments on two benchmark C# and Java datasets demonstrate that Atomizer significantly outperforms several representative baselines. On average, it surpasses the state-of-the-art graph-based methods by over 6.0% on the C# dataset and 5.5% on the Java dataset. This superiority is particularly pronounced on complex commits, where Atomizer's performance advantage widens to over 16%.