🤖 AI Summary
This work addresses the limitations of traditional line-based merging algorithms, which often generate spurious conflicts during code refactoring or concurrent editing, and existing syntax- or semantics-aware approaches that suffer from language specificity, formatting loss, and poor cross-file adaptability. The paper proposes Summer, a document-format-agnostic, token-level merging algorithm that decomposes text into universal tokens and models branch changes as string rewrite and move operations. Without relying on language-specific parsers, Summer supports structured edits such as function extraction and inlining. Evaluated on the ConflictBench benchmark, Summer achieves 36% accuracy—the highest among evaluated tools—in precisely reproducing developers’ actual merge outcomes across both Java and non-Java files, while ranking second in semantic correctness, thereby demonstrating the first text-level merging approach that effectively balances generality with semantic awareness.
📝 Abstract
Merging is a core operation in version control systems such as Git, but traditional line-based algorithms often yield spurious conflicts, particularly in the presence of refactorings or parallel edits. While syntax- and semantics-aware merging approaches can reduce conflicts, they introduce drawbacks such as loss of formatting, dependence on language-specific parsers, and limited flexibility across heterogeneous artifacts. To address this gap, we present Summer, a novel textual token-based merge algorithm independent of document formats. Dividing text into tokens, our approach formulates token-level changes in one branch into string-rewriting rules and move rules, and applies these rules to the text of the other branch to construct a merge. Despite being independent on programming languages, our move rules model extracting and inlining functions. We evaluated Summer on ConflictBench, a large benchmark of real-world merge scenarios, comparing it with five pioneering merge tools across Java and non-Java files. Experimental results show that Summer achieved the highest 36% accuracy in reproducing merges verbatim identical to developers', and ranked second in semantic accuracy.