A Universal Textual Merge Strategy Based on Tokens for Version Control Systems

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

162K/year
🤖 AI Summary
This work addresses the limitations of traditional line-based merging algorithms, which often generate spurious conflicts during code refactoring or concurrent editing, and existing syntax- or semantics-aware approaches that suffer from language specificity, formatting loss, and poor cross-file adaptability. The paper proposes Summer, a document-format-agnostic, token-level merging algorithm that decomposes text into universal tokens and models branch changes as string rewrite and move operations. Without relying on language-specific parsers, Summer supports structured edits such as function extraction and inlining. Evaluated on the ConflictBench benchmark, Summer achieves 36% accuracy—the highest among evaluated tools—in precisely reproducing developers’ actual merge outcomes across both Java and non-Java files, while ranking second in semantic correctness, thereby demonstrating the first text-level merging approach that effectively balances generality with semantic awareness.

Technology Category

Application Category

📝 Abstract
Merging is a core operation in version control systems such as Git, but traditional line-based algorithms often yield spurious conflicts, particularly in the presence of refactorings or parallel edits. While syntax- and semantics-aware merging approaches can reduce conflicts, they introduce drawbacks such as loss of formatting, dependence on language-specific parsers, and limited flexibility across heterogeneous artifacts. To address this gap, we present Summer, a novel textual token-based merge algorithm independent of document formats. Dividing text into tokens, our approach formulates token-level changes in one branch into string-rewriting rules and move rules, and applies these rules to the text of the other branch to construct a merge. Despite being independent on programming languages, our move rules model extracting and inlining functions. We evaluated Summer on ConflictBench, a large benchmark of real-world merge scenarios, comparing it with five pioneering merge tools across Java and non-Java files. Experimental results show that Summer achieved the highest 36% accuracy in reproducing merges verbatim identical to developers', and ranked second in semantic accuracy.
Problem

Research questions and friction points this paper is trying to address.

merge conflicts
version control
token-based merging
refactoring
heterogeneous artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

token-based merging
version control
conflict resolution
language-agnostic
string-rewriting rules
🔎 Similar Papers
No similar papers found.