Generalization of Variadic Structures with Binders: A Tool for Structural Code Comparison

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of generalizing variable-binding-rich, variadic program structures (e.g., ASTs), which suffer from sensitivity to code snippet insertion/deletion and reliance on concrete variable names. We propose a novel generalization algorithm grounded in nominal logic and anti-unification. Methodologically, it introduces a distinction between term variables and hedge variables, augmented by parameterizable rigid functions, to explicitly model binding relationships and support variable-length pattern matching; theoretically, it integrates nominal set theory to rigorously handle α-equivalence. The algorithm efficiently computes the maximum common abstraction—preserving essential syntactic features—while enabling quantitative difference measurement and faithful reconstruction of original expressions. Experiments demonstrate substantial improvements in generalization accuracy, interpretability, and robustness across tasks including code clone detection and refactoring identification. Our approach establishes a formal, tunable paradigm for structured code comparison.

Technology Category

Application Category

📝 Abstract
This paper introduces a novel anti-unification algorithm for the generalization of variadic structures with binders, designed as a flexible tool for structural code comparison. By combining nominal techniques for handling variable binding with support for variadic expressions (common in abstract syntax trees and programming languages), the approach addresses key challenges such as overemphasis on bound variable names and difficulty handling insertions or deletions in code fragments. The algorithm distinguishes between atoms and two kinds of variables (term and hedge variables) to compute best generalizations that maximally preserve structural similarities while abstracting systematic differences. It also provides detailed information to reconstruct original expressions and quantify structural differences. This information can be useful in tasks like code clone detection, refactoring, and program analysis. By introducing a parametrizable rigidity function, the technique offers fine-grained control over similarity criteria and reduces nondeterminism, enabling flexible adaptation to practical scenarios where trivial similarities should be discounted. Although demonstrated primarily in the context of code similarity detection, this framework is broadly applicable wherever precise comparison of variadic and binder-rich representations is required.
Problem

Research questions and friction points this paper is trying to address.

Generalizing variadic structures with binders for structural code comparison
Addressing overemphasis on variable names and handling code insertions/deletions
Providing fine-grained similarity control for code clone detection and analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Anti-unification algorithm for variadic structures with binders
Combines nominal techniques with variadic expression support
Parametrizable rigidity function controls similarity criteria
🔎 Similar Papers
No similar papers found.