Characterizing Multi-Hunk Patches: Divergence, Proximity, and LLM Repair Challenges

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Prior automated program repair (APR) research predominantly focuses on single-hunk patches, overlooking the coordination challenges arising from semantically interdependent changes across multiple code hunks. Method: This paper systematically characterizes the structural properties of multi-hunk patches, introducing two key dimensions: *inter-hunk divergence*—quantifying semantic, syntactic, and file-level differences—and *program-level spatial proximity*, a novel classification framework. Using the real-world HUNK4J bug dataset, we conduct lexical, syntactic, and file-level analyses and empirically evaluate six state-of-the-art LLMs. Contribution/Results: We find that LLM-based repair success rates decline significantly with increasing inter-hunk divergence and spatial dispersion; all models fail entirely on highly dispersed “Fragment”-class patches. Inter-hunk divergence emerges as the primary limiting factor for LLM repair capability. Consequently, we propose a divergence-aware APR paradigm—a new foundation for scalable, semantics-guided automated repair.

Technology Category

Application Category

📝 Abstract

Multi-hunk bugs, where fixes span disjoint regions of code, are common in practice, yet remain underrepresented in automated repair. Existing techniques and benchmarks pre-dominantly target single-hunk scenarios, overlooking the added complexity of coordinating semantically related changes across the codebase. In this work, we characterize HUNK4J, a dataset of multi-hunk patches derived from 372 real-world defects. We propose hunk divergence, a metric that quantifies the variation among edits in a patch by capturing lexical, structural, and file-level differences, while incorporating the number of hunks involved. We further define spatial proximity, a classification that models how hunks are spatially distributed across the program hierarchy. Our empirical study spanning six LLMs reveals that model success rates decline with increased divergence and spatial dispersion. Notably, when using the LLM alone, no model succeeds in the most dispersed Fragment class. These findings highlight a critical gap in LLM capabilities and motivate divergence-aware repair strategies.

Problem

Research questions and friction points this paper is trying to address.

Characterizing multi-hunk bug patches in real-world defects

Measuring hunk divergence and spatial proximity in patches

Evaluating LLM repair challenges with divergent and dispersed patches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces HUNK4J dataset for multi-hunk patches

Proposes hunk divergence metric for patch variation

Defines spatial proximity classification for hunk distribution

🔎 Similar Papers

A Systematic Literature Review on Large Language Models for Automated Program Repair