DPO-F+: Aligning Code Repair Feedback with Developers' Preferences

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited interpretability of large language models (LLMs) in code repair and low human-AI collaboration efficiency, this paper proposes a developer-profile-aware feedback alignment framework. It introduces domain-adapted natural language feedback evaluation metrics, designs a DPO-based optimization mechanism incorporating marginal signals, and automatically generates high-quality preference datasets tailored to code repair tasks. The approach shifts the paradigm from single-output code generation to iterative, human-AI co-understanding. Experiments demonstrate that our method improves the top-1 pass rate by 5.71 percentage points on novice programming tasks and increases problem-solving success rate by 4.67 percentage points on SWE-bench Lite. Moreover, feedback alignment quality significantly surpasses existing baselines, validating the effectiveness of developer-centered feedback modeling and preference learning.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly applied to software engineering tasks, especially code repair. However, developers often struggle to interpret model outputs, limiting effective human-AI teaming. Prior work largely optimizes repaired code while under-addressing the natural-language feedback that enables comprehension and iterative improvement. We present DPO-f+, a novel framework that aligns code-repair feedback with developer needs and profiles. It (1) formalizes developer-profiled, domain-specific metrics for feedback alignment; (2) automatically constructs pairwise preference datasets from code-repair tasks; (3) fine-tunes using Direct Preference Optimization (DPO) augmented with a lightweight margin signal; and (4) provides an automated feedback evaluation protocol. Empirically, DPO-f+ outperforms both the baseline and standard DPO on generated-code accuracy and overall feedback alignment. On novice programming tasks, DPO-f+ raises the top-1 pass rate by 5.71 percentage points (pp) over the baseline and by 3.30 pp over DPO. On the more challenging SWE-bench Lite benchmark, it increases the issue-resolution rate by 1.67 pp over DPO and by 4.67 pp over the baseline. It also achieves the largest improvement in feedback alignment, outperforming DPO and the baseline. By aligning feedback more closely with developer needs, DPO-f+ turns LLM-assisted repair from one-shot outputs into a collaborative sensemaking workflow, providing a practical approach to enhancing code comprehension and fostering more effective human-AI teaming in software engineering.
Problem

Research questions and friction points this paper is trying to address.

Aligning code repair feedback with developer preferences and profiles
Improving human-AI teaming through better natural-language feedback interpretation
Enhancing code comprehension via collaborative sensemaking workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns code repair feedback with developer preferences
Uses Direct Preference Optimization with margin signal
Automatically constructs pairwise preference datasets
🔎 Similar Papers
No similar papers found.
Z
Zihan Fang
Vanderbilt University
Y
Yifan Zhang
Vanderbilt University
Y
Yueke Zhang
Vanderbilt University
Kevin Leach
Kevin Leach
Vanderbilt University
Artificial IntelligenceSoftware EngineeringSecurity
Y
Yu Huang
Vanderbilt University