DPO-F+: Aligning Code Repair Feedback with Developers' Preferences

📅 2025-11-02

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the limited interpretability of large language models (LLMs) in code repair and low human-AI collaboration efficiency, this paper proposes a developer-profile-aware feedback alignment framework. It introduces domain-adapted natural language feedback evaluation metrics, designs a DPO-based optimization mechanism incorporating marginal signals, and automatically generates high-quality preference datasets tailored to code repair tasks. The approach shifts the paradigm from single-output code generation to iterative, human-AI co-understanding. Experiments demonstrate that our method improves the top-1 pass rate by 5.71 percentage points on novice programming tasks and increases problem-solving success rate by 4.67 percentage points on SWE-bench Lite. Moreover, feedback alignment quality significantly surpasses existing baselines, validating the effectiveness of developer-centered feedback modeling and preference learning.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly applied to software engineering tasks, especially code repair. However, developers often struggle to interpret model outputs, limiting effective human-AI teaming. Prior work largely optimizes repaired code while under-addressing the natural-language feedback that enables comprehension and iterative improvement. We present DPO-f+, a novel framework that aligns code-repair feedback with developer needs and profiles. It (1) formalizes developer-profiled, domain-specific metrics for feedback alignment; (2) automatically constructs pairwise preference datasets from code-repair tasks; (3) fine-tunes using Direct Preference Optimization (DPO) augmented with a lightweight margin signal; and (4) provides an automated feedback evaluation protocol. Empirically, DPO-f+ outperforms both the baseline and standard DPO on generated-code accuracy and overall feedback alignment. On novice programming tasks, DPO-f+ raises the top-1 pass rate by 5.71 percentage points (pp) over the baseline and by 3.30 pp over DPO. On the more challenging SWE-bench Lite benchmark, it increases the issue-resolution rate by 1.67 pp over DPO and by 4.67 pp over the baseline. It also achieves the largest improvement in feedback alignment, outperforming DPO and the baseline. By aligning feedback more closely with developer needs, DPO-f+ turns LLM-assisted repair from one-shot outputs into a collaborative sensemaking workflow, providing a practical approach to enhancing code comprehension and fostering more effective human-AI teaming in software engineering.

Problem

Research questions and friction points this paper is trying to address.

Aligning code repair feedback with developer preferences and profiles

Improving human-AI teaming through better natural-language feedback interpretation

Enhancing code comprehension via collaborative sensemaking workflows

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns code repair feedback with developer preferences

Uses Direct Preference Optimization with margin signal

Automatically constructs pairwise preference datasets

🔎 Similar Papers

No similar papers found.