Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often generate functionally correct code that suffers from poor quality—e.g., suboptimal style and low maintainability. Method: This paper proposes a lightweight, controllable approach for high-quality code generation. Its core innovation is a novel sequence-level ranking loss trained on high-quality/low-quality code pairs, integrated with contrastive Prefix-Tuning using single-attribute-specific prefixes—eliminating redundancy from multi-prefix tuning. The method synergizes comparative learning with an automated pipeline for constructing and labeling code pairs. Contribution/Results: Evaluated on Code Llama 7B, the approach achieves over 100% improvement in code quality for certain task categories while preserving full functional correctness. Extensive ablation studies confirm the efficacy of each component, and cross-task generalization experiments demonstrate strong transferability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have been widely adopted in commercial code completion engines, significantly enhancing coding efficiency and productivity. However, LLMs may generate code with quality issues that violate coding standards and best practices, such as poor code style and maintainability, even when the code is functionally correct. This necessitates additional effort from developers to improve the code, potentially negating the efficiency gains provided by LLMs. To address this problem, we propose a novel comparative prefix-tuning method for controllable high-quality code generation. Our method introduces a single, property-specific prefix that is prepended to the activations of the LLM, serving as a lightweight alternative to fine-tuning. Unlike existing methods that require training multiple prefixes, our approach trains only one prefix and leverages pairs of high-quality and low-quality code samples, introducing a sequence-level ranking loss to guide the model's training. This comparative approach enables the model to better understand the differences between high-quality and low-quality code, focusing on aspects that impact code quality. Additionally, we design a data construction pipeline to collect and annotate pairs of high-quality and low-quality code, facilitating effective training. Extensive experiments on the Code Llama 7B model demonstrate that our method improves code quality by over 100% in certain task categories, while maintaining functional correctness. We also conduct ablation studies and generalization experiments, confirming the effectiveness of our method's components and its strong generalization capability.
Problem

Research questions and friction points this paper is trying to address.

Improves code quality in LLM-generated code
Reduces need for developer code refinement
Enhances understanding of code quality differences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparative prefix-tuning for code quality
Single property-specific prefix training
Sequence-level ranking loss for training
🔎 Similar Papers
No similar papers found.
Yuan Jiang
Yuan Jiang
Nanyang Technological University
Large Language ModelsReinforcement LearningCombinatorial OptimizationOperations Research
Yuan Jiang
Yuan Jiang
Nanyang Technological University
Large Language ModelsReinforcement LearningCombinatorial OptimizationOperations Research
Y
Yujian Zhang
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001
L
Liang Lu
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001
Christoph Treude
Christoph Treude
Associate Professor of Computer Science, Singapore Management University
Software EngineeringEmpirical Software EngineeringHuman-AI InteractionAI for ScienceAI4SE
X
Xiaohong Su
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001
S
Shan Huang
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001
T
Tiantian Wang
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001