🤖 AI Summary
Large language models (LLMs) often generate functionally correct code that suffers from poor quality—e.g., suboptimal style and low maintainability. Method: This paper proposes a lightweight, controllable approach for high-quality code generation. Its core innovation is a novel sequence-level ranking loss trained on high-quality/low-quality code pairs, integrated with contrastive Prefix-Tuning using single-attribute-specific prefixes—eliminating redundancy from multi-prefix tuning. The method synergizes comparative learning with an automated pipeline for constructing and labeling code pairs. Contribution/Results: Evaluated on Code Llama 7B, the approach achieves over 100% improvement in code quality for certain task categories while preserving full functional correctness. Extensive ablation studies confirm the efficacy of each component, and cross-task generalization experiments demonstrate strong transferability.
📝 Abstract
Large Language Models (LLMs) have been widely adopted in commercial code completion engines, significantly enhancing coding efficiency and productivity. However, LLMs may generate code with quality issues that violate coding standards and best practices, such as poor code style and maintainability, even when the code is functionally correct. This necessitates additional effort from developers to improve the code, potentially negating the efficiency gains provided by LLMs. To address this problem, we propose a novel comparative prefix-tuning method for controllable high-quality code generation. Our method introduces a single, property-specific prefix that is prepended to the activations of the LLM, serving as a lightweight alternative to fine-tuning. Unlike existing methods that require training multiple prefixes, our approach trains only one prefix and leverages pairs of high-quality and low-quality code samples, introducing a sequence-level ranking loss to guide the model's training. This comparative approach enables the model to better understand the differences between high-quality and low-quality code, focusing on aspects that impact code quality. Additionally, we design a data construction pipeline to collect and annotate pairs of high-quality and low-quality code, facilitating effective training. Extensive experiments on the Code Llama 7B model demonstrate that our method improves code quality by over 100% in certain task categories, while maintaining functional correctness. We also conduct ablation studies and generalization experiments, confirming the effectiveness of our method's components and its strong generalization capability.