FAIT: Fault-Aware Fine-Tuning for Better Code Generation

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Modern instruction-tuned large language models (LLMs) frequently generate syntactically correct but functionally incorrect “plausible-looking” code, primarily because standard supervised fine-tuning (SFT) applies uniform token-level loss weighting and overlooks error-prone semantic segments. To address this, we propose Fault-Aware Fine-tuning (FAFT), the first method to jointly identify multi-granularity (line- and token-level) fault-sensitive segments and dynamically reweight losses during SFT, thereby prioritizing optimization on semantically fragile units. Evaluated across seven mainstream LLMs and three code-generation benchmarks, FAFT achieves an average +6.9% improvement in pass@1—surpassing GPT-3.5-Turbo after a single training round. Moreover, it enhances generalization by 3.8%–19.1% across unseen tasks, significantly improving model capability in detecting and correcting semantic errors.

Technology Category

Application Category

📝 Abstract

Modern instruction-tuned large language models (LLMs) have made remarkable progress in code generation. However, these LLMs fine-tuned with standard supervised fine-tuning (SFT) sometimes generate plausible-looking but functionally incorrect code variants. This issue likely stems from the limitation of standard SFT, which treats all tokens equally during optimization and fails to emphasize the error-sensitive segments-specific code differences between correct implementations and similar incorrect variants. To address this problem, we propose Fault-Aware Fine-Tuning (FAIT), a novel fine-tuning technique that enhances LLMs' code generation by (1) extracting multi-granularity (line/token-level) differences between correct and incorrect yet similar implementations to identify error-sensitive segments, and (2) dynamically prioritizing those segments during training via dynamic loss weighting. Through extensive experiments on seven LLMs across three widely-used benchmarks, our method achieves an average relative improvement of 6.9% on pass@1 with just one epoch of training, with some enhanced 6.7B LLMs outperforming closed-source models, e.g., GPT-3.5-Turbo. Furthermore, our fine-tuning technique demonstrates strong generalization with performance improvements ranging from 3.8% to 19.1% across diverse instruction-tuned LLMs, and our ablation studies confirm the contributions of different granularities of differences and loss function components.

Problem

Research questions and friction points this paper is trying to address.

Identifies error-sensitive segments in code generation

Improves LLMs' accuracy via dynamic loss weighting

Enhances generalization across diverse instruction-tuned LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granularity code difference extraction

Dynamic loss weighting for error-sensitive segments

Enhanced code generation via fault-aware tuning

🔎 Similar Papers

No similar papers found.