Enhancing Code LLM Training with Programmer Attention

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Code large language models (LLMs) lack explicit guidance from human programmer attention, and existing eye-tracking data are utilized in a fragmented and costly manner. Method: We propose the first end-to-end attention-augmented training framework, integrating eye-movement trajectory enhancement, learnable attention motif abstraction, and reward-guided supervised fine-tuning atop the CodeT5 architecture to enable human–AI collaborative attention modeling. For the first time, fine-grained eye-tracking signals are systematically embedded throughout the supervised fine-tuning pipeline of code LLMs, moving beyond conventional text-only supervision paradigms. Contribution/Results: On the CodeXGlue code summarization benchmark, our approach achieves a +7.16 improvement in CodeBLEU, demonstrating the substantial benefit of human attention priors for program semantic understanding.

Technology Category

Application Category

📝 Abstract

Human attention provides valuable yet underexploited signals for code LLM training, offering a perspective beyond purely machine-driven attention. Despite the complexity and cost of collecting eye-tracking data, there has also been limited progress in systematically using these signals for code LLM training. To address both issues, we propose a cohesive pipeline spanning augmentation and reward-based fine-tuning. Specifically, we introduce (1) an eye-tracking path augmentation method to expand programmer attention datasets, (2) a pattern abstraction step that refines raw fixations into learnable attention motifs, and (3) a reward-guided strategy for integrating these insights directly into a CodeT5 supervised fine-tuning process. Our experiments yield +7.16 in CodeBLEU on the CodeXGlue benchmark for code summarization, underscoring how uniting human and machine attention can boost code intelligence. We hope this work encourages broader exploration of human-centric methods in next-generation AI4SE.

Problem

Research questions and friction points this paper is trying to address.

Enhancing Code LLM training using human attention signals.

Developing methods to expand and refine programmer attention datasets.

Integrating human attention insights into CodeT5 fine-tuning process.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Eye-tracking path augmentation for dataset expansion

Pattern abstraction to refine raw fixations

Reward-guided strategy for CodeT5 fine-tuning

🔎 Similar Papers

No similar papers found.