Enhancing Code LLM Training with Programmer Attention

πŸ“… 2025-03-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Code large language models (LLMs) lack explicit guidance from human programmer attention, and existing eye-tracking data are utilized in a fragmented and costly manner. Method: We propose the first end-to-end attention-augmented training framework, integrating eye-movement trajectory enhancement, learnable attention motif abstraction, and reward-guided supervised fine-tuning atop the CodeT5 architecture to enable human–AI collaborative attention modeling. For the first time, fine-grained eye-tracking signals are systematically embedded throughout the supervised fine-tuning pipeline of code LLMs, moving beyond conventional text-only supervision paradigms. Contribution/Results: On the CodeXGlue code summarization benchmark, our approach achieves a +7.16 improvement in CodeBLEU, demonstrating the substantial benefit of human attention priors for program semantic understanding.

Technology Category

Application Category

πŸ“ Abstract
Human attention provides valuable yet underexploited signals for code LLM training, offering a perspective beyond purely machine-driven attention. Despite the complexity and cost of collecting eye-tracking data, there has also been limited progress in systematically using these signals for code LLM training. To address both issues, we propose a cohesive pipeline spanning augmentation and reward-based fine-tuning. Specifically, we introduce (1) an eye-tracking path augmentation method to expand programmer attention datasets, (2) a pattern abstraction step that refines raw fixations into learnable attention motifs, and (3) a reward-guided strategy for integrating these insights directly into a CodeT5 supervised fine-tuning process. Our experiments yield +7.16 in CodeBLEU on the CodeXGlue benchmark for code summarization, underscoring how uniting human and machine attention can boost code intelligence. We hope this work encourages broader exploration of human-centric methods in next-generation AI4SE.
Problem

Research questions and friction points this paper is trying to address.

Enhancing Code LLM training using human attention signals.
Developing methods to expand and refine programmer attention datasets.
Integrating human attention insights into CodeT5 fine-tuning process.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Eye-tracking path augmentation for dataset expansion
Pattern abstraction to refine raw fixations
Reward-guided strategy for CodeT5 fine-tuning
πŸ”Ž Similar Papers
No similar papers found.