Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Existing supervised fine-tuning solely minimizes generation loss, neglecting the model’s intrinsic learning signals and hindering human-like reflective optimization. To address this, we propose the Transformer Copilot framework, introducing the novel “error log” mechanism that records the primary model’s (Pilot) prediction deviations during fine-tuning and trains an auxiliary Copilot model to dynamically correct the Pilot’s logits in real time—enabling collaborative, reflective learning. Crucially, the framework unifies joint training and inference-time fusion, incurring no additional inference latency. Evaluated across 12 benchmarks spanning commonsense reasoning, arithmetic, and recommendation, it achieves an average 34.5% improvement with minimal computational overhead, demonstrating strong cross-task transferability and scalability. Our core contributions are: (i) an error-log-driven self-reflective learning paradigm; (ii) a dual-model dynamic logits correction mechanism; and (iii) a seamless training-inference co-design.

Technology Category

Application Category

📝 Abstract

Large language models are typically adapted to downstream tasks through supervised fine-tuning on domain-specific data. While standard fine-tuning focuses on minimizing generation loss to optimize model parameters, we take a deeper step by retaining and leveraging the model's own learning signals, analogous to how human learners reflect on past mistakes to improve future performance. We first introduce the concept of Mistake Log to systematically track the model's learning behavior and recurring errors throughout fine-tuning. Treating the original transformer-based model as the Pilot, we correspondingly design a Copilot model to refine the Pilot's inference performance via logits rectification. We name the overall Pilot-Copilot framework the Transformer Copilot, which introduces (i) a novel Copilot model design, (ii) a joint training paradigm where the Copilot continuously learns from the evolving Mistake Log alongside the Pilot, and (iii) a fused inference paradigm where the Copilot rectifies the Pilot's logits for enhanced generation. We provide both theoretical and empirical analyses on our new learning framework. Experiments on 12 benchmarks spanning commonsense, arithmetic, and recommendation tasks demonstrate that Transformer Copilot consistently improves performance by up to 34.5%, while introducing marginal computational overhead to Pilot models and exhibiting strong scalability and transferability.

Problem

Research questions and friction points this paper is trying to address.

Improving LLM fine-tuning by learning from past mistakes

Designing a Copilot model to rectify Pilot model errors

Enhancing generation performance with minimal computational overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Mistake Log to track model errors

Designs Copilot model for logits rectification

Joint training with evolving Mistake Log

🔎 Similar Papers

No similar papers found.