🤖 AI Summary
Current large language models (LLMs) face significant bottlenecks in autonomously generating high-performance code, primarily due to the lack of interpretable, performance-oriented supervision mechanisms. To address this, we propose a novel reinforcement fine-tuning paradigm—Runtime Measurement–Driven Preference Alignment with Human-Readable Optimization Trajectory Supervision (RFT)—that enables input-aware, policy-level optimization decisions and generates interpretable, actionable feedback. We further introduce a planner-and-optimizer collaborative architecture that supports end-to-end, iteration-free translation from source code to optimized code. Evaluated on a real-world optimization trajectory dataset, our approach achieves state-of-the-art performance on the PIE benchmark, outperforming all existing methods. It substantially enhances the optimization capability of both 32B open-weight models and GPT-5, achieving the highest measured speedup ratios and effective optimization rates.
📝 Abstract
Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in real-world software systems. We argue that current LLMs struggle not only due to data scarcity but, more importantly, because they lack supervision that guides interpretable and effective performance improvements. In this work, we introduce PerfCoder, a family of LLMs specifically designed to generate performance-enhanced code from source code via interpretable, customized optimizations. PerfCoder is fine-tuned on a curated collection of real-world optimization trajectories with human-readable annotations, and preference-aligned by reinforcement fine-tuning using runtime measurements, enabling it to propose input-specific improvement strategies and apply them directly without relying on iterative refinement. On the PIE code performance benchmark, PerfCoder surpasses all existing models in both runtime speedup and effective optimization rate, demonstrating that performance optimization cannot be achieved by scale alone but requires optimization stratetgy awareness. In addition, PerfCoder can generate interpretable feedback about the source code, which, when provided as input to a larger LLM in a planner-and-optimizer cooperative workflow, can further improve outcomes. Specifically, we elevate the performance of 32B models and GPT-5 to new levels on code optimization, substantially surpassing their original performance.