KAT-Coder Technical Report

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To bridge the gap between static LLM pretraining and dynamic agent execution, this paper proposes KAT-Dev—a deployable intelligent coding agent framework. Methodologically: (1) it introduces a multi-stage curriculum training pipeline, incorporating reflection-augmented mid-phase training; (2) it proposes multi-ground-truth reward modeling and error-masked supervised fine-tuning to enhance RL efficiency and robustness; and (3) it adopts tree-structured trajectory training to align with production-grade IDE environments. Contributions include: the first framework jointly optimizing long-context reasoning, instruction alignment, and reliable tool invocation. The KAT-Dev 32B model supports 20 programming languages and significantly outperforms baselines in realistic development scenarios. All code, datasets, and models are publicly released, enabling industrial deployment of intelligent programming agents.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have enabled progress in agentic coding, where models autonomously reason, plan, and act within interactive software development workflows. However, bridging the gap between static text-based training and dynamic real-world agentic execution remains a core challenge. In this technical report, we present KAT-Coder, a large-scale agentic code model trained through a multi-stage curriculum encompassing Mid-Term Training, Supervised Fine-Tuning (SFT), Reinforcement Fine-Tuning (RFT), and Reinforcement-to-Deployment Adaptation. The Mid-Term stage enhances reasoning, planning, and reflection capabilities through a corpus of real software engineering data and synthetic agentic interactions. The SFT stage constructs a million-sample dataset balancing twenty programming languages, ten development contexts, and ten task archetypes. The RFT stage introduces a novel multi-ground-truth reward formulation for stable and sample-efficient policy optimization. Finally, the Reinforcement-to-Deployment phase adapts the model to production-grade IDE environments using Error-Masked SFT and Tree-Structured Trajectory Training. In summary, these stages enable KAT-Coder to achieve robust tool-use reliability, instruction alignment, and long-context reasoning, forming a deployable foundation for real-world intelligent coding agents. Our KAT series 32B model, KAT-Dev, has been open-sourced on https://huggingface.co/Kwaipilot/KAT-Dev.

Problem

Research questions and friction points this paper is trying to address.

Bridging static text training with dynamic agentic coding execution

Enhancing reasoning, planning, and reflection for software development

Achieving robust tool-use reliability and instruction alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage curriculum training for agentic coding

Multi-ground-truth reward formulation for policy optimization

Error-masked SFT and tree-structured trajectory training

🔎 Similar Papers

Improving Source Code Similarity Detection Through GraphCodeBERT and Integration of Additional Features