KAT-Coder Technical Report

πŸ“… 2025-10-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To bridge the gap between static LLM pretraining and dynamic agent execution, this paper proposes KAT-Devβ€”a deployable intelligent coding agent framework. Methodologically: (1) it introduces a multi-stage curriculum training pipeline, incorporating reflection-augmented mid-phase training; (2) it proposes multi-ground-truth reward modeling and error-masked supervised fine-tuning to enhance RL efficiency and robustness; and (3) it adopts tree-structured trajectory training to align with production-grade IDE environments. Contributions include: the first framework jointly optimizing long-context reasoning, instruction alignment, and reliable tool invocation. The KAT-Dev 32B model supports 20 programming languages and significantly outperforms baselines in realistic development scenarios. All code, datasets, and models are publicly released, enabling industrial deployment of intelligent programming agents.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in large language models (LLMs) have enabled progress in agentic coding, where models autonomously reason, plan, and act within interactive software development workflows. However, bridging the gap between static text-based training and dynamic real-world agentic execution remains a core challenge. In this technical report, we present KAT-Coder, a large-scale agentic code model trained through a multi-stage curriculum encompassing Mid-Term Training, Supervised Fine-Tuning (SFT), Reinforcement Fine-Tuning (RFT), and Reinforcement-to-Deployment Adaptation. The Mid-Term stage enhances reasoning, planning, and reflection capabilities through a corpus of real software engineering data and synthetic agentic interactions. The SFT stage constructs a million-sample dataset balancing twenty programming languages, ten development contexts, and ten task archetypes. The RFT stage introduces a novel multi-ground-truth reward formulation for stable and sample-efficient policy optimization. Finally, the Reinforcement-to-Deployment phase adapts the model to production-grade IDE environments using Error-Masked SFT and Tree-Structured Trajectory Training. In summary, these stages enable KAT-Coder to achieve robust tool-use reliability, instruction alignment, and long-context reasoning, forming a deployable foundation for real-world intelligent coding agents. Our KAT series 32B model, KAT-Dev, has been open-sourced on https://huggingface.co/Kwaipilot/KAT-Dev.
Problem

Research questions and friction points this paper is trying to address.

Bridging static text training with dynamic agentic coding execution
Enhancing reasoning, planning, and reflection for software development
Achieving robust tool-use reliability and instruction alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage curriculum training for agentic coding
Multi-ground-truth reward formulation for policy optimization
Error-masked SFT and tree-structured trajectory training
πŸ”Ž Similar Papers
No similar papers found.
Z
Zizheng Zhan
Ken Deng
Ken Deng
Kwaipilot Team, Kuaishou Technology
LLMAI4SEAI Agent
X
Xiaojiang Zhang
J
Jinghui Wang
H
Huaixi Tang
Z
Zhiyi Lai
Haoyang Huang
Haoyang Huang
JD Explore Academy (present) | StepFun | Microsoft Research
Multimodal & Multilingual Foundation Model
W
Wen Xiang
K
Kun Wu
Wenhao Zhuang
Wenhao Zhuang
Kuaishou Technology
Natural Language Processing
Minglei Zhang
Minglei Zhang
Assistant Professor, University of Macau
Data ConverterADC-based Optical RXMixed-Signal ML
S
Shaojie Wang
S
Shangpeng Yan
K
Kepeng Lei
Z
Zongxian Feng
Huiming Wang
Huiming Wang
Chongqing University of Posts and Telecommunications
Disturbance rejection control theory (such as active disturbance rejection controlsliding mode
Z
Zheng Lin
M
Mengtong Li
M
Mengfei Xie
Y
Yinghan Cui
Xuxing Chen
Xuxing Chen
Meta
OptimizationMachine learningApplied Math
C
Chao Wang
Weihao Li
Weihao Li
Research Fellow, Australian National University
Computer VisionMachine Learning
W
Wenqiang Zhu
J
Jiarong Zhang
J
Jingxuan Xu
S
Songwei Yu
Yifan Yao
Yifan Yao
Drexel University
X
Xinping Lei
H
Han Li
J
Junqi Xiong
Zuchen Gao
Zuchen Gao
Phd Candidate of The Hong Kong Polytechnic University
D
Dailin Li
H
Haimo Li
J
Jiaheng Liu
Y
Yuqun Zhang
J
Junyi Peng
H
Haotian Zhang
B
Bin Chen