KAT-V1: Kwai-AutoThink Technical Report

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the pervasive “overthinking” problem—redundant chain-of-thought invocation on simple inputs—in complex reasoning tasks, this paper introduces KAT-V1, the first open-source 40B large language model supporting dynamic reasoning-mode switching. Its core innovation is the AutoThink training paradigm: an intent-aware prompting mechanism estimates task complexity; majority-voting signals guide mode selection between reasoning and direct-response; and a dual-modality data construction strategy, cold-start initialization, and Step-SRPO reinforcement learning jointly enable fine-grained control and efficient transfer across modes. Experiments demonstrate that KAT-V1 matches or surpasses DeepSeek-R1-0528 and Qwen3-235B-A22B on multiple reasoning benchmarks while reducing reasoning token consumption by ~30%. The model has been deployed in Kwai’s programming assistant, Kwaipilot.

Technology Category

Application Category

📝 Abstract
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks, where an automatic thinking training paradigm is proposed to dynamically switch between reasoning and non-reasoning modes based on task complexity. Specifically, first, we construct the dual-regime dataset based on a novel tagging pipeline and a multi-agent synthesis strategy, and then we apply Multi-Token Prediction (MTP)-enhanced knowledge distillation, enabling efficient and fine-grained reasoning transfer with minimal pretraining cost. Besides, we implement a cold-start initialization strategy that introduces mode-selection priors using majority-vote signals and intent-aware prompting. Finally, we propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework, offering structured guidance over both reasoning-mode selection and response accuracy. Extensive experiments across multiple benchmarks demonstrate that KAT consistently matches or even outperforms current state-of-the-art models, including DeepSeek-R1-0528 and Qwen3-235B-A22B, across a wide range of reasoning-intensive tasks while reducing token usage by up to approximately 30%. Beyond academic evaluation, KAT has been successfully deployed in Kwaipilot (i.e., Kuaishou's internal coding assistant), and improves real-world development workflows with high accuracy, efficiency, and controllable reasoning behaviors. Moreover, we are actively training a 200B Mixture-of-Experts (MoE) with 40B activation parameters, where the early-stage results already demonstrate promising improvements in performance and efficiency, further showing the scalability of the AutoThink paradigm.
Problem

Research questions and friction points this paper is trying to address.

Addresses overthinking in reasoning tasks with dynamic mode switching
Enhances reasoning transfer via multi-token prediction knowledge distillation
Improves efficiency and accuracy with structured RL-guided reasoning selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic thinking training paradigm
Multi-Token Prediction knowledge distillation
Step-SRPO reinforcement learning algorithm
🔎 Similar Papers
No similar papers found.
Z
Zizheng Zhan
Kwaipilot Team
Ken Deng
Ken Deng
Kwaipilot Team, Kuaishou Technology
LLMAI4SEAI Agent
H
Huaixi Tang
Kwaipilot Team
W
Wen Xiang
Kwaipilot Team
K
Kun Wu
Kwaipilot Team
Weihao Li
Weihao Li
Research Fellow, Australian National University
Computer VisionMachine Learning
W
Wenqiang Zhu
Kwaipilot Team
J
Jingxuan Xu
Kwaipilot Team
L
Lecheng Huang
Kwaipilot Team
Z
Zongxian Feng
Kwaipilot Team
S
Shaojie Wang
Kwaipilot Team
S
Shangpeng Yan
Kwaipilot Team
J
Jiaheng Liu
Kwaipilot Team
Zhongyuan Peng
Zhongyuan Peng
Fudan University
LLM
Zuchen Gao
Zuchen Gao
Phd Candidate of The Hong Kong Polytechnic University
Haoyang Huang
Haoyang Huang
JD Explore Academy (present) | StepFun | Microsoft Research
Multimodal & Multilingual Foundation Model
Z
Ziqi Zhan
Kwaipilot Team
Y
Yanan Wu
Kwaipilot Team
Yuanxing Zhang
Yuanxing Zhang
Kuaishou Technology
Recommender SystemLarge Language ModelVideo Understanding
J
Jian Yang
Kwaipilot Team
G
Guang Chen
Kwaipilot Team
H
Haotian Zhang
Kwaipilot Team
B
Bin Chen
Kwaipilot Team
B
Bing Yu
Kwaipilot Team