π€ AI Summary
This work proposes Code-Flow, a multi-stage training paradigm designed to model the dynamic evolution of code in software development and enhance large language modelsβ capabilities in intelligent programming, agent-based software engineering, and complex tool invocation. The approach integrates pretraining, an intermediate training phase grounded in agent execution trajectories, and a bifurcated post-training strategy comprising a reasoning-driven reinforcement learning path (Thinking) and a general-purpose instruction-tuning path (Instruct). A Loop architecture is introduced to balance performance gains with deployment overhead. Trained with extended context windows of 32k and 128k tokens, the resulting IQuest-Coder-V1 series achieves state-of-the-art performance on critical benchmarks spanning agent-driven software engineering, competitive programming, and sophisticated tool usage.
π Abstract
In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through different phases of the pipeline. Our models are developed through the evolutionary pipeline, starting with the initial pre-training consisting of code facts, repository, and completion data. Following that, we implement a specialized mid-training stage that integrates reasoning and agentic trajectories in 32k-context and repository-scale in 128k-context to forge deep logical foundations. The models are then finalized with post-training of specialized coding capabilities, which is bifurcated into two specialized paths: the thinking path (utilizing reasoning-driven RL) and the instruct path (optimized for general assistance). IQuest-Coder-V1 achieves state-of-the-art performance among competitive models across critical dimensions of code intelligence: agentic software engineering, competitive programming, and complex tool use. To address deployment constraints, the IQuest-Coder-V1-Loop variant introduces a recurrent mechanism designed to optimize the trade-off between model capacity and deployment footprint, offering an architecturally enhanced path for efficacy-efficiency trade-off. We believe the release of the IQuest-Coder-V1 series, including the complete white-box chain of checkpoints from pre-training bases to the final thinking and instruction models, will advance research in autonomous code intelligence and real-world agentic systems.