IQuest-Coder-V1 Technical Report

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work proposes Code-Flow, a multi-stage training paradigm designed to model the dynamic evolution of code in software development and enhance large language models’ capabilities in intelligent programming, agent-based software engineering, and complex tool invocation. The approach integrates pretraining, an intermediate training phase grounded in agent execution trajectories, and a bifurcated post-training strategy comprising a reasoning-driven reinforcement learning path (Thinking) and a general-purpose instruction-tuning path (Instruct). A Loop architecture is introduced to balance performance gains with deployment overhead. Trained with extended context windows of 32k and 128k tokens, the resulting IQuest-Coder-V1 series achieves state-of-the-art performance on critical benchmarks spanning agent-driven software engineering, competitive programming, and sophisticated tool usage.

Technology Category

Application Category

📝 Abstract

In this report, we introduce the IQuest-Coder-V1 series-(7B/14B/40B/40B-Loop), a new family of code large language models (LLMs). Moving beyond static code representations, we propose the code-flow multi-stage training paradigm, which captures the dynamic evolution of software logic through different phases of the pipeline. Our models are developed through the evolutionary pipeline, starting with the initial pre-training consisting of code facts, repository, and completion data. Following that, we implement a specialized mid-training stage that integrates reasoning and agentic trajectories in 32k-context and repository-scale in 128k-context to forge deep logical foundations. The models are then finalized with post-training of specialized coding capabilities, which is bifurcated into two specialized paths: the thinking path (utilizing reasoning-driven RL) and the instruct path (optimized for general assistance). IQuest-Coder-V1 achieves state-of-the-art performance among competitive models across critical dimensions of code intelligence: agentic software engineering, competitive programming, and complex tool use. To address deployment constraints, the IQuest-Coder-V1-Loop variant introduces a recurrent mechanism designed to optimize the trade-off between model capacity and deployment footprint, offering an architecturally enhanced path for efficacy-efficiency trade-off. We believe the release of the IQuest-Coder-V1 series, including the complete white-box chain of checkpoints from pre-training bases to the final thinking and instruction models, will advance research in autonomous code intelligence and real-world agentic systems.

Problem

Research questions and friction points this paper is trying to address.

code intelligence

agentic software engineering

dynamic code evolution

model deployment efficiency

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

code-flow training

agentic trajectories

reasoning-driven RL