🤖 AI Summary
This work proposes a sparse mixture-of-experts (MoE) architecture—comprising 196B total parameters with only 11B activated per token—to enhance agent reasoning efficiency and execution speed while maintaining state-of-the-art intelligence. The design integrates a 3:1 interleaved sliding-window and full attention mechanism, multi-token prediction (MTP-3), and a scalable reinforcement learning framework that combines verifiable signals with preference feedback to enable stable self-improvement through large-scale off-policy training. The resulting model achieves leading performance across multiple benchmarks: 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6, 88.2% on tau2-Bench, 69.0% on BrowseComp, and 51.0% on Terminal-Bench 2.0, matching the capabilities of GPT-5.2 xHigh and Gemini 3.0 Pro, thereby significantly advancing the frontier of efficient agent deployment.
📝 Abstract
We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.