Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

This work proposes a sparse mixture-of-experts (MoE) architecture—comprising 196B total parameters with only 11B activated per token—to enhance agent reasoning efficiency and execution speed while maintaining state-of-the-art intelligence. The design integrates a 3:1 interleaved sliding-window and full attention mechanism, multi-token prediction (MTP-3), and a scalable reinforcement learning framework that combines verifiable signals with preference feedback to enable stable self-improvement through large-scale off-policy training. The resulting model achieves leading performance across multiple benchmarks: 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6, 88.2% on tau2-Bench, 69.0% on BrowseComp, and 51.0% on Terminal-Bench 2.0, matching the capabilities of GPT-5.2 xHigh and Gemini 3.0 Pro, thereby significantly advancing the frontier of efficient agent deployment.

Technology Category

Application Category

📝 Abstract

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.

Problem

Research questions and friction points this paper is trying to address.

agentic intelligence

computational efficiency

Mixture-of-Experts

multi-round interaction

frontier-level intelligence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts

Multi-Token Prediction

Sliding-Window Attention