DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

267K/year
🤖 AI Summary
This work addresses the challenge of training high-performance, small-scale deep reasoning agents for edge devices using only approximately 10,000 open-source data samples. The authors propose a two-stage training paradigm: first, agent-style supervised fine-tuning (Agentic SFT) on high-quality long-horizon trajectories, followed by reinforcement learning via the IGPO algorithm augmented with information-gain-driven episode-level rewards and format-aware regularization. This approach substantially enhances the reliability and reasoning capabilities of a 4B-parameter model on long-horizon tasks. The resulting model, DR-Venus-4B, significantly outperforms all existing models under 9B parameters across multiple deep reasoning benchmarks and approaches the performance of systems an order of magnitude larger (~30B), demonstrating the strong potential of compact models in resource-constrained edge scenarios.

Technology Category

Application Category

📝 Abstract
Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.
Problem

Research questions and friction points this paper is trying to address.

edge-scale agents
small language models
limited open data
deep research
data efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

edge-scale agents
agentic reinforcement learning
data-efficient training
information-gain rewards
small language models
🔎 Similar Papers
No similar papers found.