QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

This work addresses the limited performance of foundation models in Chinese medical vertical domains for deep search tasks by proposing a full-stack optimization framework. The approach integrates large-scale medical knowledge graphs with online exploration, introduces a novel multi-hop medical data synthesis strategy, and enhances the agent’s planning, tool-use, and reflection capabilities through a two-stage fine-tuning process combining supervised fine-tuning (SFT) and reinforcement learning (RL). The study also establishes QuarkMedSearch Benchmark—the first expert-validated Chinese medical deep search evaluation benchmark—on which the proposed method achieves state-of-the-art performance among open-source models of comparable scale while maintaining strong competitiveness on general-purpose tasks.

Technology Category

Application Category

📝 Abstract

As agentic foundation models continue to evolve, how to further improve their performance in vertical domains has become an important challenge. To this end, building upon Tongyi DeepResearch, a powerful agentic foundation model, we focus on the Chinese medical deep search scenario and propose QuarkMedSearch, systematically exploring a full-pipeline approach spanning medical multi-hop data construction, training strategies, and evaluation benchmarks to further push and assess its performance upper bound in vertical domains. Specifically, for data synthesis, to address the scarcity of deep search training data in the medical domain, we combine a large-scale medical knowledge graph with real-time online exploration to construct long-horizon medical deep search training data; for post-training, we adopt a two-stage SFT and RL training strategy that progressively enhances the model's planning, tool invocation, and reflection capabilities required for deep search, while maintaining search efficiency; for evaluation, we collaborate with medical experts to construct the QuarkMedSearch Benchmark through rigorous manual verification. Experimental results demonstrate that QuarkMedSearch achieves state-of-the-art performance among open-source models of comparable scale on the QuarkMedSearch Benchmark, while also maintaining strong competitiveness on general benchmarks.

Problem

Research questions and friction points this paper is trying to address.

agentic foundation models

vertical domains

medical deep search

long-horizon reasoning

performance upper bound

Innovation

Methods, ideas, or system contributions that make the work stand out.

long-horizon deep search

medical knowledge graph

two-stage SFT and RL

multi-hop data synthesis

domain-specific benchmark

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

No related jobs found.

Authors to Follow