DocDancer: Towards Agentic Document-Grounded Information Seeking

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing document-based question answering agents, which typically rely on closed-source models and lack effective tool utilization capabilities, hindering efficient and open-ended access to document information. The study formulates document QA as an information-seeking task and proposes the first end-to-end trainable, open-source, tool-augmented agent framework that explicitly models the processes of document exploration and comprehension. To mitigate the scarcity of high-quality training data, the authors design an exploration-synthesis data generation pipeline and integrate a long-context understanding model to enhance performance. The approach achieves state-of-the-art results on the MMLongBench-Doc and DocBench benchmarks, demonstrating its effectiveness while offering novel insights into agent tool design and synthetic data construction for document understanding tasks.

Technology Category

Application Category

📝 Abstract
Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and largely rely on closed-source models. In this work, we introduce DocDancer, an end-to-end trained open-source Doc agent. We formulate DocQA as an information-seeking problem and propose a tool-driven agent framework that explicitly models document exploration and comprehension. To enable end-to-end training of such agents, we introduce an Exploration-then-Synthesis data synthesis pipeline that addresses the scarcity of high-quality training data for DocQA. Training on the synthesized data, the trained models on two long-context document understanding benchmarks, MMLongBench-Doc and DocBench, show their effectiveness. Further analysis provides valuable insights for the agentic tool design and synthetic data.
Problem

Research questions and friction points this paper is trying to address.

Document Question Answering
tool utilization
open-source models
training data scarcity
agentic information seeking
Innovation

Methods, ideas, or system contributions that make the work stand out.

DocDancer
tool-driven agent
document-grounded QA
end-to-end training
synthetic data pipeline
🔎 Similar Papers
No similar papers found.