DocDancer: Towards Agentic Document-Grounded Information Seeking

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the limitations of existing document-based question answering agents, which typically rely on closed-source models and lack effective tool utilization capabilities, hindering efficient and open-ended access to document information. The study formulates document QA as an information-seeking task and proposes the first end-to-end trainable, open-source, tool-augmented agent framework that explicitly models the processes of document exploration and comprehension. To mitigate the scarcity of high-quality training data, the authors design an exploration-synthesis data generation pipeline and integrate a long-context understanding model to enhance performance. The approach achieves state-of-the-art results on the MMLongBench-Doc and DocBench benchmarks, demonstrating its effectiveness while offering novel insights into agent tool design and synthetic data construction for document understanding tasks.

Technology Category

Application Category

📝 Abstract

Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and largely rely on closed-source models. In this work, we introduce DocDancer, an end-to-end trained open-source Doc agent. We formulate DocQA as an information-seeking problem and propose a tool-driven agent framework that explicitly models document exploration and comprehension. To enable end-to-end training of such agents, we introduce an Exploration-then-Synthesis data synthesis pipeline that addresses the scarcity of high-quality training data for DocQA. Training on the synthesized data, the trained models on two long-context document understanding benchmarks, MMLongBench-Doc and DocBench, show their effectiveness. Further analysis provides valuable insights for the agentic tool design and synthetic data.

Problem

Research questions and friction points this paper is trying to address.

Document Question Answering

tool utilization

open-source models

training data scarcity

agentic information seeking

Innovation

Methods, ideas, or system contributions that make the work stand out.

DocDancer

tool-driven agent

document-grounded QA