๐ค AI Summary
Existing deep research frameworks predominantly focus on web-scale search, neglecting private, heterogeneous scientific dataโresulting in inefficient retrieval, poor compliance with FAIR principles, and limited reusability. To address this, we propose IoDResearch, a framework centered on private data and grounded in the Internet of Data (IoD) paradigm. It encapsulates heterogeneous data as FAIR-compliant digital objects and decomposes them into atomic knowledge units and dynamic knowledge graphs. By integrating digital object architecture, heterogeneous graph indexing, and multi-agent collaboration, IoDResearch enables trustworthy question answering and automated scientific report generation. Experiments demonstrate significant improvements over state-of-the-art RAG and deep research baselines across retrieval, QA, and report generation tasks. Furthermore, we release the first IoD DeepResearch benchmark to advance automated, reusable, private-data-driven scientific discovery.
๐ Abstract
The rapid growth of multi-source, heterogeneous, and multimodal scientific data has increasingly exposed the limitations of traditional data management. Most existing DeepResearch (DR) efforts focus primarily on web search while overlooking local private data. Consequently, these frameworks exhibit low retrieval efficiency for private data and fail to comply with the FAIR principles, ultimately resulting in inefficiency and limited reusability. To this end, we propose IoDResearch (Internet of Data Research), a private data-centric Deep Research framework that operationalizes the Internet of Data paradigm. IoDResearch encapsulates heterogeneous resources as FAIR-compliant digital objects, and further refines them into atomic knowledge units and knowledge graphs, forming a heterogeneous graph index for multi-granularity retrieval. On top of this representation, a multi-agent system supports both reliable question answering and structured scientific report generation. Furthermore, we establish the IoD DeepResearch Benchmark to systematically evaluate both data representation and Deep Research capabilities in IoD scenarios. Experimental results on retrieval, QA, and report-writing tasks show that IoDResearch consistently surpasses representative RAG and Deep Research baselines. Overall, IoDResearch demonstrates the feasibility of private-data-centric Deep Research under the IoD paradigm, paving the way toward more trustworthy, reusable, and automated scientific discovery.