IoDResearch: Deep Research on Private Heterogeneous Data via the Internet of Data

๐Ÿ“… 2025-10-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing deep research frameworks predominantly focus on web-scale search, neglecting private, heterogeneous scientific dataโ€”resulting in inefficient retrieval, poor compliance with FAIR principles, and limited reusability. To address this, we propose IoDResearch, a framework centered on private data and grounded in the Internet of Data (IoD) paradigm. It encapsulates heterogeneous data as FAIR-compliant digital objects and decomposes them into atomic knowledge units and dynamic knowledge graphs. By integrating digital object architecture, heterogeneous graph indexing, and multi-agent collaboration, IoDResearch enables trustworthy question answering and automated scientific report generation. Experiments demonstrate significant improvements over state-of-the-art RAG and deep research baselines across retrieval, QA, and report generation tasks. Furthermore, we release the first IoD DeepResearch benchmark to advance automated, reusable, private-data-driven scientific discovery.

Technology Category

Application Category

๐Ÿ“ Abstract
The rapid growth of multi-source, heterogeneous, and multimodal scientific data has increasingly exposed the limitations of traditional data management. Most existing DeepResearch (DR) efforts focus primarily on web search while overlooking local private data. Consequently, these frameworks exhibit low retrieval efficiency for private data and fail to comply with the FAIR principles, ultimately resulting in inefficiency and limited reusability. To this end, we propose IoDResearch (Internet of Data Research), a private data-centric Deep Research framework that operationalizes the Internet of Data paradigm. IoDResearch encapsulates heterogeneous resources as FAIR-compliant digital objects, and further refines them into atomic knowledge units and knowledge graphs, forming a heterogeneous graph index for multi-granularity retrieval. On top of this representation, a multi-agent system supports both reliable question answering and structured scientific report generation. Furthermore, we establish the IoD DeepResearch Benchmark to systematically evaluate both data representation and Deep Research capabilities in IoD scenarios. Experimental results on retrieval, QA, and report-writing tasks show that IoDResearch consistently surpasses representative RAG and Deep Research baselines. Overall, IoDResearch demonstrates the feasibility of private-data-centric Deep Research under the IoD paradigm, paving the way toward more trustworthy, reusable, and automated scientific discovery.
Problem

Research questions and friction points this paper is trying to address.

Enhancing retrieval efficiency for private heterogeneous data
Ensuring FAIR compliance in deep research frameworks
Enabling automated scientific discovery with trustworthy data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Encapsulates heterogeneous resources as FAIR digital objects
Refines data into atomic knowledge units and graphs
Uses multi-agent system for QA and report generation
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zhuofan Shi
National Key Laboratory of Data Space Technology and System, Peking University
Z
Zijie Guo
National Key Laboratory of Data Space Technology and System, Peking University
X
Xinjian Ma
National Key Laboratory of Data Space Technology and System, Peking University
G
Gang Huang
National Key Laboratory of Data Space Technology and System, Peking University
Yun Ma
Yun Ma
Assistant Professor, Peking University
WebMobile ComputingSoftware EngineeringService
X
Xiang Jing
National Key Laboratory of Data Space Technology and System, Peking University