🤖 AI Summary
Existing general-purpose network protocols struggle to overcome scientific data silos, hindering data interoperability and accessibility under the AI4Science paradigm. This work proposes the Data Access and Collaboration Protocol (DACP), a novel protocol designed specifically for scientific collaboration, which introduces the Streaming Data Frame (SDF) as its core model. DACP integrates unified resource identifiers, columnar stream framing, and a reverse supply mechanism to enable cross-domain data discovery, in-situ computation, and streaming return of results. The authors implement a reference server, faird, to demonstrate a scalable prototype infrastructure for scientific data collaboration, significantly enhancing collaborative efficiency and access performance across distributed data centers.
📝 Abstract
Scientific computing is rapidly entering a data-intensive era. However, existing general-purpose network protocol stacks face limitations in eliminating data silos and improving data accessibility and interoperability, making it difficult to effectively meet the demands of emerging paradigms such as AI4Science. To address these challenges, we propose the Data Access and Collaboration Protocol (DACP). DACP defines the Streaming Data Frame (SDF) as its core data model. Through Unified Resource Identification, columnar stream framing, and a reverse supply mechanism, DACP enables data discovery, in-situ computation, and the streaming return of results across scientific data centers, thereby facilitating efficient cross-domain collaboration. Furthermore, this paper introduces faird, a reference server implementation of DACP. This work provides a viable path for building scalable and collaborative scientific data infrastructures.