From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This study addresses the low-latency, high-throughput, and scalable data transfer requirements for cross-facility (edge-to-HPC) workflows in AI–HPC convergence scenarios. Method: We systematically compare three streaming architectures—Direct Transfer Streaming (DTS), Proxy-based Streaming (PRS), and Management Service Streaming (MSS)—and propose the DS2HPC taxonomy alongside SciStream, a lightweight in-memory streaming toolkit supporting scientific workflow patterns including work sharing, feedback loops, and broadcast-aggregation. Contribution/Results: Evaluation on real-world advanced computing infrastructures shows that DTS achieves minimal latency and maximal throughput but suffers from deployment constraints; MSS offers strong scalability yet incurs significant overhead; PRS delivers near-DTS performance while maintaining deployment flexibility and scalability, making it the optimal trade-off between efficiency and practicality. Our work provides empirical evidence and methodological support for designing cross-domain scientific data streaming architectures.

Technology Category

Application Category

📝 Abstract

In this paper, we investigate three cross-facility data streaming architectures, Direct Streaming (DTS), Proxied Streaming (PRS), and Managed Service Streaming (MSS). We examine their architectural variations in data flow paths and deployment feasibility, and detail their implementation using the Data Streaming to HPC (DS2HPC) architectural framework and the SciStream memory-to-memory streaming toolkit on the production-grade Advanced Computing Ecosystem (ACE) infrastructure at Oak Ridge Leadership Computing Facility (OLCF). We present a workflow-specific evaluation of these architectures using three synthetic workloads derived from the streaming characteristics of scientific workflows. Through simulated experiments, we measure streaming throughput, round-trip time, and overhead under work sharing, work sharing with feedback, and broadcast and gather messaging patterns commonly found in AI-HPC communication motifs. Our study shows that DTS offers a minimal-hop path, resulting in higher throughput and lower latency, whereas MSS provides greater deployment feasibility and scalability across multiple users but incurs significant overhead. PRS lies in between, offering a scalable architecture whose performance matches DTS in most cases.

Problem

Research questions and friction points this paper is trying to address.

Investigating cross-facility data streaming architectures

Evaluating performance and scalability of streaming methods

Comparing throughput and latency across architectural variations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct Streaming enables minimal-hop high-throughput data transfer

Proxied Streaming provides scalable architecture with competitive performance

Managed Service Streaming ensures cross-facility deployment feasibility and scalability

🔎 Similar Papers

No similar papers found.

AMD

Austin, Texas, United States

AI/HPC System Performance Engineer