🤖 AI Summary
Emerging requirements in X-ray science—including AI training on high-speed data streams, femtosecond-level time-of-flight analysis, and distributed crystallographic structure determination—demand a scalable, secure, and low-latency experimental data infrastructure. Method: This paper introduces the first end-to-end experimental data stream framework integrating cloud-native microservices with traditional HPC batch processing. It innovatively combines RESTful API–driven request services, OAuth2.0 mutual authentication, Kafka-based high-throughput messaging, containerized microservices, and HPC job schedulers to realize high-throughput data buffering and cross-institutional secure sharing. Contribution/Results: The framework achieves millisecond-scale real-time data distribution, supports customizable visualization and distributed structure solution, and has been validated via the LCLStreamer prototype deployed across multiple synchrotron facilities. It improves data access efficiency by 3–5× and significantly enhances multi-center collaborative research, advancing synchrotron science toward a “streaming experiment” paradigm.
📝 Abstract
We describe a new end-to-end experimental data streaming framework designed from the ground up to support new types of applications -- AI training, extremely high-rate X-ray time-of-flight analysis, crystal structure determination with distributed processing, and custom data science applications and visualizers yet to be created. Throughout, we use design choices merging cloud microservices with traditional HPC batch execution models for security and flexibility. This project makes a unique contribution to the DOE Integrated Research Infrastructure (IRI) landscape. By creating a flexible, API-driven data request service, we address a significant need for high-speed data streaming sources for the X-ray science data analysis community. With the combination of data request API, mutual authentication web security framework, job queue system, high-rate data buffer, and complementary nature to facility infrastructure, the LCLStreamer framework has prototyped and implemented several new paradigms critical for future generation experiments.