The LCLStream Ecosystem for Multi-Institutional Dataset Exploration

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Emerging requirements in X-ray science—including AI training on high-speed data streams, femtosecond-level time-of-flight analysis, and distributed crystallographic structure determination—demand a scalable, secure, and low-latency experimental data infrastructure. Method: This paper introduces the first end-to-end experimental data stream framework integrating cloud-native microservices with traditional HPC batch processing. It innovatively combines RESTful API–driven request services, OAuth2.0 mutual authentication, Kafka-based high-throughput messaging, containerized microservices, and HPC job schedulers to realize high-throughput data buffering and cross-institutional secure sharing. Contribution/Results: The framework achieves millisecond-scale real-time data distribution, supports customizable visualization and distributed structure solution, and has been validated via the LCLStreamer prototype deployed across multiple synchrotron facilities. It improves data access efficiency by 3–5× and significantly enhances multi-center collaborative research, advancing synchrotron science toward a “streaming experiment” paradigm.

Technology Category

Application Category

📝 Abstract
We describe a new end-to-end experimental data streaming framework designed from the ground up to support new types of applications -- AI training, extremely high-rate X-ray time-of-flight analysis, crystal structure determination with distributed processing, and custom data science applications and visualizers yet to be created. Throughout, we use design choices merging cloud microservices with traditional HPC batch execution models for security and flexibility. This project makes a unique contribution to the DOE Integrated Research Infrastructure (IRI) landscape. By creating a flexible, API-driven data request service, we address a significant need for high-speed data streaming sources for the X-ray science data analysis community. With the combination of data request API, mutual authentication web security framework, job queue system, high-rate data buffer, and complementary nature to facility infrastructure, the LCLStreamer framework has prototyped and implemented several new paradigms critical for future generation experiments.
Problem

Research questions and friction points this paper is trying to address.

Supports AI training and high-rate X-ray analysis
Enables distributed crystal structure determination workflows
Provides API-driven high-speed data streaming for science
Innovation

Methods, ideas, or system contributions that make the work stand out.

API-driven data request service for flexible streaming
Cloud microservices merged with HPC batch execution
High-rate data buffer with mutual authentication security
🔎 Similar Papers
No similar papers found.
David Rogers
David Rogers
Research Engineer, Vanderbilt University
V
Valerio Mariani
LCLS, SLAC National Accelerator Laboratory, Menlo Park, California, USA
C
Cong Wang
LCLS, SLAC National Accelerator Laboratory, Menlo Park, California, USA
Ryan Coffee
Ryan Coffee
LCLS-SLAC National Accelerator Lab
Molecular PhysicsUltrafast X-Ray SpectroscopyMaterial Response to Electronic Excitation
W
Wilko Kroeger
LCLS, SLAC National Accelerator Laboratory, Menlo Park, California, USA
M
Murali Shankar
LCLS, SLAC National Accelerator Laboratory, Menlo Park, California, USA
H
Hans Thorsten Schwander
LCLS, SLAC National Accelerator Laboratory, Menlo Park, California, USA
T
Tom Beck
NCCS, Oak Ridge Leadership Computing Facility, supported by the US DOE Office of Science under Contract No. DE-AC05-00OR22725. Oak Ridge, Tennessee, USA
Frédéric Poitevin
Frédéric Poitevin
SLAC National Accelerator Laboratory
Structural BiologyMachine LearningComputer Vision
J
Jana Thayer
LCLS, SLAC National Accelerator Laboratory, supported by the US DOE Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515. Menlo Park, California, USA