FiLark: a streaming-first software framework for end-to-end exploration, annotation, and algorithm integration in distributed acoustic sensing

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
This work addresses the challenge that continuous, high-channel data streams from distributed acoustic sensing (DAS) are poorly supported by conventional batch-processing frameworks, hindering interactive exploration, scalable annotation, and real-time algorithm integration. To overcome this, we propose FiLark—the first DAS analysis framework built entirely around a “stream-first” abstraction. FiLark unifies multi-file or continuous recordings into a constant-memory data stream, enabling low-memory interactive browsing of arbitrarily long records, in-stream generation of machine learning–ready labels, and seamless transition from development to production environments. Implemented in Python, FiLark integrates OpenGL-based circular-buffer rendering, CPU/GPU-accelerated time–space–frequency signal operators, stateful chunked execution, and standardized monitoring interfaces, substantially enhancing the efficiency, scalability, and end-to-end reproducibility of DAS analytics.
📝 Abstract
Distributed acoustic sensing (DAS) systems generate continuous, ultra-high-channel-count data streams at rates that exceed the capabilities of conventional batch-oriented analysis frameworks. As a result, essential tasks such as interactive exploration of long-duration recordings, scalable event annotation, and real-time algorithm-in-the-loop monitoring remain inadequately supported by workflows built around manually selected data segments and offline processing. This paper presents FiLark (Fiber Lark), a Python framework that applies a \emph{streaming-first} principle uniformly across data access, signal processing, visualization and monitoring for DAS. Instead of operating on manually selected data segments, FiLark presents any DAS sources-including continuous multi-file recordings-as a unified stream and builds all system components around that abstraction. An OpenGL-based ring-buffer renderer enables interactive browsing and visualization of arbitrarily long recordings with constant memory usage. An integrated annotation interface supports event labeling directly within continuous data streams, facilitating the creation of reproducible machine-learning-ready labeled datasets without offline preprocessing. The signal processing library includes temporal, spatial, spectral, and decomposition-based operators, with both CPU implementations and GPU-accelerated variants via PyTorch, alongside stateful chunked execution that preserves processing continuity and application semantics across segment boundaries. A standardized monitor interface further integrates streaming detectors and learning-based models into the visualization workflow. By sharing a common streaming abstraction across all layers, FiLark allows processing configurations and workflows developed interactively to transfer directly to scalable production pipelines without modification.
Problem

Research questions and friction points this paper is trying to address.

Distributed Acoustic Sensing
streaming data
interactive exploration
event annotation
real-time monitoring
Innovation

Methods, ideas, or system contributions that make the work stand out.

streaming-first
distributed acoustic sensing
real-time annotation
GPU-accelerated signal processing
continuous data stream
🔎 Similar Papers
No similar papers found.