Real-Time Inference for Distributed Multimodal Systems under Communication Delay Uncertainty

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor real-time inference robustness of distributed multimodal systems under uncertain communication latency, this paper proposes a neuro-inspired non-blocking inference paradigm. Departing from reliance on a reference modality, our approach introduces a latency-aware framework integrating online latency estimation, adaptive temporal windowing for dynamic ensemble, and asynchronous multimodal fusion—enabling fine-grained accuracy–latency trade-off control across heterogeneous data streams. The core innovation is a learnable temporal integration window that adaptively adjusts fusion timing according to each modality’s real-time latency distribution, significantly enhancing system resilience to network fluctuations. Evaluated on audio-visual event localization, our method achieves a 5.2% improvement in mean average precision (mAP) while maintaining low end-to-end latency, outperforming state-of-the-art approaches in both inference stability and cross-scenario generalization.

Technology Category

Application Category

📝 Abstract
Connected cyber-physical systems perform inference based on real-time inputs from multiple data streams. Uncertain communication delays across data streams challenge the temporal flow of the inference process. State-of-the-art (SotA) non-blocking inference methods rely on a reference-modality paradigm, requiring one modality input to be fully received before processing, while depending on costly offline profiling. We propose a novel, neuro-inspired non-blocking inference paradigm that primarily employs adaptive temporal windows of integration (TWIs) to dynamically adjust to stochastic delay patterns across heterogeneous streams while relaxing the reference-modality requirement. Our communication-delay-aware framework achieves robust real-time inference with finer-grained control over the accuracy-latency tradeoff. Experiments on the audio-visual event localization (AVEL) task demonstrate superior adaptability to network dynamics compared to SotA approaches.
Problem

Research questions and friction points this paper is trying to address.

Handling uncertain communication delays in distributed multimodal inference systems
Eliminating dependency on reference-modality requirements for real-time processing
Achieving robust inference with adaptive control over accuracy-latency tradeoffs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive temporal windows adjust to stochastic delays
Neuro-inspired paradigm relaxes reference-modality requirement
Communication-delay-aware framework controls accuracy-latency tradeoff
🔎 Similar Papers
No similar papers found.