๐ค AI Summary
This work addresses the incompatibility between continuous audio streams on edge devices and the discrete batch requirements of contrastive learning, as well as the challenge of balancing accuracy, latency, and bandwidth under dynamic resource constraints. To this end, the authors propose a streaming contrastive learning framework tailored for heterogeneous ARM-based edge platforms. The framework decouples representation quality from local batch size through distribution modeling, integrates an uncertainty-guided adaptive computation partitioning strategy with a lightweight reinforcement learning mechanism to enable runtime optimization of the accuracyโlatency trade-off, and introduces a hybrid loss function to support high-quality representation learning under sparse updates. Experiments across devices ranging from Raspberry Pi 4 to Apple M2 demonstrate up to a 4.7ร reduction in per-sample latency, 77.1% less bandwidth usage, 52.3% lower energy consumption, and accuracy degradation within 2.2%.
๐ Abstract
Large-batch Contrastive Learning (CL), the foundation of modern representation learning, is fundamentally incompatible with the volatile resource constraints of edge devices. This conflict creates a dilemma: small on-device batches degrade model fidelity, while offloading to the cloud incurs unacceptable latency and bandwidth costs. Existing solutions often resort to static model compression, which fails to adapt to the runtime volatility of edge environments. To bridge this gap, we present StreamSplit, a novel framework that makes streaming CL practical across heterogeneous ARM client platforms. StreamSplit resolves the conflict between the continuous nature of ambient audio and the discrete batch requirements of models like CLAP and COLA. We introduce: (1) A distribution-based streaming framework that decouples representation quality from local batch size, using a tractable Hybrid Loss to maintain fidelity despite sparse updates; and (2) An Uncertainty-Guided Adaptive Splitter that uses a lightweight Reinforcement Learning (RL) policy to dynamically partition computation. Uniquely, this policy integrates real-time resource monitoring with embedding ambiguity to optimize the accuracy-latency trade-off on the fly. We evaluate StreamSplit on diverse hardware, from the resource-constrained Raspberry Pi 4 to the high-performance Apple M2. Results demonstrate that StreamSplit reduces per-sample latency by up to 4.7x and cuts bandwidth by 77.1% and energy by 52.3% compared to server-centric baselines. Crucially, it maintains accuracy within 2.2% of server-centric models, proving that adaptive, distributed learning is a viable path for the modern edge ecosystem.