🤖 AI Summary
This work addresses the challenge of efficiently processing high-throughput matrix streams under stringent resource constraints, where existing approaches suffer from poor update efficiency due to frequent cubic-time matrix decompositions under tight error bounds. The authors propose AeroSketch, a framework that leverages randomized numerical linear algebra (RandNLA) to construct compact matrix sketches suitable for persistent, sliding-window, and distributed streaming settings. AeroSketch is the first method to simultaneously achieve optimal communication and space complexity while reducing the per-update time complexity from cubic to quadratic, thereby attaining near-optimal (within logarithmic factors) update performance. Experimental results on both synthetic and real-world datasets demonstrate that AeroSketch significantly improves throughput while maintaining comparable approximation accuracy and optimal resource consumption.
📝 Abstract
Many real-world matrix datasets arrive as high-throughput vector streams, making it impractical to store or process them in their entirety. To enable real-time analytics under limited computational, memory, and communication resources, matrix sketching techniques have been developed over recent decades to provide compact approximations of such streaming data. Some algorithms have achieved optimal space and communication complexity. However, these approaches often require frequent time-consuming matrix factorization operations. In particular, under tight approximation error bounds, each matrix factorization computation incurs cubic time complexity, thereby limiting their update efficiency. In this paper, we introduce AeroSketch, a novel matrix sketching framework that leverages recent advances in randomized numerical linear algebra (RandNLA). AeroSketch achieves optimal communication and space costs while delivering near-optimal update time complexity (within logarithmic factors) across persistent, sliding window, and distributed streaming scenarios. Extensive experiments on both synthetic and real-world datasets demonstrate that AeroSketch consistently outperforms state-of-the-art methods in update throughput. In particular, under tight approximation error constraints, AeroSketch reduces the cubic time complexity to the quadratic level. Meanwhile, it maintains comparable approximation quality while retaining optimal communication and space costs.