AeroSketch: Near-Optimal Time Matrix Sketch Framework for Persistent, Sliding Window, and Distributed Streams

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently processing high-throughput matrix streams under stringent resource constraints, where existing approaches suffer from poor update efficiency due to frequent cubic-time matrix decompositions under tight error bounds. The authors propose AeroSketch, a framework that leverages randomized numerical linear algebra (RandNLA) to construct compact matrix sketches suitable for persistent, sliding-window, and distributed streaming settings. AeroSketch is the first method to simultaneously achieve optimal communication and space complexity while reducing the per-update time complexity from cubic to quadratic, thereby attaining near-optimal (within logarithmic factors) update performance. Experimental results on both synthetic and real-world datasets demonstrate that AeroSketch significantly improves throughput while maintaining comparable approximation accuracy and optimal resource consumption.

Technology Category

Application Category

📝 Abstract
Many real-world matrix datasets arrive as high-throughput vector streams, making it impractical to store or process them in their entirety. To enable real-time analytics under limited computational, memory, and communication resources, matrix sketching techniques have been developed over recent decades to provide compact approximations of such streaming data. Some algorithms have achieved optimal space and communication complexity. However, these approaches often require frequent time-consuming matrix factorization operations. In particular, under tight approximation error bounds, each matrix factorization computation incurs cubic time complexity, thereby limiting their update efficiency. In this paper, we introduce AeroSketch, a novel matrix sketching framework that leverages recent advances in randomized numerical linear algebra (RandNLA). AeroSketch achieves optimal communication and space costs while delivering near-optimal update time complexity (within logarithmic factors) across persistent, sliding window, and distributed streaming scenarios. Extensive experiments on both synthetic and real-world datasets demonstrate that AeroSketch consistently outperforms state-of-the-art methods in update throughput. In particular, under tight approximation error constraints, AeroSketch reduces the cubic time complexity to the quadratic level. Meanwhile, it maintains comparable approximation quality while retaining optimal communication and space costs.
Problem

Research questions and friction points this paper is trying to address.

matrix sketching
streaming data
update efficiency
approximation error
time complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

matrix sketching
randomized numerical linear algebra
near-optimal update time
streaming data
distributed streams
🔎 Similar Papers
No similar papers found.