🤖 AI Summary
High-throughput streaming aggregation—particularly sliding-window aggregation (SWAG)—faces critical challenges in grouped and time-series analytics, including excessive hardware overhead, bottlenecks in hash-based state management, and heavy reliance on off-chip memory. To address these, this paper proposes a sorting-based FPGA pipeline architecture. It introduces the first DRAM-free, sorting-driven SWAG paradigm, integrating a hardware sorting network, adaptive group-level scheduling, and optimized on-chip memory mapping to jointly optimize resource utilization, throughput, and window capacity. Experimental evaluation demonstrates that the design achieves a 476× speedup over an optimized CPU implementation on the same platform, delivers 7.14× higher throughput than the state-of-the-art, supports windows four times larger, and significantly reduces FPGA resource consumption.
📝 Abstract
Aggregation queries are a series of computationally-demanding analytics operations on grouped and time series data. They include tasks such as summation or finding the median among the items of a group sharing a group ID, and within a specified number of the last observed tuples for sliding window aggregation (SWAG). They have a wide range of applications including in database analytics, operating systems, bank security and medical sensors. Existing challenges include the hardware complexity that comes with efficiently handling per-group states using hash-based approaches. This paper presents a pipelined and adaptable approach for calculating a wide range of aggregation queries with high throughput. It is then adapted for SWAG to achieve up to 476x speedup over the CPU of the same platform. It outperforms the state-of-the-art such as by being able to process 7.14x more tuples per second, and support 4x the window sizes with a fraction of the resources and no DRAM.