🤖 AI Summary
This work addresses the problem of counting users whose cumulative dwell time in a specified set of regions exceeds a threshold \( k \) within large-scale mobility data. The paper proposes an efficient data structure that supports both exact and approximate queries. Its key contributions include an exact solution with tunable space–time trade-offs, an approximate method combining sampling and sketching techniques, and optimizations for high-dimensional hyper-rectangular range queries in geometric space. Theoretical analysis establishes lower bounds on space and time complexity, while experiments demonstrate that the proposed approach significantly outperforms existing baselines in both general and geometric settings, achieving a favorable balance between accuracy and efficiency.
📝 Abstract
This paper addresses the Counting Long Aggregated Visits problem, which is defined as follows. We are given $n$ users and $m$ regions, where each user spends some time visiting some regions. For a parameter $k$ and a query consisting of a subset of $r$ regions, the task is to count the number of distinct users whose aggregate time spent visiting the query regions is at least $k$. This problem is motivated by queries arising in the analysis of large-scale mobility datasets. We present several exact and approximate data structures for supporting counting long aggregated visits, as well as conditional and unconditional lower bounds. First, we describe an exact data structure that exhibits a space-time tradeoff, as well as efficient approximate solutions based on sampling and sketching techniques. We then study the problem in geometric settings where regions are points in $\mathbb{R}^d$ and queries are hyperrectangles, and derive exact data structures that achieve improved performance in these structured spaces.