π€ AI Summary
Count-Min (CM) sketches suffer from inter-group unfairness in streaming frequency estimation: groups containing low-frequency elements incur significantly higher expected additive error than those with high-frequency elements. To address this, we propose Group-Aware Semi-Uniform Hashing (GASH), a novel framework that jointly designs group-aware semi-uniform hash functions and column partitioning to theoretically guarantee uniform expected relative error across all groupsβby directly controlling hash collisions. GASH is the first method to formally quantify the accuracy trade-off induced by enforcing fairness. Empirical evaluation on both real-world and synthetic datasets demonstrates that GASH achieves strong inter-group fairness while preserving CM-level time and space efficiency, introducing only negligible additional estimation error.
π Abstract
Frequency estimation in streaming data often relies on sketches like Count-Min (CM) to provide approximate answers with sublinear space. However, CM sketches introduce additive errors that disproportionately impact low-frequency elements, creating fairness concerns across different groups of elements. We introduce Fair-Count-Min, a frequency estimation sketch that guarantees equal expected approximation factors across element groups, thus addressing the unfairness issue. We propose a column partitioning approach with group-aware semi-uniform hashing to eliminate collisions between elements from different groups. We provide theoretical guarantees for fairness, analyze the price of fairness, and validate our theoretical findings through extensive experiments on real-world and synthetic datasets. Our experimental results show that Fair-Count-Min achieves fairness with minimal additional error and maintains competitive efficiency compared to standard CM sketches.