Analyzing Deviations from Monotonic Trends through Database Repair

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of violations of expected monotonic trends in aggregate values within datasets. We introduce Aggregate Order Dependencies (AODs), which formally require that the aggregate value (e.g., mean) of a target attribute strictly increases or decreases with respect to a total order on grouping attributes. We propose the first “aggregate-centered” AOD variant, rigorously characterize its computational complexity as NP-hard, and develop a generic algorithmic framework complemented by efficient heuristic strategies. Our approach integrates database repair, combinatorial optimization, and statistical group analysis, supporting diverse aggregate functions (e.g., AVG, SUM, COUNT). Experiments on real and synthetic datasets demonstrate that our algorithms are both efficient and scalable, with heuristics yielding substantial speedups. Case studies successfully detect and interpret non-monotonic anomalies in domain-specific relationships—such as education investment vs. enrollment rates, housing prices vs. neighborhood rankings, and disease incidence vs. age—thereby exposing underlying data quality issues.

Technology Category

Application Category

📝 Abstract
Datasets often exhibit violations of expected monotonic trends - for example, higher education level correlating with higher average salary, newer homes being more expensive, or diabetes prevalence increasing with age. We address the problem of quantifying how far a dataset deviates from such trends. To this end, we introduce Aggregate Order Dependencies (AODs), an aggregation-centric extension of the previously studied order dependencies. An AOD specifies that the aggregated value of a target attribute (e.g., mean salary) should monotonically increase or decrease with the grouping attribute (e.g., education level). We formulate the AOD repair problem as finding the smallest set of tuples to delete from a table so that the given AOD is satisfied. We analyze the computational complexity of this problem and propose a general algorithmic template for solving it. We instantiate the template for common aggregation functions, introduce optimization techniques that substantially improve the runtime of the template instances, and develop efficient heuristic alternatives. Our experimental study, carried out on both real-world and synthetic datasets, demonstrates the practical efficiency of the algorithms and provides insight into the performance of the heuristics. We also present case studies that uncover and explain unexpected AOD violations using our framework.
Problem

Research questions and friction points this paper is trying to address.

Quantifying dataset deviations from expected monotonic trends.
Introducing Aggregate Order Dependencies to model such monotonic relationships.
Developing efficient algorithms to repair datasets by removing minimal tuples.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Aggregate Order Dependencies for trend analysis
Formulates repair problem as minimal tuple deletion
Develops optimized algorithms and heuristics for efficiency
🔎 Similar Papers
No similar papers found.