Data-Semantics-Aware Recommendation of Diverse Pivot Tables

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the inefficiency and redundancy inherent in manual exploration of meaningful pivot table combinations in high-dimensional data, this paper proposes SAGE—a novel system that introduces data semantic modeling into pivot table recommendation for the first time. SAGE jointly optimizes per-table insightfulness (utility) and set-level diversity while supporting user-behavior adaptation. It achieves this through a data-semantic-aware utility model and a semantics-guided greedy algorithm that drastically reduces the combinatorial search space, enabling efficient and diverse recommendations over large-scale, high-dimensional datasets. Experiments on three real-world datasets demonstrate that SAGE significantly outperforms state-of-the-art methods in recommendation quality, computational efficiency, and scalability. A case study further confirms its superiority over leading commercial tools and large language models in practical pivot table discovery.

Technology Category

Application Category

📝 Abstract

Data summarization is essential to discover insights from large datasets. In a spreadsheets, pivot tables offer a convenient way to summarize tabular data by computing aggregates over some attributes, grouped by others. However, identifying attribute combinations that will result in useful pivot tables remains a challenge, especially for high-dimensional datasets. We formalize the problem of automatically recommending insightful and interpretable pivot tables, eliminating the tedious manual process. A crucial aspect of recommending a set of pivot tables is to diversify them. Traditional works inadequately address the table-diversification problem, which leads us to consider the problem of pivot table diversification. We present SAGE, a data-semantics-aware system for recommending k-budgeted diverse pivot tables, overcoming the shortcomings of prior work for top-k recommendations that cause redundancy. SAGE ensures that each pivot table is insightful, interpretable, and adaptive to the user's actions and preferences, while also guaranteeing that the set of pivot tables are different from each other, offering a diverse recommendation. We make two key technical contributions: (1) a data-semantics-aware model to measure the utility of a single pivot table and the diversity of a set of pivot tables, and (2) a scalable greedy algorithm that can efficiently select a set of diverse pivot tables of high utility, by leveraging data semantics to significantly reduce the combinatorial search space. Our extensive experiments on three real-world datasets show that SAGE outperforms alternative approaches, and efficiently scales to accommodate high-dimensional datasets. Additionally, we present several case studies to highlight SAGE's qualitative effectiveness over commercial software and Large Language Models (LLMs).

Problem

Research questions and friction points this paper is trying to address.

Automate recommendation of insightful pivot tables

Diversify pivot tables to avoid redundancy

Adapt recommendations to user preferences efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-semantics-aware model for pivot utility

Scalable greedy algorithm for diversity

Adaptive to user preferences efficiently

🔎 Similar Papers

No similar papers found.

Databricks

$228,600—$342,800 USD

Mountain View, CA, USA / San Francisco, CA, USA

Machine Learning Software Engineer

Apple

Sunnyvale, United States of America

Authors to Follow