LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

This paper addresses the poor interpretability of data discovery results and the high cognitive load imposed on users in data lake environments, introducing and systematically solving the novel task of “visualization recommendation for data discovery results.” Methodologically, it proposes an end-to-end recommendation framework that automatically generates a candidate visualization space from the output of a data discovery engine. The framework integrates heuristic visualization construction, semantic-aware interest modeling, and interest-driven pruning to recommend high-insight, user-relevant visualizations. Experiments on a real-world data lake demonstrate over a 10× speedup in recommendation latency. A user study confirms that the approach significantly accelerates analytical onboarding, enhances exploratory flexibility, and effectively improves the usability and interactive efficiency of data discovery applications.

Technology Category

Application Category

📝 Abstract

Data discovery from data lakes is an essential application in modern data science. While many previous studies focused on improving the efficiency and effectiveness of data discovery, little attention has been paid to the usability of such applications. In particular, exploring data discovery results can be cumbersome due to the cognitive load involved in understanding raw tabular results and identifying insights to draw conclusions. To address this challenge, we introduce a new problem -- visualization recommendation for data discovery over data lakes -- which aims at automatically identifying visualizations that highlight relevant or desired trends in the results returned by data discovery engines. We propose LakeVisage, an end-to-end framework as the first solution to this problem. Given a data lake, a data discovery engine, and a user-specified query table, LakeVisage intelligently explores the space of visualizations and recommends the most useful and ``interesting'' visualization plans. To this end, we developed (i) approaches to smartly construct the candidate visualization plans from the results of the data discovery engine and (ii) effective pruning strategies to filter out less interesting plans so as to accelerate the visual analysis. Experimental results on real data lakes show that our proposed techniques can lead to an order of magnitude speedup in visualization recommendation. We also conduct a comprehensive user study to demonstrate that LakeVisage offers convenience to users in real data analysis applications by enabling them seamlessly get started with the tasks and performing explorations flexibly.

Problem

Research questions and friction points this paper is trying to address.

Recommends visualizations for data discovery results

Reduces cognitive load in understanding raw tabular data

Accelerates visual analysis via smart pruning strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates visualization recommendation for data lakes

Smartly constructs candidate visualization plans

Effective pruning for faster visual analysis

🔎 Similar Papers

MQRLD: A Multimodal Data Retrieval Platform with Query-aware Feature Representation and Learned Index Based on Data Lake