🤖 AI Summary
This work addresses the limited semantic interpretability of clustering results in high-dimensional data after dimensionality reduction and the high expertise barrier imposed by existing visualization techniques. To bridge this gap, the authors propose an interactive framework that integrates large language models (LLMs) with visual analytics, leveraging LLMs for the first time to automatically generate human-readable semantic descriptions of clusters while incorporating external contextual knowledge. This approach substantially lowers the barrier for non-experts to understand clustering outcomes, enhancing both the accessibility and reliability of interpretability in data analysis. The effectiveness of the method is validated through systematic evaluation, and the accompanying tool has been publicly released as open-source software.
📝 Abstract
Dimensionality reduction is a powerful technique for revealing structure and potential clusters in data. However, as the axes are complex, non-linear combinations of features, they often lack semantic interpretability. Existing visual analytics (VA) methods support cluster interpretation through feature comparison and interactive exploration, but they require technical expertise and intense human effort. We present \textit{LangLasso}, a novel method that complements VA approaches through interactive, natural language descriptions of clusters using large language models (LLMs). It produces human-readable descriptions that make cluster interpretation accessible to non-experts and allow integration of external contextual knowledge beyond the dataset. We systematically evaluate the reliability of these explanations and demonstrate that \langlasso provides an effective first step for engaging broader audiences in cluster interpretation. The tool is available at https://langlasso.vercel.app