Automated Machine Learning for Unsupervised Tabular Tasks

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Unsupervised tabular data tasks—such as anomaly detection and clustering—lack automated, cross-dataset model selection methods. Method: This paper proposes LOTUS, the first framework leveraging optimal transport (OT) distance for unsupervised multi-task model recommendation. LOTUS measures distributional similarity between unlabeled datasets and integrates a historical performance memory mechanism to recommend appropriate machine learning pipelines for new datasets within a unified paradigm. Contribution/Results: LOTUS pioneers the application of optimal transport in unsupervised AutoML, enabling joint modeling and generalized recommendation across anomaly detection and clustering tasks. Extensive experiments on multiple benchmark datasets demonstrate that LOTUS significantly outperforms existing baselines, substantially improving both model selection effectiveness and cross-dataset generalization capability.

Technology Category

Application Category

📝 Abstract

In this work, we present LOTUS (Learning to Learn with Optimal Transport for Unsupervised Scenarios), a simple yet effective method to perform model selection for multiple unsupervised machine learning(ML) tasks such as outlier detection and clustering. Our intuition behind this work is that a machine learning pipeline will perform well in a new dataset if it previously worked well on datasets with a similar underlying data distribution. We use Optimal Transport distances to find this similarity between unlabeled tabular datasets and recommend machine learning pipelines with one unified single method on two downstream unsupervised tasks: outlier detection and clustering. We present the effectiveness of our approach with experiments against strong baselines and show that LOTUS is a very promising first step toward model selection for multiple unsupervised ML tasks.

Problem

Research questions and friction points this paper is trying to address.

Selecting optimal ML pipelines for unsupervised tabular tasks automatically

Using Optimal Transport to measure similarity between unlabeled datasets

Recommending pipelines for outlier detection and clustering tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Optimal Transport for dataset similarity

Recommends pipelines for outlier detection tasks

Applies unified method to unsupervised learning scenarios

🔎 Similar Papers

No similar papers found.