🤖 AI Summary
This work addresses the challenge of semi-supervised few-shot learning on tabular data under label scarcity, where existing approaches often rely on data augmentation strategies ill-suited to tabular structures. The authors propose SeBA, a novel augmentation-free joint embedding framework that introduces a “separate-and-align-from-birth” mechanism: the input is split into two complementary views, and the representation of one view is aligned to preserve the nearest-neighbor relationships observed in the other. By integrating view separation, nearest-neighbor modeling, and joint embedding alignment, SeBA forges a stronger association between features and labels. Extensive experiments demonstrate that SeBA significantly outperforms current methods across multiple tabular benchmarks, establishing a new paradigm for few-shot learning tailored specifically to tabular data.
📝 Abstract
Learning from scarce labeled data with a larger pool of unlabeled samples, known as semi-supervised few-shot learning (SS-FSL), remains critical for applications involving tabular data in domains like medicine, finance, and science. The existing SS-FSL methods often rely on self-supervised learning (SSL) frameworks developed for vision or language, which assume the availability of a natural form of data augmentations. For tabular data, defining meaningful augmentations is non-trivial and can easily distort semantics, limiting the effectiveness of conventional SSL. In this work, we rethink SSL for tabular data and propose Separated-at-Birth Alignment (SeBA), a joint-embedding framework for SS-FSL that eliminates the dependence on augmentations. Our core idea is to separate the data into two independent, but complementary views and align the representations of one view to mirror the nearest-neighbor correspondence of the data in the second view. Our experimental evaluation supported by a theoretical analysis justifies that SeBA generates an output space, which improves the feature-label relationship. An experimental study conducted in various benchmark datasets demonstrates that SeBA achieves the state-of-the-art performance in the majority of cases, opening a new avenue for SS-FSL paradigm in the domain of tabular data.