๐ค AI Summary
This work addresses the problem of unsupervised continuous severity scoring under label scarcity or ambiguous label definitionsโe.g., ill-defined clinical disease progression criteria. We propose a novel unsupervised scoring framework integrating representation learning, side-information modeling, and metric learning. For the first time, we formalize clinical symptoms and domain-specific constraints as semantic constraints or auxiliary signals, and design an end-to-end trainable semantic triplet architecture that eliminates reliance on explicit labels. Our method introduces a constraint-aware loss function that jointly optimizes structured side-information encoding and pairwise/triplet metric learning. Evaluated on standard benchmarks and real-world biomedical electronic health records, the approach significantly outperforms baselines: the learned severity scores achieve high concordance with clinical assessments (Pearson *r* > 0.82), while demonstrating strong interpretability and cross-institutional generalizability.
๐ Abstract
Common machine learning settings range from supervised tasks, where accurately labeled data is accessible, through semi-supervised and weakly-supervised tasks, where target labels are scant or noisy, to unsupervised tasks where labels are unobtainable. In this paper we study a scenario where the target labels are not available but additional related information is at hand. This information, referred to as Side Information, is either correlated with the unknown labels or imposes constraints on the feature space. We formulate the problem as an ensemble of three semantic components: representation learning, side information and metric learning. The proposed scoring model is advantageous for multiple use-cases. For example, in the healthcare domain it can be used to create a severity score for diseases where the symptoms are known but the criteria for the disease progression are not well defined. We demonstrate the utility of the suggested scoring system on well-known benchmark data-sets and bio-medical patient records.