On the sample complexity of semi-supervised multi-objective learning

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This paper investigates the sample complexity of semi-supervised multi-objective learning (MOL) under task conflict. Conventional bounds depend on the overall complexity of the shared hypothesis class $mathcal{G}$, leading to inflated statistical costs. We first establish that, for general loss functions, this cost is unavoidable even in the ideal semi-supervised setting. In contrast, for Bregman losses, model complexity manifests solely through unlabeled data—enabling a substantial reduction in labeled-sample requirements. Leveraging a pseudolabeling framework, we derive tight upper bounds by integrating analysis of the Bayes-optimal solution and the covariate marginal distribution; these bounds are shown to be achievable via a simple pseudolabeling algorithm. Our core contribution is the theoretical demonstration that, under Bregman losses, unlabeled data fundamentally alleviate the labeling bottleneck—a result that provides the first rigorous justification for the empirical efficacy of pseudolabeling in semi-supervised MOL.

Technology Category

Application Category

📝 Abstract

In multi-objective learning (MOL), several possibly competing prediction tasks must be solved jointly by a single model. Achieving good trade-offs may require a model class $mathcal{G}$ with larger capacity than what is necessary for solving the individual tasks. This, in turn, increases the statistical cost, as reflected in known MOL bounds that depend on the complexity of $mathcal{G}$. We show that this cost is unavoidable for some losses, even in an idealized semi-supervised setting, where the learner has access to the Bayes-optimal solutions for the individual tasks as well as the marginal distributions over the covariates. On the other hand, for objectives defined with Bregman losses, we prove that the complexity of $mathcal{G}$ may come into play only in terms of unlabeled data. Concretely, we establish sample complexity upper bounds, showing precisely when and how unlabeled data can significantly alleviate the need for labeled data. These rates are achieved by a simple, semi-supervised algorithm via pseudo-labeling.

Problem

Research questions and friction points this paper is trying to address.

Analyzing sample complexity in semi-supervised multi-objective learning with competing tasks

Determining when unlabeled data reduces labeled data requirements for Bregman losses

Establishing upper bounds for semi-supervised algorithms via pseudo-labeling approach

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised algorithm via pseudo-labeling

Bregman losses reduce labeled data need

Unlabeled data alleviates model complexity cost

🔎 Similar Papers

Hyperparameter Importance Analysis for Multi-Objective AutoML