Cold Start Active Preference Learning in Socio-Economic Domains

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address the cold-start problem arising from label scarcity in preference learning for socioeconomic applications, this paper proposes a fully unsupervised active preference learning framework requiring no initial labeled data. Methodologically, it introduces self-supervised pretraining—novel in this domain—leveraging PCA to uncover intrinsic data structure and generate high-quality pseudo-labels, enabling true zero-shot initialization. Subsequently, an uncertainty-driven active querying strategy is employed to emulate human feedback under noisy conditions. Experiments across multiple real-world socioeconomic datasets demonstrate that the approach achieves significantly higher model accuracy and sample efficiency using fewer pairwise annotations, substantially reducing reliance on expert labeling. The core contribution lies in the pioneering integration of self-supervised pretraining with active learning, effectively resolving the zero-shot cold-start challenge in preference learning.

Technology Category

Application Category

📝 Abstract

Active preference learning is a powerful paradigm for efficiently modeling preferences, yet it suffers from the cold-start problem: a significant drop in performance when no initial labeled data is available. This challenge is particularly acute in computational social systems and economic analysis, where labeled data is often scarce, expensive, and subject to expert noise. To address this gap, we propose a novel framework for cold-start active preference learning. Our method initiates the learning process through a self-supervised pre-training phase, utilizing Principal Component Analysis (PCA) to derive initial pseudo-labels from the data's inherent structure, thereby creating a cold-start model without any initial oracle interaction. Subsequently, the model is refined through an active learning loop that strategically queries a simulated noisy oracle for labels. We conduct extensive experiments on diverse datasets from different domains, including financial credibility, career success rate, and socio-economic status. The results demonstrate that our cold-start approach outperforms standard active learning strategies that begin from a blank slate, achieving higher accuracy with substantially fewer labeled pairs. Our framework offers a practical and effective solution to mitigate the cold-start problem, enhancing the sample efficiency and applicability of preference learning in data-constrained environments. We release our code at https://github.com/Dan-A2/cold-start-preference-learning

Problem

Research questions and friction points this paper is trying to address.

Addresses cold-start issue in active preference learning

Improves accuracy with fewer labeled data pairs

Enhances preference learning in data-scarce domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised pre-training with PCA

Active learning with noisy oracle queries

Cold-start model without initial labels

🔎 Similar Papers

No similar papers found.