Offline Clustering of Preference Learning with Active-data Augmentation

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses multi-user preference clustering in offline preference learning under challenging conditions: scarce pairwise feedback, severe cross-user and cross-preference-dimension data imbalance. The goal is to improve utility estimation for unseen test users. We propose a joint “offline clustering + active data augmentation” framework—the first to integrate preference clustering with active sampling. We theoretically characterize the trade-off between sample noise and bias, and derive an error bound that guides selection of the most informative samples to mitigate dimensional imbalance. Our method unifies pairwise preference modeling, spectral clustering, and theory-driven active learning. Extensive experiments on synthetic and real-world datasets demonstrate significant improvements over pure offline baselines, validating its dual advantages in data efficiency and generalization performance.

Technology Category

Application Category

📝 Abstract

Preference learning from pairwise feedback is a widely adopted framework in applications such as reinforcement learning with human feedback and recommendations. In many practical settings, however, user interactions are limited or costly, making offline preference learning necessary. Moreover, real-world preference learning often involves users with different preferences. For example, annotators from different backgrounds may rank the same responses differently. This setting presents two central challenges: (1) identifying similarity across users to effectively aggregate data, especially under scenarios where offline data is imbalanced across dimensions, and (2) handling the imbalanced offline data where some preference dimensions are underrepresented. To address these challenges, we study the Offline Clustering of Preference Learning problem, where the learner has access to fixed datasets from multiple users with potentially different preferences and aims to maximize utility for a test user. To tackle the first challenge, we first propose Off-C$^2$PL for the pure offline setting, where the learner relies solely on offline data. Our theoretical analysis provides a suboptimality bound that explicitly captures the tradeoff between sample noise and bias. To address the second challenge of inbalanced data, we extend our framework to the setting with active-data augmentation where the learner is allowed to select a limited number of additional active-data for the test user based on the cluster structure learned by Off-C$^2$PL. In this setting, our second algorithm, A$^2$-Off-C$^2$PL, actively selects samples that target the least-informative dimensions of the test user's preference. We prove that these actively collected samples contribute more effectively than offline ones. Finally, we validate our theoretical results through simulations on synthetic and real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Addresses offline preference learning with multiple users having diverse preferences

Tackles data imbalance across preference dimensions in offline datasets

Enhances learning through active-data augmentation for underrepresented dimensions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline clustering groups users by preference similarity

Active-data augmentation targets underrepresented preference dimensions

Algorithms balance sample noise and bias tradeoffs

🔎 Similar Papers

Information-Theoretic Active Correlation Clustering