Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address the scarcity of human preference data and high annotation costs in large language model (LLM) reward modeling, this paper proposes LENS, a latent-space preference data synthesis framework. Unlike conventional text-space methods that generate response pairs directly, LENS pioneers preference synthesis in the model’s latent space: it employs a variational autoencoder (VAE) to learn structured representations of response embeddings and generates semantically coherent, order-preserving synthetic preference pairs via controlled latent-variable perturbations. Theoretically, LENS is proven to preserve the original preference ordering and enhance reward model generalization. Experiments demonstrate that LENS significantly outperforms text-based data augmentation on standard benchmarks, achieving an 18× speedup in synthesis, reducing parameter count by 16,000×, and simultaneously improving both reward model accuracy and training efficiency.

Technology Category

Application Category

📝 Abstract

Reward modeling, crucial for aligning large language models (LLMs) with human preferences, is often bottlenecked by the high cost of preference data. Existing textual data synthesis methods are computationally expensive. We propose a novel framework LENS for synthesizing preference data directly in the LLM's latent embedding space. Our method employs a Variational Autoencoder (VAE) to learn a structured latent representation of response embeddings. By performing controlled perturbations in this latent space and decoding back to the embedding space, we efficiently generate diverse, semantically consistent synthetic preference pairs, bypassing costly text generation and annotation. We provide theoretical guarantees that our synthesized pairs approximately preserve original preference ordering and improve reward model generalization. Empirically, our latent-space synthesis significantly outperforms text-based augmentation on standard benchmarks, achieving superior results while being 18x faster in generation and using a 16,000x smaller model. Our work offers a scalable and effective alternative for enhancing reward modeling through efficient data augmentation. Code is publicly available at https://github.com/deeplearning-wisc/lens

Problem

Research questions and friction points this paper is trying to address.

Reward modeling bottlenecked by high-cost preference data collection

Existing text synthesis methods are computationally expensive

Proposes latent-space synthesis to generate diverse preference pairs efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesizes preference data in latent embedding space

Uses VAE for structured latent representation learning

Generates synthetic pairs via controlled latent perturbations

🔎 Similar Papers

Preference Elicitation for Offline Reinforcement Learning