MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
Existing contrastive planning methods learn only a single latent geometric structure, making it difficult to distinguish among multiple strategies that trade off efficiency against risk within the same task. This work proposes a preference-conditioned contrastive planning framework that enables continuous modulation of planning conservatism during inference through a scalar user-specified preference, without requiring retraining. The approach integrates Feature-wise Linear Modulation (FiLM) with low-rank neural modulation to jointly optimize the representation geometry and predictive operators while preserving computational efficiency in density ratio estimation. Evaluated across six environments, the method smoothly adjusts plan safety and significantly outperforms state-augmentation baselines, achieving notable improvements in both temporal coherence and alignment with user preferences.
📝 Abstract
Temporally contrastive representation learning induces a latent structure capable of reducing long-horizon planning to inference in a low-dimensional linear system. However, existing contrastive planning work learns a single latent geometry which cannot distinguish multiple valid behaviors trading task efficiency against risk exposure for the same start-goal query. We introduce MoMo, a preference-conditioned contrastive planner allowing a scalar user preference to continuously modulate plan conservativeness at inference time, without retraining. MoMo learns a joint conditioning of the representation geometry and latent prediction operator via Feature-Wise Linear Modulation and low-rank neural modulation, respectively. We show that our formulation preserves the probability density ratio encoded in the representation space that is required for inference-driven contrastive planning, further retaining its inference-time efficiency. Across six environments, MoMo smoothly adapts plan safety according to user preferences, yielding improved temporal and preferential consistency over state augmentation baselines.
Problem

Research questions and friction points this paper is trying to address.

contrastive representation learning
preference modulation
long-horizon planning
latent geometry
behavioral trade-offs
Innovation

Methods, ideas, or system contributions that make the work stand out.

preference-conditioned planning
contrastive representation learning
Feature-Wise Linear Modulation
low-rank neural modulation
inference-driven planning