🤖 AI Summary
Existing contrastive planning methods learn only a single latent geometric structure, making it difficult to distinguish among multiple strategies that trade off efficiency against risk within the same task. This work proposes a preference-conditioned contrastive planning framework that enables continuous modulation of planning conservatism during inference through a scalar user-specified preference, without requiring retraining. The approach integrates Feature-wise Linear Modulation (FiLM) with low-rank neural modulation to jointly optimize the representation geometry and predictive operators while preserving computational efficiency in density ratio estimation. Evaluated across six environments, the method smoothly adjusts plan safety and significantly outperforms state-augmentation baselines, achieving notable improvements in both temporal coherence and alignment with user preferences.
📝 Abstract
Temporally contrastive representation learning induces a latent structure capable of reducing long-horizon planning to inference in a low-dimensional linear system. However, existing contrastive planning work learns a single latent geometry which cannot distinguish multiple valid behaviors trading task efficiency against risk exposure for the same start-goal query. We introduce MoMo, a preference-conditioned contrastive planner allowing a scalar user preference to continuously modulate plan conservativeness at inference time, without retraining. MoMo learns a joint conditioning of the representation geometry and latent prediction operator via Feature-Wise Linear Modulation and low-rank neural modulation, respectively. We show that our formulation preserves the probability density ratio encoded in the representation space that is required for inference-driven contrastive planning, further retaining its inference-time efficiency. Across six environments, MoMo smoothly adapts plan safety according to user preferences, yielding improved temporal and preferential consistency over state augmentation baselines.