Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

📅 2024-06-24

🏛️ International Conference on Machine Learning

📈 Citations: 4

✨ Influential: 0

career value

210K/year

🤖 AI Summary

In stochastic environments, existing temporal distance definitions violate the triangle inequality, impairing generalization in path planning and goal-conditioned reinforcement learning, and yielding inaccurate path estimates. Method: We propose a novel framework integrating contrastive learning, successor representations, and quasimetric theory to estimate structured temporal distances in high-dimensional stochastic settings. Specifically, we apply a carefully designed variable transformation to contrastively learned successor features. Contribution/Results: We formally prove that the resulting transformed features induce a temporal distance satisfying the triangle inequality—resolving a long-standing challenge in constructing metric-like structures for stochastic dynamical systems. Empirically, the learned distance enables compositional generalization (e.g., path concatenation), accelerates policy learning significantly, and outperforms state-of-the-art quasimetric baselines across diverse stochastic benchmarks.

Technology Category

Application Category

📝 Abstract

Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning that involve reaching goals, allowing one to estimate the transit time between two states. However, prior attempts to define such temporal distances in stochastic settings have been stymied by an important limitation: these prior approaches do not satisfy the triangle inequality. This is not merely a definitional concern, but translates to an inability to generalize and find shortest paths. In this paper, we build on prior work in contrastive learning and quasimetrics to show how successor features learned by contrastive learning (after a change of variables) form a temporal distance that does satisfy the triangle inequality, even in stochastic settings. Importantly, this temporal distance is computationally efficient to estimate, even in high-dimensional and stochastic settings. Experiments in controlled settings and benchmark suites demonstrate that an RL algorithm based on these new temporal distances exhibits combinatorial generalization (i.e.,"stitching") and can sometimes learn more quickly than prior methods, including those based on quasimetrics.

Problem

Research questions and friction points this paper is trying to address.

Defining temporal distances in stochastic settings

Ensuring temporal distances satisfy triangle inequality

Enhancing RL algorithms for faster learning and generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning enables temporal distance estimation.

Triangle inequality satisfied in stochastic environments.

Efficient computation in high-dimensional settings.

🔎 Similar Papers

No similar papers found.