🤖 AI Summary
This work addresses the fundamental disconnect between perception (state representation learning) and planning (search-based temporal reasoning) in traditional AI. To bridge this gap, we propose Compositional Temporal Reasoning Representations (CRTR), a method that jointly optimizes perception and temporal structure within a time-contrastive learning framework via a theoretically grounded compositional negative sampling mechanism—automatically suppressing spurious features and enabling end-to-end temporal reasoning. Our key contributions are threefold: (1) CRTR is the first approach to solve arbitrary Rubik’s Cube configurations *solely* through learned representations, eliminating the need for external search; (2) the learned representations exhibit strong generalization and intrinsic interpretability; and (3) CRTR achieves state-of-the-art performance on both Sokoban and Rubik’s Cube benchmarks—reducing search steps below Best-First Search on the latter. This work establishes a novel paradigm for unifying perception and reasoning in AI.
📝 Abstract
In classical AI, perception relies on learning state-based representations, while planning, which can be thought of as temporal reasoning over action sequences, is typically achieved through search. We study whether such reasoning can instead emerge from representations that capture both perceptual and temporal structure. We show that standard temporal contrastive learning, despite its popularity, often fails to capture temporal structure due to its reliance on spurious features. To address this, we introduce Combinatorial Representations for Temporal Reasoning (CRTR), a method that uses a negative sampling scheme to provably remove these spurious features and facilitate temporal reasoning. CRTR achieves strong results on domains with complex temporal structure, such as Sokoban and Rubik's Cube. In particular, for the Rubik's Cube, CRTR learns representations that generalize across all initial states and allow it to solve the puzzle using fewer search steps than BestFS, though with longer solutions. To our knowledge, this is the first method that efficiently solves arbitrary Cube states using only learned representations, without relying on an external search algorithm.