🤖 AI Summary
This work addresses the challenge of achieving combinatorial generalization in sequential decision-making by effectively reusing local transition structures within trajectories, a capability often overlooked by existing methods that neglect their geometric and dynamical information. The authors propose Matrix-space Reinforcement Learning (MSRL), which, for the first time, encodes local transitions as coordinate-invariant positive semi-definite matrix descriptors endowed with additive completeness, compositional additivity, and minimal sufficiency. By aggregating first- and second-order statistics of trajectory segments, MSRL enables algebraic composition and transfer in an abstract matrix space. The framework is compatible with mainstream model-free and model-based algorithms and incorporates matrix embedding, value mapping, and barrier filtering mechanisms. Experiments demonstrate that under limited training budgets, MSRL achieves a target AUC of 0.73, significantly outperforming TD-MPC (0.57), TD-MPC-PT+FT (0.63), and MSRL trained from scratch (0.65).
📝 Abstract
Compositional generalization in sequential decision-making requires identifying which parts of prior rollouts remain useful for new tasks. Existing methods reuse skills or predictive models, but often overlook rich local transition geometry and dynamics. We propose Matrix-Space Reinforcement Learning (MSRL), a geometric abstraction that represents trajectory segments through positive semidefinite matrix descriptors aggregating first- and second-order statistics of lifted one-step transitions. These descriptors expose shared hidden structure, support algebraic composition in an abstract matrix space, and reveal opportunities for transfer. We prove that the descriptor is well defined up to coordinate gauge, complete for the induced low-order additive signal class, additive under valid segment composition, and minimally sufficient among admissible additive descriptors. We further show that conditioning value functions on the trajectory-segment matrix yields a first-order smooth approximation of action values, enabling source-learned matrix-to-value mappings to bootstrap learning in new tasks. MSRL is plug-in compatible with standard model-free and model-based methods, while obstruction filtering rejects implausible compositions. Empirically, MSRL achieves the best average finite-budget target AUC of 0.73, outperforming MSRL from scratch (0.65), TD-MPC-PT+FT (0.63), and TD-MPC (0.57).