🤖 AI Summary
To address low sample efficiency and the need for manually specified task similarities in real-world multi-task reinforcement learning (RL), this paper proposes a novel low-rank approximation method based on higher-order Q-tensors—the first to incorporate tensor decomposition into value function modeling for multi-task RL. By representing the shared Q-function structure via low-rank tensorization, our approach jointly infers task similarities and learns policies during optimization, eliminating reliance on prior assumptions about task relatedness. Integrated with stochastic optimization, it significantly improves data efficiency. Evaluated on two realistic benchmarks—cart-pole control and multi-device wireless communication—our method achieves 37%–52% higher sample efficiency compared to state-of-the-art approaches, while demonstrating substantially improved cross-task generalization performance.
📝 Abstract
In pursuit of reinforcement learning systems that could train in physical environments, we investigate multi-task approaches as a means to alleviate the need for massive data acquisition. In a tabular scenario where the Q-functions are collected across tasks, we model our learning problem as optimizing a higher order tensor structure. Recognizing that close-related tasks may require similar actions, our proposed method imposes a low-rank condition on this aggregated Q-tensor. The rationale behind this approach to multi-task learning is that the low-rank structure enforces the notion of similarity, without the need to explicitly prescribe which tasks are similar, but inferring this information from a reduced amount of data simultaneously with the stochastic optimization of the Q-tensor. The efficiency of our low-rank tensor approach to multi-task learning is demonstrated in two numerical experiments, first in a benchmark environment formed by a collection of inverted pendulums, and then into a practical scenario involving multiple wireless communication devices.