Can Tabular Foundation Models Guide Exploration in Robot Policy Learning?

๐Ÿ“… 2026-04-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

217K/year
๐Ÿค– AI Summary
This work addresses the challenges of high-dimensional continuous control, where existing policy optimization methods often suffer from local optima, sensitivity to initialization, and prohibitively expensive global exploration. The authors propose TFM-S3, a novel approach that introduces a pretrained tabular foundation model into robotic policy learning for the first time. By dynamically constructing a low-dimensional policy subspace and combining high-frequency local updates with intermittent global search, TFM-S3 achieves efficient exploration and rapid convergence at low sample cost. The method integrates singular value decomposition, surrogate-model-guided optimization, and the TD3 framework, demonstrating significantly accelerated early-stage convergence on standard continuous control benchmarks and outperforming both TD3 and population-based methods under identical interaction budgets.
๐Ÿ“ Abstract
Policy optimization in high-dimensional continuous control for robotics remains a challenging problem. Predominant methods are inherently local and often require extensive tuning and carefully chosen initial guesses for good performance, whereas more global and less initialization-sensitive search methods typically incur high rollout costs. We propose TFM-S3, a tabular hybrid local-global method for improving global exploration in robot policy learning with limited rollout cost. We interleave high-frequency local updates with intermittent rounds of global search. In each search round, we construct a dynamically updated low-dimensional policy subspace via SVD and perform iterative surrogate-guided refinement within this space. A pretrained tabular foundation model predicts candidate returns from a small context set, enabling large-scale screening with limited rollout cost. Experiments on continuous control benchmarks show that TFM-S3 consistently accelerates early-stage convergence and improves final performance compared to TD3 and population-based baselines under an identical rollout budget. These results demonstrate that foundation models are a powerful new tool for creating sample-efficient policy learning methods for continuous control in robotics.
Problem

Research questions and friction points this paper is trying to address.

robot policy learning
high-dimensional continuous control
global exploration
sample efficiency
rollout cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tabular Foundation Model
Policy Optimization
Global Exploration
Sample Efficiency
Low-dimensional Subspace