Can Tabular Foundation Models Guide Exploration in Robot Policy Learning?

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenges of high-dimensional continuous control, where existing policy optimization methods often suffer from local optima, sensitivity to initialization, and prohibitively expensive global exploration. The authors propose TFM-S3, a novel approach that introduces a pretrained tabular foundation model into robotic policy learning for the first time. By dynamically constructing a low-dimensional policy subspace and combining high-frequency local updates with intermittent global search, TFM-S3 achieves efficient exploration and rapid convergence at low sample cost. The method integrates singular value decomposition, surrogate-model-guided optimization, and the TD3 framework, demonstrating significantly accelerated early-stage convergence on standard continuous control benchmarks and outperforming both TD3 and population-based methods under identical interaction budgets.

📝 Abstract

Policy optimization in high-dimensional continuous control for robotics remains a challenging problem. Predominant methods are inherently local and often require extensive tuning and carefully chosen initial guesses for good performance, whereas more global and less initialization-sensitive search methods typically incur high rollout costs. We propose TFM-S3, a tabular hybrid local-global method for improving global exploration in robot policy learning with limited rollout cost. We interleave high-frequency local updates with intermittent rounds of global search. In each search round, we construct a dynamically updated low-dimensional policy subspace via SVD and perform iterative surrogate-guided refinement within this space. A pretrained tabular foundation model predicts candidate returns from a small context set, enabling large-scale screening with limited rollout cost. Experiments on continuous control benchmarks show that TFM-S3 consistently accelerates early-stage convergence and improves final performance compared to TD3 and population-based baselines under an identical rollout budget. These results demonstrate that foundation models are a powerful new tool for creating sample-efficient policy learning methods for continuous control in robotics.

Problem

Research questions and friction points this paper is trying to address.

robot policy learning

high-dimensional continuous control

global exploration

sample efficiency

rollout cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tabular Foundation Model

Policy Optimization

Global Exploration

Sample Efficiency

Low-dimensional Subspace

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15

ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models

2024-03-14arXiv.orgCitations: 6

💼 Related Jobs

AI Research Scientist, Robotics