TabQL: In-Context Q-Learning with Tabular Foundation Models

πŸ“… 2026-05-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

199K/year
πŸ€– AI Summary
This work addresses the limitations of traditional deep Q-networks (DQNs) in sample efficiency and rapid adaptation to new environments. The authors propose TabQL, a novel framework that introduces, for the first time, a tabular foundation model endowed with in-context learning capabilities into Q-learning. TabQL models state–action–Q-value tuples as sequences in a sequence-to-sequence manner, enabling zero-shot or few-shot Q-value inference. A DQN-guided warm-start mechanism enhances context quality, while online adaptation is achieved through Bellman updates. Theoretical analysis demonstrates that TabQL converges under mild assumptions and exhibits reduced sample complexity. Empirical results across multiple benchmark tasks show that TabQL significantly outperforms DQN, achieving substantially improved sample efficiency.
πŸ“ Abstract
We propose Tabular Q-Learning (TabQL), a reinforcement learning framework that replaces the conventional parametric Q-network in Deep Q-Learning (DQN) with a tabular foundation model endowed with in-context learning capabilities. The key idea is to represent Q-values through a sequence-to-sequence foundation model operating over a tabularized representation of state-action-Q-value tuples, enabling rapid adaptation from limited online interaction by conditioning on recent experience. TabQL departs from classical DQN by leveraging (i) zero- or few-shot Q-value inference via in-context updates, and (ii) a warm-up phase using standard DQN to bootstrap high-quality context. Particularly, to enhance the context quality, new transitions are generated by executing actions output by TabQL with predicted Q values from DQN. We formalize TabQL, analyze its convergence and sample complexity under mild assumptions, and show that TabQL interpolates between vanilla Q-learning and DQN with in-context learning. Our analysis demonstrates that TabQL achieves improved efficiency compared to DQN by amortizing Bellman updates through in-context learning. Extensive numerical experiments with several benchmarks showcase the effectiveness and efficacy of the proposed TabQL.
Problem

Research questions and friction points this paper is trying to address.

reinforcement learning
Q-learning
in-context learning
tabular foundation models
sample efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning
tabular foundation model
Q-learning
reinforcement learning
sample efficiency
πŸ”Ž Similar Papers
2024-02-11arXiv.orgCitations: 3