Is Temporal Difference Learning the Gold Standard for Stitching in RL?

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional wisdom in reinforcement learning posits that temporal-difference (TD) methods are the gold standard for experience stitching, while Monte Carlo (MC) methods inherently lack experience recombination capability. Method: This work systematically investigates the experience-stitching capacity of MC and TD under function approximation, evaluating their generalization performance across neural networks of varying capacities. Contribution/Results: We find that MC’s experience-stitching ability improves markedly with increasing model capacity—its performance gap relative to TD shrinks substantially and is dwarfed by gains from scaling. When the critic’s capacity is sufficiently large, the generalization gap between MC and TD vanishes almost entirely. These results indicate that TD’s inherent temporal inductive bias becomes less critical for experience stitching in large-scale function approximation settings; instead, scaling model capacity alone unlocks RL-specific generalization. This challenges the long-held view on MC/TD generalization mechanisms and offers a novel perspective for large-model-driven reinforcement learning.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) promises to solve long-horizon tasks even when training data contains only short fragments of the behaviors. This experience stitching capability is often viewed as the purview of temporal difference (TD) methods. However, outside of small tabular settings, trajectories never intersect, calling into question this conventional wisdom. Moreover, the common belief is that Monte Carlo (MC) methods should not be able to recombine experience, yet it remains unclear whether function approximation could result in a form of implicit stitching. The goal of this paper is to empirically study whether the conventional wisdom about stitching actually holds in settings where function approximation is used. We empirically demonstrate that Monte Carlo (MC) methods can also achieve experience stitching. While TD methods do achieve slightly stronger capabilities than MC methods (in line with conventional wisdom), that gap is significantly smaller than the gap between small and large neural networks (even on quite simple tasks). We find that increasing critic capacity effectively reduces the generalization gap for both the MC and TD methods. These results suggest that the traditional TD inductive bias for stitching may be less necessary in the era of large models for RL and, in some cases, may offer diminishing returns. Additionally, our results suggest that stitching, a form of generalization unique to the RL setting, might be achieved not through specialized algorithms (temporal difference learning) but rather through the same recipe that has provided generalization in other machine learning settings (via scale). Project website: https://michalbortkiewicz.github.io/golden-standard/
Problem

Research questions and friction points this paper is trying to address.

Challenges conventional wisdom about temporal difference learning for experience stitching
Empirically studies whether Monte Carlo methods can achieve experience stitching
Investigates if function approximation enables implicit stitching in reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monte Carlo methods achieve experience stitching with function approximation
Increasing critic capacity reduces generalization gap for MC and TD
Stitching achieved through scale rather than specialized TD algorithms
🔎 Similar Papers
No similar papers found.