🤖 AI Summary
This work addresses the error accumulation in long-horizon sequence learning caused by model misspecification in autoregressive modeling. By unifying the training objective, evaluation metric, and approximation measure under a joint KL-divergence framework, it systematically characterizes how sequence length affects both approximation and estimation errors. The paper introduces, for the first time, a horizon-independent approximation factor and leverages information-theoretic analysis, oracle inequalities, and policy class complexity to establish matching upper and lower bounds: an information-theoretic estimation lower bound of Ω(H) and a corresponding upper bound of Õ(H). The resulting policy learning regret matches the best-known rates in imitation learning, revealing that the choice of divergence is the primary source of error amplification and thereby validating the theoretical optimality of the proposed approach.
📝 Abstract
We study the fundamental and timely problem of learning long sequences in autoregressive modeling and next-token prediction under model misspecification, measured by the joint Kullback--Leibler (KL) divergence. Our goal is to characterize how the sequence horizon \(H\) affects both approximation and estimation errors in this joint-distribution, sequence-level regime. By establishing matching upper and lower bounds, we provide, to our knowledge, the first complete characterization of long-horizon error behavior under the natural joint KL objective, with improved rates and optimality justification relative to existing work. On the approximation side, we show that joint KL admits a horizon-free approximation factor, in sharp contrast to Hellinger-based analyses that exhibit an \(Ω(H)\) dependence for computationally efficient methods; this isolates the choice of divergence as the source of approximation amplification. On the estimation side, we prove a fundamental information-theoretic lower bound of order \(Ω(H)\) that holds for both decomposable policy classes and fully shared policies, matching the \(\widetilde O(H)\) upper bounds achieved by computationally efficient algorithms. Our analysis clarifies the landscape of recent autoregressive learning results by aligning the log-loss training objective, the sequence-level evaluation metric, and the approximation metric {\color{black}through a sharp joint-KL oracle theory}. We further show that these joint-KL guarantees imply policy learning regret bounds at rates matching prior imitation learning literature.