Contextual Online Decision Making with Infinite-Dimensional Functional Regression

📅 2025-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In contextual sequential decision-making, simultaneously optimizing multiple statistical objectives—such as expectation, variance, and quantiles—remains challenging. Method: We propose the first infinite-dimensional functional regression framework for online learning of the full conditional cumulative distribution function (CDF), directly modeling the response’s conditional CDF rather than relying on conventional single-statistic parametric models. Contributions: (i) We establish a tight regret bound of $ ilde{mathcal{O}}(T^{frac{3gamma+2}{2(gamma+2)}})$, where $gamma$ characterizes the eigenvalue decay rate of the associated integral operator—recovering the optimal finite-dimensional rate when $gamma = 0$; (ii) we derive a computationally tractable spectral estimation procedure for practical implementation; and (iii) we achieve the first theoretical generalization from finite- to infinite-dimensional contextual bandits, preserving optimality. This work unifies multi-objective optimization under distributional uncertainty with provably optimal nonparametric learning.

Technology Category

Application Category

📝 Abstract
Contextual sequential decision-making problems play a crucial role in machine learning, encompassing a wide range of downstream applications such as bandits, sequential hypothesis testing and online risk control. These applications often require different statistical measures, including expectation, variance and quantiles. In this paper, we provide a universal admissible algorithm framework for dealing with all kinds of contextual online decision-making problems that directly learns the whole underlying unknown distribution instead of focusing on individual statistics. This is much more difficult because the dimension of the regression is uncountably infinite, and any existing linear contextual bandits algorithm will result in infinite regret. To overcome this issue, we propose an efficient infinite-dimensional functional regression oracle for contextual cumulative distribution functions (CDFs), where each data point is modeled as a combination of context-dependent CDF basis functions. Our analysis reveals that the decay rate of the eigenvalue sequence of the design integral operator governs the regression error rate and, consequently, the utility regret rate. Specifically, when the eigenvalue sequence exhibits a polynomial decay of order $frac{1}{gamma}ge 1$, the utility regret is bounded by $ ilde{mathcal{O}}Big(T^{frac{3gamma+2}{2(gamma+2)}}Big)$. By setting $gamma=0$, this recovers the existing optimal regret rate for contextual bandits with finite-dimensional regression and is optimal under a stronger exponential decay assumption. Additionally, we provide a numerical method to compute the eigenvalue sequence of the integral operator, enabling the practical implementation of our framework.
Problem

Research questions and friction points this paper is trying to address.

Online Decision Making
Complex Mathematical Models
Machine Learning Algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online Decision Making
Complex Mathematical Model Handling
Regret Minimization
🔎 Similar Papers
No similar papers found.
H
Haichen Hu
Center for Computational Science and Engineering, MIT; Department of Civil and Environmental Engineering, MIT
Rui Ai
Rui Ai
Massachusetts Institute of Technology
reinforcement learninggame theory
Stephen Bates
Stephen Bates
Assistant Professor, MIT EECS
StatisticsMachine LearningArtificial IntelligenceUncertainty Quantification
D
David Simchi-Levi
Institute for Data, Systems, and Society, MIT; Department of Civil and Environmental Engineering, MIT