Contextual Linear Optimization with Partial Feedback

๐Ÿ“… 2024-05-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper studies contextual linear optimization (CLO) under partial feedbackโ€”where the decision maker observes only the total path latency (bandit feedback) or edge-wise latencies (semi-bandit feedback), not the full cost vector. To address this realistic constraint, we propose a unified Induced Empirical Risk Minimization (IERM) framework. It is the first to establish fast-rate regret bounds under model misspecification and supports biased offline learning. The method integrates context-aware prediction, partial-feedback modeling, surrogate loss design, and IERM-based optimization, and is compatible with both bandit and semi-bandit feedback. Experiments on synthetic and real-world traffic datasets demonstrate significant improvements over baselines. Additionally, our analysis yields a novel fast-rate regret bound for misspecified policy classes even under full feedback.

Technology Category

Application Category

๐Ÿ“ Abstract
Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients in the objective and thereby improve decision-making performance. A canonical example is the stochastic shortest path problem with random edge costs (e.g., travel time) and contextual features (e.g., lagged traffic, weather). While existing work on CLO assumes fully observed cost coefficient vectors, in many applications the decision maker observes only partial feedback corresponding to each chosen decision in the history. In this paper, we study both a bandit-feedback setting (e.g., only the overall travel time of each historical path is observed) and a semi-bandit-feedback setting (e.g., travel times of the individual segments on each chosen path are additionally observed). We propose a unified class of offline learning algorithms for CLO with different types of feedback, following a powerful induced empirical risk minimization (IERM) framework that integrates estimation and optimization. We provide a novel fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of estimation methods. To solve the partial-feedback IERM, we also tailor computationally tractable surrogate losses. A byproduct of our theory of independent interest is the fast-rate regret bound for IERM with full feedback and a misspecified policy class. We compare the performance of different methods numerically using stochastic shortest path examples on simulated and real data and provide practical insights from the empirical results.
Problem

Research questions and friction points this paper is trying to address.

Addresses contextual linear optimization with partial feedback limitations
Develops offline learning algorithms for bandit and semi-bandit feedback settings
Provides fast-rate regret bounds for misspecified model classes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Induced empirical risk minimization framework integration
Tailored surrogate losses for partial feedback settings
Fast-rate regret bounds for misspecified model classes
๐Ÿ”Ž Similar Papers
No similar papers found.
Yichun Hu
Yichun Hu
Cornell University
Nathan Kallus
Nathan Kallus
Cornell University
Optimization under uncertaintyCausal inferenceBanditsRLFairness
X
Xiaojie Mao
Tsinghua University, 100084 Beijing, China
Y
Yanchen Wu
Tsinghua University, 100084 Beijing, China