Contextual Linear Optimization with Partial Feedback

📅 2024-05-26

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This paper studies contextual linear optimization (CLO) under partial feedback—where the decision maker observes only the total path latency (bandit feedback) or edge-wise latencies (semi-bandit feedback), not the full cost vector. To address this realistic constraint, we propose a unified Induced Empirical Risk Minimization (IERM) framework. It is the first to establish fast-rate regret bounds under model misspecification and supports biased offline learning. The method integrates context-aware prediction, partial-feedback modeling, surrogate loss design, and IERM-based optimization, and is compatible with both bandit and semi-bandit feedback. Experiments on synthetic and real-world traffic datasets demonstrate significant improvements over baselines. Additionally, our analysis yields a novel fast-rate regret bound for misspecified policy classes even under full feedback.

Technology Category

Application Category

📝 Abstract

Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients in the objective and thereby improve decision-making performance. A canonical example is the stochastic shortest path problem with random edge costs (e.g., travel time) and contextual features (e.g., lagged traffic, weather). While existing work on CLO assumes fully observed cost coefficient vectors, in many applications the decision maker observes only partial feedback corresponding to each chosen decision in the history. In this paper, we study both a bandit-feedback setting (e.g., only the overall travel time of each historical path is observed) and a semi-bandit-feedback setting (e.g., travel times of the individual segments on each chosen path are additionally observed). We propose a unified class of offline learning algorithms for CLO with different types of feedback, following a powerful induced empirical risk minimization (IERM) framework that integrates estimation and optimization. We provide a novel fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of estimation methods. To solve the partial-feedback IERM, we also tailor computationally tractable surrogate losses. A byproduct of our theory of independent interest is the fast-rate regret bound for IERM with full feedback and a misspecified policy class. We compare the performance of different methods numerically using stochastic shortest path examples on simulated and real data and provide practical insights from the empirical results.

Problem

Research questions and friction points this paper is trying to address.

Addresses contextual linear optimization with partial feedback limitations

Develops offline learning algorithms for bandit and semi-bandit feedback settings

Provides fast-rate regret bounds for misspecified model classes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Induced empirical risk minimization framework integration

Tailored surrogate losses for partial feedback settings

Fast-rate regret bounds for misspecified model classes

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique