🤖 AI Summary
This paper studies online objective inference in inverse linear optimization: recovering an agent’s hidden linear objective function from sequential input-output responses without assuming optimality. Methodologically, it establishes the first theoretical connection between inverse linear optimization and online convex optimization with Fenchel–Young loss, introducing dual analysis and gap-dependent regret analysis. Its main contributions are threefold: (1) It proposes an offline suboptimality loss guarantee that eliminates reliance on the agent’s optimal responses; (2) Under a target-value interval assumption, it achieves a time-invariant upper bound on cumulative loss—surpassing the standard $O(sqrt{T})$ regret rate; (3) Without requiring strong convexity or other conventional assumptions, it proves that the sum of suboptimality loss and estimation loss admits a $T$-invariant upper bound, significantly improving convergence rate and robustness while enhancing interpretability of the inferred objective.
📝 Abstract
This paper revisits the online learning approach to inverse linear optimization studied by B""armann et al. (2017), where the goal is to infer an unknown linear objective function of an agent from sequential observations of the agent's input-output pairs. First, we provide a simple understanding of the online learning approach through its connection to online convex optimization of emph{Fenchel--Young losses}. As a byproduct, we present an offline guarantee on the emph{suboptimality loss}, which measures how well predicted objectives explain the agent's choices, without assuming the optimality of the agent's choices. Second, assuming that there is a gap between optimal and suboptimal objective values in the agent's decision problems, we obtain an upper bound independent of the time horizon $T$ on the sum of suboptimality and emph{estimate losses}, where the latter measures the quality of solutions recommended by predicted objectives. Interestingly, our gap-dependent analysis achieves a faster rate than the standard $O(sqrt{T})$ regret bound by exploiting structures specific to inverse linear optimization, even though neither the loss functions nor their domains enjoy desirable properties, such as strong convexity.