🤖 AI Summary
This work addresses the online inverse linear optimization problem over M-convex action sets—a broad class encompassing structures such as matroids—and establishes polynomial-in-dimension regret bounds under adversarial feedback corruption. By leveraging the combinatorial structure of optimal solutions in M-convex sets together with geometric volume arguments, the authors design the first algorithm achieving a finite regret bound of $O(d \log d)$ in this setting. Furthermore, they introduce an adaptive robust mechanism that does not require prior knowledge of the number of corrupted rounds $C$. Through directed-graph-based feedback monitoring and corruption detection, the algorithm maintains a regret bound of $O((C+1)d \log d)$ under at most $C$ adversarially corrupted rounds, significantly improving upon previous exponential-in-$d$ bounds.
📝 Abstract
We study online inverse linear optimization, also known as contextual recommendation, where a learner sequentially infers an agent's hidden objective vector from observed optimal actions over feasible sets that change over time. The learner aims to recommend actions that perform well under the agent's true objective, and the performance is measured by the regret, defined as the cumulative gap between the agent's optimal values and those achieved by the learner's recommended actions. Prior work has established a regret bound of $O(d\log T)$, as well as a finite but exponentially large bound of $\exp(O(d\log d))$, where $d$ is the dimension of the optimization problem and $T$ is the time horizon, while a regret lower bound of $\Omega(d)$ is known (Gollapudi et al. 2021; Sakaue et al. 2025). Whether a finite regret bound polynomial in $d$ is achievable or not has remained an open question. We partially resolve this by showing that when the feasible sets are M-convex -- a broad class that includes matroids -- a finite regret bound of $O(d\log d)$ is possible. We achieve this by combining a structural characterization of optimal solutions on M-convex sets with a geometric volume argument. Moreover, we extend our approach to adversarially corrupted feedback in up to $C$ rounds. We obtain a regret bound of $O((C+1)d\log d)$ without prior knowledge of $C$, by monitoring directed graphs induced by the observed feedback to detect corruptions adaptively.