π€ AI Summary
This paper addresses the fundamental question of whether polynomial-time algorithms exist for βsorting-based pivot rulesβ in three canonical optimization and decision problems: the simplex method for linear programming, policy iteration for Markov decision processes (MDPs), and strategy improvement for parity games. To overcome the fragmentation and lack of unification in prior lower-bound constructions, we introduce the first information-access abstraction framework tailored to sorting-based pivot rules, formally capturing their dependence on input ordering. Within this framework, we establish: (1) a superpolynomial lower bound for strategy improvement in parity games; (2) a subexponential lower bound for policy iteration in MDPs; and (3) the transferability of both bounds to the simplex method under sorting-based pivoting. Our results achieve a systematic unification of lower-bound analysis across these three domains, enabling deep cross-validation and revealing shared structural barriers to efficient pivot selection.
π Abstract
The existence of a polynomial pivot rule for the simplex method for linear programming, policy iteration for Markov decision processes, and strategy improvement for parity games each are prominent open problems in their respective fields. While numerous natural candidates for efficient rules have been eliminated, all existing lower bound constructions are tailored to individual or small sets of pivot rules. We introduce a unified framework for formalizing classes of rules according to the information about the input that they rely on. Within this framework, we show lower bounds for emph{ranking-based} classes of rules that base their decisions on orderings of the improving pivot steps induced by the underlying data. Our first result is a superpolynomial lower bound for strategy improvement, obtained via a family of sink parity games, which applies to memory-based generalizations of Bland's rule that only access the input by comparing the ranks of improving edges in some global order. Our second result is a subexponential lower bound for policy iteration, obtained via a family of Markov decision processes, which applies to memoryless rules that only access the input by comparing improving actions according to their ranks in a global order, their reduced costs, and the associated improvements in objective value. Both results carry over to the simplex method for linear programming.