🤖 AI Summary
This paper addresses the fundamental instability of rank-based input normalization under monotonic transformations, batch variations, and small perturbations. To formalize desiderata, we introduce three axioms characterizing minimal invariance and stability requirements for valid rank-normalization operators. We prove that any operator satisfying these axioms must decompose into a rank representation followed by a Lipschitz-monotonic scalarization mapping. This characterization reveals that mainstream differentiable sorting operators are inherently unstable due to their dependence on value gaps and pairwise interactions. Leveraging the theory, we construct the first minimally differentiable rank-normalization operator that strictly satisfies all axioms. Empirical evaluation on multi-task learning and robust classification tasks demonstrates its superior stability and practical necessity over existing methods.
📝 Abstract
Rank-based input normalization is a workhorse of modern machine learning, prized for its robustness to scale, monotone transformations, and batch-to-batch variation. In many real systems, the ordering of feature values matters far more than their raw magnitudes - yet the structural conditions that a rank-based normalization operator must satisfy to remain stable under these invariances have never been formally pinned down.
We show that widely used differentiable sorting and ranking operators fundamentally fail these criteria. Because they rely on value gaps and batch-level pairwise interactions, they are intrinsically unstable under strictly monotone transformations, shifts in mini-batch composition, and even tiny input perturbations. Crucially, these failures stem from the operators' structural design, not from incidental implementation choices.
To address this, we propose three axioms that formalize the minimal invariance and stability properties required of rank-based input normalization. We prove that any operator satisfying these axioms must factor into (i) a feature-wise rank representation and (ii) a scalarization map that is both monotone and Lipschitz-continuous. We then construct a minimal operator that meets these criteria and empirically show that the resulting constraints are non-trivial in realistic setups. Together, our results sharply delineate the design space of valid rank-based normalization operators and formally separate them from existing continuous-relaxation-based sorting methods.