Explicit and Non-asymptotic Query Complexities of Rank-Based Zeroth-order Algorithm on Stochastic Smooth Functions

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work studies zero-order stochastic optimization relying solely on ordinal feedback (e.g., pairwise preferences), motivated by practical settings such as human-in-the-loop reinforcement learning. For stochastically smoothed objective functions, we propose the first rank-based zero-order optimization algorithm and establish, for the first time, explicit, non-asymptotic query complexity bounds. Theoretical analysis shows that, under both convex and non-convex assumptions, the algorithm achieves query efficiency matching that of optimal value-based methods; moreover, purely ordinal information suffices to attain information-theoretic optimality under stochastic smoothness. Departing from conventional drift analysis and information-geometric frameworks, we introduce novel analytical tools to characterize the information capacity of ordinal feedback. To our knowledge, this is the first work providing rigorous, non-asymptotic theoretical guarantees for preference-driven optimization.

Technology Category

Application Category

📝 Abstract

Zeroth-order (ZO) optimization with ordinal feedback has emerged as a fundamental problem in modern machine learning systems, particularly in human-in-the-loop settings such as reinforcement learning from human feedback, preference learning, and evolutionary strategies. While rank-based ZO algorithms enjoy strong empirical success and robustness properties, their theoretical understanding, especially under stochastic objectives and standard smoothness assumptions, remains limited. In this paper, we study rank-based zeroth-order optimization for stochastic functions where only ordinal feedback of the stochastic function is available. We propose a simple and computationally efficient rank-based ZO algorithm. Under standard assumptions including smoothness, strong convexity, and bounded second moments of stochastic gradients, we establish explicit non-asymptotic query complexity bounds for both convex and nonconvex objectives. Notably, our results match the best-known query complexities of value-based ZO algorithms, demonstrating that ordinal information alone is sufficient for optimal query efficiency in stochastic settings. Our analysis departs from existing drift-based and information-geometric techniques, offering new tools for the study of rank-based optimization under noise. These findings narrow the gap between theory and practice and provide a principled foundation for optimization driven by human preferences.

Problem

Research questions and friction points this paper is trying to address.

Develops rank-based zeroth-order algorithm for stochastic smooth functions

Establishes explicit non-asymptotic query complexity bounds

Demonstrates ordinal feedback suffices for optimal query efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rank-based ZO algorithm with ordinal feedback

Explicit non-asymptotic query complexity bounds

Matches value-based ZO algorithm efficiency

🔎 Similar Papers

No similar papers found.