First-Order Softmax Weighted Switching Gradient Method for Distributed Stochastic Minimax Optimization with Stochastic Constraints

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses worst-case client performance optimization in federated learning by studying distributed stochastic minimax optimization with random constraints. The authors propose a single-loop, first-order method that updates only primal variables using a Softmax-weighted switching gradient scheme, achieving unified optimal bounds on both optimality and feasibility errors under both full and partial client participation settings. Departing from conventional dual or penalty-based frameworks, the approach avoids hyperparameter sensitivity and oscillatory convergence. By establishing a tighter lower bound on the Softmax temperature parameter, the method provides high-probability convergence guarantees under mild assumptions. Theoretical analysis shows that the algorithm attains the standard oracle complexity of 𝒪(ε⁻⁴) and a high-probability convergence rate of 𝒪(log(1/δ)). Empirical validation on Neyman–Pearson and fair classification tasks demonstrates its practical effectiveness.

Technology Category

Application Category

📝 Abstract

This paper addresses the distributed stochastic minimax optimization problem subject to stochastic constraints. We propose a novel first-order Softmax-Weighted Switching Gradient method tailored for federated learning. Under full client participation, our algorithm achieves the standard $\mathcal{O}(\epsilon^{-4})$ oracle complexity to satisfy a unified bound $\epsilon$ for both the optimality gap and feasibility tolerance. We extend our theoretical analysis to the practical partial participation regime by quantifying client sampling noise through a stochastic superiority assumption. Furthermore, by relaxing standard boundedness assumptions on the objective functions, we establish a strictly tighter lower bound for the softmax hyperparameter. We provide a unified error decomposition and establish a sharp $\mathcal{O}(\log\frac{1}{\delta})$ high-probability convergence guarantee. Ultimately, our framework demonstrates that a single-loop primal-only switching mechanism provides a stable alternative for optimizing worst-case client performance, effectively bypassing the hyperparameter sensitivity and convergence oscillations often encountered in traditional primal-dual or penalty-based approaches. We verify the efficacy of our algorithm via experiment on the Neyman-Pearson (NP) classification and fair classification tasks.

Problem

Research questions and friction points this paper is trying to address.

distributed stochastic minimax optimization

stochastic constraints

federated learning

first-order method

optimality and feasibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Softmax-Weighted Switching Gradient

Distributed Stochastic Minimax Optimization

Federated Learning