🤖 AI Summary
This work addresses the high communication overhead in federated learning when heterogeneous agents collaboratively optimize Linear Quadratic Regulator (LQR) policies. The authors propose ScalarFedLQR, a novel algorithm based on zeroth-order optimization and projected gradient estimation, wherein each agent uploads only a one-dimensional scalar projection of its local gradient. The server aggregates these scalars to reconstruct a global descent direction. This approach reduces the per-round uplink communication complexity from O(d) to O(1), with projection errors diminishing as the number of participating agents increases. While ensuring stability of all iterates, ScalarFedLQR achieves linear convergence in the expected LQR cost, substantially lowering communication costs while matching the performance of full-gradient federated LQR methods.
📝 Abstract
We propose ScalarFedLQR, a communication-efficient federated algorithm for model-free learning of a common policy in linear quadratic regulator (LQR) control of heterogeneous agents. The method builds on a decomposed projected gradient mechanism, in which each agent communicates only a scalar projection of a local zeroth-order gradient estimate. The server aggregates these scalar messages to reconstruct a global descent direction, reducing per-agent uplink communication from O(d) to O(1), independent of the policy dimension. Crucially, the projection-induced approximation error diminishes as the number of participating agents increases, yielding a favorable scaling law: larger fleets enable more accurate gradient recovery, admit larger stepsizes, and achieve faster linear convergence despite high dimensionality. Under standard regularity conditions, all iterates remain stabilizing and the average LQR cost decreases linearly fast. Numerical results demonstrate performance comparable to full-gradient federated LQR with substantially reduced communication.