🤖 AI Summary
This work addresses the function space inconsistency in federated reinforcement learning caused by parameter averaging, particularly when clients employ heterogeneous or nonlinear encoders. To resolve this, the authors propose FedQHD, a method that combines hyperdimensional random feature state encoding with a linear readout architecture, yielding Q-functions that are nonlinear in states yet linear in parameters—enabling closed-form aggregation in function space. FedQHD achieves, for the first time, function-space-consistent federated Q-learning and incorporates a federated teacher distillation mechanism to handle heterogeneous encoders. The study also provides a formal analysis of the federated gap and its origins. Experiments on four continuous-state, discrete-action tasks demonstrate that FedQHD matches or exceeds the performance of FedAvg and distillation-based baselines with lower computational overhead, while empirically validating the theoretical relationship between the federated gap and encoder dimensionality.
📝 Abstract
Federated reinforcement learning enables decentralized agents to collaboratively improve policies or value estimates without exchanging raw trajectories. However, FedAvg-style parameter averaging is not function-space consistent: when clients use heterogeneous encoders or even identical nonlinear networks, averaged parameters need not correspond to the weighted average of client value functions in any common function space. We propose FedQHD, a federated Q-learning method using hyperdimensional (random-feature) state encoders with a linear readout, so that Q-functions are nonlinear in state yet linear in trainable parameters. This linear structure enables closed-form aggregation. With a shared encoder, the function-space consensus update coincides exactly with weighted averaging of local readout matrices. With heterogeneous encoders, the server constructs a global teacher by averaging client Q-values on a shared anchor-state set, and each client compiles this teacher into its local representation via a single ridge projection. We formalize the federation gap -- the error incurred when compiling a federated teacher into a heterogeneous client representation -- relative to a client-specific oracle projection. We show that this gap decomposes into subspace misalignment, anchor-set conditioning, and regularization bias. We further identify the anchor-to-dimension ratio $m \geq D_i$ as the well-conditioned regime in which the gap reduces to a multiple of the encoder heterogeneity floor. On four continuous-state, discrete-action control benchmarks, FedQHD matches or outperforms FedAvg-style baselines and distillation-based alternatives while requiring substantially less computation, and the empirical dependence of the federation gap on encoder dimension matches our theoretical analysis.