🤖 AI Summary
In decentralized deep learning, parameter averaging suffers from poor generalization and slow convergence under high-dimensional networks and sparse topologies. To address this, we propose a paradigm shift from parameter consistency to output consistency, introducing a Deep Relative Trust (DRT)-driven distributed diffusion algorithm. Our method is the first to explicitly model output-space similarity as a collaborative optimization objective, integrating DRT-based trust quantification, distributed diffusion updates, and a non-convex convergence analysis framework. Experiments on image classification demonstrate that our approach significantly improves generalization in sparse networks, achieves faster convergence, and attains higher final accuracy—especially under low-connectivity topologies. The core contribution lies in establishing both the theoretical feasibility and empirical effectiveness of output consistency, thereby introducing a novel paradigm for decentralized learning.
📝 Abstract
Decentralized learning strategies allow a collection of agents to learn efficiently from local data sets without the need for central aggregation or orchestration. Current decentralized learning paradigms typically rely on an averaging mechanism to encourage agreement in the parameter space. We argue that in the context of deep neural networks, which are often over-parameterized, encouraging consensus of the neural network outputs, as opposed to their parameters can be more appropriate. This motivates the development of a new decentralized learning algorithm, termed DRT diffusion, based on deep relative trust (DRT), a recently introduced similarity measure for neural networks. We provide convergence analysis for the proposed strategy, and numerically establish its benefit to generalization, especially with sparse topologies, in an image classification task.