π€ AI Summary
To address the challenges of poor synergy between blockchain and distributed machine learning, computational waste due to chain forks, limited edge-node computational capacity, and non-IID data distributions, this paper proposes BagChainβa dual-functional blockchain. Methodologically, it deeply integrates consensus mechanisms with Bagging ensemble learning: (i) a three-layer on-chain architecture replaces energy-intensive mining with model training; (ii) a cross-fork model parameter sharing mechanism enables efficient collaborative training of heterogeneous weak base models and robust ensemble formation under non-IID conditions; and (iii) a non-IID adaptive training strategy is introduced. Experiments demonstrate that BagChain consistently outperforms baseline methods under both IID and non-IID settings, while maintaining high robustness and accuracy under low-compute, sparse-connectivity, and high-latency conditions. This work establishes the first decentralized, energy-efficient, and scalable on-chain ensemble learning paradigm.
π Abstract
This work proposes a dual-functional blockchain framework named BagChain for bagging-based decentralized learning. BagChain integrates blockchain with distributed machine learning by replacing the computationally costly hash operations in proof-of-work with machine-learning model training. BagChain utilizes individual miners' private data samples and limited computing resources to train potentially weak base models, which may be very weak, and further aggregates them into strong ensemble models. Specifically, we design a three-layer blockchain structure associated with the corresponding generation and validation mechanisms to enable distributed machine learning among uncoordinated miners in a permissionless and open setting. To reduce computational waste due to blockchain forking, we further propose the cross fork sharing mechanism for practical networks with lengthy delays. Extensive experiments illustrate the superiority and efficacy of BagChain when handling various machine learning tasks on both independently and identically distributed (IID) and non-IID datasets. BagChain remains robust and effective even when facing constrained local computing capability, heterogeneous private user data, and sparse network connectivity.