🤖 AI Summary
This work addresses the tension in cross-institutional federated learning, where participants collaboratively train models yet compete in downstream markets—particularly under non-IID data distributions, where contributing data may inadvertently strengthen rivals and dampen participation incentives. To resolve this, the authors propose CoCoGen+, a novel framework that jointly models inter-organizational competition, non-IID data, and generative AI–based synthetic data generation, endogenizing data generation strategies within the learning process. By formulating a weighted potential game, each participant balances model performance gains, computational costs, and competitive utility losses at every training round. A tailored payoff redistribution mechanism further incentivizes sustained collaboration by accounting for both contribution-based compensation and competitive harm. Experiments demonstrate that CoCoGen+ consistently outperforms baseline methods across diverse tasks and data distributions, significantly enhancing participant engagement and overall social welfare.
📝 Abstract
In data-sensitive domains such as healthcare, cross-silo federated learning (CFL) allows organizations to collaboratively train AI models without sharing raw data. However, practical CFL deployments are inherently coopetitive, in which organizations cooperate during model training while competing in downstream markets. In such settings, training contributions, including data volume, quality, and diversity, can improve the global model yet inadvertently strengthen rivals. This dilemma is amplified by non-IID data, which leads to asymmetric learning gains and undermines sustained participation. While existing competition-aware CFL and incentive-design approaches reward organizations based on marginal training contributions, they fail to account for the costs of strengthening competitors. In this paper, we introduce CoCoGen+, a coopetition-compatible data generation and incentivization framework that jointly models non-IID data and inter-organizational competition while endogenizing GenAI-based synthetic data generation as a strategic decision. Specifically, CoCoGen+ formulates each training round as a weighted potential game, where organizations strategically decide how much synthetic data to generate by balancing learning performance gains against computational costs and competition-caused utility losses. We then provide a tractable equilibrium characterization and derive implementable generation strategies to maximize social welfare. To promote long-term collaboration, we integrate a payoff redistribution-based incentive mechanism to compensate organizations for their contributions and competition-caused utility degradation. Experiments on varying learning tasks validate the feasibility of CoCoGen+. The results show how non-IID data, competition intensity, and incentives shape organizational strategies and social welfare, while CoCoGen+ outperforms baselines in efficiency.