FedSDR: Federated Self-Distillation with Rectification

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses performance degradation and hallucination in federated fine-tuning of large language models caused by data heterogeneity. The authors propose FedSD, a federated self-distillation framework, along with its enhanced variant FedSDR, introducing self-distillation as a foundational strategy in federated learning. By mapping client representations into a smoothed “model understanding space,” the approach mitigates distribution shift. A dual-stream LoRA architecture is designed: a local LoRA-S captures heterogeneous knowledge, while a global LoRA-R corrects factual consistency using original data; only LoRA-R is aggregated to ensure global coherence and faithfulness. Experiments demonstrate that the method significantly outperforms existing approaches across multiple benchmarks, effectively suppressing hallucinations and redundancy while improving accuracy and robustness. The study also uncovers the “rewriting paradox” phenomenon.

📝 Abstract

Federated fine-tuning of Large Language Models faces severe statistical heterogeneity. However, existing model-level defenses often overlook the root cause: intrinsic data distribution mismatches. In this work, we first establish Federated Self-Distillation (FedSD) as a fundamental and potent strategy. By projecting client representations into a smoothed ``model-understanding space,'' FedSD alone serves as a universal booster, demonstrating superior performance over conventional algorithms. Despite its success, we identify a subtle trade-off termed the Rewrite Paradox -- unconstrained self-distillation can inadvertently increase hallucinations and redundancy. To refine this paradigm, we further propose FedSDR (Federated Self-Distillation with Rectification), the ultimate reinforced framework. It augments FedSD with a dual-stream mechanism: a local LoRA-S (Smoothing) branch to implicitly absorb heterogeneity via distilled data, and a parallel global LoRA-R (Rectification) branch anchored to raw data to enforce factual correctness. By selectively aggregating only LoRA-R, FedSDR yields a globally aligned and faithful model. Extensive experiments verify its superior performance.

Problem

Research questions and friction points this paper is trying to address.

federated learning

statistical heterogeneity

data distribution mismatch

large language models

hallucination

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Self-Distillation

Rectification

LoRA

Statistical Heterogeneity

Rewrite Paradox

🔎 Similar Papers

FedDTG: Federated Data-Free Knowledge Distillation via Three-Player Generative Adversarial Networks

2022-01-10arXiv.orgCitations: 15