🤖 AI Summary
Diagnosing rare genetic syndromes faces dual challenges—patient facial image privacy preservation and cross-institutional data silos. Method: We propose Federated GestaltMatcher, a horizontal federated learning framework that enables distributed facial feature extraction and privacy-preserving kernel matrix aggregation without uploading or sharing raw images. It features a plug-and-play federated service architecture for low-overhead institutional onboarding and employs a global ensemble model update strategy to enhance generalizability. Results: Experiments under heterogeneous data distributions and multi-institutional settings demonstrate diagnostic accuracy exceeding 90% of centralized training, while ensuring high accuracy, strong robustness, and strict privacy guarantees. The framework establishes a scalable, regulatory-compliant, and trustworthy federated paradigm for AI-powered rare disease imaging diagnosis.
📝 Abstract
Machine learning has shown promise in facial dysmorphology, where characteristic facial features provide diagnostic clues for rare genetic disorders. GestaltMatcher, a leading framework in this field, has demonstrated clinical utility across multiple studies, but its reliance on centralized datasets limits further development, as patient data are siloed across institutions and subject to strict privacy regulations. We introduce a federated GestaltMatcher service based on a cross-silo horizontal federated learning framework, which allows hospitals to collaboratively train a global ensemble feature extractor without sharing patient images. Patient data are mapped into a shared latent space, and a privacy-preserving kernel matrix computation framework enables syndrome inference and discovery while safeguarding confidentiality. New participants can directly benefit from and contribute to the system by adopting the global feature extractor and kernel configuration from previous training rounds. Experiments show that the federated service retains over 90% of centralized performance and remains robust to both varying silo numbers and heterogeneous data distributions.