🤖 AI Summary
To address challenges in AI-driven scientific research—including cross-institutional collaboration barriers, platform heterogeneity, and model irreproducibility—this paper proposes the first domain-specific cross-domain federated cloud platform architecture for scientific research. The architecture unifies interactive development (Jupyter/VS Code), GPU-accelerated training (with annotation support and MLflow experiment tracking), federated learning (FATE), lightweight inference (Triton), and cloud-edge-device coordinated deployment. It embeds a GAIA-X–compliant identity and data governance framework to ensure end-to-end traceability and reproducibility across the research lifecycle. Orchestrated via Kubernetes Federation, the platform has been integrated with 12 European e-Infrastructures and supports over 200 scientific projects. Empirical evaluation demonstrates a 76% reduction in model reproduction time and a 41% decrease in communication overhead during cross-institutional federated training.
📝 Abstract
In this paper, we describe a federated compute platform dedicated to support Artificial Intelligence in scientific workloads. Putting the effort into reproducible deployments, it delivers consistent, transparent access to a federation of physically distributed e-Infrastructures. Through a comprehensive service catalogue, the platform is able to offer an integrated user experience covering the full Machine Learning lifecycle, including model development (with dedicated interactive development environments), training (with GPU resources, annotation tools, experiment tracking, and federated learning support) and deployment (covering a wide range of deployment options all along the Cloud Continuum). The platform also provides tools for traceability and reproducibility of AI models, integrates with different Artificial Intelligence model providers, datasets and storage resources, allowing users to interact with the broader Machine Learning ecosystem. Finally, it is easily customizable to lower the adoption barrier by external communities.