π€ AI Summary
Current large language modelβbased agents for health data modeling exhibit limited generalization, rely heavily on predefined templates, and lack robust uncertainty quantification, thereby undermining the reliability of clinical decision-making. This work proposes the first multi-agent system that integrates uncertainty quantification throughout the entire modeling pipeline. By orchestrating five specialized agents in a closed-loop collaborative framework, the system enables template-free, task-adaptive exploration, modeling, and optimization of health data, simultaneously enhancing both predictive performance and the quality of uncertainty estimates. Moreover, it generates interpretable reports to support risk-aware clinical decisions. Evaluated across 17 real-world health tasks, the proposed system outperforms state-of-the-art baselines by 29.2% in prediction accuracy and improves uncertainty estimation fidelity by 50.2%.
π Abstract
LLM-based agents have demonstrated strong potential for autonomous machine learning, yet their applicability to health data remains limited. Existing systems often struggle to generalize across heterogeneous health data modalities, rely heavily on predefined solution templates with insufficient adaptation to task-specific objectives, and largely overlook uncertainty estimation, which is essential for reliable decision-making in healthcare. To address these challenges, we propose \textit{AutoHealth}, a novel uncertainty-aware multi-agent system that autonomously models health data and assesses model reliability. \textit{AutoHealth} employs closed-loop coordination among five specialized agents to perform data exploration, task-conditioned model construction, training, and optimization, while jointly prioritizing predictive performance and uncertainty quantification. Beyond producing ready-to-use models, the system generates comprehensive reports to support trustworthy interpretation and risk-aware decision-making. To rigorously evaluate its effectiveness, we curate a challenging real-world benchmark comprising 17 tasks across diverse data modalities and learning settings. \textit{AutoHealth} completes all tasks and outperforms state-of-the-art baselines by 29.2\% in prediction performance and 50.2\% in uncertainty estimation.