🤖 AI Summary
This work proposes an autonomous coding agent framework that enables clinicians to develop AI models end-to-end using only natural language descriptions of their clinical intent, eliminating the need for AI expert involvement. Addressing the inefficiencies and misalignment inherent in traditional clinician–AI expert collaborations, the framework integrates natural language understanding, automated model design and training, and incorporates weakly supervised learning with bias mitigation strategies. Evaluated across five clinical tasks, the generated models demonstrate strong performance; notably, in a debiased pneumothorax classification task, the model reduces reliance on chest tubes—a known confounding factor—by nearly 50%, substantially enhancing both clinical interpretability and fairness.
📝 Abstract
Clinical AI development has traditionally followed a collaborative paradigm that depends on close interaction between clinicians and specialized AI teams. This paradigm imposes a practical challenge: clinicians must repeatedly communicate and refine their requirements with AI developers before those requirements can be translated into executable model development. This iterative process is time-consuming, and even after repeated discussion, misalignment may still exist because the two sides do not fully share each other's expertise. However, autonomous coding agents may change this paradigm, raising the possibility that clinicians could develop clinical AI models independently through natural-language interaction alone. In this study, we present such an autonomous prototype for clinician-driven clinical AI development. We evaluated the system on five clinical tasks spanning dermoscopic lesion classification, melanoma-versus-nevus triage, wrist-fracture detection (including a weakly supervised variant with only 5% bounding-box annotations), and debiased pneumothorax classification on chest radiographs. Across these settings, the system consistently developed models from clinician requests and achieved promising performance. Notably, in a debiased pneumothorax classification task on chest radiographs, where chest drains can act as a major confounder, the system successfully mitigated shortcut learning and nearly halved the model's reliance on chest drains. These findings provide proof of concept that autonomous coding agents may help shift clinical AI development toward a more clinician-driven paradigm, reducing the communication overhead and dependence on specialized AI developers. Although further validation and robustness assessment are needed, this study suggests a promising path toward making clinical AI development more accessible.