π€ AI Summary
This study addresses the fairness risks in clinical AI models for early-onset colorectal cancer detection, which often arise from algorithmic bias and a lack of effective auditing mechanisms. To mitigate these issues, the authors propose the first retrieval-augmented generation (RAG)-based dual-agent architecture that integrates a domain-expert agent with a fairness-advisor agent to automatically identify sensitive attributes and evaluate fairness metrics. Leveraging Ollama large language models (8Bβ120B parameters) and semantic similarity analysis, the system significantly outperforms baseline methods in identifying health disparities, achieving the highest semantic alignment. Ablation studies confirm the approachβs effectiveness and scalability in enhancing fairness auditing capabilities for clinical AI systems.
π Abstract
Artificial intelligence (AI) is increasingly used in clinical settings, yet limited oversight and domain expertise can allow algorithmic bias and safety risks to persist. This study evaluates whether an agentic AI system can support auditing biomedical machine learning models for fairness in early-onset colorectal cancer (EO-CRC), a condition with documented demographic disparities. We implemented a two-agent architecture consisting of a Domain Expert Agent that synthesizes literature on EO-CRC disparities and a Fairness Consultant Agent that recommends sensitive attributes and fairness metrics for model evaluation. An ablation study compared three Ollama large language models (8B, 20B, and 120B parameters) across three configurations: pretrained LLM-only, Agent without Retrieval-Augmented Generation (RAG), and Agent with RAG. Across models, the Agent with RAG achieved the highest semantic similarity to expert-derived reference statements, particularly for disparity identification, suggesting agentic systems with retrieval may help scale fairness auditing in clinical AI.