🤖 AI Summary
Medical interns face challenges in concurrently developing spatial localization, visual attention, knowledge integration, and diagnostic reasoning skills during chest X-ray (CXR) interpretation training.
Method: We propose the first multi-agent instructional framework—built on AutoGen—that integrates gaze tracking, anatomical segmentation (TensorFlow U-Net), and Bayesian knowledge tracing. The system incorporates NV-Reason-CXR-3B multimodal reasoning, PubMed real-time literature retrieval, REFLACX case matching, and safety-aware prompting to deliver context-aware, dynamic tutoring and personalized feedback.
Contribution/Results: Our innovation lies in jointly leveraging eye-tracking data, pixel-level lobar segmentation, and cognitive state modeling to drive adaptive pedagogical strategies, enabling precise skill assessment and sub-second response latency. Experiments demonstrate statistically significant improvements over baselines in lesion localization accuracy (+12.4%) and diagnostic reasoning quality (+18.7%), with validated clinical deployability and rigorous information leakage control.
📝 Abstract
IMACT-CXR is an interactive multi-agent conversational tutor that helps trainees interpret chest X-rays by unifying spatial annotation, gaze analysis, knowledge retrieval, and image-grounded reasoning in a single AutoGen-based workflow. The tutor simultaneously ingests learner bounding boxes, gaze samples, and free-text observations. Specialized agents evaluate localization quality, generate Socratic coaching, retrieve PubMed evidence, suggest similar cases from REFLACX, and trigger NV-Reason-CXR-3B for vision-language reasoning when mastery remains low or the learner explicitly asks. Bayesian Knowledge Tracing (BKT) maintains skill-specific mastery estimates that drive both knowledge reinforcement and case similarity retrieval. A lung-lobe segmentation module derived from a TensorFlow U-Net enables anatomically aware gaze feedback, and safety prompts prevent premature disclosure of ground-truth labels. We describe the system architecture, implementation highlights, and integration with the REFLACX dataset for real DICOM cases. IMACT-CXR demonstrates responsive tutoring flows with bounded latency, precise control over answer leakage, and extensibility toward live residency deployment. Preliminary evaluation shows improved localization and diagnostic reasoning compared to baselines.