🤖 AI Summary
This study addresses the low accuracy and frequent hallucinations in automated radiographic description generation for mandibular cysts in dental panoramic radiographs. We propose the Structured, Loop-based Self-Correction and Output (SLSO) framework, leveraging GPT-4o’s multimodal capabilities through a two-stage iterative self-correction process. SLSO integrates multimodal image understanding, precise tooth-position extraction, multi-round consistency verification, and iterative re-generation to substantially suppress hallucinations and enhance descriptions of negative findings and critical anatomical localization. Supported by a 10-step standardized pipeline, SLSO improves accuracy by 66.9%, 33.3%, and 28.6% for tooth-position identification, tooth displacement, and root resorption—three clinically critical metrics. Moreover, 92% of cases achieve structurally consistent outputs within five iterations. To our knowledge, SLSO establishes the first verifiable and reproducible multimodal framework for generating structured, clinically actionable radiology reports in dentomaxillofacial imaging.
📝 Abstract
In this study, we utilized the multimodal capabilities of OpenAI GPT-4o to automatically generate jaw cyst findings on dental panoramic radiographs. To improve accuracy, we constructed a Self-correction Loop with Structured Output (SLSO) framework and verified its effectiveness. A 10-step process was implemented for 22 cases of jaw cysts, including image input and analysis, structured data generation, tooth number extraction and consistency checking, iterative regeneration when inconsistencies were detected, and finding generation with subsequent restructuring and consistency verification. A comparative experiment was conducted using the conventional Chain-of-Thought (CoT) method across seven evaluation items: transparency, internal structure, borders, root resorption, tooth movement, relationships with other structures, and tooth number. The results showed that the proposed SLSO framework improved output accuracy for many items, with 66.9%, 33.3%, and 28.6% improvement rates for tooth number, tooth movement, and root resorption, respectively. In the successful cases, a consistently structured output was achieved after up to five regenerations. Although statistical significance was not reached because of the small size of the dataset, the overall SLSO framework enforced negative finding descriptions, suppressed hallucinations, and improved tooth number identification accuracy. However, the accurate identification of extensive lesions spanning multiple teeth is limited. Nevertheless, further refinement is required to enhance overall performance and move toward a practical finding generation system.