🤖 AI Summary
Standardization of terminology, format, and style in clinical fundus image reports remains inconsistent, severely hindering large language models’ (LLMs) comprehension and interoperability of ophthalmic data. To address this, we propose a two-stage LLM-based standardization framework: first, constructing a bilingual domain-specific terminology lexicon for fundus imaging and training RetSTA-7B-Zero—a zero-shot adaptable model; second, refining it into RetSTA-7B via terminology-constrained decoding, clinical-scenario augmentation, and cross-lingual self-distillation to enable report-level standardization. To our knowledge, this is the first work achieving bilingual, report-level standardization for fundus reports. We introduce a novel “terminology-driven + self-distillation–enhanced” paradigm, substantially improving generalizability across 20+ ophthalmic conditions. Experiments demonstrate that RetSTA-7B consistently outperforms state-of-the-art LLMs on bilingual standardization tasks. The model is publicly released and demonstrates strong clinical deployability.
📝 Abstract
Standardization of clinical reports is crucial for improving the quality of healthcare and facilitating data integration. The lack of unified standards, including format, terminology, and style, is a great challenge in clinical fundus diagnostic reports, which increases the difficulty for large language models (LLMs) to understand the data. To address this, we construct a bilingual standard terminology, containing fundus clinical terms and commonly used descriptions in clinical diagnosis. Then, we establish two models, RetSTA-7B-Zero and RetSTA-7B. RetSTA-7B-Zero, fine-tuned on an augmented dataset simulating clinical scenarios, demonstrates powerful standardization behaviors. However, it encounters a challenge of limitation to cover a wider range of diseases. To further enhance standardization performance, we build RetSTA-7B, which integrates a substantial amount of standardized data generated by RetSTA-7B-Zero along with corresponding English data, covering diverse complex clinical scenarios and achieving report-level standardization for the first time. Experimental results demonstrate that RetSTA-7B outperforms other compared LLMs in bilingual standardization task, which validates its superior performance and generalizability. The checkpoints are available at https://github.com/AB-Story/RetSTA-7B.