🤖 AI Summary
Automated analysis of otolaryngology (ENT) endoscopic images has long been hindered by inter-device and inter-operator variability, subtle and localized pathologies, and fine-grained discrimination challenges—e.g., left/right laterality and vocal fold status. Moreover, existing benchmarks lack support for cross-modal similar-case retrieval (enabling joint visual + bilingual textual queries). To address these gaps, we introduce the first large-scale, bilingual (Chinese–English), clinically supervised ENT endoscopy dataset. We propose a unified framework integrating anatomical region–level fine-grained classification with cross-modal retrieval (image–image and text–image). We define three standardized benchmark tasks. Rigorously validated via expert annotation, server-side blind evaluation, and an international challenge, our work establishes a reproducible, clinically interpretable, and multimodal evaluation ecosystem—advancing intelligent ENT diagnosis toward clinical trustworthiness and interactive multimodal reasoning.
📝 Abstract
Automated analysis of endoscopic imagery is a critical yet underdeveloped component of ENT (ear, nose, and throat) care, hindered by variability in devices and operators, subtle and localized findings, and fine-grained distinctions such as laterality and vocal-fold state. In addition to classification, clinicians require reliable retrieval of similar cases, both visually and through concise textual descriptions. These capabilities are rarely supported by existing public benchmarks. To this end, we introduce ENTRep, the ACM Multimedia 2025 Grand Challenge on ENT endoscopy analysis, which integrates fine-grained anatomical classification with image-to-image and text-to-image retrieval under bilingual (Vietnamese and English) clinical supervision. Specifically, the dataset comprises expert-annotated images, labeled for anatomical region and normal or abnormal status, and accompanied by dual-language narrative descriptions. In addition, we define three benchmark tasks, standardize the submission protocol, and evaluate performance on public and private test splits using server-side scoring. Moreover, we report results from the top-performing teams and provide an insight discussion.