ELM: Ensemble of Language Models for Predicting Tumor Group from Pathology Reports

๐Ÿ“… 2025-03-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Population-based cancer registries (PBCRs) face a critical bottleneck in manually classifying tumor topographies from unstructured pathology reportsโ€”e.g., processing 100,000 reports requires ~900 person-hours. To address this, we propose the first SLMโ€“LLM collaborative framework tailored to real-world PBCR operations. Our method innovatively employs a region-aware (upper/lower report partition), six-model ensemble with a five-vote consensus decision rule, augmented by an LLM-based arbitration module to resolve inter-model disagreements. We further introduce domain-specific prompt engineering and achieve scalable co-deployment of lightweight, fine-tuned SLMs (six models) and an LLM. Evaluated across 19 tumor topographies, the system achieves mean precision and recall of 0.94. Deployed at the British Columbia Cancer Registry, it reduces annual manual effort by over 100 person-hours. This work marks the first end-to-end, production-ready integration of SLMs and LLMs for automated cancer topography coding in PBCRs.

Technology Category

Application Category

๐Ÿ“ Abstract
Population-based cancer registries (PBCRs) face a significant bottleneck in manually extracting data from unstructured pathology reports, a process crucial for tasks like tumor group assignment, which can consume 900 person-hours for approximately 100,000 reports. To address this, we introduce ELM (Ensemble of Language Models), a novel ensemble-based approach leveraging both small language models (SLMs) and large language models (LLMs). ELM utilizes six fine-tuned SLMs, where three SLMs use the top part of the pathology report and three SLMs use the bottom part. This is done to maximize report coverage. ELM requires five-out-of-six agreement for a tumor group classification. Disagreements are arbitrated by an LLM with a carefully curated prompt. Our evaluation across nineteen tumor groups demonstrates ELM achieves an average precision and recall of 0.94, outperforming single-model and ensemble-without-LLM approaches. Deployed at the British Columbia Cancer Registry, ELM demonstrates how LLMs can be successfully applied in a PBCR setting to achieve state-of-the-art results and significantly enhance operational efficiencies, saving hundreds of person-hours annually.
Problem

Research questions and friction points this paper is trying to address.

Automating tumor group classification from pathology reports
Reducing manual extraction workload in cancer registries
Improving accuracy and efficiency with ensemble language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble of fine-tuned small language models
LLM arbitration for classification disagreements
Top and bottom report parts for coverage
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Lovedeep Gondara
British Columbia Cancer Registry, Provincial Health Services Authority, Vancouver, Canada
Jonathan Simkin
Jonathan Simkin
Director, BC Cancer Registry
EpidemiologyMachine LearningNatural Language Processing
S
Shebnum Devji
British Columbia Cancer Registry, Provincial Health Services Authority, Vancouver, Canada
G
Gregory Arbour
Data Science Institute, University of British Columbia, Vancouver, Canada
Raymond Ng
Raymond Ng
University of British Columbia
data mininghealth informaticsgenomicsNLPtext mining