ELM: Ensemble of Language Models for Predicting Tumor Group from Pathology Reports

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Population-based cancer registries (PBCRs) face a critical bottleneck in manually classifying tumor topographies from unstructured pathology reports—e.g., processing 100,000 reports requires ~900 person-hours. To address this, we propose the first SLM–LLM collaborative framework tailored to real-world PBCR operations. Our method innovatively employs a region-aware (upper/lower report partition), six-model ensemble with a five-vote consensus decision rule, augmented by an LLM-based arbitration module to resolve inter-model disagreements. We further introduce domain-specific prompt engineering and achieve scalable co-deployment of lightweight, fine-tuned SLMs (six models) and an LLM. Evaluated across 19 tumor topographies, the system achieves mean precision and recall of 0.94. Deployed at the British Columbia Cancer Registry, it reduces annual manual effort by over 100 person-hours. This work marks the first end-to-end, production-ready integration of SLMs and LLMs for automated cancer topography coding in PBCRs.

Technology Category

Application Category

📝 Abstract

Population-based cancer registries (PBCRs) face a significant bottleneck in manually extracting data from unstructured pathology reports, a process crucial for tasks like tumor group assignment, which can consume 900 person-hours for approximately 100,000 reports. To address this, we introduce ELM (Ensemble of Language Models), a novel ensemble-based approach leveraging both small language models (SLMs) and large language models (LLMs). ELM utilizes six fine-tuned SLMs, where three SLMs use the top part of the pathology report and three SLMs use the bottom part. This is done to maximize report coverage. ELM requires five-out-of-six agreement for a tumor group classification. Disagreements are arbitrated by an LLM with a carefully curated prompt. Our evaluation across nineteen tumor groups demonstrates ELM achieves an average precision and recall of 0.94, outperforming single-model and ensemble-without-LLM approaches. Deployed at the British Columbia Cancer Registry, ELM demonstrates how LLMs can be successfully applied in a PBCR setting to achieve state-of-the-art results and significantly enhance operational efficiencies, saving hundreds of person-hours annually.

Problem

Research questions and friction points this paper is trying to address.

Automating tumor group classification from pathology reports

Reducing manual extraction workload in cancer registries

Improving accuracy and efficiency with ensemble language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble of fine-tuned small language models

LLM arbitration for classification disagreements

Top and bottom report parts for coverage

🔎 Similar Papers

No similar papers found.