A chart review process aided by natural language processing and multi-wave adaptive sampling to expedite validation of code-based algorithms for large database studies

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual review of unstructured electronic health record (EHR) text to construct reference standards for large-scale database studies is time-consuming and labor-intensive. Method: We propose an NLP-driven, multi-wave adaptive sampling validation framework that integrates NLP-assisted annotation, quantitative bias analysis, and a predefined termination rule based on error convergence—dynamically optimizing both sample selection and stopping timing while preserving measurement accuracy. Results: Empirical evaluation shows that NLP reduces per-record review time by 40%; multi-wave sampling with termination criteria skips 77% of records requiring no manual review, with negligible impact (<0.5 percentage points) on final algorithm performance estimation bias. The framework significantly improves validation efficiency, feasibility, and scalability, offering a reproducible, resource-efficient, and standardized validation pathway for coded-algorithm-based health outcome measurement.

Technology Category

Application Category

📝 Abstract
Background: One of the ways to enhance analyses conducted with large claims databases is by validating the measurement characteristics of code-based algorithms used to identify health outcomes or other key study parameters of interest. These metrics can be used in quantitative bias analyses to assess the robustness of results for an inferential study given potential bias from outcome misclassification. However, extensive time and resource allocation are typically re-quired to create reference-standard labels through manual chart review of free-text notes from linked electronic health records. Methods: We describe an expedited process that introduces efficiency in a validation study us-ing two distinct mechanisms: 1) use of natural language processing (NLP) to reduce time spent by human reviewers to review each chart, and 2) a multi-wave adaptive sampling approach with pre-defined criteria to stop the validation study once performance characteristics are identified with sufficient precision. We illustrate this process in a case study that validates the performance of a claims-based outcome algorithm for intentional self-harm in patients with obesity. Results: We empirically demonstrate that the NLP-assisted annotation process reduced the time spent on review per chart by 40% and use of the pre-defined stopping rule with multi-wave samples would have prevented review of 77% of patient charts with limited compromise to precision in derived measurement characteristics. Conclusion: This approach could facilitate more routine validation of code-based algorithms used to define key study parameters, ultimately enhancing understanding of the reliability of find-ings derived from database studies.
Problem

Research questions and friction points this paper is trying to address.

Expediting validation of code-based algorithms in large database studies
Reducing manual chart review time using NLP and adaptive sampling
Enhancing reliability of findings from claims-based outcome algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Natural language processing for faster chart review
Multi-wave adaptive sampling to stop validation early
Pre-defined stopping criteria ensuring sufficient precision
🔎 Similar Papers
No similar papers found.
S
Shirley V Wang
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
Georg Hahn
Georg Hahn
Harvard University
S
Sushama Kattinakere Sreedhara
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
M
Mufaddal Mahesri
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
H
Haritha S. Pillai
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
R
Rajendra Aldis
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
J
Joyce Lii
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
S
Sarah K. Dutcher
Center for Drug Evaluation and Research, Food and Drug Administration, Silver Springs, MD
R
Rhoda Eniafe
Center for Drug Evaluation and Research, Food and Drug Administration, Silver Springs, MD
J
Jamal T. Jones
Center for Drug Evaluation and Research, Food and Drug Administration, Silver Springs, MD
K
Keewan Kim
Center for Drug Evaluation and Research, Food and Drug Administration, Silver Springs, MD
J
Jiwei He
Center for Drug Evaluation and Research, Food and Drug Administration, Silver Springs, MD
H
Hana Lee
Center for Drug Evaluation and Research, Food and Drug Administration, Silver Springs, MD
S
Sengwee Toh
Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA
Rishi J Desai
Rishi J Desai
Brigham & Women's Hospital/Harvard Medical School
Pharmacoepidemiology
J
Jie Yang
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA