AgentScore: Autoformulation of Deployable Clinical Scoring Systems

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Although current machine learning models exhibit strong predictive performance, their deployment as clinical scoring systems is hindered by a lack of memorability, auditability, and bedside executability. This work proposes a semantics-guided optimization approach that automatically generates clinically deployable scoring systems within the constrained space of unit-weighted, binary-rule checklists. The method leverages large language models to propose candidate rules and employs a data-driven, deterministic validation mechanism to select those satisfying real-world deployment constraints. By uniquely integrating large language models with rigorous empirical validation, the approach outperforms existing scoring system generation methods across eight clinical prediction tasks, achieving AUCs comparable to more flexible interpretable models. Furthermore, in two external validations, it demonstrates superior discriminative performance relative to current clinical guideline-based scores.

Technology Category

Application Category

📝 Abstract

Modern clinical practice relies on evidence-based guidelines implemented as compact scoring systems composed of a small number of interpretable decision rules. While machine-learning models achieve strong performance, many fail to translate into routine clinical use due to misalignment with workflow constraints such as memorability, auditability, and bedside execution. We argue that this gap arises not from insufficient predictive power, but from optimizing over model classes that are incompatible with guideline deployment. Deployable guidelines often take the form of unit-weighted clinical checklists, formed by thresholding the sum of binary rules, but learning such scores requires searching an exponentially large discrete space of possible rule sets. We introduce AgentScore, which performs semantically guided optimization in this space by using LLMs to propose candidate rules and a deterministic, data-grounded verification-and-selection loop to enforce statistical validity and deployability constraints. Across eight clinical prediction tasks, AgentScore outperforms existing score-generation methods and achieves AUC comparable to more flexible interpretable models despite operating under stronger structural constraints. On two additional externally validated tasks, AgentScore achieves higher discrimination than established guideline-based scores.

Problem

Research questions and friction points this paper is trying to address.

clinical scoring systems

deployable guidelines

interpretable models

workflow constraints

unit-weighted checklists

Innovation

Methods, ideas, or system contributions that make the work stand out.

AgentScore

clinical scoring systems

interpretable AI