Who Benefits From Sinus Surgery? Comparing Generative AI and Supervised Machine Learning for Predicting Surgical Outcomes in Chronic Rhinosinusitis

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of predicting clinically meaningful postoperative improvement in chronic rhinosinusitis patients using preoperative structured clinical data, with the goal of avoiding ineffective surgeries. We present the first systematic comparison between supervised learning models—logistic regression, tree ensembles, and multilayer perceptrons—and leading generative AI systems (ChatGPT, Claude, Gemini, Perplexity) on a real-world clinical decision task, using identical structured inputs and constraining outputs to binary recommendations with confidence scores. The best-performing multilayer perceptron achieved 85% accuracy, demonstrating superior calibration and net benefit on decision curve analysis compared to all generative AI models. Although generative AI exhibited suboptimal predictive performance, its reasoning aligned closely with clinical expertise and feature importance rankings. We propose an interpretable clinical workflow that prioritizes machine learning for prediction while leveraging generative AI for explanatory support, offering a novel paradigm for precision preoperative assessment.

Technology Category

Application Category

📝 Abstract
Artificial intelligence has reshaped medical imaging, yet the use of AI on clinical data for prospective decision support remains limited. We study pre-operative prediction of clinically meaningful improvement in chronic rhinosinusitis (CRS), defining success as a more than 8.9-point reduction in SNOT-22 at 6 months (MCID). In a prospectively collected cohort where all patients underwent surgery, we ask whether models using only pre-operative clinical data could have identified those who would have poor outcomes, i.e. those who should have avoided surgery. We benchmark supervised ML (logistic regression, tree ensembles, and an in-house MLP) against generative AI (ChatGPT, Claude, Gemini, Perplexity), giving each the same structured inputs and constraining outputs to binary recommendations with confidence. Our best ML model (MLP) achieves 85 % accuracy with superior calibration and decision-curve net benefit. GenAI models underperform on discrimination and calibration across zero-shot setting. Notably, GenAI justifications align with clinician heuristics and the MLP's feature importance, repeatedly highlighting baseline SNOT-22, CT/endoscopy severity, polyp phenotype, and physchology/pain comorbidities. We provide a reproducible tabular-to-GenAI evaluation protocol and subgroup analyses. Findings support an ML-first, GenAI- augmented workflow: deploy calibrated ML for primary triage of surgical candidacy, with GenAI as an explainer to enhance transparency and shared decision-making.
Problem

Research questions and friction points this paper is trying to address.

chronic rhinosinusitis
surgical outcome prediction
pre-operative decision support
artificial intelligence
clinical prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative AI
supervised machine learning
surgical outcome prediction
clinical decision support
model calibration
🔎 Similar Papers
No similar papers found.
Sayeed Shafayet Chowdhury
Sayeed Shafayet Chowdhury
Graduate Research Assistant, Purdue University
Spiking Neural NetsComputer VisionNeuromorphic Computing
Snehasis Mukhopadhyay
Snehasis Mukhopadhyay
Professor of Computer and Information Science, Purdue University Indianapolis
Artificial IntelligenceData MiningMachine Learning
S
Shiaofen Fang
Department of Computer Science, Indiana University Indianapolis, IN, USA
V
Vijay R. Ramakrishnan
Department of Otolaryngology—Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, IN, USA