A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the capacity of large language models (LLMs) to evaluate clinical trial reports against the CONSORT guidelines, exposing practical limitations in automating medical compliance assessment. Using a behavior–metacognitive analytical framework and expert-annotated data, we systematically compare zero-shot, few-shot, and chain-of-thought prompting strategies across three dimensions: reasoning trace fidelity, uncertainty articulation, and generation of alternative interpretations. Results reveal substantial heterogeneity in LLM performance across CONSORT items, with pervasive logical leaps, selective evidence omission, and attribution biases—particularly under conditions requiring multi-step causal inference or interpretation of ambiguous phrasing. Notably, this work introduces, for the first time, a metacognitive lens to medical text evaluation, establishing both a methodological foundation for explainable, verifiable clinical AI and empirically grounded boundaries on current LLM capabilities in regulatory assessment contexts.

Technology Category

Application Category

📝 Abstract
Despite the rapid expansion of Large Language Models (LLMs) in healthcare, the ability of these systems to assess clinical trial reporting according to CONSORT standards remains unclear, particularly with respect to their cognitive and reasoning strategies. This study applies a behavioral and metacognitive analytic approach with expert-validated data, systematically comparing two representative LLMs under three prompt conditions. Clear differences emerged in how the models approached various CONSORT items, and prompt types, including shifts in reasoning style, explicit uncertainty, and alternative interpretations shaped response patterns. Our results highlight the current limitations of these systems in clinical compliance automation and underscore the importance of understanding their cognitive adaptations and strategic behavior in developing more explainable and reliable medical AI.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to assess clinical trial reporting
Comparing cognitive strategies under different prompt conditions
Identifying limitations in clinical compliance automation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Applying behavioral and metacognitive analytic approach
Systematically comparing two LLMs under prompts
Highlighting cognitive adaptations for reliable medical AI
🔎 Similar Papers
No similar papers found.
S
Sohyeon Jeon
Seoul National University, Interdisciplinary Program of Medical Informatics
Hyung-Chul Lee
Hyung-Chul Lee
Seoul National University College of Medicine
AnesthesiologyPatient MonitoringPerioperative MedicineMachine Learning