A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study investigates the capacity of large language models (LLMs) to evaluate clinical trial reports against the CONSORT guidelines, exposing practical limitations in automating medical compliance assessment. Using a behavior–metacognitive analytical framework and expert-annotated data, we systematically compare zero-shot, few-shot, and chain-of-thought prompting strategies across three dimensions: reasoning trace fidelity, uncertainty articulation, and generation of alternative interpretations. Results reveal substantial heterogeneity in LLM performance across CONSORT items, with pervasive logical leaps, selective evidence omission, and attribution biases—particularly under conditions requiring multi-step causal inference or interpretation of ambiguous phrasing. Notably, this work introduces, for the first time, a metacognitive lens to medical text evaluation, establishing both a methodological foundation for explainable, verifiable clinical AI and empirically grounded boundaries on current LLM capabilities in regulatory assessment contexts.

Technology Category

Application Category

📝 Abstract

Despite the rapid expansion of Large Language Models (LLMs) in healthcare, the ability of these systems to assess clinical trial reporting according to CONSORT standards remains unclear, particularly with respect to their cognitive and reasoning strategies. This study applies a behavioral and metacognitive analytic approach with expert-validated data, systematically comparing two representative LLMs under three prompt conditions. Clear differences emerged in how the models approached various CONSORT items, and prompt types, including shifts in reasoning style, explicit uncertainty, and alternative interpretations shaped response patterns. Our results highlight the current limitations of these systems in clinical compliance automation and underscore the importance of understanding their cognitive adaptations and strategic behavior in developing more explainable and reliable medical AI.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to assess clinical trial reporting

Comparing cognitive strategies under different prompt conditions

Identifying limitations in clinical compliance automation systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Applying behavioral and metacognitive analytic approach

Systematically comparing two LLMs under prompts

Highlighting cognitive adaptations for reliable medical AI

🔎 Similar Papers

No similar papers found.