🤖 AI Summary
This work addresses the limitation of existing clinical reasoning methods, which rely solely on static knowledge and struggle to dynamically internalize the fine-grained semantic and structural characteristics of individual cases. To overcome this, we propose a Dual-Stream Calibration (DSC) framework that introduces test-time training into clinical reasoning for the first time. DSC jointly optimizes the model’s internal representations through a semantic calibration stream—minimizing entropy to stabilize generation trajectories—and a structural calibration stream—modeling reasoning dependencies via meta-learning. Leveraging a test-time-specific support set, DSC enables deep contextual internalization, shifting the paradigm from passive matching to active refinement of the reasoning space. Extensive experiments across 13 clinical datasets and three task paradigms demonstrate that DSC significantly outperforms state-of-the-art training-dependent models and existing test-time learning approaches.
📝 Abstract
Contextual clinical reasoning demands robust inference grounded in complex, heterogeneous clinical records. While state-of-the-art fine-tuning, in-context learning (ICL), and retrieval-augmented generation (RAG) enable knowledge exposure, they often fall short of genuine contextual internalization: dynamically adjusting a model's internal representations to the subtle nuances of individual cases at inference time. To address this, we propose Dual-Stream Calibration (DSC), a test-time training framework that transcends superficial knowledge exposure to achieve deep internalization during inference. DSC facilitates input internalization by synergistically aligning two calibration streams. Unlike passive context exposure, the Semantic Calibration Stream enforces a deliberative reflection on core evidence, internalizing semantic anchors by minimizing entropy to stabilize generative trajectories. Simultaneously, the Structural Calibration Stream assimilates latent inferential dependencies through an iterative meta-learning objective. By training on specialized support sets at test-time, this stream enables the model to bridge the gap between external evidence and internal logic, synthesizing fragmented data into a coherent response. Our approach shifts the reasoning paradigm from passive attention-based matching to an active refinement of the latent inferential space. Validated against thirteen clinical datasets, DSC demonstrates superiority across three distinct task paradigms, consistently outstripping state-of-the-art baselines ranging from training-dependent models to test-time learning frameworks.