PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head

๐Ÿ“… 2026-05-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

201K/year
๐Ÿค– AI Summary
Existing methods struggle to diagnose representation drift in post-training variants of large language modelsโ€”such as quantization or LoRA fine-tuningโ€”as they can only assess performance degradation without identifying root causes or guiding mitigation. This work proposes PRISM, a method that leverages the linear output head and near-isometric backbone structure of LLMs to derive a closed-form upper bound on cross-entropy risk discrepancy. PRISM uniquely decomposes representation drift into three geometrically interpretable and independently measurable axes: scale, shape, and head. Notably, the shape component is differentiable and can be employed as a regularizer to mitigate catastrophic forgetting. Experiments show that PRISM achieves Spearman correlation coefficients of 0.820 and 0.831 in risk ranking for quantized and LoRA-adapted models, respectively, across two model families and five benchmarks; moreover, shape-based regularization outperforms experience replay in alleviating downstream forgetting.
๐Ÿ“ Abstract
Comparing post-training LLM variants, such as quantized, LoRA-adapted, and distilled models, requires a diagnostic that identifies how a variant has drifted, not only whether it has degraded. Existing similarity scores such as CKA and SVCCA can flag degradation, but they do not directly link representation drift to risk or mechanism. We propose PRISM, Proxy Risk Inference via Structural Mapping, which exploits the linear output head of LLMs and the empirically near-isometric structure of their backbones to derive a closed-form upper bound on the cross-entropy risk gap between a target model and a post-training variant. The bound is calibrated for variant ranking and decomposes drift into three independently measurable axes: scale mismatch, shape mismatch, and head divergence. Each axis corresponds to a distinct failure mode, including shape distortion under low-bit quantization, scale separability under LoRA forgetting, and head divergence under GGUF k-quantization. As a result, the dominant axis suggests a remediation direction rather than merely raising a degradation flag. Because the shape term is differentiable, the same geometry can also serve as a training-time regularizer against catastrophic forgetting. Across two model families and five benchmarks, PRISM ranks variants with mean Spearman correlations of 0.820 for post-training quantization and 0.831 for LoRA forgetting, and its axis-guided shape regularizer outperforms experience replay in aggregate at mitigating downstream forgetting.
Problem

Research questions and friction points this paper is trying to address.

representation drift
post-training variants
risk bound
LLM diagnostics
model degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

PRISM
representation drift decomposition
geometric risk bound
catastrophic forgetting mitigation
LLM post-training analysis