PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing methods struggle to diagnose representation drift in post-training variants of large language models—such as quantization or LoRA fine-tuning—as they can only assess performance degradation without identifying root causes or guiding mitigation. This work proposes PRISM, a method that leverages the linear output head and near-isometric backbone structure of LLMs to derive a closed-form upper bound on cross-entropy risk discrepancy. PRISM uniquely decomposes representation drift into three geometrically interpretable and independently measurable axes: scale, shape, and head. Notably, the shape component is differentiable and can be employed as a regularizer to mitigate catastrophic forgetting. Experiments show that PRISM achieves Spearman correlation coefficients of 0.820 and 0.831 in risk ranking for quantized and LoRA-adapted models, respectively, across two model families and five benchmarks; moreover, shape-based regularization outperforms experience replay in alleviating downstream forgetting.

📝 Abstract

Comparing post-training LLM variants, such as quantized, LoRA-adapted, and distilled models, requires a diagnostic that identifies how a variant has drifted, not only whether it has degraded. Existing similarity scores such as CKA and SVCCA can flag degradation, but they do not directly link representation drift to risk or mechanism. We propose PRISM, Proxy Risk Inference via Structural Mapping, which exploits the linear output head of LLMs and the empirically near-isometric structure of their backbones to derive a closed-form upper bound on the cross-entropy risk gap between a target model and a post-training variant. The bound is calibrated for variant ranking and decomposes drift into three independently measurable axes: scale mismatch, shape mismatch, and head divergence. Each axis corresponds to a distinct failure mode, including shape distortion under low-bit quantization, scale separability under LoRA forgetting, and head divergence under GGUF k-quantization. As a result, the dominant axis suggests a remediation direction rather than merely raising a degradation flag. Because the shape term is differentiable, the same geometry can also serve as a training-time regularizer against catastrophic forgetting. Across two model families and five benchmarks, PRISM ranks variants with mean Spearman correlations of 0.820 for post-training quantization and 0.831 for LoRA forgetting, and its axis-guided shape regularizer outperforms experience replay in aggregate at mitigating downstream forgetting.

Problem

Research questions and friction points this paper is trying to address.

representation drift

post-training variants

risk bound

LLM diagnostics

model degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

PRISM

representation drift decomposition

geometric risk bound