Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors

📅 2026-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the persistent challenge in attention-based regression models where optimization conflicts between mean squared error (MSE) and Pearson correlation coefficient (PCC) often lead to premature PCC saturation, particularly on highly homogeneous data. The issue is exacerbated by the softmax-induced attention mechanism and further constrained by the inherent upper bound on PCC improvement imposed by convex aggregators. To overcome these limitations, we propose the Extrapolative Correlation Attention (ECA) mechanism, which introduces a non-convex, extrapolatable aggregation strategy that transcends the convex hull restriction and mitigates gradient conflicts between MSE and PCC. Theoretical analysis and extensive experiments demonstrate that ECA consistently breaks through the PCC plateau across multiple benchmarks, achieving substantial gains in correlation metrics while preserving competitive MSE performance.

Technology Category

Application Category

📝 Abstract
Attention-based regression models are often trained by jointly optimizing Mean Squared Error (MSE) loss and Pearson correlation coefficient (PCC) loss, emphasizing the magnitude of errors and the order or shape of targets, respectively. A common but poorly understood phenomenon during training is the PCC plateau: PCC stops improving early in training, even as MSE continues to decrease. We provide the first rigorous theoretical analysis of this behavior, revealing fundamental limitations in both optimization dynamics and model capacity. First, in regard to the flattened PCC curve, we uncover a critical conflict where lowering MSE (magnitude matching) can paradoxically suppress the PCC gradient (shape matching). This issue is exacerbated by the softmax attention mechanism, particularly when the data to be aggregated is highly homogeneous. Second, we identify a limitation in the model capacity: we derived a PCC improvement limit for any convex aggregator (including the softmax attention), showing that the convex hull of the inputs strictly bounds the achievable PCC gain. We demonstrate that data homogeneity intensifies both limitations. Motivated by these insights, we propose the Extrapolative Correlation Attention (ECA), which incorporates novel, theoretically-motivated mechanisms to improve the PCC optimization and extrapolate beyond the convex hull. Across diverse benchmarks, including challenging homogeneous data setting, ECA consistently breaks the PCC plateau, achieving significant improvements in correlation without compromising MSE performance.
Problem

Research questions and friction points this paper is trying to address.

PCC plateau
attention-based regressors
optimization dynamics
model capacity
data homogeneity
Innovation

Methods, ideas, or system contributions that make the work stand out.

attention mechanism
Pearson correlation coefficient
convex aggregator
optimization dynamics
extrapolative attention
🔎 Similar Papers
No similar papers found.