Learning-to-Context Slope: Evaluating In-Context Learning Effectiveness Beyond Performance Illusions

📅 2025-06-29

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing in-context learning (ICL) evaluation methods rely on performance changes, suffering from ambiguous attribution, strong data dependence, and unreliability under few-shot conditions. To address these limitations, this paper introduces the Learning-to-Context Slope (LCS) metric—the first to disentangle ICL effectiveness into two orthogonal components: context alignment capability and output calibration capability. LCS models learning gain as a slope, grounded in demonstrative–input relevance and loss trajectory dynamics, enabling label-efficient or even unsupervised evaluation. Crucially, LCS requires no ground-truth labels, quantifies ICL’s fundamental efficacy, identifies critical capability bottlenecks, and yields actionable efficacy thresholds. Extensive experiments within synthetic-data and continuous-loss analytical frameworks demonstrate that LCS exhibits high correlation with true performance gains and maintains robustness under data scarcity and distributional shift—significantly outperforming conventional metrics.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) has emerged as an effective approach to enhance the performance of large language models (LLMs). However, its effectiveness varies significantly across models and tasks, posing challenges for practitioners to determine when ICL reliably improves performance. Current evaluation approaches, reliant on performance change after applying ICL, suffer from low reliability, poor attribution, and impracticality in data-insufficient scenarios. We propose the Learning-to-Context Slope (LCS), a novel metric that quantifies ICL effectiveness by modeling the slope between learning gain (loss decrease from demonstrations) and contextual relevance (demonstration-input relevance). LCS addresses key limitations of performance-based metrics: (1) it captures continuous loss changes even when outputs are incorrect, improving reliability; (2) its formulation attributes ICL failures to weak contextual alignment (inability to adapt inputs to demonstrations) or strong output calibration (self-verification of correctness); and (3) it minimizes reliance on labeled data via synthetic evaluation. Extensive experiments demonstrate that LCS strongly correlates with performance improvements in labeled settings and reliably reflects true effectiveness in biased or data-scarce scenarios. Further analysis reveals actionable thresholds for LCS and identifies model capabilities critical to ICL success.

Problem

Research questions and friction points this paper is trying to address.

Evaluating in-context learning effectiveness reliably across models and tasks

Addressing limitations of performance-based metrics in unreliable and data-scarce scenarios

Proposing a novel metric to quantify learning gain and contextual relevance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Learning-to-Context Slope (LCS) metric

Quantifies ICL effectiveness via loss-relevance slope

Minimizes labeled data reliance via synthetic evaluation

🔎 Similar Papers

Memorization In In-Context Learning