Non-Asymptotic Analysis of Efficiency in Conformalized Regression

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the non-asymptotic efficiency of conformal prediction sets—specifically, the deviation of their length from the ideal interval length. Focusing on the joint influence of training set size $n$, calibration set size $m$, and miscoverage level $alpha$, we establish, for the first time, non-asymptotic efficiency bounds for conformalized quantile and median regression trained via stochastic gradient descent. Under mild distributional assumptions and leveraging concentration inequalities, we derive a tight bound on the excess length with precision $mathcal{O}(1/sqrt{n} + 1/(alpha^2 n) + 1/sqrt{m} + exp(-alpha^2 m))$. The analysis uncovers a phase-transition phenomenon in efficiency convergence as $alpha$ varies. Our theoretical results provide verifiable, practical guidance for calibrating the split between training and calibration sets and for selecting $alpha$. Extensive experiments confirm both the validity and tightness of the derived bound.

Technology Category

Application Category

📝 Abstract

Conformal prediction provides prediction sets with coverage guarantees. The informativeness of conformal prediction depends on its efficiency, typically quantified by the expected size of the prediction set. Prior work on the efficiency of conformalized regression commonly treats the miscoverage level $α$ as a fixed constant. In this work, we establish non-asymptotic bounds on the deviation of the prediction set length from the oracle interval length for conformalized quantile and median regression trained via SGD, under mild assumptions on the data distribution. Our bounds of order $mathcal{O}(1/sqrt{n} + 1/(α^2 n) + 1/sqrt{m} + exp(-α^2 m))$ capture the joint dependence of efficiency on the proper training set size $n$, the calibration set size $m$, and the miscoverage level $α$. The results identify phase transitions in convergence rates across different regimes of $α$, offering guidance for allocating data to control excess prediction set length. Empirical results are consistent with our theoretical findings.

Problem

Research questions and friction points this paper is trying to address.

Establishes non-asymptotic bounds for conformalized regression efficiency

Analyzes joint dependence on training size, calibration size, and miscoverage level

Identifies phase transitions in convergence rates across different regimes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-asymptotic bounds for conformalized regression efficiency

Joint dependence analysis on dataset sizes and miscoverage

Phase transitions identification in convergence rates

🔎 Similar Papers

No similar papers found.