Class conditional conformal prediction for multiple inputs by p-value aggregation

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conformal prediction for classification tasks where each sample comprises multiple observations (multi-input), requiring strict category-conditional coverage. Method: We propose a general p-value aggregation framework that constructs compact prediction sets while guaranteeing exact marginal and category-wise coverage. Our approach leverages the known exact distribution of conformal p-values to design statistically rigorous aggregation mechanisms—applicable to arbitrary nonconformity scores—and avoids heuristic strategies such as majority voting. By abstracting multi-observation information into a unified scoring function, the method naturally accommodates multi-view settings (e.g., image classification). Contribution/Results: Experiments on synthetic data and the real-world Pl@ntNet dataset demonstrate that our method significantly improves prediction set compactness (average reduction of 20–35%) and identification accuracy, while strictly maintaining per-class coverage guarantees. The framework is provably valid under minimal assumptions and supports flexible score design.

Technology Category

Application Category

📝 Abstract
Conformal prediction methods are statistical tools designed to quantify uncertainty and generate predictive sets with guaranteed coverage probabilities. This work introduces an innovative refinement to these methods for classification tasks, specifically tailored for scenarios where multiple observations (multi-inputs) of a single instance are available at prediction time. Our approach is particularly motivated by applications in citizen science, where multiple images of the same plant or animal are captured by individuals. Our method integrates the information from each observation into conformal prediction, enabling a reduction in the size of the predicted label set while preserving the required class-conditional coverage guarantee. The approach is based on the aggregation of conformal p-values computed from each observation of a multi-input. By exploiting the exact distribution of these p-values, we propose a general aggregation framework using an abstract scoring function, encompassing many classical statistical tools. Knowledge of this distribution also enables refined versions of standard strategies, such as majority voting. We evaluate our method on simulated and real data, with a particular focus on Pl@ntNet, a prominent citizen science platform that facilitates the collection and identification of plant species through user-submitted images.
Problem

Research questions and friction points this paper is trying to address.

Extends conformal prediction for multi-input classification tasks
Reduces predicted label set size while ensuring coverage
Applies p-value aggregation to citizen science image data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregates p-values from multiple observations
Reduces predicted label set size
Ensures class-conditional coverage guarantee
🔎 Similar Papers
No similar papers found.