🤖 AI Summary
Existing conformal prediction methods exhibit insufficient coverage on challenging samples in eye-mounted camera pose estimation, resulting in conditional coverage significantly below the nominal level. This work proposes a two-stage adaptive conformal prediction framework: first, a geodesic SE(3) nonconformity score is introduced to identify physically more difficult frames; second, a DINOv2-Bridge difficulty estimator—requiring no test images—is designed to enable cross-user difficulty transfer and dynamic adjustment of prediction intervals. Experiments on the EPIC-Fields dataset demonstrate that, while maintaining an overall 90% coverage, the proposed method improves coverage on the most difficult 25% of samples from approximately 0.75 to 0.93, substantially narrowing the conditional coverage gap.
📝 Abstract
Egocentric pose estimation for Augmented Reality (AR) and assistive devices requires not just accurate predictions but guaranteed uncertainty regions. Conformal prediction (CP) provides such guarantees without retraining, but we show that standard CP with a single fixed threshold achieves nominal 90% overall coverage while covering only ~60% of the hardest 25% of frames (Q4) -- a ~30 percentage-point conditional coverage gap consistent across 12 participants, 3 predictors, and 3 horizons (108 evaluations) on EPIC-Fields. We further show that a geodesic SE(3) nonconformity score identifies physically harder frames than Euclidean scoring, with only 15-26% Q4 overlap and 2-3x higher ground-truth camera displacement for geodesic Q4 frames. To close the coverage gap, we propose DINOv2-Bridge adaptive CP: a two-stage difficulty estimator trained on a single source participant that transfers cross-participant without any images at test time, improving Q4 coverage from ~0.75 to ~0.93 while maintaining overall coverage at the 90% target.