Explicit Abstention Knobs for Predictable Reliability in Video Question Answering

📅 2025-12-31

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the critical challenge of mitigating high-cost errors caused by model uncertainty and ensuring controllable error rates under distribution shifts when deploying vision-language models in high-stakes scenarios. The authors propose a confidence-threshold-based active abstention mechanism that enables selective prediction in video question answering, dynamically balancing coverage and error rate. Leveraging the Gemini 2.0 Flash model and the NExT-QA dataset, they demonstrate for the first time that this approach significantly reduces error rates under in-distribution conditions while maintaining predictable error control under distributional shift, thereby offering a reliable deployment pathway for safety-critical applications.

Technology Category

Application Category

📝 Abstract

High-stakes deployment of vision-language models (VLMs) requires selective prediction, where systems abstain when uncertain rather than risk costly errors. We investigate whether confidence-based abstention provides reliable control over error rates in video question answering, and whether that control remains robust under distribution shift. Using NExT-QA and Gemini 2.0 Flash, we establish two findings. First, confidence thresholding provides mechanistic control in-distribution. Sweeping threshold epsilon produces smooth risk-coverage tradeoffs, reducing error rates f

Problem

Research questions and friction points this paper is trying to address.

video question answering

selective prediction

abstention

distribution shift

reliable control

Innovation

Methods, ideas, or system contributions that make the work stand out.

explicit abstention

confidence thresholding

selective prediction