Explicit Abstention Knobs for Predictable Reliability in Video Question Answering

📅 2025-12-31
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical challenge of mitigating high-cost errors caused by model uncertainty and ensuring controllable error rates under distribution shifts when deploying vision-language models in high-stakes scenarios. The authors propose a confidence-threshold-based active abstention mechanism that enables selective prediction in video question answering, dynamically balancing coverage and error rate. Leveraging the Gemini 2.0 Flash model and the NExT-QA dataset, they demonstrate for the first time that this approach significantly reduces error rates under in-distribution conditions while maintaining predictable error control under distributional shift, thereby offering a reliable deployment pathway for safety-critical applications.

Technology Category

Application Category

📝 Abstract
High-stakes deployment of vision-language models (VLMs) requires selective prediction, where systems abstain when uncertain rather than risk costly errors. We investigate whether confidence-based abstention provides reliable control over error rates in video question answering, and whether that control remains robust under distribution shift. Using NExT-QA and Gemini 2.0 Flash, we establish two findings. First, confidence thresholding provides mechanistic control in-distribution. Sweeping threshold epsilon produces smooth risk-coverage tradeoffs, reducing error rates f
Problem

Research questions and friction points this paper is trying to address.

video question answering
selective prediction
abstention
distribution shift
reliable control
Innovation

Methods, ideas, or system contributions that make the work stand out.

explicit abstention
confidence thresholding
selective prediction
distribution shift
video question answering
🔎 Similar Papers
No similar papers found.