Knowing When to Answer: Adaptive Confidence Refinement for Reliable Audio-Visual Question Answering

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the reliability degradation of audio-visual question answering (AVQA) systems when confronted with unreliable inputs, as they often generate answers regardless of input quality. To mitigate this issue, the authors propose Adaptive Confidence Refinement (ACR), a method that preserves the maximum softmax probability (MSP) as the primary confidence signal while introducing an input-adaptive residual correction mechanism. Specifically, ACR employs a residual risk head and a confidence gating head that jointly refine MSP dynamically and determine whether to abstain from answering, without replacing the original confidence metric. Experimental results demonstrate that ACR consistently enhances both abstention capability and overall system reliability across three mainstream AVQA models under in-distribution, out-of-distribution, and data-bias scenarios.

Technology Category

Application Category

📝 Abstract
We present a formal problem formulation for \textit{Reliable} Audio-Visual Question Answering ($\mathcal{R}$-AVQA), where we prefer abstention over answering incorrectly. While recent AVQA models have high accuracy, their ability to identify when they are likely wrong and their consequent abstention from answering remain underexplored areas of research. To fill this gap, we explore several approaches and then propose Adaptive Confidence Refinement (ACR), a lightweight method to further enhance the performance of $\mathcal{R}$-AVQA. Our key insight is that the Maximum Softmax Probability (MSP) is Bayes-optimal only under strong calibration, a condition usually not met in deep neural networks, particularly in multimodal models. Instead of replacing MSP, our ACR maintains it as a primary confidence signal and applies input-adaptive residual corrections when MSP is deemed unreliable. ACR introduces two learned heads: i) a Residual Risk Head that predicts low-magnitude correctness residuals that MSP does not capture, and ii) a Confidence Gating Head to determine MSP trustworthiness. Our experiments and theoretical analysis show that ACR consistently outperforms existing methods on in- and out-of-disrtibution, and data bias settings across three different AVQA architectures, establishing a solid foundation for $\mathcal{R}$-AVQA task. The code and checkpoints will be available upon acceptance \href{https://github.com/PhuTran1005/R-AVQA}{at here}
Problem

Research questions and friction points this paper is trying to address.

Reliable AVQA
abstention
confidence calibration
audio-visual question answering
model reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Confidence Refinement
Reliable AVQA
Maximum Softmax Probability
Confidence Gating
Residual Risk Prediction
🔎 Similar Papers
No similar papers found.