🤖 AI Summary
This work addresses the challenge that modality reliability in multimodal fusion often varies dynamically with context—such as sensor degradation or class-specific noise—rendering traditional methods based on static assumptions ineffective. To overcome this limitation, the authors propose the C²MF framework, which introduces a Context-Specific Information Credibility (CSIC) metric to dynamically assess the reliability of each modality at the instance level under the current input, leveraging conditional probability circuits. Fusion is then adaptively performed via KL divergence, enabling both precision and interpretability. Experimental results demonstrate that C²MF achieves up to a 29% accuracy improvement over static baselines in high-noise and modality-conflict scenarios, while preserving the inherent interpretability of probabilistic circuits.
📝 Abstract
Multimodal fusion requires integrating information from multiple sources that may conflict depending on context. Existing fusion approaches typically rely on static assumptions about source reliability, limiting their ability to resolve conflicts when a modality becomes unreliable due to situational factors such as sensor degradation or class-specific corruption. We introduce C$^2$MF, a context-specfic credibility-aware multimodal fusion framework that models per-instance source reliability using a Conditional Probabilistic Circuit (CPC). We formalize instance-level reliability through Context-Specific Information Credibility (CSIC), a KL-divergence-based measure computed exactly from the CPC. CSIC generalizes conventional static credibility estimates as a special case, enabling principled and adaptive reliability assessment. To evaluate robustness under cross-modal conflicts, we propose the Conflict benchmark, in which class-specific corruptions deliberately induce discrepancies between different modalities. Experimental results show that C$^2$MF improves predictive accuracy by up to 29% over static-reliability baselines in high-noise settings, while preserving the interpretability advantages of probabilistic circuit-based fusion.