🤖 AI Summary
To address the challenge of effectively fusing heterogeneous modalities—such as images and text—in multimodal fake news detection, where large representational discrepancies hinder integration, this paper proposes a Hierarchical Multimodal Fusion Network coupled with a Targeted Pareto (TPareto) optimization algorithm. TPareto is the first to introduce hierarchical-aware Pareto gradient integration into this task, enabling positive guidance of global training via layer-specific loss design and cross-modal gradient coordination. Evaluated on FakeSV and FVC benchmarks, our method achieves absolute accuracy improvements of 2.40% and 1.89%, respectively, surpassing state-of-the-art baselines. The core contributions are: (1) a hierarchical fusion architecture that aligns modality representations across abstraction levels; (2) TPareto, a novel optimization framework resolving conflicting multimodal gradient objectives; and (3) a gradient coordination and target decoupling mechanism during fusion that enhances modality synergy while preserving task-specific semantics.
📝 Abstract
Multimodal fake news detection is essential for maintaining the authenticity of Internet multimedia information. Significant differences in form and content of multimodal information lead to intensified optimization conflicts, hindering effective model training as well as reducing the effectiveness of existing fusion methods for bimodal. To address this problem, we propose the MTPareto framework to optimize multimodal fusion, using a Targeted Pareto(TPareto) optimization algorithm for fusion-level-specific objective learning with a certain focus. Based on the designed hierarchical fusion network, the algorithm defines three fusion levels with corresponding losses and implements all-modal-oriented Pareto gradient integration for each. This approach accomplishes superior multimodal fusion by utilizing the information obtained from intermediate fusion to provide positive effects to the entire process. Experiment results on FakeSV and FVC datasets show that the proposed framework outperforms baselines and the TPareto optimization algorithm achieves 2.40% and 1.89% accuracy improvement respectively.