🤖 AI Summary
Multimodal fake news detection has long been constrained by shallow feature fusion, struggling to capture high-level semantics and complex cross-modal interactions. This work systematically reviews the paradigm shift driven by Large Vision-Language Models (LVLMs), tracing the evolution from traditional approaches to end-to-end unified reasoning frameworks. It presents the first structured taxonomy encompassing model architectures, datasets, and evaluation benchmarks. By mapping the trajectory of technical advancements, the study offers an in-depth analysis of critical challenges—including interpretability, temporal reasoning, and domain generalization—and constructs a comprehensive technical landscape for multimodal fake news detection in the LVLM era, providing both theoretical insights and practical guidance for future research.