Physics-based phenomenological characterization of cross-modal bias in multimodal models

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of systemic biases in multimodal large language models that arise from cross-modal interactions and often evade detection by conventional representation analysis, thereby compromising algorithmic fairness. To tackle this issue, the authors propose an interpretable framework grounded in physical phenomenology, which replaces traditional symbolic explanations by integrating embodied experience into fairness assessment through a physics-informed surrogate model that captures Transformer dynamics—including self- and cross-attention mechanisms and semantic network structures. Experiments across heterogeneous architectures such as Qwen2.5-Omni and Gemma 3n reveal that multimodal inputs can exacerbate, rather than mitigate, modality dominance. By combining perturbation analysis, dynamical systems theory, and chaotic time-series modeling, the work uncovers the emergence of structured error attractors under label perturbations, offering a novel paradigm for understanding and intervening in cross-modal bias.

Technology Category

Application Category

📝 Abstract
The term'algorithmic fairness'is used to evaluate whether AI models operate fairly in both comparative (where fairness is understood as formal equality, such as"treat like cases as like") and non-comparative (where unfairness arises from the model's inaccuracy, arbitrariness, or inscrutability) contexts. Recent advances in multimodal large language models (MLLMs) are breaking new ground in multimodal understanding, reasoning, and generation; however, we argue that inconspicuous distortions arising from complex multimodal interaction dynamics can lead to systematic bias. The purpose of this position paper is twofold: first, it is intended to acquaint AI researchers with phenomenological explainable approaches that rely on the physical entities that the machine experiences during training/inference, as opposed to the traditional cognitivist symbolic account or metaphysical approaches; second, it is to state that this phenomenological doctrine will be practically useful for tackling algorithmic fairness issues in MLLMs. We develop a surrogate physics-based model that describes transformer dynamics (i.e., semantic network structure and self-/cross-attention) to analyze the dynamics of cross-modal bias in MLLM, which are not fully captured by conventional embedding- or representation-level analyses. We support this position through multi-input diagnostic experiments: 1) perturbation-based analyses of emotion classification using Qwen2.5-Omni and Gemma 3n, and 2) dynamical analysis of Lorenz chaotic time-series prediction through the physical surrogate. Across two architecturally distinct MLLMs, we show that multimodal inputs can reinforce modality dominance rather than mitigate it, as revealed by structured error-attractor patterns under systematic label perturbation, complemented by dynamical analysis.
Problem

Research questions and friction points this paper is trying to address.

cross-modal bias
algorithmic fairness
multimodal large language models
systematic bias
transformer dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

phenomenological explainability
physics-based surrogate model
cross-modal bias
algorithmic fairness
transformer dynamics
🔎 Similar Papers
No similar papers found.
H
Hyeongmo Kim
Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea; Department of Physics and Astronomy, Seoul National University, Seoul, Republic of Korea
S
Sohyun Kang
Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea
Y
Yerin Choi
Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea
S
Seungyeon Ji
Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea; Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
J
Junhyuk Woo
Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea
Hyunsuk Chung
Hyunsuk Chung
University of Melbourne
Data MiningKnowledge-based SystemMultimodal UnderstandingKnowledge Capture
Soyeon Caren Han
Soyeon Caren Han
University of Melbourne, University of Sydney, Postech
Natural Language ProcessingMultimodal LearningVision and LanguageNatural Language Understanding
Kyungreem Han
Kyungreem Han
Korea Institute of Science and Technology
Molecular/Quantum MechanicsBiophysicsArtificial IntelligencePhilosophy of MindFree Will