TouchFormer: A Robust Transformer-based Framework for Multimodal Material Perception

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the significant degradation in material perception performance under visually constrained conditions, this paper proposes a Transformer-based robust multimodal fusion framework. Unlike naive modality concatenation, our approach introduces a modality-adaptive gating mechanism and cross-instance embedding regularization to dynamically weight modal contributions, explicitly suppress modality-specific noise, and gracefully handle partial modality absence. Additionally, we design both intra- and inter-modality attention modules to enable fine-grained cross-modal feature alignment and complementarity. Evaluated on the SSMC and USMC benchmarks, the framework achieves absolute accuracy improvements of 2.48% and 6.83%, respectively. Furthermore, comprehensive experiments on a real-world robotic platform demonstrate its robustness and generalization capability in complex, visually degraded environments.

Technology Category

Application Category

📝 Abstract
Traditional vision-based material perception methods often experience substantial performance degradation under visually impaired conditions, thereby motivating the shift toward non-visual multimodal material perception. Despite this, existing approaches frequently perform naive fusion of multimodal inputs, overlooking key challenges such as modality-specific noise, missing modalities common in real-world scenarios, and the dynamically varying importance of each modality depending on the task. These limitations lead to suboptimal performance across several benchmark tasks. In this paper, we propose a robust multimodal fusion framework, TouchFormer. Specifically, we employ a Modality-Adaptive Gating (MAG) mechanism and intra- and inter-modality attention mechanisms to adaptively integrate cross-modal features, enhancing model robustness. Additionally, we introduce a Cross-Instance Embedding Regularization(CER) strategy, which significantly improves classification accuracy in fine-grained subcategory material recognition tasks. Experimental results demonstrate that, compared to existing non-visual methods, the proposed TouchFormer framework achieves classification accuracy improvements of 2.48% and 6.83% on SSMC and USMC tasks, respectively. Furthermore, real-world robotic experiments validate TouchFormer's effectiveness in enabling robots to better perceive and interpret their environment, paving the way for its deployment in safety-critical applications such as emergency response and industrial automation. The code and datasets will be open-source, and the videos are available in the supplementary materials.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in vision-based material perception under impaired conditions
Solves naive multimodal fusion ignoring noise, missing data, and dynamic importance
Improves robustness for material recognition in real-world robotic applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-Adaptive Gating mechanism for cross-modal integration
Intra- and inter-modality attention for robust fusion
Cross-Instance Embedding Regularization improves fine-grained recognition
🔎 Similar Papers
No similar papers found.
K
Kailin Lyu
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Long Xiao
Long Xiao
University of Cambridge, Engineering Department, Cavendish Laboratory
GraphenePhotonicsTerahertzCommunication System
J
Jianing Zeng
Institute of Automation, Chinese Academy of Sciences
J
Junhao Dong
Nanyang Technological University
X
Xuexin Liu
Institute of Automation, Chinese Academy of Sciences
Z
Zhuojun Zou
Institute of Automation, Chinese Academy of Sciences
H
Haoyue Yang
Institute of Automation, Chinese Academy of Sciences
L
Lin Shu
Institute of Automation, Chinese Academy of Sciences
J
Jie Hao
Institute of Automation, Chinese Academy of Sciences