TacMamba: A Tactile History Compression Adapter Bridging Fast Reflexes and Slow VLA Reasoning

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the temporal mismatch between high-frequency tactile feedback and low-frequency visual planning by proposing a hierarchical fusion architecture that aligns rapid tactile reflexes with slower vision-language-action (VLA) reasoning. The core innovations include a plug-and-play high-frequency tactile interface, a Mamba-based state-space model serving as a tactile history compressor with O(1) inference latency, and a tactile-guided two-stage self-supervised training strategy that integrates temporal contrastive learning with phase-uniform sampling. Evaluated on button-press counting and latent-state switching tasks, the system achieves 100% success rates—significantly outperforming vision-only baselines—while meeting hard real-time constraints with a mere 0.45 ms latency.

Technology Category

Application Category

📝 Abstract
In visually ambiguous manipulation such as detecting button click tactile feedback is often the sole source of ground truth. However, fusing tactile data poses a significant challenge due to a spatiotemporal mismatch: tactile perception requires high-frequency processing with long-horizon memory (System 1), whereas visual policies operate at low control frequencies (System 2). Existing architectures struggle to bridge this gap: Transformers are computationally prohibitive for high-frequency loops (>100Hz), while LSTMs suffer from forgetting over extended interaction histories. In this paper, we introduce TacMamba, a hierarchical architecture that aligns high-bandwidth tactile reflexes with low-frequency visual planning. Our approach comprises three core contributions: (1) a custom high-frequency tactile interface designed for flexible integration; (2) a Mamba-based Tactile History Compressor that encodes continuous force history into a compact state with O(1) inference latency (0.45 ms), enabling plug-and-play fusion with VLA models without joint pre-training and (3) a Tactile-Guided Dual-Stage Training strategy that leverages temporal discrimination for self-supervised representation learning and phase-uniform sampling to mitigate data sparsity. Experiments on discrete counting and implicit state switching demonstrate that TacMamba achieves 100% success rates, significantly outperforming the visual-only pi_0.5 baseline, while strictly satisfying hard real-time constraints.
Problem

Research questions and friction points this paper is trying to address.

tactile perception
spatiotemporal mismatch
high-frequency processing
visual-tactile fusion
real-time control
Innovation

Methods, ideas, or system contributions that make the work stand out.

TacMamba
Tactile History Compression
Mamba-based Architecture
VLA Fusion
Dual-Stage Training
🔎 Similar Papers
No similar papers found.
Z
Zhenan Wang
Zhejiang University, Hangzhou, China
Y
Yanzhe Wang
Zhejiang University, Hangzhou, China
M
Meixuan Ren
Zhejiang University, Hangzhou, China
P
Peng Li
GigaAI, Beijing, China
Yang Liu
Yang Liu
Zhejiang University
MultimediaData mining
Y
Yifei Nie
Zhejiang University, Hangzhou, China
L
Limin Long
GigaAI, Beijing, China
Yun Ye
Yun Ye
Intel
Computer VisionDeep LearningSemiconductor Physics
X
Xiaofeng Wang
GigaAI, Beijing, China
Zhen Zhu
Zhen Zhu
University of Illinois at Urbana-Champaign
Computer VisionDeep Learning
H
Huixu Dong
Zhejiang University, Hangzhou, China