TacMamba: A Tactile History Compression Adapter Bridging Fast Reflexes and Slow VLA Reasoning

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the temporal mismatch between high-frequency tactile feedback and low-frequency visual planning by proposing a hierarchical fusion architecture that aligns rapid tactile reflexes with slower vision-language-action (VLA) reasoning. The core innovations include a plug-and-play high-frequency tactile interface, a Mamba-based state-space model serving as a tactile history compressor with O(1) inference latency, and a tactile-guided two-stage self-supervised training strategy that integrates temporal contrastive learning with phase-uniform sampling. Evaluated on button-press counting and latent-state switching tasks, the system achieves 100% success rates—significantly outperforming vision-only baselines—while meeting hard real-time constraints with a mere 0.45 ms latency.

Technology Category

Application Category

📝 Abstract

In visually ambiguous manipulation such as detecting button click tactile feedback is often the sole source of ground truth. However, fusing tactile data poses a significant challenge due to a spatiotemporal mismatch: tactile perception requires high-frequency processing with long-horizon memory (System 1), whereas visual policies operate at low control frequencies (System 2). Existing architectures struggle to bridge this gap: Transformers are computationally prohibitive for high-frequency loops (>100Hz), while LSTMs suffer from forgetting over extended interaction histories. In this paper, we introduce TacMamba, a hierarchical architecture that aligns high-bandwidth tactile reflexes with low-frequency visual planning. Our approach comprises three core contributions: (1) a custom high-frequency tactile interface designed for flexible integration; (2) a Mamba-based Tactile History Compressor that encodes continuous force history into a compact state with O(1) inference latency (0.45 ms), enabling plug-and-play fusion with VLA models without joint pre-training and (3) a Tactile-Guided Dual-Stage Training strategy that leverages temporal discrimination for self-supervised representation learning and phase-uniform sampling to mitigate data sparsity. Experiments on discrete counting and implicit state switching demonstrate that TacMamba achieves 100% success rates, significantly outperforming the visual-only pi_0.5 baseline, while strictly satisfying hard real-time constraints.

Problem

Research questions and friction points this paper is trying to address.

tactile perception

spatiotemporal mismatch

high-frequency processing

visual-tactile fusion

real-time control

Innovation

Methods, ideas, or system contributions that make the work stand out.

TacMamba

Tactile History Compression

Mamba-based Architecture

VLA Fusion

Dual-Stage Training

🔎 Similar Papers

MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models

2024-03-14arXiv.orgCitations: 11

Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects

2024-09-19arXiv.orgCitations: 0

💼 Related Jobs

AI Research Scientist, Robotics