🤖 AI Summary
Diffusion models in robotic manipulation often suffer from performance degradation due to overly large denoising decoders that introduce task-irrelevant redundancy and noise. To address this, this work proposes a lightweight variational regularization module that imposes a timestep-conditional Gaussian distribution over backbone features and constructs an adaptive information bottleneck via KL divergence constraints. This mechanism dynamically suppresses noise while preserving task-relevant features during inference, without requiring any modifications to the training pipeline. Evaluated on the RoboTwin2.0, Adroit, and MetaWorld simulation benchmarks, the method achieves new state-of-the-art results, improving success rates by 6.1% and 4.1%, respectively, and demonstrates strong performance in real-world robot deployment.
📝 Abstract
Diffusion-based visuomotor policies built on 3D visual representations have achieved strong performance in learning complex robotic skills. However, most existing methods employ an oversized denoising decoder. While increasing model capacity can improve denoising, empirical evidence suggests that it also introduces redundancy and noise in intermediate feature blocks. Crucially, we find that randomly masking backbone features at inference time (without changing training) can improve performance, confirming the presence of task-irrelevant noise in intermediate features. To this end, we propose Variational Regularization (VR), a lightweight module that imposes a timestep-conditioned Gaussian over backbone features and applies a KL-divergence regularizer, forming an adaptive information bottleneck. Extensive experiments on three simulation benchmarks (RoboTwin2.0, Adroit, and MetaWorld) show that, compared to the baseline DP3, our approach improves the success rate by 6.1% on RoboTwin2.0 and by 4.1% on Adroit and MetaWorld, achieving new state-of-the-art results. Real-world experiments further demonstrate that our method performs well in practical deployments. Code will released.