Wavelet Policy: Imitation Policy Learning in Frequency Domain with Wavelet Transforms

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing imitation learning methods predominantly focus on spatial-domain modeling, neglecting the temporal dynamics of action sequences and consequently underutilizing frequency-domain information. This work pioneers frequency-domain modeling for robotic manipulation trajectory prediction. We propose the Wavelet Policy framework: (1) leveraging wavelet transforms to extract multi-scale frequency-domain features; (2) introducing a SE²MD architecture—a single-encoder, multi-decoder design that jointly models visual and motor representations; and (3) proposing Learnable Frequency-Domain Filters (LFDF), a novel module enhancing generalization under visual perturbations. Evaluated on four challenging robotic arm tasks, our method surpasses state-of-the-art approaches by over 10% while maintaining comparable parameter count; notably, it exhibits significantly slower performance degradation in long-horizon tasks. The code will be made publicly available.

Technology Category

Application Category

📝 Abstract
Recent imitation learning policies, often framed as time series prediction tasks, directly map robotic observations-such as high-dimensional visual data and proprioception-into the action space. While time series prediction primarily relies on spatial domain modeling, the underutilization of frequency domain analysis in robotic manipulation trajectory prediction may lead to neglecting the inherent temporal information embedded within action sequences. To address this, we reframe imitation learning policies through the lens of the frequency domain and introduce the Wavelet Policy. This novel approach employs wavelet transforms (WT) for feature preprocessing and extracts multi-scale features from the frequency domain using the SE2MD (Single Encoder to Multiple Decoder) architecture. Furthermore, to enhance feature mapping in the frequency domain and increase model capacity, we introduce a Learnable Frequency-Domain Filter (LFDF) after each frequency decoder, improving adaptability under different visual conditions. Our results show that the Wavelet Policy outperforms state-of-the-art (SOTA) end-to-end methods by over 10% on four challenging robotic arm tasks, while maintaining a comparable parameter count. In long-range settings, its performance declines more slowly as task volume increases. The code will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Enhancing imitation learning by utilizing frequency domain analysis
Addressing neglect of temporal information in action sequences
Improving robotic manipulation trajectory prediction accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet transforms for frequency domain preprocessing
SE2MD architecture for multi-scale feature extraction
Learnable Frequency-Domain Filter enhances adaptability
🔎 Similar Papers
No similar papers found.
C
Changchuan Yang
Zhejiang University
Y
Yuhang Dong
Zhejiang University
Guanzhong Tian
Guanzhong Tian
Ningbo Research Institute, Zhejiang University
Computer VisionModel CompressionPattern Recognition
H
Haizhou Ge
Tsinghua University
H
Hongrui Zhu
Zhejiang University