Frequency-Enhanced Diffusion Models: Curriculum-Guided Semantic Alignment for Zero-Shot Skeleton Action Recognition

πŸ“… 2026-04-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the issue of excessive smoothing of high-frequency dynamic details and limited generalization in zero-shot skeleton-based action recognition caused by spectral bias in diffusion models. To mitigate this, the authors propose a spectral-aware diffusion generation framework that integrates spectral modeling with curriculum learning for the first time. The approach incorporates a semantic-guided spectral residual module, a timestep-adaptive spectral loss, and a curriculum-based semantic abstraction mechanism. These components collectively enhance the model’s ability to recover fine-grained motion features and achieve precise cross-modal alignment between textual semantics and action representations. Evaluated on NTU RGB+D, PKU-MMD, and Kinetics-skeleton benchmarks, the method significantly improves zero-shot recognition accuracy, achieving state-of-the-art performance.

Technology Category

Application Category

πŸ“ Abstract
Human action recognition is pivotal in computer vision, with applications ranging from surveillance to human-robot interaction. Despite the effectiveness of supervised skeleton-based methods, their reliance on exhaustive annotation limits generalization to novel actions. Zero-Shot Skeleton Action Recognition (ZSAR) emerges as a promising paradigm, yet it faces challenges due to the spectral bias of diffusion models, which oversmooth high-frequency dynamics. Here, we propose Frequency-Aware Diffusion for Skeleton-Text Matching (FDSM), integrating a Semantic-Guided Spectral Residual Module, a Timestep-Adaptive Spectral Loss, and Curriculum-based Semantic Abstraction to address these challenges. Our approach effectively recovers fine-grained motion details, achieving state-of-the-art performance on NTU RGB+D, PKU-MMD, and Kinetics-skeleton datasets. Code has been made available at https://github.com/yuzhi535/FDSM. Project homepage: https://yuzhi535.github.io/FDSM.github.io/
Problem

Research questions and friction points this paper is trying to address.

Zero-Shot Skeleton Action Recognition
spectral bias
diffusion models
high-frequency dynamics
skeleton-based action recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-Enhanced Diffusion
Zero-Shot Action Recognition
Spectral Residual Module
Curriculum Learning
Skeleton-Text Alignment
πŸ”Ž Similar Papers
Y
Yuxi Zhou
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China.
Zhengbo Zhang
Zhengbo Zhang
Singapore University of Technology and Design
Generative ModelsReinforcement Learning
J
Jingyu Pan
School of Geodesy and Geomatics, Wuhan University, Wuhan, China.; School of Mathematics and Statistics, Wuhan University, Wuhan, China.
Zhiyu Lin
Zhiyu Lin
Beijing Jiaotong University
Z
Zhigang Tu
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China.; Wuhan University Shenzhen Research Institute, Shenzhen, China.