🤖 AI Summary
This work addresses the challenge of modeling long-range dynamics from sparse skeleton data, which suffers from the loss of fine-grained spatiotemporal details and reliance on predefined topologies. To overcome these limitations, the authors propose a kinematics-driven anisotropic Gaussian splatting module that maps discrete joints into a continuous generative representation. They further introduce a graph convolutional network grounded in probabilistic topology derived from the Bhattacharyya distance, thereby transcending the constraints of fixed graph structures and sparse inputs. Additionally, a visual context gating mechanism and multi-view continuous heatmap rendering are incorporated to substantially enhance the modeling of complex spatiotemporal dynamics. The resulting framework achieves notably more robust action recognition performance under low-fidelity skeletal inputs.
📝 Abstract
Skeleton-based action recognition is widely utilized in sensor systems including human-computer interaction and intelligent surveillance. Nevertheless, current sensor devices typically generate sparse skeleton data as discrete coordinates, which inevitably discards fine-grained spatiotemporal details during highly dynamic movements. Moreover, the rigid constraints of predefined physical sensor topologies hinder the modeling of latent long-range dependencies. To overcome these limitations, we propose KGS-GCN, a graph convolutional network that integrates kinematics-driven Gaussian splatting with probabilistic topology. Our framework explicitly addresses the challenges of sensor data sparsity and topological rigidity by transforming discrete joints into continuous generative representations. Firstly, a kinematics-driven Gaussian splatting module is designed to dynamically construct anisotropic covariance matrices using instantaneous joint velocity vectors. This module enhances visual representation by rendering sparse skeleton sequences into multi-view continuous heatmaps rich in spatiotemporal semantics. Secondly, to transcend the limitations of fixed physical connections, a probabilistic topology construction method is proposed. This approach generates an adaptive prior adjacency matrix by quantifying statistical correlations via the Bhattacharyya distance between joint Gaussian distributions. Ultimately, the GCN backbone is adaptively modulated by the rendered visual features via a visual context gating mechanism. Empirical results demonstrate that KGS-GCN significantly enhances the modeling of complex spatiotemporal dynamics. By addressing the inherent limitations of sparse inputs, our framework offers a robust solution for processing low-fidelity sensor data. This approach establishes a practical pathway for improving perceptual reliability in real-world sensing applications.