Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the semantic deficiency and inability to model human-object interactions inherent in conventional skeleton-based action recognition methods for Industry 4.0 collaborative robotic assembly, this paper proposes a semantics-enhanced skeleton representation. Specifically, joint and object categories are mapped to dense semantic vectors via word embeddings—replacing sparse one-hot encodings—and a “semantic volume” is explicitly constructed to capture deep semantic associations between joints and objects, enabling joint representation of heterogeneous skeleton and object categories. The method integrates end-to-end into mainstream skeleton sequence models. Evaluations across multiple assembly datasets demonstrate significant improvements in classification accuracy (average +3.2%) and enhanced generalization across environments and tasks. The core contribution lies in pioneering the integration of language-level semantic modeling into industrial assembly action recognition, thereby advancing skeleton data representation from geometric encoding to semantic understanding.

Technology Category

Application Category

📝 Abstract
Effective human action recognition is widely used for cobots in Industry 4.0 to assist in assembly tasks. However, conventional skeleton-based methods often lose keypoint semantics, limiting their effectiveness in complex interactions. In this work, we introduce a novel approach to skeleton-based action recognition that enriches input representations by leveraging word embeddings to encode semantic information. Our method replaces one-hot encodings with semantic volumes, enabling the model to capture meaningful relationships between joints and objects. Through extensive experiments on multiple assembly datasets, we demonstrate that our approach significantly improves classification performance, and enhances generalization capabilities by simultaneously supporting different skeleton types and object classes. Our findings highlight the potential of incorporating semantic information to enhance skeleton-based action recognition in dynamic and diverse environments.
Problem

Research questions and friction points this paper is trying to address.

Enhancing skeleton-based action recognition with semantic word embeddings
Addressing keypoint semantics loss in complex human-robot interactions
Improving generalization across diverse skeleton types and object classes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses word embeddings for semantic encoding
Replaces one-hot encodings with semantic volumes
Supports multiple skeleton and object types
🔎 Similar Papers
No similar papers found.
Dustin Aganian
Dustin Aganian
Ilmenau University of Technology
Deep LearningMachine LearningComputer VisionRobotics
E
Erik Franze
Ilmenau University of Technology, Neuroinformatics and Cognitive Robotics Lab
M
Markus Eisenbach
Ilmenau University of Technology, Neuroinformatics and Cognitive Robotics Lab
Horst-Michael Gross
Horst-Michael Gross
Full Professor of Computer Science, Technische Universitaet Ilmenau
RoboticsCognitive RoboticsNeural NetworksDeep LearningHuman-Robot Interaction