🤖 AI Summary
This study addresses the lack of reliable automated evaluation methods for sign language AI systems by proposing the first multi-dimensional sign language expression assessment framework grounded in human skeletal pose. Methodologically, it integrates three complementary metrics—keypoint geometric distance, pose embedding similarity, and cross-modal back-translation quality—to establish an interpretable and reproducible evaluation pipeline. Through automated meta-evaluation and cross-lingual human correlation studies, the framework systematically characterizes the applicability boundaries of each metric across sign-to-text retrieval and text-to-pose generation tasks. Key contributions include: (1) uncovering systematic performance trade-offs among evaluation metrics; (2) releasing PoseEval—an open-source, modular pose evaluation toolkit; and (3) advancing standardization in sign language assessment, thereby significantly improving development and iterative refinement efficiency for sign language translation and generation systems.
📝 Abstract
We present a comprehensive study on meaningfully evaluating sign language utterances in the form of human skeletal poses. The study covers keypoint distance-based, embedding-based, and back-translation-based metrics. We show tradeoffs between different metrics in different scenarios through automatic meta-evaluation of sign-level retrieval and a human correlation study of text-to-pose translation across different sign languages. Our findings and the open-source pose-evaluation toolkit provide a practical and reproducible way of developing and evaluating sign language translation or generation systems.