KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This study addresses the challenge of efficiently generating structurally coherent and detailed 2D human pose animations from symbolic sign language notations such as HamNoSys. To this end, the authors propose KANMultiSign, a framework employing a coarse-to-fine multiscale generation strategy: it first ensures global structural consistency through a body–hands–face skeletal hierarchy and subsequently refines finger-level articulations. The approach integrates Kolmogorov–Arnold Networks (KANs) with a Transformer backbone to model the nonlinear mapping from discrete symbols to continuous poses and incorporates multiscale supervision to enhance output fidelity. Notably, this work presents the first application of KANs to sequence generation tasks, achieving competitive or superior performance against strong baselines while significantly reducing parameter count. Experiments across multiple sign language corpora demonstrate consistent reductions in joint error, validating the efficacy of multiscale supervision complemented by KAN-based efficient modeling.

📝 Abstract

Sign language production from symbolic notation offers a scalable route to accessible sign animation. We present KANMultiSign, a multi-scale sequence generator that translates HamNoSys notation into two-dimensional human pose sequences. Our framework makes two complementary contributions. First, we introduce a coarse-to-fine generation strategy with multi-scale supervision: the model is first guided by an intermediate body--hand--face scaffold to encourage global structural coherence, and then refines fine-grained hand articulation to improve finger-level detail. Second, we investigate integrating Kolmogorov--Arnold Network modules into a Transformer backbone, using learnable univariate function primitives to model the highly non-linear mapping from discrete phonological symbols to continuous body kinematics with a compact parameterization. Experiments on multiple public corpora spanning Polish, German, Greek, and French sign languages show consistent reductions in dynamic time warping based joint error compared with a strong notation-to-pose baseline, while using substantially fewer parameters. Controlled ablations further indicate that KAN-based variants substantially reduce parameter count while maintaining competitive performance when coupled with multi-scale supervision, rather than serving as the main driver of accuracy gains. These findings position multi-scale supervision as the key mechanism for improving notation-conditioned pose generation, with KAN offering a compact alternative for efficient modeling. Our code will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

sign language animation

pose generation

symbolic notation

multi-scale modeling

Kolmogorov-Arnold Networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kolmogorov-Arnold Networks

multi-scale supervision

sign language animation