AsymK-Talker: Real-Time and Long-Horizon Talking Head Generation via Asymmetric Kernel Distillation

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Existing methods for audio-driven talking head generation suffer from low causal efficiency, poor temporal consistency, and long-term generation drift, hindering their applicability in real-time scenarios. This work proposes AsymK-Talker, a three-stage framework comprising kernel-conditioned recurrent generation, temporal reference encoding, and asymmetric kernel distillation. It introduces, for the first time, an asymmetric kernel distillation mechanism that integrates motion-kernel-guided causal chunking with temporally aware identity encoding. This design achieves high-quality, low-latency, and temporally stable generation over extended sequences while preserving strong lip-sync accuracy. Experimental results demonstrate that the proposed method excels in both visual fidelity and synchronization metrics, enabling robust, real-time synthesis of photorealistic talking heads.

📝 Abstract

Recent advances in diffusion models have markedly enhanced the visual fidelity of audio-driven talking head generation. Nevertheless, existing methods are constrained by three critical limitations: causal inefficiency that impedes real-time inference, incompatibility with temporally coherent conditioning, and progressive drift over long-horizon generation, collectively hindering their deployment in real-time applications. To overcome these challenges, we introduce AsymK-Talker, a novel diffusion-distillation method designed for real-time and long-horizon talking head generation. AsymK-Talker comprises three key components: (1) Kernel-Conditioned Loop Generation (KCLG), a causal, chunk-wise generation paradigm that leverages motion kernels to enable temporally consistent propagation; (2) Temporal Reference Encoding (TRE), which converts a static identity reference into a time-aware latent representation to enhance audio-visual synchronization; and (3) Asymmetric Kernel Distillation (AKD), a teacher-student distillation framework wherein the teacher model conditions on ground-truth motion kernels for supervision, while the student learns to generate from generated kernels, thereby ensuring robustness during extended generation sequences. AsymK-Talker achieves promising results on both visual fidelity and lip synchronization metrics.

Problem

Research questions and friction points this paper is trying to address.

talking head generation

real-time inference

temporal coherence

long-horizon generation

diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asymmetric Kernel Distillation

Kernel-Conditioned Loop Generation

Temporal Reference Encoding