Micro-Expression-Aware Avatar Fingerprinting via Inter-Frame Feature Differencing

📅 2026-04-25

📈 Citations: 0

✨ Influential: 0

career value

279K/year

🤖 AI Summary

This work addresses the challenge of verifying speaker identity—rather than merely detecting authenticity—in synthesized talking-head videos. It proposes an end-to-end, preprocessing-free method for extracting identity-specific motion fingerprints directly from raw video sequences. Departing from conventional non-differentiable landmark-based pipelines, the approach innovatively integrates an F5C micro-expression-aware backbone with an inter-frame deep feature differencing mechanism to capture motion-dominated identity-related dynamics while effectively suppressing static appearance cues. Evaluated on the NVFAIR dataset, the method achieves an AUC of 0.877 and consistently matches or outperforms landmark-based baselines across most cross-generator settings.

Technology Category

Application Category

📝 Abstract

Avatar fingerprinting, i.e., verifying who drives a synthetic talking-head video rather than whether it is real, is a critical safeguard for authorized use of face-reenactment technology. Existing methods rely on a fixed, non-differentiable landmark extraction stage that prevents the fingerprinting model from being optimized end-to-end from raw pixels. We propose a preprocessing-free system built on a micro-expression-aware backbone operating on raw video frames, with inter-frame feature differencing as the core design principle: consecutive feature maps are subtracted in the learned deep feature space, so that temporally stable appearance dimensions contribute zero to the output while driver-specific motion dynamics are preserved. A controlled ablation on NVFAIR confirms that temporal motion accounts for the large majority of discriminative performance, and that raw appearance features actively degrade identity separation. Both the choice of backbone and the differencing principle are essential: differencing alone is insufficient when applied to a generic encoder, as appearance-dominated features collapse to near-identical representations across adjacent frames, while the micro-expression-aware F5C backbone retains measurable motion variation that the differencing operation can exploit. Without any external preprocessing, our model achieves an overall AUC of 0.877 on NVFAIR and matches or exceeds the landmark-based baseline on the majority of cross-generator pairs.

Problem

Research questions and friction points this paper is trying to address.

avatar fingerprinting

micro-expression

inter-frame feature differencing

face reenactment

identity verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

micro-expression-aware

inter-frame feature differencing

avatar fingerprinting

end-to-end optimization

motion dynamics

🔎 Similar Papers

No similar papers found.