Omni-ID: Holistic Identity Representation Designed for Generative Tasks

📅 2024-12-12

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the insufficient robustness of identity representation in generative face synthesis and its difficulty in fusing multi-view/multi-expression inputs. We propose Omni-ID, a holistic, generation-oriented facial identity representation method. Our approach introduces three key innovations: (1) a few-shot-to-many-shot reconstruction training paradigm that jointly models global and local identity features across poses and expressions from unordered multi-image inputs; (2) a multi-decoder collaborative architecture trained exclusively with generative losses—eliminating discriminative objectives for the first time in identity representation learning; and (3) a fixed-dimensional, structured output representation. Evaluated on the MFHQ dataset, Omni-ID significantly outperforms mainstream representations—including CLIP and ArcFace—in identity preservation, fine-detail recovery, and cross-pose generalization within generative tasks, enabling more robust and nuanced facial identity modeling.

Technology Category

Application Category

📝 Abstract

We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. It consolidates information from a varied number of unstructured input images into a structured representation, where each entry represents certain global or local identity features. Our approach uses a few-to-many identity reconstruction training paradigm, where a limited set of input images is used to reconstruct multiple target images of the same individual in various poses and expressions. A multi-decoder framework is further employed to leverage the complementary strengths of diverse decoders during training. Unlike conventional representations, such as CLIP and ArcFace, which are typically learned through discriminative or contrastive objectives, Omni-ID is optimized with a generative objective, resulting in a more comprehensive and nuanced identity capture for generative tasks. Trained on our MFHQ dataset -- a multi-view facial image collection, Omni-ID demonstrates substantial improvements over conventional representations across various generative tasks.

Problem

Research questions and friction points this paper is trying to address.

Holistic facial representation for generative tasks

Encoding diverse expressions and poses in fixed-size format

Consolidating unstructured images into structured identity features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Omni-ID encodes holistic facial identity features

Uses few-to-many identity reconstruction training

Multi-decoder framework enhances generative performance

🔎 Similar Papers

TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation