🤖 AI Summary
To address the linear computational cost of the Softmax layer with respect to the number of identities in large-scale face recognition (FR), this paper proposes Identity Tokenization: replacing scalar identity labels with structured integer sequences, thereby reformulating identity prediction as discrete sequence decoding rather than single-label classification. This work pioneers the integration of generative modeling principles into FR, abandoning the conventional dot-product similarity paradigm. We design a lightweight sequence decoder that reduces inference complexity from *O(N)* to *O*(log *N*). The backbone network and tokenization module are jointly optimized end-to-end. On IJB-B and IJB-C benchmarks, our method achieves absolute improvements of +1.52% and +0.6% in TAR@FAR=1e−4, respectively, while substantially lowering computational overhead for large-scale deployment.
📝 Abstract
Aiming to reduce the computational cost of Softmax in massive label space of Face Recognition (FR) benchmarks, recent studies estimate the output using a subset of identities. Although promising, the association between the computation cost and the number of identities in the dataset remains linear only with a reduced ratio. A shared characteristic among available FR methods is the employment of atomic scalar labels during training. Consequently, the input to label matching is through a dot product between the feature vector of the input and the Softmax centroids. Inspired by generative modeling, we present a simple yet effective method that substitutes scalar labels with structured identity code, i.e., a sequence of integers. Specifically, we propose a tokenization scheme that transforms atomic scalar labels into structured identity codes. Then, we train an FR backbone to predict the code for each input instead of its scalar label. As a result, the associated computational cost becomes logarithmic w.r.t. number of identities. We demonstrate the benefits of the proposed method by conducting experiments. In particular, our method outperforms its competitors by 1.52%, and 0.6% at TAR@FAR$=1e-4$ on IJB-B and IJB-C, respectively, while transforming the association between computational cost and the number of identities from linear to logarithmic. See code at https://github.com/msed-Ebrahimi/GIF