🤖 AI Summary
Pedestrian re-identification (ReID) is hindered by the scarcity of high-quality annotated data and stringent privacy constraints. To address this, we propose OmniPerson—the first unified identity-preserving controllable pedestrian generation framework, supporting cross-modal (visible/infrared), multi-reference image, and text-guided synthesis for both images and videos. Our method introduces a novel Multi-Refer Fuser module that enables flexible identity fusion and unified representation from an arbitrary number of reference images, while integrating fine-grained controls including textual descriptions, pose, attributes, RGB-to-IR translation, and super-resolution. Leveraging this framework, we construct PersonSyn—the first large-scale multi-reference controllable pedestrian generation dataset—with automated annotation ensuring strict identity consistency and visual fidelity. Experiments demonstrate that OmniPerson achieves state-of-the-art performance in both generation quality and identity preservation. When applied to ReID data augmentation, it significantly improves the performance of mainstream ReID models.
📝 Abstract
Person re-identification (ReID) suffers from a lack of large-scale high-quality training data due to challenges in data privacy and annotation costs. While previous approaches have explored pedestrian generation for data augmentation, they often fail to ensure identity consistency and suffer from insufficient controllability, thereby limiting their effectiveness in dataset augmentation. To address this, We introduce OmniPerson, the first unified identity-preserving pedestrian generation pipeline for visible/infrared image/video ReID tasks. Our contributions are threefold: 1) We proposed OmniPerson, a unified generation model, offering holistic and fine-grained control over all key pedestrian attributes. Supporting RGB/IR modality image/video generation with any number of reference images, two kinds of person poses, and text. Also including RGB-to-IR transfer and image super-resolution abilities.2) We designed Multi-Refer Fuser for robust identity preservation with any number of reference images as input, making OmniPerson could distill a unified identity from a set of multi-view reference images, ensuring our generated pedestrians achieve high-fidelity pedestrian generation.3) We introduce PersonSyn, the first large-scale dataset for multi-reference, controllable pedestrian generation, and present its automated curation pipeline which transforms public, ID-only ReID benchmarks into a richly annotated resource with the dense, multi-modal supervision required for this task. Experimental results demonstrate that OmniPerson achieves SoTA in pedestrian generation, excelling in both visual fidelity and identity consistency. Furthermore, augmenting existing datasets with our generated data consistently improves the performance of ReID models. We will open-source the full codebase, pretrained model, and the PersonSyn dataset.