OmniPerson: Unified Identity-Preserving Pedestrian Generation

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Pedestrian re-identification (ReID) is hindered by the scarcity of high-quality annotated data and stringent privacy constraints. To address this, we propose OmniPerson—the first unified identity-preserving controllable pedestrian generation framework, supporting cross-modal (visible/infrared), multi-reference image, and text-guided synthesis for both images and videos. Our method introduces a novel Multi-Refer Fuser module that enables flexible identity fusion and unified representation from an arbitrary number of reference images, while integrating fine-grained controls including textual descriptions, pose, attributes, RGB-to-IR translation, and super-resolution. Leveraging this framework, we construct PersonSyn—the first large-scale multi-reference controllable pedestrian generation dataset—with automated annotation ensuring strict identity consistency and visual fidelity. Experiments demonstrate that OmniPerson achieves state-of-the-art performance in both generation quality and identity preservation. When applied to ReID data augmentation, it significantly improves the performance of mainstream ReID models.

Technology Category

Application Category

📝 Abstract

Person re-identification (ReID) suffers from a lack of large-scale high-quality training data due to challenges in data privacy and annotation costs. While previous approaches have explored pedestrian generation for data augmentation, they often fail to ensure identity consistency and suffer from insufficient controllability, thereby limiting their effectiveness in dataset augmentation. To address this, We introduce OmniPerson, the first unified identity-preserving pedestrian generation pipeline for visible/infrared image/video ReID tasks. Our contributions are threefold: 1) We proposed OmniPerson, a unified generation model, offering holistic and fine-grained control over all key pedestrian attributes. Supporting RGB/IR modality image/video generation with any number of reference images, two kinds of person poses, and text. Also including RGB-to-IR transfer and image super-resolution abilities.2) We designed Multi-Refer Fuser for robust identity preservation with any number of reference images as input, making OmniPerson could distill a unified identity from a set of multi-view reference images, ensuring our generated pedestrians achieve high-fidelity pedestrian generation.3) We introduce PersonSyn, the first large-scale dataset for multi-reference, controllable pedestrian generation, and present its automated curation pipeline which transforms public, ID-only ReID benchmarks into a richly annotated resource with the dense, multi-modal supervision required for this task. Experimental results demonstrate that OmniPerson achieves SoTA in pedestrian generation, excelling in both visual fidelity and identity consistency. Furthermore, augmenting existing datasets with our generated data consistently improves the performance of ReID models. We will open-source the full codebase, pretrained model, and the PersonSyn dataset.

Problem

Research questions and friction points this paper is trying to address.

Addresses lack of large-scale high-quality training data for person re-identification

Ensures identity consistency and controllability in pedestrian generation for data augmentation

Unifies generation for visible/infrared image/video ReID tasks with multi-modal control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified generation model for holistic pedestrian attribute control

Multi-Refer Fuser for identity preservation from multiple references

Automated curation pipeline creating PersonSyn dataset with dense annotations

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Sr. Research Engineer/Scientist (all levels), World Models

TikTok

San Jose, California

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)