Scholar

Pu Cao

Google Scholar ID: i_R1l9UAAAAJ

Beijing University of Posts and Telecommunications

Computer Vision

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

178

H-index

i10-index

Publications

Co-authors

Contact

Emailcaopu@bupt.edu.cn TwitterOpen ↗GitHubOpen ↗

Publications

8 items

Fourier Series Coder: A Novel Perspective on Angle Boundary Discontinuity Problem for Oriented Object Detection

2026

Cited

A Tilted Seesaw: Revisiting Autoencoder Trade-off for Controllable Diffusion

2026

Cited

ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers

2025

Cited

Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGS

2025

Cited

Preliminary Explorations with GPT-4o(mni) Native Image Generation

2025

Cited

Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

2025

Cited

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

2023

Cited

LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space

arXiv.org · 2022

Cited

Resume (English only)

Academic Achievements

Publications:
- November 2025, one paper accepted by AAAI 2026 (oral).
- January 2025, one paper accepted by CVPR 2025.
Projects:
- UniDiffusion: A Diffusion training toolbox based on diffusers and existing SOTA methods.
- Awesome Controllable T2I Diffusion Models: A collection of resources on controllable generation with text-to-image diffusion models.
- GAN Inverter: A GAN inversion toolbox based on the PyTorch library.

Research Experience

Currently a fourth-year Ph.D. student at Beijing University of Posts and Telecommunications, working on research related to Visual Synthesis and Multimodal Large-language Model.

Education

PhD in Artificial Intelligence, 2022-present, Beijing University of Posts and Telecommunications, supervised by Prof. Qing Song and Dr. Lu Yang; BSc in Information and Computational Science, 2018, University of Science and Technology Beijing.

Background

Research interests include Image Synthesis, Multimodal Large Language Models, Visual Representation, Image Detection/Segmentation, and Computer Vision. Currently a Ph.D. student in Artificial Intelligence, focusing on Visual Synthesis and Multimodal Large-language Model.

Miscellany

Service: Reviewer for TPAMI, TMM, TNNLS, TCSVT, ECCV 2024 (Outstanding Reviewer), WACV 2024/2025, CVPR 2025.

Co-authors

0 total

Co-authors: 0 (list not available)