Over 1200 citations on Google Scholar with an h-index of 12
Published multiple papers in top-tier conferences including ECCV, ICCV, CVPR, and AAAI
Notable work: 'PyramidBox: A Context-assisted Single Shot Face Detector' (ECCV 2018, 563 citations)
Contributed to the VOT2021 and VOT2022 visual object tracking challenges
Recent research includes multimodal understanding/generation (e.g., Tokenflow), text-to-image customization (e.g., Photoverse), and 3D avatar generation (e.g., Avatarverse)