Published several papers, such as 'Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis' (with Bingda Tang, Boyang Zheng, Xichen Pan, and Saining Xie), 'SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation' (in collaboration with NVIDIA), 'From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning' (with Le Zhuo, Liangbing Zhao, and others), and 'Fine-Grained Perturbation Guidance via Attention Head Selection' (with Donghoon Ahn et al.). Additionally, there is 'Factuality Matters: When Image Generation and Editing Meet Structured Visuals' (with Le Zhuo and others).
Research Experience
Involved in multiple research projects, including works presented at CVPR'25, ICCV'25, and NeurIPS'25.
Background
Research Interests: Diffusion models. Currently working at Hugging Face.
Miscellany
Maintains a Google Doc answering some FAQs. The structure of this website is inspired by Omar’s site.