Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding (open-source project).
EVTAR: End-to-End Try on with Additional Unpaired Visual Reference, Preprint, 2025.
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves, Preprint, 2025.
AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents, Preprint, 2025.
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation, Preprint, 2025.
AffordanceSAM: Segment Anything Once More in Affordance Grounding, Preprint, 2025.
Low-Biased General Annotated Dataset Generation, Preprint, 2025.
Education
School of Automation, Northwestern Polytechnical University, 2022-2026, B.S.E., supervised by Prof. Lei Zhang, in the research team led by Prof. Yanning Zhang.
Background
Research interests encompass deep learning and computer vision. Delighted to communicate and collaborate with anyone interested in this field.