Scholar

Haoge Deng

Google Scholar ID: S2sbvjgAAAAJ

Institute of Automation, Chinese Academy of Sciences & Beijing Academy of Artificial Intelligence

Computer Vision

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

201

H-index

i10-index

Publications

Co-authors

list available

Contact

Emaildenghaoge666@gmail.com CVOpen ↗GitHubOpen ↗

Publications

9 items

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

2026

Cited

LINA: Linear Autoregressive Image Generative Models with Continuous Tokens

2026

Cited

Emu3.5: Native Multimodal Models are World Learners

2025

Cited

Uniform Discrete Diffusion with Metric Path for Video Generation

2025

Cited

CI-VID: A Coherent Interleaved Text-Video Dataset

2025

Cited

OmniGen2: Exploration to Advanced Multimodal Generation

2025

Cited

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

2025

Cited

Autoregressive Video Generation without Vector Quantization

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

- Emu3.5: A natively multimodal world model that unifies vision and language through end-to-end next-token prediction on interleaved video-derived data, enhanced by reinforcement learning and DiDA-based parallel decoding for efficient, spatiotemporally consistent generation.
- Uniform Discrete Diffusion with Metric Path for Video Generation: A simple yet powerful discrete framework that formulates video generation as an iterative process of global refinement over spatiotemporal tokens, enabling efficient scaling to long-duration videos.
- Autoregressive Video Generation without Vector Quantization: A non-quantized autoregressive model that enables efficient video generation by reformulating the video creation as frame-by-frame and set-by-set predictions.
- You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale: A scalable visual-conditional MVD model for open-world 3D creation, which can be trained on web-scale video collections without camera pose annotations.
- GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation: A 3D generation method that integrates explicit generalized 3D priors with 2D diffusion priors to enhance the capability of obtaining unambiguous 3D consistent geometric structures without sacrificing diversity or fidelity.
- SketchKnitter: Vectorized Sketch Generation with Diffusion Models: A method that achieves vectorized sketch generation by reversing the stroke deformation process using a diffusion model learned from real sketches.

Research Experience

Current research work during his PhD focuses on generative models and multimodal generation.

Education

- PhD student, jointly supervised by the Institute of Automation, Chinese Academy of Sciences (CASIA), and Beijing Academy of Artificial Intelligence (BAAI), supervised by Prof. Zhaoxiang Zhang and Dr. Xinlong Wang
- MSc degree, BUPT, supervised by Prof. Yonggang Qi
- Bachelor's degree in Electronics Information Science and Technology, BUPT, 2022

Background