Yatian Pang
Scholar

Yatian Pang

Google Scholar ID: AZQyNWkAAAAJ
National University of Singapore
Multi-modal understandingMulti-modal generationUnified models
Citations & Impact
All-time
Citations
1,674
 
H-index
10
 
i10-index
10
 
Publications
14
 
Co-authors
6
list available
Resume (English only)
Academic Achievements
  • Achieved state-of-the-art results on various benchmarks in Qwen3-VL video understanding
  • Proposed UniWorld, a unified framework connecting frozen VLMs with diffusion generators via a novel semantic encoder
  • Key contributor to Open-Sora-Plan, releasing high-quality video generation architecture and data
  • “Video Sparse Attention for Streaming Long Video Understanding” (2025, under submission)
  • “Unified Autoregressive Pretraining for Image Generation and Representation Learning” (2025, under submission)
  • “Next Patch Prediction for Autoregressive Visual Generation” (arXiv, 2024)
  • “DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses” (ICCV 2025)
  • “Envision3D: One Image to 3D with Anchor Views Interpolation” (arXiv, 2024)
  • “Masked autoencoders for point cloud self-supervised learning” (ECCV, 2022)
  • Co-authored “MoE-LLaVA: Mixture of Experts for Large Vision-Language Models” (IEEE TMM, 2024)
  • Co-authored “LanguageBind: Extending Video-Language Pretraining to N-Modality by Language-Based Semantic Alignment” (ICLR 2024)