Scholar
Yuxin Song
Google Scholar ID: 1uL_9HAAAAAJ
Baidu
Computer Vision
Vision-Language Model
Generative Model
Video Understanding
Follow
Google Scholar
↗
Citations & Impact
All-time
Citations
440
H-index
11
i10-index
11
Publications
18
Co-authors
7
list available
Contact
No contact links provided.
Publications
13 items
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
2026
Cited
0
CoLoGen: Progressive Learning of Concept`-`Localization Duality for Unified Image Generation
2026
Cited
0
ViSS-R1: Self-Supervised Reinforcement Video Reasoning
2025
Cited
0
Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
2025
Cited
0
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
2025
Cited
0
Balanced Actor Initialization: Stable RLHF Training of Distillation-Based Reasoning Models
2025
Cited
0
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
2025
Cited
0
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
2025
Cited
0
Load more
Resume (English only)
Co-authors
7 total
Wenhao Wu (吴文灏)
Scientist @ Amazon AGI
Jingdong Wang (王井东), Fellow of CAE & IEEE & IAPR
Baidu
Haocheng Feng
Baidu
Yifan Sun
Zhejiang University
Errui Ding
Baidu Inc.
Min Yang
Bytedance
Yu Lu
University of Technology Sydney
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up