Scholar
Yuqian Yuan
Google Scholar ID: 7D7QL9MAAAAJ
PhD student, Zhejiang University
Computer Vision
Machine Learning
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
516
H-index
8
i10-index
7
Publications
13
Co-authors
12
list available
Contact
GitHub
Open ↗
Publications
19 items
VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies
2026
Cited
0
InstructSAM: Segment Any Instance with Any Instructions
2026
Cited
0
CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark
2026
Cited
0
LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation
2026
Cited
0
RynnBrain: Open Embodied Foundation Models
2026
Cited
0
MAU-GPT: Enhancing Multi-type Industrial Anomaly Understanding via Anomaly-aware and Generalist Experts Adaptation
2026
Cited
0
Unified Personalized Understanding, Generating and Editing
arXiv.org · 2026
Cited
1
PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models
arXiv.org · 2026
Cited
0
Load more
Resume (English only)
Academic Achievements
Oct 2025: Released PixelRefer, a unified pixel-level MLLM framework for fine-grained regional understanding.
Sep 2025: Paper 'EOC-Bench' accepted by NeurIPS 2025.
Aug 2025: Released RynnEC, a video MLLM designed for embodied cognition tasks.
Jun 2025: Released EOC-Bench, an object-centric embodied cognition benchmark in dynamic egocentric scenarios.
May 2025: Paper 'TokenPacker' accepted by IJCV 2025.
Apr 2025: VideoRefer and VideoRefer-Bench adopted by NVIDIA & UC Berkeley in their DAM work.
Feb 2025: Two papers accepted by CVPR 2025; released VideoRefer-700K dataset on HuggingFace.
Jan 2025: Released VideoLLaMA 3, a frontier multimodal foundation model for image and video understanding.
Published multiple high-impact works including PixelRefer (Arxiv 2025), RynnEC (Technical Report 2025), EOC-Bench (NeurIPS 2025), VideoRefer Suite (CVPR 2025), ECBench (CVPR 2025), TokenPacker (IJCV 2025), and VideoLLaMA 3 (Technical Report 2025).
Co-authors
12 total
Wentong Li
Nanjing University of Aeronautics and Astronautics
Lidong Bing
MiroMind, Alibaba DAMO, Tencent, CMU, CUHK
Jianke Zhu
Professor of Computer Science, Zhejiang University
Boqiang Zhang
Tencent AILab
Xin Li
Alibaba Group
Zesen Cheng
Peking University
Yuming Jiang
Alibaba DAMO Academy
Hang Zhang
Qwen Team; Zhejiang University; Sichuan University
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up