Scholar
Shangzhe Di
Google Scholar ID: qkO6rFQAAAAJ
Shanghai Jiao Tong University
Video Understanding
Multimodal Learning
Computer Vision
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
346
H-index
8
i10-index
7
Publications
11
Co-authors
7
list available
Contact
No contact links provided.
Publications
8 items
OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
2026
Cited
0
VersaViT: Enhancing MLLM Vision Backbones via Task-Guided Optimization
2026
Cited
0
Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning
2026
Cited
0
Revisiting Multi-Task Visual Representation Learning
2026
Cited
1
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
2025
Cited
0
Learning Streaming Video Representation via Multitask Training
2025
Cited
0
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
2025
Cited
0
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
2024
Cited
1
Resume (English only)
Co-authors
7 total
Weidi Xie
Shanghai Jiao Tong University | VGG, University of Oxford
si liu
Beihang University
Shuicheng Yan, Fellow of AAAI, ACM, SAEng, IEEE, IAPR | Hunting Robotics and Cuda Researchers
Professor@National University of Singapore | Looking for lab members targeting beyond papers
Zeren Jiang
University of Oxford
Zhaokai Wang
Shanghai Jiao Tong University; Shanghai AI Laboratory
Yulu Gao
BUAA
Zongheng Tang
Beihang University
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up