Scholar

Ziyao Zeng

Google Scholar ID: FYL2DYEAAAAJ

Yale University

Computer VisionMachine LearningRoboticsMultimodal Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

631

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗

Publications

12 items

Artificial Foveated Perception for Mitigating Shortcut Learning in Robotic Foundation Models

2026

Cited

OpenLongTail: Generative Scaling of Long-Tail Driving Data

2026

Cited

UniTac: A Unified Multimodal Model for Cross-Sensor Tactile Understanding and Generation

2026

Cited

A Physics-Grounded Benchmark for Multi-Agent Dynamics in World Models

2026

Cited

Benchmarking AI Agents for Addressing Scientific Challenges Across Scales

2026

Cited

4DP-QA: Scalable QA for 4D Perception in Vision Language Models

2026

Cited

RuleSmith: Multi-Agent LLMs for Automated Game Balancing

2026

Cited

Coffee: Controllable Diffusion Fine-tuning

2025

Cited

Resume (English only)

Academic Achievements

Paper 'ETA: Energy-based Test-time Adaptation for Depth Completion' accepted to ICCV 2025
Paper 'ProtoDepth: Unsupervised Continual Depth Completion with Prototypes' accepted to CVPR 2025
Paper 'HOMER: Homography-Based Efficient Multi-view 3D Object Removal' published as arXiv technical report in 2025
Paper 'PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation' published as arXiv technical report in 2024
Paper 'RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions' published in 2024
Served as reviewer for CVPR (2022, 2025 [Outstanding Reviewer]), ICCV (2023, 2025), ECCV (2024), ICML (2025), ICLR (2025, 2026), NeurIPS (2024, 2025), ACM MM (2023, 2025), AISTATS (2024, 2025), ICASSP (2024, 2025), TCSVT

Background

Third-year Ph.D. student in Computer Science at Yale University (2023–expected 2027), advised by Prof. Alex Wong
Research interests include Computer Vision, Machine Learning, and Robotics
Focuses on Multimodal Embodied AI inspired by human learning
Current research centers on Vision-Language Models for 3D Vision
Research vision: empower embodied AI with multimodal sensing and leverage pre-trained multimodal representations to interact with the physical world like humans

Co-authors

10 total